RE: Solr Search Handler Suggestion

2017-01-26 Thread Moenieb Davids
Hi Mikhail,

The per row scenario would cater for queries that is looking at specific rows 
with.
For example, I need address and bank details of a member that is stored on a 
different core

I guess what I am trying to do is get Solr search functionality that is similar 
to DB, something which I can easily plug my marious corporate solutions into o 
that they can retrieve info

-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org] 
Sent: 26 January 2017 09:23 AM
To: solr-user
Subject: Re: Solr Search Handler Suggestion

Hello,  Moenieb.

It'd worth to mention that it's not effective to include, java-user@ in this 
thread.
Also, this proposal is purposed DIH, that's worth to be mentioned in subj.
Then, this config looks like it will issue solr request per every parent row 
that's deadly inefficient.

On Wed, Jan 25, 2017 at 10:53 AM, Moenieb Davids  wrote:

> Hi Guys,
>
> Just an Idea for easier config of search handlers:
>
> Will it be feasible to configure a search handler that has its own 
> schema based on the current core as well as inserting nested objects 
> from cross core queries.
>
> Example (for illustration purpose, ignore syntax :) ) 
>
>   
>
>   
>
>   
> http://localhost:8983/solr/items; query="user_liking_this:${thiscore.id}"
> >
>   
>   
> 
>
>   
> http://localhost:8983/solr/items; query="user_liking_this:${thiscore.id}"
> >
>   
>   
> 
>
> 
>
> This will allow you to create endpoints to interact and return fields 
> and their values from others cores and seems to be possibly easier to manage?
>
>
>
>
>
>
>
>
>
>
> 
> ===
> GPAA e-mail Disclaimers and confidential note
>
> This e-mail is intended for the exclusive use of the addressee only.
> If you are not the intended recipient, you should not use the contents 
> or disclose them to any other person. Please notify the sender 
> immediately and delete the e-mail. This e-mail is not intended nor 
> shall it be taken to create any legal relations, contractual or otherwise.
> Legally binding obligations can only arise for the GPAA by means of a 
> written instrument signed by an authorised signatory.
> 
> ===
>



--
Sincerely yours
Mikhail Khludnev










===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===


Commit/callbacks doesn't happen on core close

2017-01-26 Thread saiks
Hi All,

While I was testing out the transient feature of Solr cores, I found this
issue.

I had my transientCacheSize set to "1" in solr.xml and I have created two
cores with these properties
loadOnStartup=false
transient=true

I had my softCommit time set to 5 seconds and hardCommit to 10 seconds.

Now, when I try to ingest into two cores one by one and if I do it before
the hard commit time, i.e. ingest into two cores within 10 seconds, commit
only happens for the second ingested core and also only second cores
postCommit listeners get called and not the first.

I have tried to see why and found this
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java#L789

where call to "commit()" method on "closeWriter()" is commented out.
So, could someone please tell me how to make sure commit and the callbacks
happen on core close or is there any other way to solve this problem.

Thank you



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Commit-callbacks-doesn-t-happen-on-core-close-tp4316015.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Latest advice on G1 collector?

2017-01-26 Thread Jeff Wartes

Adding my anecdotes: 

I’m using heavily tuned ParNew/CMS. This is a SolrCloud collection, but 
per-node I’ve got a 28G heap and a 200G index. The large heap turned out to be 
necessary because certain operations in Lucene allocate memory based on things 
other than result size, (index size typically, or field cardinality) and small 
bursts of queries that used these allocations would otherwise cause overflows 
directly into tenured. Currently, I get a ParNew collection with a 200ms pause 
every few seconds. CMS collections happen every few hours.

I tested G1 at one point, and with a little tuning got it about the same pause 
levels as the current configuration, but given a rough equality, I stuck with 
Lucene’s recommendation for CMS.

As a result of this conversation, I actually went and tested the Parallel 
collector last night. I was frankly astonished to find that with no real tuning 
my query latency dropped across the board, from a few percent improvement in 
p50 to an almost 50% improvement in p99. Small collections happened at roughly 
the same rate, but with meaningfully lower pauses, and the pause durations were 
much more consistent.
However, (and Shawn called this) after about an hour at full query load it 
accumulated enough to do a large collection, and it took a 14 second pause to 
shed about 20G from the heap. That’s less than the default SolrCloud ZK 
timeout, but still impolite.

My conclusion is that Parallel is a lot better at cleaning eden space than 
ParNew, (why?) but losing the concurrent tenured collection is still pretty 
nasty.

Even so, I’m seriously considering whether I should be using Parallel in 
production as a result of this experiment. I have a reasonable number of 
replicas, and I use 98th percentile backup requests 
(https://github.com/whitepages/SOLR-4449) on shard fan-out requests, so having 
a node suddenly lock up very occasionally only really hurts the top-level 
queries addressed to that node, not the shard-requests. If I did backup 
requests at the SolrJ client level too, the overall latency savings might still 
be worth considering.



On 1/25/17, 5:35 PM, "Walter Underwood"  wrote:

> On Jan 25, 2017, at 5:19 PM, Shawn Heisey  wrote:
> 
>  It seems that Lucene/Solr
> creates a lot of references as it runs, and collecting those in parallel
> offers a significant performance advantage.

This is critical for any tuning. Most of the query time allocations in Solr 
have the lifetime of a single request. Query parsing, result scoring, all that 
is garbage after the HTTP response is sent. So the GC must be configured with a 
large young generation (Eden, Nursery, whatever). If that generation cannot 
handle all the short-lived allocations under heavy load, they will be allocated 
from tenured space.

Right now, we run with an 8G heap and 2G of young generation space with 
CMS/ParNew. We see a major GC every 30-60 minutes, depending on load.

Cache evictions will always be garbage in tenured space, so we cannot avoid 
major GCs. The oldest non-accessed objects are evicted, and those will almost 
certainly be tenured.

All this means that Solr puts a heavy burden on the GC. A combination of 
many short-lived allocations plus a steady flow of tenured garbage.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)





RE: Indexing nested documents giving back unrelated parents when asking for children

2017-01-26 Thread Fabien Renaud
Is anyone thinking this is a normal behavior? To get random children?
And to have a different (correct) behavior if you commit after sending each 
document?

Fabien



-Original Message-
From: Fabien Renaud [mailto:fabien.ren...@findwise.com] 
Sent: den 24 januari 2017 19:23
To: solr-user@lucene.apache.org
Subject: RE: Indexing nested documents giving back unrelated parents when 
asking for children

But the problem is already there with only two levels.

If I change the code to add document to Solr by the following:
   client1.add(doc1);
   client1.commit();
   client1.add(doc4);
   client1.commit();

Then things work as expected as I get the follwing result (as well as the 
correct parent-child relation between 1,2 and 4;5):
"docs": [
  {
"id": "2"
  },
  {
"id": "5"
  }
]

Fabien
-Original Message-
From: Mikhail Khludnev [mailto:gge...@gmail.com]
Sent: den 24 januari 2017 19:02
To: solr-user 
Subject: RE: Indexing nested documents giving back unrelated parents when 
asking for children

Fabien,
Giving this you have three levels can you update the sample code accordingly? I 
might already replied on such question earlier, iirc filter should enumerate 
all types beside of the certain one.

24 янв. 2017 г. 16:21 пользователь "Fabien Renaud" < 
fabien.ren...@findwise.com> написал:

I know it works as expected when I set type_s:up as you describe. But I was 
expecting no children at all in my query.

In my real query I have a document with several children and thus can't specify 
a specific type with childFilter. And I can't give back all children because 
some of them do not make any sense at all.
And the problem appears for an intermediate node (which has children and which 
itself a child of another).

Fabien

-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org]
Sent: den 24 januari 2017 14:06
To: solr-user 
Subject: Re: Indexing nested documents giving back unrelated parents when 
asking for children

Hello Fabien,

I believe parentFilter should be type_s:up, and consequently the type_s:up 
should go in fq.

On Tue, Jan 24, 2017 at 3:30 PM, Fabien Renaud 
wrote:

> Hello,
>
> I'm wondering if I missed something in my code (which uses solrj 6.3):
>
> public class Main {
>
> private SolrClient client1;
>
> public void run() {
> client1 = new
> HttpSolrClient.Builder("http://localhost:8983/solr
> ").build();
>
> SolrInputDocument doc1 = new SolrInputDocument();
>
> doc1.addField("id", "1");
> doc1.addField("type_s", "up");
> SolrInputDocument doc2 = new SolrInputDocument();
>
> doc2.addField("id", "2");
> doc2.addField("type_s", "down");
>
> doc1.addChildDocument(doc2);
>
> SolrInputDocument doc4 = new SolrInputDocument();
> doc4.addField("id", "4");
> doc4.addField("type_s", "up");
>
> SolrInputDocument doc5 = new SolrInputDocument();
> doc5.addField("id", "5");
> doc5.addField("type_s", "down");
>
> doc4.addChildDocument(doc5);
>
> try {
> client1.add("techproducts", Arrays.asList(doc1,doc4));
> } catch (Exception e) {
> System.out.println("Indexing failed" + e);
> }
> }
>
> If I start Solr 6.3 using bin/start start -e techproduct and ask the
> following:
>
> http://localhost:8983/solr/techproducts/select?fl=*,[
> child%20parentFilter=type_s:down]=type_s:down=on=*:*=js
> on
>
>
> then I get:
>
> {
>   "docs": [
> {
>   "id": "2",
>   "type_s": "down"
> },
> {
>   "id": "5",
>   "type_s": "down",
>   "_childDocuments_": [
> {
>   "id": "1",
>   "type_s": "up"
> }
>   ]
> }
>   ]
> }
>
> which seems to be a bug for me. Or did I miss something?
> Notice that the relations "2 is a child of 1" and "5 is a child of 4"
> are working fine. It's just that I get extra (unwanted and unrelated)
relations.
>
> Notice that at some point I manage to get back two documents with the 
> __same__ id (with different version). I'm not able to reproduce this 
> but I guess it could be related.
>
> Fabien
>
>


--
Sincerely yours
Mikhail Khludnev


Documents issue

2017-01-26 Thread KRIS MUSSHORN

Running the latest crawl from Nutch to SOLR 5.4.1 it seems that my copy fields 
do not work as expected anymore. 

 
 
 


Why would copyField ignore the default all of a sudden? 

I've not made any significant changes to SOLR and none at all to nutch. 
{
  "response":{"numFound":699,"start":0,"docs":[
  {
"metatag.doctype":"Articles",
"facet_metatag_doctype":"Articles"}, 
_snipped a bunch of articles _ 
{
"metatag.doctype":"Dispatches",
"facet_metatag_doctype":"Dispatches"}, 

_snipped a bunch of Dispatches_
  
  {
"metatag.doctype":"Other"},
  {
"metatag.doctype":"Other"},
  {
"metatag.doctype":"Other"},
  {
"metatag.doctype":"Other"},
  {
"metatag.doctype":"Other"} 

_snipped a bunch of Other_
  ]
  },
  "facet_counts":{
"facet_queries":{},
"facet_fields":{
  "facet_metatag_doctype":[
"Dispatches",38,
"Articles",33]},
"facet_dates":{},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}} 



Re: How to alter the facet query limit default

2017-01-26 Thread KRIS MUSSHORN

Alexandre, 
Thanks. 
I will refactor my schema to eliminate the period seperated values in a field 
names and try your suggestion. 
I'll let you know how it goes. 

Kris 
- Original Message -

From: "Alexandre Rafalovitch"  
To: "solr-user"  
Sent: Thursday, January 26, 2017 11:40:49 AM 
Subject: Re: How to alter the facet query limit default 

facet.limit? 
f..facet.limit? (not sure how that would work with field 
name that contains dots) 

Docs are at: https://cwiki.apache.org/confluence/display/solr/Faceting 

Regards, 
Alex. 
 
http://www.solr-start.com/ - Resources for Solr users, new and experienced 


On 26 January 2017 at 10:36, KRIS MUSSHORN  wrote: 
> SOLR 5.4.1 i am running a query with multiple facet fields. 
> _snip_ 
> select?q=*%3A*=metatag.date.prefix4+DESC=7910=metatag.date.prefix7=json=true=true=metatag.date.prefix7
>  =metatag.date.prefix4=metatag.doctype 
> 
> field metatag.date.prefix7 has way more facets than the default of 100. 
> 
> How would I set up solr, or modify my query, to ensure that the facets return 
> all values. 
> 
> TIA, 
> 
> Kris 
> 



Re: Upgrade SOLR version - facets perfomance regression

2017-01-26 Thread billnbell
Are you using docvalues ? Try that it might help.

Bill Bell
Sent from mobile


> On Jan 26, 2017, at 10:38 AM, Bhawna Asnani  wrote:
> 
> Hi,
> I am experiencing a similar issue. We have tried method uif but that didn't
> help much. There is still some performance degradation.
> Perhaps some underlying changes in the lucene version its using.
> 
> Will switching to JSON facet API help in this case? We have 5 nodes/single
> shard in our production setup.
> 
> On Tue, Jan 24, 2017 at 4:34 AM, alessandro.benedetti > wrote:
> 
>> Hi Solr,
>> I admit the issue you mentioned has not been transparently solved, and
>> indeed you would need to explicitly use the method=uif to get 4.10.1
>> behavior.
>> 
>> This is valid if you were using  fc/fcs approaches with high cardinality
>> fields.
>> 
>> In the case you facet method is enum ( Term Enumeration), the issue has
>> been
>> transparently solved (
>> https://issues.apache.org/jira/browse/SOLR-9176 )
>> 
>> Cheers
>> 
>> 
>> 
>> --
>> View this message in context: http://lucene.472066.n3.
>> nabble.com/Upgrade-SOLR-version-facets-perfomance-
>> regression-tp4315027p4315512.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>> 


Re: Solrcloud backup delete

2017-01-26 Thread Hrishikesh Gadre
Hi Johan,

Once the backup is created successfully, Solr does not play any role in
managing the backup copies and it is left up to the user. You may want to
build a script which maintains last N backup copies (and delete old ones).

If you end up building such script, see if you can submit a patch against
SOLR-9744 

Thanks
Hrishikesh


On Wed, Jan 25, 2017 at 11:33 PM, Johan Kooijman 
wrote:

> Hi all,
>
> I see I can easily create/restore backups of the entire collection
> https://cwiki.apache.org/confluence/display/solr/Collections+API.
>
> I now have a situation where these backups fill up a disk, so I need to get
> rid of some. On the URL above I don't see an API call to delete a backup.
>
> What whould be the preferred method for deleting these old backups?
>
> --
> Met vriendelijke groeten / With kind regards,
> Johan Kooijman
>


Re: Upgrade SOLR version - facets perfomance regression

2017-01-26 Thread Bhawna Asnani
Hi,
I am experiencing a similar issue. We have tried method uif but that didn't
help much. There is still some performance degradation.
Perhaps some underlying changes in the lucene version its using.

Will switching to JSON facet API help in this case? We have 5 nodes/single
shard in our production setup.

On Tue, Jan 24, 2017 at 4:34 AM, alessandro.benedetti  wrote:

> Hi Solr,
> I admit the issue you mentioned has not been transparently solved, and
> indeed you would need to explicitly use the method=uif to get 4.10.1
> behavior.
>
> This is valid if you were using  fc/fcs approaches with high cardinality
> fields.
>
> In the case you facet method is enum ( Term Enumeration), the issue has
> been
> transparently solved (
> https://issues.apache.org/jira/browse/SOLR-9176 )
>
> Cheers
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Upgrade-SOLR-version-facets-perfomance-
> regression-tp4315027p4315512.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to alter the facet query limit default

2017-01-26 Thread Alexandre Rafalovitch
facet.limit?
f..facet.limit? (not sure how that would work with field
name that contains dots)

Docs are at: https://cwiki.apache.org/confluence/display/solr/Faceting

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 26 January 2017 at 10:36, KRIS MUSSHORN  wrote:
> SOLR 5.4.1 i am running a query with multiple facet fields.
> _snip_ 
> select?q=*%3A*=metatag.date.prefix4+DESC=7910=metatag.date.prefix7=json=true=true=metatag.date.prefix7
>  =metatag.date.prefix4=metatag.doctype
>
> field metatag.date.prefix7 has way more facets than the default of 100.
>
> How would I set up solr, or modify my query, to ensure that the facets return 
> all values.
>
> TIA,
>
> Kris
>


Re: After migrating to SolrCloud

2017-01-26 Thread Alexandre Rafalovitch
If you can't figure it out, you can dynamically change log status for
the particular package and/or you could enable access logs to see the
request.

Or run Wireshark on the network and see what's going on (when you have
a hammer!!!).

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 26 January 2017 at 10:51, Erick Erickson  wrote:
> You probably have an old HTTP jar somewhere in your classpath that's
> being found first. Or you have some client somewhere using an old HTTP
> version.
>
> Best,
> Erick
>
> On Thu, Jan 26, 2017 at 7:49 AM, marotosg  wrote:
>> Hi All,
>> I have migrated Solr from older versio 3.6 to SolrCloud 6.2 and all good but
>> there are almost every second some WARN messages in the logs.
>>
>> HttpParser
>> bad HTTP parsed: 400 HTTP/0.9 not supported for
>> HttpChannelOverHttp@16a84451{r=0,c=false,a=IDLE,uri=null}
>>
>> Anynone knows where are these coming from?
>>
>> Thanks
>>
>>
>>
>>
>> --
>> View this message in context: 
>> http://lucene.472066.n3.nabble.com/After-migrating-to-SolrCloud-tp4315943.html
>> Sent from the Solr - User mailing list archive at Nabble.com.


Re: After migrating to SolrCloud

2017-01-26 Thread Erick Erickson
You probably have an old HTTP jar somewhere in your classpath that's
being found first. Or you have some client somewhere using an old HTTP
version.

Best,
Erick

On Thu, Jan 26, 2017 at 7:49 AM, marotosg  wrote:
> Hi All,
> I have migrated Solr from older versio 3.6 to SolrCloud 6.2 and all good but
> there are almost every second some WARN messages in the logs.
>
> HttpParser
> bad HTTP parsed: 400 HTTP/0.9 not supported for
> HttpChannelOverHttp@16a84451{r=0,c=false,a=IDLE,uri=null}
>
> Anynone knows where are these coming from?
>
> Thanks
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/After-migrating-to-SolrCloud-tp4315943.html
> Sent from the Solr - User mailing list archive at Nabble.com.


After migrating to SolrCloud

2017-01-26 Thread marotosg
Hi All,
I have migrated Solr from older versio 3.6 to SolrCloud 6.2 and all good but
there are almost every second some WARN messages in the logs. 

HttpParser
bad HTTP parsed: 400 HTTP/0.9 not supported for
HttpChannelOverHttp@16a84451{r=0,c=false,a=IDLE,uri=null}

Anynone knows where are these coming from?

Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/After-migrating-to-SolrCloud-tp4315943.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to alter the facet query limit default

2017-01-26 Thread KRIS MUSSHORN
SOLR 5.4.1 i am running a query with multiple facet fields. 
_snip_ 
select?q=*%3A*=metatag.date.prefix4+DESC=7910=metatag.date.prefix7=json=true=true=metatag.date.prefix7
 =metatag.date.prefix4=metatag.doctype 

field metatag.date.prefix7 has way more facets than the default of 100. 

How would I set up solr, or modify my query, to ensure that the facets return 
all values. 

TIA, 

Kris 



Re: no dataimport-handler defined!

2017-01-26 Thread Shawn Heisey
On 1/26/2017 7:44 AM, Chris Rogers wrote:
> Just tested the DIH example in 6.4 (bin/solr -e dih)
>
> Getting the same “No dataimport-handler defined!” for every one of the cores 
> installed as part of the example.

Repeating a reply already posted elsewhere on this thread:

It's a bug.

https://issues.apache.org/jira/browse/SOLR-10035

Easy enough to fix manually, hopefully 6.4.1 will work out of the box.

Thanks,
Shawn



Re: no dataimport-handler defined!

2017-01-26 Thread Alexandre Rafalovitch
Chris,

Shawn has already provided a workaround and a JIRA reference earlier
in this thread. Could you review his message and see if his solution
solves it for you. There might be a 6.4.1 soon and it will be fixed
there as well.

Regards,
   Alex

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 26 January 2017 at 09:44, Chris Rogers
 wrote:
> Hi Alex,
>
> Just tested the DIH example in 6.4 (bin/solr -e dih)
>
> Getting the same “No dataimport-handler defined!” for every one of the cores 
> installed as part of the example.
>
> Cheers,
> Chris
>
>
> On 24/01/2017, 15:07, "Alexandre Rafalovitch"  wrote:
>
> Strange.
>
> If you run a pre-built DIH example, do any of the cores work? (not the
> RSS one, that is broken anyway).
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 24 January 2017 at 08:32, Chris Rogers
>  wrote:
> > Hi Alex,
> >
> > I’m editing the solrconfig.xml file at /solr/server/solr/tei_config (ie 
> the one generated from the configset when the node was created).
> >
> > I’m running standalone, not cloud.
> >
> > I’m restarting sole after every change. Do I need to reload the core 
> instead of restarting?
> >
> > I’ve also tried replacing the relative path to the .jar with an 
> absolute path to the dist directory. Still didn’t work.
> >
> > Thanks,
> > Chris
> >
> > On 24/01/2017, 13:20, "Alexandre Rafalovitch"  
> wrote:
> >
> > Which solrconfig.xml are you editing and what kind of Solr install 
> are
> > you running (cloud?). And did you reload the core.
> >
> > I suspect you are not editing the file that is actually in use. For
> > example, if you are running a cloud setup, the solrconfig.xml on the
> > filesystem is disconnected from the config actually in use that is
> > stored in ZooKeeper. You would need to reupload it for change to 
> take
> > effect.
> >
> > You also may need to reload the core for changes to take effect.
> >
> > Regards,
> >Alex.
> > 
> > http://www.solr-start.com/ - Resources for Solr users, new and 
> experienced
> >
> >
> > On 24 January 2017 at 07:43, Chris Rogers
> >  wrote:
> > > Hi all,
> > >
> > > Having frustrating issues with getting SOLR 6.4.0 to recognize 
> the existence of my DIH config. I’m using Oracle Java8 jdk on Ubuntu 14.04.
> > >
> > > The DIH .jar file appears to be loading correctly. There are no 
> errors in the SOLR logs. It just says “Sorry, no dataimport-handler defined” 
> in the SOLR admin UI.
> > >
> > > My config files are listed below. Can anyone spot any mistakes 
> here?
> > >
> > > Many thanks,
> > > Chris
> > >
> > > # solrconfig.xml ##
> > >
> > >regex=".*dataimporthandler-.*\.jar" />
> > >
> > > …
> > >
> > >class="org.apache.solr.handler.dataimport.DataImportHandler">
> > > 
> > >   DIH-data-config.xml
> > > 
> > >   
> > >
> > > # DIH-data-config.xml (in the same dir as solrconfig.xml) 
> ##
> > >
> > > 
> > >   
> > >   
> > > 
> > >  > > fileName=".*xml"
> > > newerThan="'NOW-5YEARS'"
> > > recursive="true"
> > > rootEntity="false"
> > > dataSource="null"
> > > 
> baseDir="/home/bodl-tei-svc/sites/bodl-tei-svc/var/data/tolkein_tei">
> > >
> > >   
> > >
> > >> >   forEach="/TEI" url="${f.fileAbsolutePath}" 
> transformer="RegexTransformer" >
> > >  xpath="/TEI/teiHeader/fileDesc/titleStmt/title"/>
> > >  xpath="/TEI/teiHeader/fileDesc/publicationStmt/publisher"/>
> > >  xpath="/TEI/teiHeader/fileDesc/sourceDesc/msDesc/msIdentifier/altIdentifier/idno"/>
> > >   
> > >
> > > 
> > >
> > >   
> > > 
> > >
> > >
> > > --
> > > Chris Rogers
> > > Digital Projects Manager
> > > Bodleian Digital Library Systems and Services
> > > chris.rog...@bodleian.ox.ac.uk
> >
> >
>
>


Re: no dataimport-handler defined!

2017-01-26 Thread Chris Rogers
Hi Alex,

Just tested the DIH example in 6.4 (bin/solr -e dih)

Getting the same “No dataimport-handler defined!” for every one of the cores 
installed as part of the example.

Cheers,
Chris


On 24/01/2017, 15:07, "Alexandre Rafalovitch"  wrote:

Strange.

If you run a pre-built DIH example, do any of the cores work? (not the
RSS one, that is broken anyway).

Regards,
   Alex.

http://www.solr-start.com/ - Resources for Solr users, new and experienced


On 24 January 2017 at 08:32, Chris Rogers
 wrote:
> Hi Alex,
>
> I’m editing the solrconfig.xml file at /solr/server/solr/tei_config (ie 
the one generated from the configset when the node was created).
>
> I’m running standalone, not cloud.
>
> I’m restarting sole after every change. Do I need to reload the core 
instead of restarting?
>
> I’ve also tried replacing the relative path to the .jar with an absolute 
path to the dist directory. Still didn’t work.
>
> Thanks,
> Chris
>
> On 24/01/2017, 13:20, "Alexandre Rafalovitch"  wrote:
>
> Which solrconfig.xml are you editing and what kind of Solr install are
> you running (cloud?). And did you reload the core.
>
> I suspect you are not editing the file that is actually in use. For
> example, if you are running a cloud setup, the solrconfig.xml on the
> filesystem is disconnected from the config actually in use that is
> stored in ZooKeeper. You would need to reupload it for change to take
> effect.
>
> You also may need to reload the core for changes to take effect.
>
> Regards,
>Alex.
> 
> http://www.solr-start.com/ - Resources for Solr users, new and 
experienced
>
>
> On 24 January 2017 at 07:43, Chris Rogers
>  wrote:
> > Hi all,
> >
> > Having frustrating issues with getting SOLR 6.4.0 to recognize the 
existence of my DIH config. I’m using Oracle Java8 jdk on Ubuntu 14.04.
> >
> > The DIH .jar file appears to be loading correctly. There are no 
errors in the SOLR logs. It just says “Sorry, no dataimport-handler defined” in 
the SOLR admin UI.
> >
> > My config files are listed below. Can anyone spot any mistakes here?
> >
> > Many thanks,
> > Chris
> >
> > # solrconfig.xml ##
> >
> >   
> >
> > …
> >
> >   
> > 
> >   DIH-data-config.xml
> > 
> >   
> >
> > # DIH-data-config.xml (in the same dir as solrconfig.xml) ##
> >
> > 
> >   
> >   
> > 
> >  > fileName=".*xml"
> > newerThan="'NOW-5YEARS'"
> > recursive="true"
> > rootEntity="false"
> > dataSource="null"
> > 
baseDir="/home/bodl-tei-svc/sites/bodl-tei-svc/var/data/tolkein_tei">
> >
> >   
> >
> >>   forEach="/TEI" url="${f.fileAbsolutePath}" 
transformer="RegexTransformer" >
> > 
> > 
> > 
> >   
> >
> > 
> >
> >   
> > 
> >
> >
> > --
> > Chris Rogers
> > Digital Projects Manager
> > Bodleian Digital Library Systems and Services
> > chris.rog...@bodleian.ox.ac.uk
>
>




Re: Solr Cloud - How to maintain the addresses of the zookeeper servers

2017-01-26 Thread Shawn Heisey
On 1/26/2017 6:30 AM, David Michael Gang wrote:
> I want to set up a solr cloud with x nodes and have 3 zookeepers servers.
> As i understand the following parties need to know all zookeeper servers:
> * All zookeeper servers
> * All solr cloud nodes
> * All solr4j cloud smart clients
>
> So let's say if i make it hard coded and then want to add 2 zookeeper
> nodes, I would have to update many places. This makes it hard to maintain
> it.
> How do you manage this? Is there a possibility to get the list of zookeeper
> services dynamically? Any other idea?

ZK 3.5, which is currently in ALPHA state, will have dynamic cluster
membership.  This will mean that you can add/remove servers and
everything that is connected will adjust automatically.  All clients
(like Solr) and servers will need to be updated to the new version. 
Currently Solr uses ZK 3.4.6.

Even after Solr is updated to a 3.5 version of ZK and the code gets any
changes that might be required, the list of servers that Solr and SolrJ
clients *start* with might still need manual adjustment.  After the
upgrade to 3.5, if at least one of the servers in the start list is
correct, the rest might be discovered, but it's better to keep that list
current.

Thanks,
Shawn



Re: Solr Cloud - How to maintain the addresses of the zookeeper servers

2017-01-26 Thread David Michael Gang
Hi Markus and Jan,

Thanks for the quick response and good ideas.
I will look for the puppet direction. We already use puppet, so this is
easy to add

Thanks a lot,
David

On Thu, Jan 26, 2017 at 3:38 PM Markus Jelsma 
wrote:

> Or you can administate the nodes via configuration management  software
> such as Salt, Puppet, etc. If we add a Zookeeper to our list of Zookeepers,
> it is automatically updated in solr.in.sh file on all nodes and separate
> clusters.
>
> If you're looking for easy maintenance that is :)
>
> Markus
>
> -Original message-
> > From:Jan Høydahl 
> > Sent: Thursday 26th January 2017 14:34
> > To: solr-user@lucene.apache.org
> > Subject: Re: Solr Cloud - How to maintain the addresses of the zookeeper
> servers
> >
> > Hi,
> >
> > Hardcoding your zk server addresses is a key factor to stability in your
> cluster.
> > If this was some kind of magic, and the magic failed, EVERYTHING would
> come to a halt :)
> > And since changing ZK is something you do very seldom, I think it is not
> too hard to
> >
> > 1. push new solr.in.sh file to all nodes
> > 2. restart all ndoes
> >
> > --
> > Jan Høydahl, search solution architect
> > Cominvent AS - www.cominvent.com
> >
> > > 26. jan. 2017 kl. 14.30 skrev David Michael Gang <
> michaelg...@gmail.com>:
> > >
> > > Hi all,
> > >
> > > I want to set up a solr cloud with x nodes and have 3 zookeepers
> servers.
> > > As i understand the following parties need to know all zookeeper
> servers:
> > > * All zookeeper servers
> > > * All solr cloud nodes
> > > * All solr4j cloud smart clients
> > >
> > > So let's say if i make it hard coded and then want to add 2 zookeeper
> > > nodes, I would have to update many places. This makes it hard to
> maintain
> > > it.
> > > How do you manage this? Is there a possibility to get the list of
> zookeeper
> > > services dynamically? Any other idea?
> > > I wanted to hear from your expereince how to achieve this task
> effectively.
> > >
> > > Thanks,
> > > David
> >
> >
>


RE: Solr Cloud - How to maintain the addresses of the zookeeper servers

2017-01-26 Thread Markus Jelsma
Or you can administate the nodes via configuration management  software such as 
Salt, Puppet, etc. If we add a Zookeeper to our list of Zookeepers, it is 
automatically updated in solr.in.sh file on all nodes and separate clusters.

If you're looking for easy maintenance that is :)

Markus
 
-Original message-
> From:Jan Høydahl 
> Sent: Thursday 26th January 2017 14:34
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cloud - How to maintain the addresses of the zookeeper 
> servers
> 
> Hi,
> 
> Hardcoding your zk server addresses is a key factor to stability in your 
> cluster.
> If this was some kind of magic, and the magic failed, EVERYTHING would come 
> to a halt :)
> And since changing ZK is something you do very seldom, I think it is not too 
> hard to
> 
> 1. push new solr.in.sh file to all nodes
> 2. restart all ndoes
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> > 26. jan. 2017 kl. 14.30 skrev David Michael Gang :
> > 
> > Hi all,
> > 
> > I want to set up a solr cloud with x nodes and have 3 zookeepers servers.
> > As i understand the following parties need to know all zookeeper servers:
> > * All zookeeper servers
> > * All solr cloud nodes
> > * All solr4j cloud smart clients
> > 
> > So let's say if i make it hard coded and then want to add 2 zookeeper
> > nodes, I would have to update many places. This makes it hard to maintain
> > it.
> > How do you manage this? Is there a possibility to get the list of zookeeper
> > services dynamically? Any other idea?
> > I wanted to hear from your expereince how to achieve this task effectively.
> > 
> > Thanks,
> > David
> 
> 


Re: Solr Cloud - How to maintain the addresses of the zookeeper servers

2017-01-26 Thread Jan Høydahl
Hi,

Hardcoding your zk server addresses is a key factor to stability in your 
cluster.
If this was some kind of magic, and the magic failed, EVERYTHING would come to 
a halt :)
And since changing ZK is something you do very seldom, I think it is not too 
hard to

1. push new solr.in.sh file to all nodes
2. restart all ndoes

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 26. jan. 2017 kl. 14.30 skrev David Michael Gang :
> 
> Hi all,
> 
> I want to set up a solr cloud with x nodes and have 3 zookeepers servers.
> As i understand the following parties need to know all zookeeper servers:
> * All zookeeper servers
> * All solr cloud nodes
> * All solr4j cloud smart clients
> 
> So let's say if i make it hard coded and then want to add 2 zookeeper
> nodes, I would have to update many places. This makes it hard to maintain
> it.
> How do you manage this? Is there a possibility to get the list of zookeeper
> services dynamically? Any other idea?
> I wanted to hear from your expereince how to achieve this task effectively.
> 
> Thanks,
> David



Solr Cloud - How to maintain the addresses of the zookeeper servers

2017-01-26 Thread David Michael Gang
Hi all,

I want to set up a solr cloud with x nodes and have 3 zookeepers servers.
As i understand the following parties need to know all zookeeper servers:
* All zookeeper servers
* All solr cloud nodes
* All solr4j cloud smart clients

So let's say if i make it hard coded and then want to add 2 zookeeper
nodes, I would have to update many places. This makes it hard to maintain
it.
How do you manage this? Is there a possibility to get the list of zookeeper
services dynamically? Any other idea?
I wanted to hear from your expereince how to achieve this task effectively.

Thanks,
David


Re: Pass Analyzed Field to SignatureUpdateProcessorFactory

2017-01-26 Thread Leonidas Zagkaretos
Finally, I was able to implement desirable behavior using your suggestions
as follows:

- Added StatelessScriptUpdateProcessorFactory before
SignatureUpdateProcessorFactory in order to analyze "field1" and set
analyzed value to "field1_tmp_ss"
- Passed "field1_tmp_ss" to SignatureUpdateProcessorFactory
- Used IgnoreFieldUpdateProcessorFactory to ignore "field1_tmp_ss" from
document stored

Everything seems to work fine and as expected.

Thank you very much,
Have a nice day,

Leonidas

2017-01-25 19:19 GMT+02:00 Alexandre Rafalovitch :

> It might be possible by sticking additional update request processors
> before the signature one. For example clone field, regex instead of
> tokenizing on the clone, then signature. If a clone is too much of a
> burden, it may even be possible to then add IgnoreField URP to remove
> it or map it in the schema to index/store/docValues=false field.
>
> Regards,
>Alex.
> P.s. The full all-in-one list of URPs is available at:
> http://www.solr-start.com/info/update-request-processors/
>
> 
> http://www.solr-start.com/ - Resources for Solr users, new and experienced
>
>
> On 25 January 2017 at 12:00, Markus Jelsma 
> wrote:
> > Hello,
> >
> > This is not possible out of the box, you would need to manually pass the
> input through an analyzer with a tokenizer and your steming token filter,
> and put the output together again.
> >
> > Markus
> >
> >
> >
> > -Original message-
> >> From:Leonidas Zagkaretos 
> >> Sent: Wednesday 25th January 2017 17:51
> >> To: solr-user@lucene.apache.org
> >> Subject: Pass Analyzed Field to SignatureUpdateProcessorFactory
> >>
> >> Hi all,
> >>
> >> We have successfully integrated Solr in our application, and now we are
> >> facing a requirement where the application should be able to search for
> >> duplicate records in Solr core based on equality in 3 distinct fields.
> >>
> >> Tried using SignatureUpdateProcessorFactory as described in
> >> https://cwiki.apache.org/confluence/display/solr/De-Duplication and
> >> Lookup3Signature and everything seems to work fine, signature field is
> >> being filled with unique hash values.
> >>
> >> One issue we have, is that we need to pass to
> >> SignatureUpdateProcessorFactory the stemmed value of 1 of 3 fields.
> >> Currenty, the following documents produce different hash values, and we
> >> need them to produce unique.
> >> Analysis for field1 and values "value1_a" and "value1_b" produce stemmed
> >> value "value1"
> >>
> >> documentA: {
> >> field1: value1_a,
> >> field2: value2,
> >> field3: value3,
> >> signature: hash_value1
> >> }
> >>
> >> documentB: {
> >> field1: value1_b,
> >> field2: value2,
> >> field3: value3,
> >> signature: hash_value2
> >> }
> >>
> >> I would like to ask whether it is possible to have required behavior,
> and
> >> some tips about how to accomplish this task.
> >>
> >> Thank you in advance,
> >>
> >> Leonidas
> >>
>