Re: SolrCloud Replication Failure

2018-10-31 Thread Kevin Risden
I haven't dug into why this is happening but it definitely reproduces. I
removed the local requirements (port mapping and such) from the gist you
posted (very helpful). I confirmed this fails locally and on Travis CI.

https://github.com/risdenk/test-solr-start-stop-replica-consistency

I don't even see the first update getting applied from num 10 -> 20. After
the first update there is no more change.

Kevin Risden


On Wed, Oct 31, 2018 at 8:26 PM Jeremy Smith  wrote:

> Thanks Erick, this is 7.5.0.
> 
> From: Erick Erickson 
> Sent: Wednesday, October 31, 2018 8:20:18 PM
> To: solr-user
> Subject: Re: SolrCloud Replication Failure
>
> What version of solr? This code was pretty much rewriten in 7.3 IIRC
>
> On Wed, Oct 31, 2018, 10:47 Jeremy Smith 
> > Hi all,
> >
> >  We are currently running a moderately large instance of standalone
> > solr and are preparing to switch to solr cloud to help us scale up.  I
> have
> > been running a number of tests using docker locally and ran into an issue
> > where replication is consistently failing.  I have pared down the test
> case
> > as minimally as I could.  Here's a link for the docker-compose.yml (I put
> > it in a directory called solrcloud_simple) and a script to run the test:
> >
> >
> > https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
> >
> >
> > Here's the basic idea behind the test:
> >
> >
> > 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2
> > replicas (each node gets a replica).  Just use the default schema,
> although
> > I've also tried our schema and got the same result.
> >
> >
> > 2) Shut down solr-2
> >
> >
> > 3) Add 100 simple docs, just id and a field called num.
> >
> >
> > 4) Start solr-2 and check that it received the documents.  It did!
> >
> >
> > 5) Update a document, commit, and check that solr-2 received the update.
> > It did!
> >
> >
> > 6) Stop solr-2, update the same document, start solr-2, and make sure
> that
> > it received the update.  It did!
> >
> >
> > 7) Repeat step 6 with a new value.  This time solr-2 reverts back to what
> > it had in step 5.
> >
> >
> > I believe the main issue comes from this in the logs:
> >
> >
> > solr-2_1  | 2018-10-31 17:04:26.135 INFO
> > (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr
> > x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test s:shard1
> > r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync:
> > core=test_shard1_replica_n2 url=http://solr-2:8082/solr  Our versions
> are
> > newer. ourHighThreshold=1615861330901729280
> > otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280
> > otherHighest=1615861335081353216
> >
> > PeerSync thinks the versions on solr-2 are newer for some reason, so it
> > doesn't try to sync from solr-1.  In the final state, solr-2 will always
> > have a lower version for the updated doc than solr-1.  I've tried this
> with
> > different commit strategies, both auto and manual, and it doesn't seem to
> > make any difference.
> >
> > Is this a bug with solr, an issue with using docker, or am I just
> > expecting too much from solr?
> >
> > Thanks for any insights you may have,
> >
> > Jeremy
> >
> >
> >
>


Re: SolrCloud Replication Failure

2018-10-31 Thread Jeremy Smith
Thanks Erick, this is 7.5.0.

From: Erick Erickson 
Sent: Wednesday, October 31, 2018 8:20:18 PM
To: solr-user
Subject: Re: SolrCloud Replication Failure

What version of solr? This code was pretty much rewriten in 7.3 IIRC

On Wed, Oct 31, 2018, 10:47 Jeremy Smith  Hi all,
>
>  We are currently running a moderately large instance of standalone
> solr and are preparing to switch to solr cloud to help us scale up.  I have
> been running a number of tests using docker locally and ran into an issue
> where replication is consistently failing.  I have pared down the test case
> as minimally as I could.  Here's a link for the docker-compose.yml (I put
> it in a directory called solrcloud_simple) and a script to run the test:
>
>
> https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
>
>
> Here's the basic idea behind the test:
>
>
> 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2
> replicas (each node gets a replica).  Just use the default schema, although
> I've also tried our schema and got the same result.
>
>
> 2) Shut down solr-2
>
>
> 3) Add 100 simple docs, just id and a field called num.
>
>
> 4) Start solr-2 and check that it received the documents.  It did!
>
>
> 5) Update a document, commit, and check that solr-2 received the update.
> It did!
>
>
> 6) Stop solr-2, update the same document, start solr-2, and make sure that
> it received the update.  It did!
>
>
> 7) Repeat step 6 with a new value.  This time solr-2 reverts back to what
> it had in step 5.
>
>
> I believe the main issue comes from this in the logs:
>
>
> solr-2_1  | 2018-10-31 17:04:26.135 INFO
> (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr
> x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test s:shard1
> r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync:
> core=test_shard1_replica_n2 url=http://solr-2:8082/solr  Our versions are
> newer. ourHighThreshold=1615861330901729280
> otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280
> otherHighest=1615861335081353216
>
> PeerSync thinks the versions on solr-2 are newer for some reason, so it
> doesn't try to sync from solr-1.  In the final state, solr-2 will always
> have a lower version for the updated doc than solr-1.  I've tried this with
> different commit strategies, both auto and manual, and it doesn't seem to
> make any difference.
>
> Is this a bug with solr, an issue with using docker, or am I just
> expecting too much from solr?
>
> Thanks for any insights you may have,
>
> Jeremy
>
>
>


Re: hdfs - documents missing after hard poweroff

2018-10-31 Thread Kevin Risden
Also do you have auto add replicas turned on for these collections over
HDFS?

Kevin Risden


On Wed, Oct 31, 2018 at 8:20 PM Kevin Risden  wrote:

> So I'm definitely curious what is going on here.
>
> Are you still able to reproduce this? Can you check if files have been
> modified on HDFS? I'd be curious if tlogs or the index is changing
> underneath for the different restarts. Since there is no new indexing I
> would guess not but something to check.
>
> Can you run check index on the index to make sure its not corrupt when you
> don't get the full result set.
>
> Kevin Risden
>
>
> On Tue, Oct 16, 2018 at 10:23 AM Kyle Fransham 
> wrote:
>
>> Hi,
>>
>> Sometimes after a full poweroff of the solr cloud nodes, we see missing
>> documents from the index. Is there anything about our setup or our
>> recovery
>> procedure that could cause this? Details are below:
>>
>> We see the following (somewhat random) behaviour:
>>
>>  - add 10 documents to index. Commit.
>>  - query for all documents - 10 documents returned.
>>  - restart all solr nodes and reset the collection (procedure is below).
>>  - query for all  documents 10 documents returned.
>>  - restart+reset all again. - sometimes 7, 8, 9, or 10 documents returned.
>>
>> To summarize, after a full reboot of all the solr nodes, we are finding
>> that (sometimes) not all documents are in the index. This situation
>> doesn't
>> remedy itself by waiting. Restarting all will sometimes re-add them,
>> sometimes not.
>>
>> Our procedure for recovering from a hard poweroff is:
>>  - manually delete all *.lock files from the index folders on hdfs.
>>  - fully delete the znode from zookeeper.
>>  - re-add an empty znode in zookeeper.
>>  - start up all solr nodes.
>>  - re-add the configset.
>>  - re-issue the collection create command.
>>
>> After doing the above, we find that we are able to see all of the files in
>> the index about 60% of the time. Other times, we are missing some
>> documents.
>>
>> Some other things about our environment:
>>  - we're doing this test with 1 collection that has 18 shards distributed
>> across 3 solr cloud nodes.
>>  - solr version 7.5.0
>>  - hdfs is not running on the solr nodes, and is not being restarted.
>>
>> Any thoughts or tips are greatly appreciated,
>>
>> Kyle
>>
>> --
>> CONFIDENTIALITY NOTICE: The information contained in this email is
>> privileged and confidential and intended only for the use of the
>> individual
>> or entity to whom it is addressed.   If you receive this message in
>> error,
>> please notify the sender immediately at 613-729-1100 and destroy the
>> original message and all copies. Thank you.
>>
>


Re: hdfs - documents missing after hard poweroff

2018-10-31 Thread Kevin Risden
So I'm definitely curious what is going on here.

Are you still able to reproduce this? Can you check if files have been
modified on HDFS? I'd be curious if tlogs or the index is changing
underneath for the different restarts. Since there is no new indexing I
would guess not but something to check.

Can you run check index on the index to make sure its not corrupt when you
don't get the full result set.

Kevin Risden


On Tue, Oct 16, 2018 at 10:23 AM Kyle Fransham 
wrote:

> Hi,
>
> Sometimes after a full poweroff of the solr cloud nodes, we see missing
> documents from the index. Is there anything about our setup or our recovery
> procedure that could cause this? Details are below:
>
> We see the following (somewhat random) behaviour:
>
>  - add 10 documents to index. Commit.
>  - query for all documents - 10 documents returned.
>  - restart all solr nodes and reset the collection (procedure is below).
>  - query for all  documents 10 documents returned.
>  - restart+reset all again. - sometimes 7, 8, 9, or 10 documents returned.
>
> To summarize, after a full reboot of all the solr nodes, we are finding
> that (sometimes) not all documents are in the index. This situation doesn't
> remedy itself by waiting. Restarting all will sometimes re-add them,
> sometimes not.
>
> Our procedure for recovering from a hard poweroff is:
>  - manually delete all *.lock files from the index folders on hdfs.
>  - fully delete the znode from zookeeper.
>  - re-add an empty znode in zookeeper.
>  - start up all solr nodes.
>  - re-add the configset.
>  - re-issue the collection create command.
>
> After doing the above, we find that we are able to see all of the files in
> the index about 60% of the time. Other times, we are missing some
> documents.
>
> Some other things about our environment:
>  - we're doing this test with 1 collection that has 18 shards distributed
> across 3 solr cloud nodes.
>  - solr version 7.5.0
>  - hdfs is not running on the solr nodes, and is not being restarted.
>
> Any thoughts or tips are greatly appreciated,
>
> Kyle
>
> --
> CONFIDENTIALITY NOTICE: The information contained in this email is
> privileged and confidential and intended only for the use of the
> individual
> or entity to whom it is addressed.   If you receive this message in error,
> please notify the sender immediately at 613-729-1100 and destroy the
> original message and all copies. Thank you.
>


Re: SolrCloud Replication Failure

2018-10-31 Thread Erick Erickson
What version of solr? This code was pretty much rewriten in 7.3 IIRC

On Wed, Oct 31, 2018, 10:47 Jeremy Smith  Hi all,
>
>  We are currently running a moderately large instance of standalone
> solr and are preparing to switch to solr cloud to help us scale up.  I have
> been running a number of tests using docker locally and ran into an issue
> where replication is consistently failing.  I have pared down the test case
> as minimally as I could.  Here's a link for the docker-compose.yml (I put
> it in a directory called solrcloud_simple) and a script to run the test:
>
>
> https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489
>
>
> Here's the basic idea behind the test:
>
>
> 1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2
> replicas (each node gets a replica).  Just use the default schema, although
> I've also tried our schema and got the same result.
>
>
> 2) Shut down solr-2
>
>
> 3) Add 100 simple docs, just id and a field called num.
>
>
> 4) Start solr-2 and check that it received the documents.  It did!
>
>
> 5) Update a document, commit, and check that solr-2 received the update.
> It did!
>
>
> 6) Stop solr-2, update the same document, start solr-2, and make sure that
> it received the update.  It did!
>
>
> 7) Repeat step 6 with a new value.  This time solr-2 reverts back to what
> it had in step 5.
>
>
> I believe the main issue comes from this in the logs:
>
>
> solr-2_1  | 2018-10-31 17:04:26.135 INFO
> (recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr
> x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test s:shard1
> r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync:
> core=test_shard1_replica_n2 url=http://solr-2:8082/solr  Our versions are
> newer. ourHighThreshold=1615861330901729280
> otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280
> otherHighest=1615861335081353216
>
> PeerSync thinks the versions on solr-2 are newer for some reason, so it
> doesn't try to sync from solr-1.  In the final state, solr-2 will always
> have a lower version for the updated doc than solr-1.  I've tried this with
> different commit strategies, both auto and manual, and it doesn't seem to
> make any difference.
>
> Is this a bug with solr, an issue with using docker, or am I just
> expecting too much from solr?
>
> Thanks for any insights you may have,
>
> Jeremy
>
>
>


RE: Overseer could not get tags

2018-10-31 Thread Vadim Ivanov
Hi, Chris
I had the same messages in solr log while testing 7.4 and 7.5
The only remedy I've found - increasing header size:
/opt/solr/server/etc/jetty.xml


After solr restart - no more annoying messages

> -Original Message-
> From: Chris Ulicny [mailto:culicny@iq.media]
> Sent: Wednesday, October 31, 2018 7:40 PM
> To: solr-user
> Subject: Re: Overseer could not get tags
> 
> I've managed to replicate this issue with the 7.5.0 release as well by
> starting up a single instance of solr in cloud mode (on windows) and
> uploading the security.json file below to it.
> 
> After a short while, the "could not get tags from node..." messages start
> coming through every 60 seconds. The accompanying logged error and
> expecting stacktrace are also included below.
> 
> Is there a JIRA ticket for this issue (or a directly related one)? I
> couldn't seem to find one.
> 
> Thanks,
> Chris
> 
> *security.json:*
> {
> "authentication":{"blockUnknown":true,"class":"solr.BasicAuthPlugin",
> "credentials":{
> "solradmin":"...",
> "solrreader":"...",
> "solrwriter":"..."}
> },
> "authorization":{"class":"solr.RuleBasedAuthorizationPlugin",
> "permissions":[
> {"name":"read","role":"reader"},
> {"name":"security-read","role":"reader"},
> {"name":"schema-read","role":"reader"},
> {"name":"config-read","role":"reader"},
> {"name":"core-admin-read","role":"reader"},
> {"name":"collection-admin-read","role":"reader"},
> {"name":"update","role":"writer"},
> {"name":"security-edit","role":"admin"},
> {"name":"schema-edit","role":"admin"},
> {"name":"config-edit","role":"admin"},
> {"name":"core-admin-edit","role":"admin"},
> {"name":"collection-admin-edit","role":"admin"},
> {"name":"all","role":"admin"}],
> "user-role":{
> "solradmin":["reader","writer","admin"],
> "solrreader":["reader"],
> "solrwriter":["reader","writer"]}
> }
> }
> 
> *StackTrace:*
> 2018-10-31 16:20:01.994 WARN  (MetricsHistoryHandler-12-thread-1) [   ]
> o.a.s.c.s.i.SolrClientNodeStateProvider could not get tags from node
> ip:8080_solr
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at http://ip:8080/solr: Expected mime type
> application/octet-stream but got text/html. 
> 
> 
> Error 401 require authentication
> 
> HTTP ERROR 401
> Problem accessing /solr/admin/metrics. Reason:
> require authentication
> 
> 
> 
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.ja
> va:607)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$ClientSnitchCtx.in
> voke(SolrClientNodeStateProvider.java:342)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchReplicaMetri
> cs(SolrClientNodeStateProvider.java:195)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$AutoScalingSnitc
> h.getRemoteInfo(SolrClientNodeStateProvider.java:241)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:7
> 6)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchTagValues(S
> olrClientNodeStateProvider.java:139)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.getNodeValues(S
> olrClientNodeStateProvider.java:128)
> ~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
> jimczi - 2018-09-18 13:07:58]
> at
> org.apache.solr.handler.admin.MetricsHistoryHandler.collectGlobalMetrics(Me
> tricsHistoryHandler.java:498)
> ~[solr-core-7.5.0.jar:7.5.0 

RE: Odd Scoring behavior

2018-10-31 Thread Webster Homer
The KeywordRepeat and RemoveDuplicates were added to support better wildcard 
matching. Removing the duplicates just removes those terms that weren't 
stemmed. 

This seems like a subtle bug to me

-Original Message-
From: Markus Jelsma [mailto:markus.jel...@openindex.io] 
Sent: Tuesday, October 30, 2018 4:55 PM
To: solr-user@lucene.apache.org
Subject: RE: Odd Scoring behavior

Hello Webster,

It smells like KeywordRepeat. In general it is not a problem if all terms are 
scored twice. But you also have RemoveDuplicates, and this causes that in some 
cases a term in one field is scored twice, but once in the other field and then 
you have a problem.

Due to lack of replies, in the end i chose to remove the RemoveDuplicates 
filter, so that everything is always scored twice. This 'solution' at least 
solved the general scoring problem of searching across many fields.

Thus far there is no real solution to this problem as far as i know it.

Regards,
Markus

http://lucene.472066.n3.nabble.com/Multiple-languages-boosting-and-stemming-and-KeywordRepeat-td4389086.html

 
 
-Original message-
> From:Webster Homer 
> Sent: Tuesday 30th October 2018 22:34
> To: solr-user@lucene.apache.org
> Subject: Odd Scoring behavior
> 
> I noticed that sometimes query matches seem to get counted twice when they 
> are scored. This will happen if the fieldtype is being stemmed, and there is 
> a matching synonym.
> It seems that the score for the field is 2X higher than it should be. We see 
> this only when there is a matching synonym that has a stemmed term in it.
> 
> 
> We have this synonym defined:
> bsa, bovine serum albumin
> 
> We have this fieldtype:
>  positionIncrementGap="100">
>   
> 
>  words="lang/stopwords_en.txt" />
> 
> 
> 
> 
> 
>  
> 
> 
>  words="lang/stopwords_en.txt" />
>  synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
> 
> 
> 
>   
> 
> 
> Which is used as:
>  indexed="true" stored="true" required="false" multiValued="false" />
> 
> When we query this field using the eDismax query parser the field, 
> search_en_root_name seems to contribute twice to the score for this query:
> bovine serum albumin
> 
> once for the base query, and once for the stemmed form of the query:
> bovin serum albumin
> 
> If we remove the synonym it will only be counted once. We only see this 
> behavior If part of the synonym can be stemmed. This seems odd and has the 
> effect of overpowering boosts on other fields.
> 
> The explain plan without synonym
> {
>   "responseHeader":{
> "zkConnected":true,
> "status":0,
> "QTime":44,
> "params":{
>   "mm":"2<-25%",
>   "fl":"searchmv_pno, search_en_p_pri_name [explain style=nl]",
>   "group.limit":"1",
>   "q.op":"OR",
>   "sort":"score desc,sort_en_name asc ,sort_ds asc,  search_pid asc",
>   "group.ngroups":"true",
>   "q":"bovine serum albumin",
>   "tie":".45",
>   "defType":"edismax",
>   "group.sort":"sort_ds asc, score desc",
>   "qf":"search_en_p_pri_name_min^7500
> search_en_root_name_min^12000 search_en_p_pri_name^3000
> search_pid^2500 searchmv_pno^2500 searchmv_cas_number^2500
> searchmv_p_skus^2500 search_lform_lc^2500  search_en_root_name^2500
> searchmv_en_s_pri_name^2500 searchmv_en_keywords^2500
> searchmv_lookahead_terms^2000 searchmv_user_term^2000
> searchmv_en_acronym^1500 searchmv_en_synonyms^1500
> searchmv_concat_sku^1000 search_concat_pno^1000
> searchmv_en_name_suf^1000 searchmv_component_cas^1000
> search_lform^1000 searchmv_pno_genr^500 search_concat_pno_genr^500
> searchmv_p_skus_genr^500 search_eform search_mol_form 
> searchmv_component_molform searchmv_en_descriptions searchmv_en_chem_comp 
> searchmv_en_attributes searchmv_en_page_title search_mdl_number 
> searchmv_xref_comparable_pno searchmv_xref_comparable_sku 
> searchmv_xref_equivalent_pno searchmv_xref_exact_pno searchmv_xref_exact_sku 
> searchmv_vendor_sku searchmv_material_number search_en_sortkey searchmv_rtecs 
> search_color_idx search_beilstein search_ecnumber search_egecnumber 
> search_femanumber searchmv_isbn",
>   "group.field":"id_s",
>   "_":"1540331449276",
>   "group":"true"}},
>   "grouped":{
> "id_s":{
>   "matches":4701,
>   "ngroups":4393,
>   "groups":[{
>   "groupValue":"bovineserumalbumin123459048468",
>   "doclist":{"numFound":57,"start":0,"docs":[
>   {
> "search_en_p_pri_name":"Bovine Serum Albumin",
> "searchmv_pno":["A2153"],
> "[explain]":{
>   "match":true,
>   "value":38145.117,
>   "description":"max plus 0.45 times others of:",
>   "details":[{
>   "match":true,
>   "value":10434.111,
>   "description":"sum of:",

SolrCloud Replication Failure

2018-10-31 Thread Jeremy Smith
Hi all,

 We are currently running a moderately large instance of standalone solr 
and are preparing to switch to solr cloud to help us scale up.  I have been 
running a number of tests using docker locally and ran into an issue where 
replication is consistently failing.  I have pared down the test case as 
minimally as I could.  Here's a link for the docker-compose.yml (I put it in a 
directory called solrcloud_simple) and a script to run the test:


https://gist.github.com/smithje/2056209fc4a6fb3bcc8b44d0b7df3489


Here's the basic idea behind the test:


1) Create a cluster with 2 nodes (solr-1 and solr-2), 1 shard, and 2 replicas 
(each node gets a replica).  Just use the default schema, although I've also 
tried our schema and got the same result.


2) Shut down solr-2


3) Add 100 simple docs, just id and a field called num.


4) Start solr-2 and check that it received the documents.  It did!


5) Update a document, commit, and check that solr-2 received the update.  It 
did!


6) Stop solr-2, update the same document, start solr-2, and make sure that it 
received the update.  It did!


7) Repeat step 6 with a new value.  This time solr-2 reverts back to what it 
had in step 5.


I believe the main issue comes from this in the logs:


solr-2_1  | 2018-10-31 17:04:26.135 INFO  
(recoveryExecutor-4-thread-1-processing-n:solr-2:8082_solr 
x:test_shard1_replica_n2 c:test s:shard1 r:core_node4) [c:test s:shard1 
r:core_node4 x:test_shard1_replica_n2] o.a.s.u.PeerSync PeerSync: 
core=test_shard1_replica_n2 url=http://solr-2:8082/solr  Our versions are 
newer. ourHighThreshold=1615861330901729280 
otherLowThreshold=1615861314086764545 ourHighest=1615861330901729280 
otherHighest=1615861335081353216

PeerSync thinks the versions on solr-2 are newer for some reason, so it doesn't 
try to sync from solr-1.  In the final state, solr-2 will always have a lower 
version for the updated doc than solr-1.  I've tried this with different commit 
strategies, both auto and manual, and it doesn't seem to make any difference.

Is this a bug with solr, an issue with using docker, or am I just expecting too 
much from solr?

Thanks for any insights you may have,

Jeremy




Re: Overseer could not get tags

2018-10-31 Thread Chris Ulicny
I've managed to replicate this issue with the 7.5.0 release as well by
starting up a single instance of solr in cloud mode (on windows) and
uploading the security.json file below to it.

After a short while, the "could not get tags from node..." messages start
coming through every 60 seconds. The accompanying logged error and
expecting stacktrace are also included below.

Is there a JIRA ticket for this issue (or a directly related one)? I
couldn't seem to find one.

Thanks,
Chris

*security.json:*
{
"authentication":{"blockUnknown":true,"class":"solr.BasicAuthPlugin",
"credentials":{
"solradmin":"...",
"solrreader":"...",
"solrwriter":"..."}
},
"authorization":{"class":"solr.RuleBasedAuthorizationPlugin",
"permissions":[
{"name":"read","role":"reader"},
{"name":"security-read","role":"reader"},
{"name":"schema-read","role":"reader"},
{"name":"config-read","role":"reader"},
{"name":"core-admin-read","role":"reader"},
{"name":"collection-admin-read","role":"reader"},
{"name":"update","role":"writer"},
{"name":"security-edit","role":"admin"},
{"name":"schema-edit","role":"admin"},
{"name":"config-edit","role":"admin"},
{"name":"core-admin-edit","role":"admin"},
{"name":"collection-admin-edit","role":"admin"},
{"name":"all","role":"admin"}],
"user-role":{
"solradmin":["reader","writer","admin"],
"solrreader":["reader"],
"solrwriter":["reader","writer"]}
}
}

*StackTrace:*
2018-10-31 16:20:01.994 WARN  (MetricsHistoryHandler-12-thread-1) [   ]
o.a.s.c.s.i.SolrClientNodeStateProvider could not get tags from node
ip:8080_solr
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://ip:8080/solr: Expected mime type
application/octet-stream but got text/html. 


Error 401 require authentication

HTTP ERROR 401
Problem accessing /solr/admin/metrics. Reason:
require authentication



at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$ClientSnitchCtx.invoke(SolrClientNodeStateProvider.java:342)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchReplicaMetrics(SolrClientNodeStateProvider.java:195)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider$AutoScalingSnitch.getRemoteInfo(SolrClientNodeStateProvider.java:241)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.common.cloud.rule.ImplicitSnitch.getTags(ImplicitSnitch.java:76)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.fetchTagValues(SolrClientNodeStateProvider.java:139)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.client.solrj.impl.SolrClientNodeStateProvider.getNodeValues(SolrClientNodeStateProvider.java:128)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
at
org.apache.solr.handler.admin.MetricsHistoryHandler.collectGlobalMetrics(MetricsHistoryHandler.java:498)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.admin.MetricsHistoryHandler.collectMetrics(MetricsHistoryHandler.java:371)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
org.apache.solr.handler.admin.MetricsHistoryHandler.lambda$new$0(MetricsHistoryHandler.java:231)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[?:1.8.0_121]
at 

Re: streaming expressions substring-evaluator

2018-10-31 Thread Aroop Ganguly
Thanks for the note Joel.


> On Oct 31, 2018, at 5:55 AM, Joel Bernstein  wrote:
> 
> The replace operator is going to be "replaced" :)
> 
> Let's create an umbrella ticket for string operations and list out what
> would be nice to have. They can probably be added very quickly.
> 
> 
> Joel Bernstein
> http://joelsolr.blogspot.com/
> 
> 
> On Wed, Oct 31, 2018 at 8:49 AM Gus Heck  wrote:
> 
>> Probably ReplaceWithSubstringOperation (similar to
>> ReplaceWithFieldOperation thought that would probably add another class be
>> subject to https://issues.apache.org/jira/browse/SOLR-9661)
>> 
>> On Wed, Oct 31, 2018 at 8:32 AM Joel Bernstein  wrote:
>> 
>>> I don't think there is a substring or similar function. This would be
>> quite
>>> nice to add along with other string manipulations.
>>> 
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>> 
>>> 
>>> On Wed, Oct 31, 2018 at 2:37 AM Aroop Ganguly 
>>> wrote:
>>> 
 Hey Team
 
 
 Is there a way to extract a part of a string field and group by on it
>> and
 obtain a histogram ?
 
 for example the filed value is DateTime of the form: 20180911T00 and
 I want to do a substring like substring(field1,0,7), and then do a
 streaming expression of the form :
 
 rollup(
select(
 search(col1,fl=“field1”,sort=“field1 asc”), substring(field1,0,7)
>> as
 date)
   ,on= date, count(*)
 )
 
 Is there a substring operator available or an alternate in streaming
 expressions?
 
 Thanks
 Aroop
>>> 
>> 
>> 
>> --
>> http://www.the111shift.com
>> 



Re: Solr cloud - poweroff procedure

2018-10-31 Thread Walter Underwood
“Take backups” is whatever you need for your environment. In AWS, we snapshot 
the EBS volumes, and so on.

Backing up the Solr install and home directories would be good. There are some 
core.properties files in there that seem to be useful. Honestly, I don’t have a 
complete handle on the details of naming and properties for cores in Solr Cloud.

For the Zookeeper ensemble, remember that each host has a different value in 
the myid file.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Oct 31, 2018, at 5:17 AM, lstusr 5u93n4  wrote:
> 
> Hi,
> 
> Yes, zookeeper is external, and yes, we'll definitely wait until after solr
> has stopped to bring it down.
> 
> Thanks for the tip about disabling `autoAddReplicas`, we definitely don't
> want the shards moving around during the process.
> 
> Wunder, your point 3 mentions "take backups". Given that our data is on
> hdfs (not co-located with the solr servers) and backed up separately, what
> else would you recommend backing up?  The contents of the `solr.home.home`
> folder seem like good candidates... anything else? Let's say one of the
> servers gets dropped during the move, is it sufficient to restore the
> contents of `solr.home.home` onto a new server with the same
> hostname/solrVersion/zookeeperConfig and bring it up in the same way as the
> others?
> 
> Thanks all,
> 
> Kyle
> 
> 
> On Wed, 31 Oct 2018 at 05:22, Shalin Shekhar Mangar 
> wrote:
> 
>> In case you are using a recent Solr 7.x version with collections that have
>> autoAddReplicas=true, you should disable the auto add replicas feature
>> before powering off so that Solr does not decide to move replicas around
>> because nodes have been lost. See
>> 
>> https://lucene.apache.org/solr/guide/7_4/solrcloud-autoscaling-auto-add-replicas.html#using-cluster-property-to-enable-autoaddreplicas
>> 
>> On Wed, Oct 31, 2018 at 3:27 AM lstusr 5u93n4  wrote:
>> 
>>> Hi All,
>>> 
>>> We have a solr cloud running 3 shards, 3 hosts, 6 total NRT replicas, and
>>> the data director on hdfs. It has 950 million documents in the index,
>>> occupying 700GB of disk space.
>>> 
>>> We need to completely power off the system to move it.
>>> 
>>> Are there any actions we should take on shutdown to help the process?
>>> Anyhing we should expect on power on?
>>> 
>>> Thanks,
>>> 
>>> Kyle
>>> 
>> 
>> 
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>> 



RE: Merging data from different sources

2018-10-31 Thread Martin Frank Hansen (MHQ)
Hi Markus,

Thanks for your reply!

I hope I can make it work as well 

-Original Message-
From: Markus Jelsma 
Sent: 30. oktober 2018 22:02
To: solr-user@lucene.apache.org
Subject: RE: Merging data from different sources

Hello Martin,

We also use an URP for this in some cases. We index documents to some 
collection, the URP reads a field from that document which is an ID in another 
collection. So we fetch that remote Solr document on-the-fly, and use those 
fields to enrich the incoming document.

It is very straightforward and works very well.

Regards,
Markus



-Original message-
> From:Martin Frank Hansen (MHQ) 
> Sent: Tuesday 30th October 2018 21:55
> To: solr-user@lucene.apache.org
> Subject: RE: Merging data from different sources
>
> Hi Alex,
>
> Thanks for your help. I will take a look at the update-request-processor.
>
> I wonder if there is a way to link documents together, so that they always 
> show up together should one of the documents match a search query?
>
> -Original Message-
> From: Alexandre Rafalovitch 
> Sent: 30. oktober 2018 13:16
> To: solr-user 
> Subject: Re: Merging data from different sources
>
> Maybe
> https://lucene.apache.org/solr/guide/7_5/update-request-processors.htm
> l#atomicupdateprocessorfactory
>
> Regards,
> Alex
>
> On Tue, Oct 30, 2018, 7:57 AM Martin Frank Hansen (MHQ),  wrote:
>
> > Hi,
> >
> > I am trying to merge files from different sources and with different
> > content (except for one key-field) , how can this be done in Solr?
> >
> > An example could be:
> >
> > Document 1
> > 
> > 001  Unique id
> > for Document 1
> > test-123
> > …
> > 
> >
> > Document 2
> > 
> > abcdefgh   Unique id
> > for Document 2
> > test-123
> > …
> > 
> >
> > In the above case I would like to merge on Journalnumber thus ending
> > up with something like this:
> >
> >  
> > 001  Unique id
> > for the merge
> > test-123
> > abcdefgh   Reference id
> > for Document 2.
> > …
> > 
> >
> > How would I go about this? I was thinking about embedded documents,
> > but since I am not indexing the different data sources at the same
> > time I don’t think it will work. The ideal result would be to have
> > Document 2 imbedded in Document 1.
> >
> > I am currently using a schema that contains all fields from Document
> > 1 and Document 2.
> >
> > I really hope that Solr can handle this, and any help/feedback is
> > much appreciated.
> >
> > Best regards
> >
> > Martin
> >
> >
> >
> >
> > Beskyttelse af dine personlige oplysninger er vigtig for os. Her
> > finder du KMD’s
> > Privatlivspolitik, der fortæller, 
> > hvordan vi behandler oplysninger om dig.
> >
> > Protection of your personal data is important to us. Here you can
> > read KMD’s Privacy Policy
> > outlining how we process your personal data.
> >
> > Vi gør opmærksom på, at denne e-mail kan indeholde fortrolig information.
> > Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst
> > informere afsender om fejlen ved at bruge svarfunktionen. Samtidig
> > beder vi dig slette e-mailen i dit system uden at videresende eller kopiere 
> > den.
> > Selvom e-mailen og ethvert vedhæftet bilag efter vores overbevisning
> > er fri for virus og andre fejl, som kan påvirke computeren eller
> > it-systemet, hvori den modtages og læses, åbnes den på modtagerens
> > eget ansvar. Vi påtager os ikke noget ansvar for tab og skade, som
> > er opstået i forbindelse med at modtage og bruge e-mailen.
> >
> > Please note that this message may contain confidential information.
> > If you have received this message by mistake, please inform the
> > sender of the mistake by sending a reply, then delete the message
> > from your system without making, distributing or retaining any copies of it.
> > Although we believe that the message and any attachments are free
> > from viruses and other errors that might affect the computer or
> > it-system where it is received and read, the recipient opens the message at 
> > his or her own risk.
> > We assume no responsibility for any loss or damage arising from the
> > receipt or use of this message.
> >
>


Re: streaming expressions substring-evaluator

2018-10-31 Thread Joel Bernstein
The replace operator is going to be "replaced" :)

Let's create an umbrella ticket for string operations and list out what
would be nice to have. They can probably be added very quickly.


Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Oct 31, 2018 at 8:49 AM Gus Heck  wrote:

> Probably ReplaceWithSubstringOperation (similar to
> ReplaceWithFieldOperation thought that would probably add another class be
> subject to https://issues.apache.org/jira/browse/SOLR-9661)
>
> On Wed, Oct 31, 2018 at 8:32 AM Joel Bernstein  wrote:
>
> > I don't think there is a substring or similar function. This would be
> quite
> > nice to add along with other string manipulations.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> >
> > On Wed, Oct 31, 2018 at 2:37 AM Aroop Ganguly 
> > wrote:
> >
> > > Hey Team
> > >
> > >
> > > Is there a way to extract a part of a string field and group by on it
> and
> > > obtain a histogram ?
> > >
> > > for example the filed value is DateTime of the form: 20180911T00 and
> > > I want to do a substring like substring(field1,0,7), and then do a
> > > streaming expression of the form :
> > >
> > > rollup(
> > > select(
> > >  search(col1,fl=“field1”,sort=“field1 asc”), substring(field1,0,7)
> as
> > > date)
> > >,on= date, count(*)
> > > )
> > >
> > > Is there a substring operator available or an alternate in streaming
> > > expressions?
> > >
> > > Thanks
> > > Aroop
> >
>
>
> --
> http://www.the111shift.com
>


Re: streaming expressions substring-evaluator

2018-10-31 Thread Gus Heck
Probably ReplaceWithSubstringOperation (similar to
ReplaceWithFieldOperation thought that would probably add another class be
subject to https://issues.apache.org/jira/browse/SOLR-9661)

On Wed, Oct 31, 2018 at 8:32 AM Joel Bernstein  wrote:

> I don't think there is a substring or similar function. This would be quite
> nice to add along with other string manipulations.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
>
> On Wed, Oct 31, 2018 at 2:37 AM Aroop Ganguly 
> wrote:
>
> > Hey Team
> >
> >
> > Is there a way to extract a part of a string field and group by on it and
> > obtain a histogram ?
> >
> > for example the filed value is DateTime of the form: 20180911T00 and
> > I want to do a substring like substring(field1,0,7), and then do a
> > streaming expression of the form :
> >
> > rollup(
> > select(
> >  search(col1,fl=“field1”,sort=“field1 asc”), substring(field1,0,7) as
> > date)
> >,on= date, count(*)
> > )
> >
> > Is there a substring operator available or an alternate in streaming
> > expressions?
> >
> > Thanks
> > Aroop
>


-- 
http://www.the111shift.com


Re: streaming expressions substring-evaluator

2018-10-31 Thread Joel Bernstein
I don't think there is a substring or similar function. This would be quite
nice to add along with other string manipulations.

Joel Bernstein
http://joelsolr.blogspot.com/


On Wed, Oct 31, 2018 at 2:37 AM Aroop Ganguly 
wrote:

> Hey Team
>
>
> Is there a way to extract a part of a string field and group by on it and
> obtain a histogram ?
>
> for example the filed value is DateTime of the form: 20180911T00 and
> I want to do a substring like substring(field1,0,7), and then do a
> streaming expression of the form :
>
> rollup(
> select(
>  search(col1,fl=“field1”,sort=“field1 asc”), substring(field1,0,7) as
> date)
>,on= date, count(*)
> )
>
> Is there a substring operator available or an alternate in streaming
> expressions?
>
> Thanks
> Aroop


Re: Solr cloud - poweroff procedure

2018-10-31 Thread lstusr 5u93n4
Hi,

Yes, zookeeper is external, and yes, we'll definitely wait until after solr
has stopped to bring it down.

Thanks for the tip about disabling `autoAddReplicas`, we definitely don't
want the shards moving around during the process.

Wunder, your point 3 mentions "take backups". Given that our data is on
hdfs (not co-located with the solr servers) and backed up separately, what
else would you recommend backing up?  The contents of the `solr.home.home`
folder seem like good candidates... anything else? Let's say one of the
servers gets dropped during the move, is it sufficient to restore the
contents of `solr.home.home` onto a new server with the same
hostname/solrVersion/zookeeperConfig and bring it up in the same way as the
others?

Thanks all,

Kyle


On Wed, 31 Oct 2018 at 05:22, Shalin Shekhar Mangar 
wrote:

> In case you are using a recent Solr 7.x version with collections that have
> autoAddReplicas=true, you should disable the auto add replicas feature
> before powering off so that Solr does not decide to move replicas around
> because nodes have been lost. See
>
> https://lucene.apache.org/solr/guide/7_4/solrcloud-autoscaling-auto-add-replicas.html#using-cluster-property-to-enable-autoaddreplicas
>
> On Wed, Oct 31, 2018 at 3:27 AM lstusr 5u93n4  wrote:
>
> > Hi All,
> >
> > We have a solr cloud running 3 shards, 3 hosts, 6 total NRT replicas, and
> > the data director on hdfs. It has 950 million documents in the index,
> > occupying 700GB of disk space.
> >
> > We need to completely power off the system to move it.
> >
> > Are there any actions we should take on shutdown to help the process?
> > Anyhing we should expect on power on?
> >
> > Thanks,
> >
> > Kyle
> >
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Synonyms relationships

2018-10-31 Thread Doug Turnbull
Synonyms in Solr are really a kind of "programmers" tool, useful for
mapping terms to other terms. This need not correspond to linguistic
notions of a synonym or hypernomy/hyponomy.

That being said, there's probably half a dozen approaches for doing these
kinds of taxonomical relationships in Solr on top of synonyms

Here's some resources / techniques we use at OpenSource Connections for
clients
https://www.youtube.com/watch?v=90F30PS-884
https://opensourceconnections.com/blog/2017/11/21/solr-synonyms-mea-culpa/
https://opensourceconnections.com/blog/2016/12/23/elasticsearch-synonyms-patterns-taxonomies/

(last one is ES, but same ideas apply...)

Best,
-Doug

On Wed, Oct 31, 2018 at 6:20 AM Nicolas Paris 
wrote:

> Hi
>
> Does SolR provide a way to describe synonyms relationships such
> "equivalent to" ,"narrower thant", "broader than" ?
>
> It turns out both postgres and oracle do, but I can't find any related
> information in the documentation.
>
> This is useful to allow generalizing the terms of the research or not.
>
> Thanks ,
>
>
> --
> nicolas
>
-- 
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug


Re: Re: SolrCloud scaling/optimization for high request rate

2018-10-31 Thread Sofiya Strochyk


The logfiles on your servers should be verbose enough to indicate what 
machines are handling which parts of the request.

Yes, generally i see the following entries in logs:

1. 
df=_text_=false=_id=score=4=0=true=fq===24=2==1540984948280=true=javabin
2. df=_text_=false==64=0===24=2==1540984948280==true=javabin
3. q===0===24=2.2=json

Request type #3 (full request) is seen only 1 time across all shards, 
and I suppose it is the original/aggregated request. The shard is 
different every time, so this means load balancing is working.
Request #1 (get IDs by query) is always present for one replica of each 
shard.
Request #2 (get fields by IDs) is, however, sometimes missing even 
though request #1 has a non-zero number of hits for that shard. But i 
don't know if this could indicate a problem or it is working as expected?
Only one attachment made it to the list.  I'm surprised that ANY of 
them made it -- usually they don't.  Generally you need to use a file 
sharing website and provide links.  Dropbox is one site that works 
well.  Gist might also work.


The GC log that made it through (solr_gc.log.7.1) is only two minutes 
long.  Nothing useful can be learned from a log that short.  It is 
also missing the information at the top about the JVM that created it, 
so I'm wondering if you edited the file so it was shorter before 
including it.


Thanks,
Shawn

You are right, sorry, i didn't know this :)
(there is a 1MB limitation on attachments which is why i trimmed the log)
Here are the full GC logs: 1 
 2 

and images: 1  2 
3 



--
Email Signature
*Sofiia Strochyk
*


s...@interlogic.com.ua 
InterLogic
www.interlogic.com.ua 

Facebook icon  LinkedIn 
icon 




Synonyms relationships

2018-10-31 Thread Nicolas Paris
Hi

Does SolR provide a way to describe synonyms relationships such
"equivalent to" ,"narrower thant", "broader than" ?

It turns out both postgres and oracle do, but I can't find any related
information in the documentation.

This is useful to allow generalizing the terms of the research or not.

Thanks ,


-- 
nicolas


Re: Solr cloud - poweroff procedure

2018-10-31 Thread Shalin Shekhar Mangar
In case you are using a recent Solr 7.x version with collections that have
autoAddReplicas=true, you should disable the auto add replicas feature
before powering off so that Solr does not decide to move replicas around
because nodes have been lost. See
https://lucene.apache.org/solr/guide/7_4/solrcloud-autoscaling-auto-add-replicas.html#using-cluster-property-to-enable-autoaddreplicas

On Wed, Oct 31, 2018 at 3:27 AM lstusr 5u93n4  wrote:

> Hi All,
>
> We have a solr cloud running 3 shards, 3 hosts, 6 total NRT replicas, and
> the data director on hdfs. It has 950 million documents in the index,
> occupying 700GB of disk space.
>
> We need to completely power off the system to move it.
>
> Are there any actions we should take on shutdown to help the process?
> Anyhing we should expect on power on?
>
> Thanks,
>
> Kyle
>


-- 
Regards,
Shalin Shekhar Mangar.


streaming expressions substring-evaluator

2018-10-31 Thread Aroop Ganguly
Hey Team


Is there a way to extract a part of a string field and group by on it and 
obtain a histogram ?

for example the filed value is DateTime of the form: 20180911T00 and 
I want to do a substring like substring(field1,0,7), and then do a streaming 
expression of the form :

rollup(
select(
 search(col1,fl=“field1”,sort=“field1 asc”), substring(field1,0,7) as date)
   ,on= date, count(*)
)

Is there a substring operator available or an alternate in streaming 
expressions?

Thanks
Aroop