Re: Doing what does using SolrJ API

2020-09-16 Thread Steven White
Hi everyone,

I figured it out.  It is as simple as creating a List and using
that as the value part for SolrInputDocument.addField() API.

Thanks,

Steven


On Wed, Sep 16, 2020 at 9:13 PM Steven White  wrote:

> Hi everyone,
>
> I want to avoid creating a  source="OneFieldOfMany"/> in my schema (there will be over 1000 of them and
> maybe more so managing it will be a pain).  Instead, I want to use SolrJ
> API to do what  does.  Any example of how I can do this?  If
> there is an example online, that would be great.
>
> Thanks in advance.
>
> Steven
>


Doing what does using SolrJ API

2020-09-16 Thread Steven White
Hi everyone,

I want to avoid creating a  in my schema (there will be over 1000 of them and
maybe more so managing it will be a pain).  Instead, I want to use SolrJ
API to do what  does.  Any example of how I can do this?  If
there is an example online, that would be great.

Thanks in advance.

Steven


Re: NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-16 Thread Pratik Patel
Looking at some other unit tests in repo, I tried an approach using
UpdateRequest as follows.

SolrInputDocument sdoc = new SolrInputDocument(  );
> sdoc.addField( "id", testChildPOJO.id() );
> sdoc.setField( "fieldName",
> java.util.Collections.singletonMap("set", testChildPOJO.fieldName() +
> postfix) );
> final UpdateRequest req = new UpdateRequest();
> req.withRoute( pojo1.id() );
> req.add(sdoc);
>
> collection.client.request( req, collection.getCollectionName()
> );
> req.commit( collection.client, collection.getCollectionName());


But this also results in the SAME Null Pointer Exception.

Looking at the source code, it looks like "fieldPath" is null below.



>  AtomicUpdateDocumentMerger.getFieldFromHierarchy(SolrInputDocument
> completeHierarchy, String fieldPath) {
> final List docPaths =
> StrUtils.splitSmart(fieldPath.substring(1), '/');
> ..
>}


Any idea what's wrong here?

Thanks

On Wed, Sep 16, 2020 at 1:27 PM Pratik Patel  wrote:

> Hello Everyone,
>
> I am trying to update a field of a child document using atomic updates
> feature. I am using solr and solrJ version 8.5.0
>
> I have ensured that my schema satisfies the conditions for atomic updates
> and I am able to do atomic updates on normal documents but with nested
> child documents, I am getting a Null Pointer Exception. Following is the
> simple test which I am trying.
>
> TestPojo  pojo1  = new TestPojo().cId( "abcd" )
>>  .conceptid( "c1" )
>>  .storeid( storeId )
>>  .testChildPojos(
>> Collections.list( testChildPOJO, testChildPOJO2,
>>  testChildPOJO3 )
>> );
>> TestChildPOJOtestChildPOJO = new TestChildPOJO().cId(
>> "c1_child1" )
>>   .conceptid( "c1" )
>>   .storeid( storeId )
>>   .fieldName(
>> "c1_child1_field_value1" )
>>   .startTime(
>> Date.from( now.minus( 10, ChronoUnit.DAYS ) ) )
>>   .integerField_iDF(
>> 10 )
>>
>> .booleanField_bDF(true);
>> // index pojo1 with child testChildPOJO
>> SolrInputDocument sdoc = new SolrInputDocument();
>> sdoc.addField( "_route_", pojo1.cId() );
>> sdoc.addField( "id", testChildPOJO.cId() );
>> sdoc.addField( "conceptid", testChildPOJO.conceptid() );
>> sdoc.addField( "storeid", testChildPOJO.cId() );
>> sdoc.setField( "fieldName", java.util.Collections.singletonMap("set",
>> Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify field
>> "fieldName"
>> collection.client.add( sdoc );   // results in NPE!
>
>
> Stack Trace:
>
> ERROR org.apache.solr.client.solrj.impl.BaseCloudSolrClient - Request to
>> collection [collectionTest2] failed due to (500)
>> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
>> from server at
>> http://172.15.1.100:8081/solr/collectionTest2_shard1_replica_n1:
>> java.lang.NullPointerException
>> at
>> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.getFieldFromHierarchy(AtomicUpdateDocumentMerger.java:308)
>> at
>> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.mergeChildDoc(AtomicUpdateDocumentMerger.java:405)
>> at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:711)
>> at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:374)
>> at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339)
>> at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
>> at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339)
>> at
>> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225)
>> at
>> org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:245)
>> at
>> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
>> at
>> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110)
>> at
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:332)
>> at
>> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:281)
>> at
>> 

Re: Unexpected Performance decrease when upgrading Solr 5.5.2 to 8.5.2

2020-09-16 Thread matthew sporleder
Did you re-work your schema at all?  There are new primitive types,
new lucene versions, DocValue's, etc

On Wed, Sep 16, 2020 at 12:40 PM Keene Chen  wrote:
>
> Hi,
>
> Thanks for pointing that out. I've linked the images below:
>
> solr5_response_times.png
> 
>
> solr8_response_times.png
> 
>
> solr5_throughput.png
> 
>
> solr8_throughput.png
> 
>
> Regards,
> Keene
>
>
> On Wed, 16 Sep 2020 at 09:09, Colvin Cowie 
> wrote:
>
> > Hello,
> >
> > Your images won't appear on the mailing list. You'll need to post them
> > elsewhere and link to them.
> >
> > On Tue, 15 Sep 2020 at 09:44, Keene Chen  wrote:
> >
> > > Hi Solr users community,
> > >
> > >
> > > We have been doing some performance tests on Solr 5.5.2 and Solr 8.5.2 as
> > > part of an upgrading process, and we have noticed some reduced
> > performance
> > > for certain types of requests, particularly those that requests a large
> > > number of rows, eg. 1. Would anyone have an explanation as to why the
> > > performance degrades, and what areas can be looked at in order to improve
> > > its performance?
> > >
> > > The performance test example below was carried out using 18000 of such
> > > queries, running at a constant throughput as specified by the label in
> > the
> > > x-axis. “Rpm” here stands for “requests per minute”.
> > >
> > > Solr 8.5’s maximum response times are consistently better. However, the
> > > 95th and 99th percentile are comparably worse than Solr 5.5’s response
> > > times.
> > > [image: image.png]
> > > [image: image.png]
> > >
> > > The maximum throughput for solr 8.5 is reached sooner than Solr 5.5 at
> > > around 4 requests per second.
> > >
> > >
> > > [image: image.png]
> > > [image: image.png]
> > >
> > > Regards,
> > > Keene
> > >
> > > --
> > >
> > >
> > > Keene Chen  | Senior Software Developer
> > >
> > >
> > >
> > > Connect with us
> > >
> > > LinkedIn   Twitter
> > >   Instagram
> > >   Facebook
> > >   News
> > >   Blog  > >
> > >
> > > 
> > >
> > >
> > >
> > > Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> > > Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
> > >
> > > Contact details for our other offices can be found at
> > > http://www.mintel.com/office-locations.
> > >
> > > This email and any attachments may include content that is confidential,
> > > privileged
> > > or otherwise protected under applicable law. Unauthorised disclosure,
> > > copying, distribution
> > > or use of the contents is prohibited and may be unlawful. If you have
> > > received this email in error,
> > > including without appropriate authorisation, then please reply to the
> > > sender about the error
> > > and delete this email and any attachments.
> > >
> > >
> > >
> > > Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> > > Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
> > >
> > > Contact details for our other offices can be found at
> > > http://www.mintel.com/office-locations.
> > >
> > > This email and any attachments may include content that is confidential,
> > > privileged
> > > or otherwise protected under applicable law. Unauthorised disclosure,
> > > copying, distribution
> > > or use of the contents is prohibited and may be unlawful. If you have
> > > received this email in error,
> > > including without appropriate authorisation, then please reply to the
> > > sender about the error
> > > and delete this email and any attachments.
> > >
> > >
> >
>
> --
>
> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> Registered in
> England: Number 1475918. | VAT Number: GB 232 9342 72
>
> Contact details for
> our other offices can be found at http://www.mintel.com/office-locations
> .
>
> This email and any attachments
> may include content that is confidential, privileged
> or otherwise
> protected under applicable law. Unauthorised disclosure, copying,
> distribution
> or use of the contents is prohibited and may be unlawful. If
> you have received this email in error,
> including without appropriate
> authorisation, then please reply to the sender about the error
> and delete
> this email and any attachments.
>


Re: Unexpected Performance decrease when upgrading Solr 5.5.2 to 8.5.2

2020-09-16 Thread Toke Eskildsen
Keene Chen  wrote:
> We have been doing some performance tests on Solr 5.5.2
> and Solr 8.5.2 as part of an upgrading process, and we have
> noticed some reduced performance for certain types of
> requests, particularly those that requests a large number of
> rows, eg. 1.

Solr 5→8… One large change is the switch to streaming docvalues in Solr 7, 
which has an effect on random access speed, especially if there are many 
documents per shard and only some of the documents has a value for the field. 
This should have been improved in Solr 8, so a factor 10 slowdown (median, 60 
users) is surprising to me.

Could you tell us how many fields you return for each document and how many of 
these are marked as Stored or/and DocValues? Also approximately how many 
documents you have per shard?

And a sanity check: The default behaviour of document retrieval with regard to 
DocValues changed with schema version 1.6 
(https://issues.apache.org/jira/browse/SOLR-8220). Have you checked that the 
same number of fields are returned for the two setups?

- Toke Eskildsen?


NPE Issue with atomic update to nested document or child document through SolrJ

2020-09-16 Thread Pratik Patel
Hello Everyone,

I am trying to update a field of a child document using atomic updates
feature. I am using solr and solrJ version 8.5.0

I have ensured that my schema satisfies the conditions for atomic updates
and I am able to do atomic updates on normal documents but with nested
child documents, I am getting a Null Pointer Exception. Following is the
simple test which I am trying.

TestPojo  pojo1  = new TestPojo().cId( "abcd" )
>  .conceptid( "c1" )
>  .storeid( storeId )
>  .testChildPojos(
> Collections.list( testChildPOJO, testChildPOJO2,
>  testChildPOJO3 )
> );
> TestChildPOJOtestChildPOJO = new TestChildPOJO().cId(
> "c1_child1" )
>   .conceptid( "c1" )
>   .storeid( storeId )
>   .fieldName(
> "c1_child1_field_value1" )
>   .startTime(
> Date.from( now.minus( 10, ChronoUnit.DAYS ) ) )
>   .integerField_iDF(
> 10 )
>
> .booleanField_bDF(true);
> // index pojo1 with child testChildPOJO
> SolrInputDocument sdoc = new SolrInputDocument();
> sdoc.addField( "_route_", pojo1.cId() );
> sdoc.addField( "id", testChildPOJO.cId() );
> sdoc.addField( "conceptid", testChildPOJO.conceptid() );
> sdoc.addField( "storeid", testChildPOJO.cId() );
> sdoc.setField( "fieldName", java.util.Collections.singletonMap("set",
> Collections.list(testChildPOJO.fieldName() + postfix) ) ); // modify field
> "fieldName"
> collection.client.add( sdoc );   // results in NPE!


Stack Trace:

ERROR org.apache.solr.client.solrj.impl.BaseCloudSolrClient - Request to
> collection [collectionTest2] failed due to (500)
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
> from server at
> http://172.15.1.100:8081/solr/collectionTest2_shard1_replica_n1:
> java.lang.NullPointerException
> at
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.getFieldFromHierarchy(AtomicUpdateDocumentMerger.java:308)
> at
> org.apache.solr.update.processor.AtomicUpdateDocumentMerger.mergeChildDoc(AtomicUpdateDocumentMerger.java:405)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.getUpdatedDocument(DistributedUpdateProcessor.java:711)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doVersionAdd(DistributedUpdateProcessor.java:374)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.lambda$versionAdd$0(DistributedUpdateProcessor.java:339)
> at org.apache.solr.update.VersionBucket.runWithLock(VersionBucket.java:50)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:339)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:225)
> at
> org.apache.solr.update.processor.DistributedZkUpdateProcessor.processAdd(DistributedZkUpdateProcessor.java:245)
> at
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
> at
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:110)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:332)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readIterator(JavaBinUpdateRequestCodec.java:281)
> at
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:338)
> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$StreamingCodec.readNamedList(JavaBinUpdateRequestCodec.java:236)
> at
> org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:303)
> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:283)
> at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:196)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:127)
> at
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:122)
> at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:70)
> at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:211)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2596)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:802)
> at 

Re: Unexpected Performance decrease when upgrading Solr 5.5.2 to 8.5.2

2020-09-16 Thread Keene Chen
Hi,

Thanks for pointing that out. I've linked the images below:

solr5_response_times.png


solr8_response_times.png


solr5_throughput.png


solr8_throughput.png


Regards,
Keene


On Wed, 16 Sep 2020 at 09:09, Colvin Cowie 
wrote:

> Hello,
>
> Your images won't appear on the mailing list. You'll need to post them
> elsewhere and link to them.
>
> On Tue, 15 Sep 2020 at 09:44, Keene Chen  wrote:
>
> > Hi Solr users community,
> >
> >
> > We have been doing some performance tests on Solr 5.5.2 and Solr 8.5.2 as
> > part of an upgrading process, and we have noticed some reduced
> performance
> > for certain types of requests, particularly those that requests a large
> > number of rows, eg. 1. Would anyone have an explanation as to why the
> > performance degrades, and what areas can be looked at in order to improve
> > its performance?
> >
> > The performance test example below was carried out using 18000 of such
> > queries, running at a constant throughput as specified by the label in
> the
> > x-axis. “Rpm” here stands for “requests per minute”.
> >
> > Solr 8.5’s maximum response times are consistently better. However, the
> > 95th and 99th percentile are comparably worse than Solr 5.5’s response
> > times.
> > [image: image.png]
> > [image: image.png]
> >
> > The maximum throughput for solr 8.5 is reached sooner than Solr 5.5 at
> > around 4 requests per second.
> >
> >
> > [image: image.png]
> > [image: image.png]
> >
> > Regards,
> > Keene
> >
> > --
> >
> >
> > Keene Chen  | Senior Software Developer
> >
> >
> >
> > Connect with us
> >
> > LinkedIn   Twitter
> >   Instagram
> >   Facebook
> >   News
> >   Blog  >
> >
> > 
> >
> >
> >
> > Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> > Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
> >
> > Contact details for our other offices can be found at
> > http://www.mintel.com/office-locations.
> >
> > This email and any attachments may include content that is confidential,
> > privileged
> > or otherwise protected under applicable law. Unauthorised disclosure,
> > copying, distribution
> > or use of the contents is prohibited and may be unlawful. If you have
> > received this email in error,
> > including without appropriate authorisation, then please reply to the
> > sender about the error
> > and delete this email and any attachments.
> >
> >
> >
> > Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> > Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
> >
> > Contact details for our other offices can be found at
> > http://www.mintel.com/office-locations.
> >
> > This email and any attachments may include content that is confidential,
> > privileged
> > or otherwise protected under applicable law. Unauthorised disclosure,
> > copying, distribution
> > or use of the contents is prohibited and may be unlawful. If you have
> > received this email in error,
> > including without appropriate authorisation, then please reply to the
> > sender about the error
> > and delete this email and any attachments.
> >
> >
>

-- 

Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
Registered in 
England: Number 1475918. | VAT Number: GB 232 9342 72

Contact details for 
our other offices can be found at http://www.mintel.com/office-locations 
.

This email and any attachments 
may include content that is confidential, privileged 
or otherwise 
protected under applicable law. Unauthorised disclosure, copying, 
distribution 
or use of the contents is prohibited and may be unlawful. If 
you have received this email in error,
including without appropriate 
authorisation, then please reply to the sender about the error 
and delete 
this email and any attachments.



New replicas not being added to new nodes

2020-09-16 Thread Drew Kidder
I have the following setup:
- 6 shards on a collection called "jobs-09022020"
- 6 TLOG replicas (used only for indexing)
- 6 PULL replicas (used only for searching)

I want to scale the PULL replicas, 6 at a time, and have Solr add replicas
to that new node set when they come up and delete them when that node set
goes down. I have uploaded and verified the following autoscale policy to
the /api/cluster/autoscaling API:

{
  "set-trigger": {
"name": "node_added_trigger",
"event": "nodeAdded",
"waitFor": "5s",
"preferredOperation": "ADDREPLICA",
"replicaType": "PULL",
"enabled": true,
"actions": [
  {
"name": "compute_plan",
"class": "solr.ComputePlanAction",
"collections": "jobs-09022020"
  },
  {
"name": "execute_plan",
"class": "solr.ExecutePlanAction"
  }
]
  },
  "set-trigger": {
"name": "node_lost_trigger",
"event": "nodeLost",
"waitFor": "90s",
"preferredOperation": "DELETENODE",
"enabled": true
  }
}

However, when I add the new nodes, no new replicas are created. What am I
missing? Here is the output of the /admin/collections?action=COLSTATUS API
(minus the "shards" information":

"jobs-09022020":{
   "stateFormat":2,
   "znodeVersion":135,
   "properties":{
 "autoAddReplicas":"false",
 "maxShardsPerNode":"1",
 "nrtReplicas":"3",
 "policy":"search-blue",
 "pullReplicas":"2",
 "replicationFactor":"3",
 "router":{
   "field":"id",
   "name":"compositeId"

  },
 "tlogReplicas":"2"

   },
   "activeShards":6,
   "inactiveShards":0,
   "schemaNonCompliant":["(NONE)"]
}

My questions:
1. Should I expect to see individual shard replicas coming up as the nodes
come up or do all the nodes have to be available before the shard placement
can happen?
2. What am I missing on the collection to allow for new PULL replicas to
come up via the autoscaling trigger I have above?
Note: this is very similar to Shane Brook's question regarding "Replication
not occurring to newly added SOLRCloud nodes" previously posted to this
list.

--
Drew(i...@gmail.com)
http://wyntermute.dyndns.org/blog/

-- I Drive Way Too Fast To Worry About Cholesterol.


Re: "timeAllowed" param with "numFound" having a count value but doc list is empty

2020-09-16 Thread Mark Robinson
Thanks Colvin!
All the responses were helpful.

Best
Mark

On Wed, Sep 16, 2020 at 4:06 AM Colvin Cowie 
wrote:

> Hi Mark,
>
> If queries taking 10 (or however many) seconds isn't acceptable, then
> either you need to a) prevent or optimize those queries, b) improve the
> performance of your index, c) use timeAllowed and accept that queries
> taking that long may fail or provide incomplete results, or d) a
> combination of the above.
>
> If you use timeAllowed then you have to accept the possibility that a query
> won't complete within the time allowed. Therefore you need to be able to
> deal with the possibility of the query failing or of it returning
> incomplete results.
>
> In our use of Solr, if a query exceeds timeAllowed we always treat it as a
> failure, even if it might have returned partial results, and return a 5xx
> response from our own server since we don't want to serve incomplete
> results ever. But you could attempt to return whatever results you do
> receive, perhaps with a warning message for your client indicating what
> happened.
>
>
> On Wed, 16 Sep 2020 at 02:05, Mark Robinson 
> wrote:
>
> > Thanks Dominique!
> > So is this parameter generally recommended or not. I wanted to try with a
> > value of 10s. We are not using it now.
> > My goal is to prevent a query from running more than 10s on the solr
> server
> > and choking it.
> >
> > What is the general recommendation.
> >
> > Thanks!
> > Mark
> >
> > On Tue, Sep 15, 2020 at 5:38 PM Dominique Bejean <
> > dominique.bej...@eolya.fr>
> > wrote:
> >
> > > Hi,
> > >
> > > 1. Yes, your analysis is correct
> > >
> > > 2. Yes, it can occurs too with very slow query.
> > >
> > > Regards
> > >
> > > Dominique
> > >
> > > Le mar. 15 sept. 2020 à 15:14, Mark Robinson 
> a
> > > écrit :
> > >
> > > > Hi,
> > > >
> > > > When in a sample query I used "timeAllowed" as low as 10mS, I got
> value
> > > for
> > > >
> > > > "numFound" as say 2000, but no docs were returned. But when I
> increased
> > > the
> > > >
> > > > value for timeAllowed to be in seconds, never got this scenario.
> > > >
> > > >
> > > >
> > > > I have 2 qns:-
> > > >
> > > > 1. Why does numFound have a value like say 2000 or even 6000 but no
> > > >
> > > > documents actually returned. During document collection is
> calculation
> > of
> > > >
> > > > numFound done first and doc collection later?. Is doc list empty
> > > because,by
> > > >
> > > > the time doc collection started the timeAllowed cut off took effect?
> > > >
> > > >
> > > >
> > > > 2. If I give timeAllowed a value say, 10s or above do you think the
> > above
> > > >
> > > > scenario of valid count displayed in numFound, but doc list empty can
> > > ever
> > > >
> > > > occur still, as there is more time before cut-off to retrieve at
> least
> > > one
> > > >
> > > > doc ?
> > > >
> > > >
> > > >
> > > > Thanks!
> > > >
> > > > Mark
> > > >
> > > >
> > >
> >
>


Need to update SOLR_HOME in the solr service script and getting errors

2020-09-16 Thread Victor Kretzer
My setup is two solr nodes running on separate Azure Ubuntu 18.04 LTS vms using 
an external zookeeper assembly.
I installed Solr 6.6.6 using the install file and then followed the steps for 
enabling ssl. I am able to start solr, add collections and the like using 
bin/solr script.

Example:
/opt/solr$ sudo bin/solr start -cloud -s cloud/test2 -force

However, if I restart the machine or attempt to start solr using the installed 
service, it naturally goes back to the default SOLR_HOME in the 
/etc/default/solr.in.sh script: "/var/solr/data"

I've tried updating SOLR_HOME to "/opt/solr/cloud/test2" but then when I start 
the service I see the following error on the Admin Dashboard:
SolrCore Initialization Failures
mycollection_shard1_replica1: 
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: 
/opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/write.lock
Please check your logs for more information

[cid:image001.png@01D68C18.8565BFB0]

I'm including what I believe to be the pertinent information from the logs 
below:
I suspect this is a permission issue because the solr user created by the 
install script isn't allowed access to  /opt/solr but I'm new to Linux and 
haven't completely wrapped my head around the way permissions work with it. Am 
I correct in guessing the cause of the error and, if so, how do I correct this 
so that the service can be used to run my instances?

java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: 
Unable to create core [mycollection_shard1_replica1]
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:192)
  at 
org.apache.solr.core.CoreContainer.lambda$load$6(CoreContainer.java:594)
  at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
  at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Unable to create core 
[mycollection_shard1_replica1]
  at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:966)
  at 
org.apache.solr.core.CoreContainer.lambda$load$5(CoreContainer.java:565)
  at 
com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:197)
  ... 5 more
Caused by: org.apache.solr.common.SolrException: 
/opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/write.lock
  at org.apache.solr.core.SolrCore.(SolrCore.java:977)
  at org.apache.solr.core.SolrCore.(SolrCore.java:830)
  at 
org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:950)
  ... 7 more
Caused by: java.nio.file.AccessDeniedException: 
/opt/solr-6.6.6/cloud/test2/mycollection_shard1_replica1/data/index/write.lock
  at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:84)
  at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
  at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
  at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
  at java.nio.channels.FileChannel.open(FileChannel.java:287)
  at java.nio.channels.FileChannel.open(FileChannel.java:335)
  at 
org.apache.lucene.store.NativeFSLockFactory.obtainFSLock(NativeFSLockFactory.java:113)
  at 
org.apache.lucene.store.FSLockFactory.obtainLock(FSLockFactory.java:41)
  at 
org.apache.lucene.store.BaseDirectory.obtainLock(BaseDirectory.java:45)
  at 
org.apache.lucene.store.FilterDirectory.obtainLock(FilterDirectory.java:104)
  at 
org.apache.lucene.index.IndexWriter.isLocked(IndexWriter.java:4776)
  at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:709)
  at org.apache.solr.core.SolrCore.(SolrCore.java:923)


Thanks for the help,
Victor


Re: "timeAllowed" param with "numFound" having a count value but doc list is empty

2020-09-16 Thread Mark Robinson
Thanks much Bram!

Best,
Mark

On Wed, Sep 16, 2020 at 3:59 AM Bram Van Dam  wrote:

> There are a couple of open issues related to the timeAllowed parameter.
> For instance it currently doesn't work on conjunction with the
> cursorMark parameter [1]. And on Solr 7 it doesn't work at all [2].
>
> But other than that, when users have a lot of query flexibility, it's a
> pretty good idea to limit them somehow. You don't want your users to
> blow up your servers.
>
> [1] https://issues.apache.org/jira/browse/SOLR-14413
>
> [2] https://issues.apache.org/jira/browse/SOLR-9882
>
>  - Bram
>
> On 16/09/2020 03:04, Mark Robinson wrote:
> > Thanks Dominique!
> > So is this parameter generally recommended or not. I wanted to try with a
> > value of 10s. We are not using it now.
> > My goal is to prevent a query from running more than 10s on the solr
> server
> > and choking it.
> >
> > What is the general recommendation.
> >
> > Thanks!
> > Mark
> >
> > On Tue, Sep 15, 2020 at 5:38 PM Dominique Bejean <
> dominique.bej...@eolya.fr>
> > wrote:
> >
> >> Hi,
> >>
> >> 1. Yes, your analysis is correct
> >>
> >> 2. Yes, it can occurs too with very slow query.
> >>
> >> Regards
> >>
> >> Dominique
> >>
> >> Le mar. 15 sept. 2020 à 15:14, Mark Robinson 
> a
> >> écrit :
> >>
> >>> Hi,
> >>>
> >>> When in a sample query I used "timeAllowed" as low as 10mS, I got value
> >> for
> >>>
> >>> "numFound" as say 2000, but no docs were returned. But when I increased
> >> the
> >>>
> >>> value for timeAllowed to be in seconds, never got this scenario.
> >>>
> >>>
> >>>
> >>> I have 2 qns:-
> >>>
> >>> 1. Why does numFound have a value like say 2000 or even 6000 but no
> >>>
> >>> documents actually returned. During document collection is calculation
> of
> >>>
> >>> numFound done first and doc collection later?. Is doc list empty
> >> because,by
> >>>
> >>> the time doc collection started the timeAllowed cut off took effect?
> >>>
> >>>
> >>>
> >>> 2. If I give timeAllowed a value say, 10s or above do you think the
> above
> >>>
> >>> scenario of valid count displayed in numFound, but doc list empty can
> >> ever
> >>>
> >>> occur still, as there is more time before cut-off to retrieve at least
> >> one
> >>>
> >>> doc ?
> >>>
> >>>
> >>>
> >>> Thanks!
> >>>
> >>> Mark
> >>>
> >>>
> >>
> >
>
>


Re: Solr Cloud 8.5.1 - HDFS and Erasure Coding

2020-09-16 Thread Jörn Franke
I am not aware of a test. However keep
In mind that HDFS supported will be deprecated.

Additionally - you can configure erasure encoding in HDFS on a per folder / 
file basis so you could in the worst case just make the folder for Solr with 
the standard HDFS mode.

Erasure encoding has several limitations (eg possibility to append etc) so I 
would be at least sceptical if it works and do extensive testing .

> Am 16.09.2020 um 15:41 schrieb Joe Obernberger :
> 
> Anyone use Solr with Erasure Coding on HDFS?  Is that supported?
> 
> Thank you
> 
> -Joe
> 


Solr Cloud 8.5.1 - HDFS and Erasure Coding

2020-09-16 Thread Joe Obernberger

Anyone use Solr with Erasure Coding on HDFS?  Is that supported?

Thank you

-Joe



Re: Solr training

2020-09-16 Thread Charlie Hull
I do of course mean 'Group Discounts': you don't get a discount for 
being in a 'froup' sadly (I wasn't even aware that was a thing!)


Charlie

On 16/09/2020 13:26, Charlie Hull wrote:


Hi all,

We're running our SolrThink Like a Relevance Engineer training 6-9 Oct 
- you can find out more & book tickets at 
https://opensourceconnections.com/training/solr-think-like-a-relevance-engineer-tlre/


The course is delivered over 4 half-days from 9am EST / 2pm BST / 3pm 
CET and is led by Eric Pugh who co-wrote the first book on Solr and is 
a Solr Committer. It's suitable for all members of the search team - 
search engineers, data scientists, even product owners who want to 
know how Solr search can be measured & tuned. Delivered by working 
relevance engineers the course features practical exercises and will 
give you a great foundation in how to use Solr to build great search.


Tthe early bird discount expires end of this week so do book soon if 
you're interested! Froup discounts also available. We're also running 
a more advanced course on Learning to Rank a couple of weeks later - 
you can find all our training courses and dates at 
https://opensourceconnections.com/training/


Cheers

Charlie

--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web:www.o19s.com



--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com



Solr training

2020-09-16 Thread Charlie Hull

Hi all,

We're running our SolrThink Like a Relevance Engineer training 6-9 Oct - 
you can find out more & book tickets at 
https://opensourceconnections.com/training/solr-think-like-a-relevance-engineer-tlre/


The course is delivered over 4 half-days from 9am EST / 2pm BST / 3pm 
CET and is led by Eric Pugh who co-wrote the first book on Solr and is a 
Solr Committer. It's suitable for all members of the search team - 
search engineers, data scientists, even product owners who want to know 
how Solr search can be measured & tuned. Delivered by working relevance 
engineers the course features practical exercises and will give you a 
great foundation in how to use Solr to build great search.


Tthe early bird discount expires end of this week so do book soon if 
you're interested! Froup discounts also available. We're also running a 
more advanced course on Learning to Rank a couple of weeks later - you 
can find all our training courses and dates at 
https://opensourceconnections.com/training/


Cheers

Charlie

--
Charlie Hull
OpenSource Connections, previously Flax

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.o19s.com



Solr waitForMerges() causing leaderless shard during shutdown

2020-09-16 Thread Ramsey Haddad (BLOOMBERG/ LONDON)
Hi Solr community,

We have been investigating an issue in our solr (7.5.0) setup where the 
shutdown of our solr node takes quite some time (3-4 minutes) during which we 
are effectively leaderless.
After investigating and digging deeper we were able to track it down to segment 
merges which happen before a solr core is closed.

 stack trace when killing the node 


Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode):

"Attach Listener" #150736 daemon prio=9 os_prio=0 tid=0x7f6da4002000 
nid=0x13292 waiting on condition [0x]
java.lang.Thread.State: RUNNABLE

"coreCloseExecutor-22-thread-1" #150733 prio=5 os_prio=0 tid=0x7f6d54020800 
nid=0x11b61 in Object.wait() [0x7f6c98564000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
~at java.lang.Object.wait(Native Method)
~at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:4672)
~- locked <0x0005499908c0> (a org.apache.solr.update.SolrIndexWriter)
~at org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2559)
~- locked <0x0005499908c0> (a org.apache.solr.update.SolrIndexWriter)
~at org.apache.lucene.index.IndexWriter.shutdown(IndexWriter.java:1036)
~at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1078)
~at org.apache.solr.update.SolrIndexWriter.close(SolrIndexWriter.java:286)
~at 
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:892)
~at 
org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:105)
~at 
org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:399)
~- locked <0x00054e150cc0> (a org.apache.solr.update.DefaultSolrCoreState)
~at 
org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:83)
~at org.apache.solr.core.SolrCore.close(SolrCore.java:1574)
~at org.apache.solr.core.SolrCores.lambda$close$0(SolrCores.java:141)
~at org.apache.solr.core.SolrCores$$Lambda$443/1058423472.call(Unknown Source)
~at java.util.concurrent.FutureTask.run(FutureTask.java:266)




The situation is as follows -

1. The first thing that happens is the request handlers being closed at -
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrCore.java#L1588

2. Then it tries to close the index writer via -
https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/core/SolrCore.java#L1610

3. When closing the index writer, it waits for any pending merges to finish at -
https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/index/IndexWriter.java#L1236

Now, if this waitForMerges() takes a long time (3-4 minutes), the instance 
won't shut down for the whole of that time, but because of *Step 1* it will stop
accepting any requests.

This becomes a problem when this node has a leader replica and it is stuck on 
waitForMerges() after closing its reqHandlers. We are in a situation where
the leader is not accepting requests but has not given away the leadership, so 
we are in a leaderless phase.


This issue triggers when we turnaround our nodes which causes a brief period of 
leaderless shards which leads to potential data losses.

My question is -
1. How to avoid this situation given that we have big segment sizes and the 
merging the largest segments is going to take some time.
We do not want to reduce the segment size as it will impact our search 
performance which is crucial.
2. Should Solr ideally not do the waitForMerges() step before closing the 
request handlers?


Merge Policy config and segment size -


time_of_arrival desc
inner
org.apache.solr.index.TieredMergePolicyFactory

16
20480





Re: Unexpected Performance decrease when upgrading Solr 5.5.2 to 8.5.2

2020-09-16 Thread Colvin Cowie
Hello,

Your images won't appear on the mailing list. You'll need to post them
elsewhere and link to them.

On Tue, 15 Sep 2020 at 09:44, Keene Chen  wrote:

> Hi Solr users community,
>
>
> We have been doing some performance tests on Solr 5.5.2 and Solr 8.5.2 as
> part of an upgrading process, and we have noticed some reduced performance
> for certain types of requests, particularly those that requests a large
> number of rows, eg. 1. Would anyone have an explanation as to why the
> performance degrades, and what areas can be looked at in order to improve
> its performance?
>
> The performance test example below was carried out using 18000 of such
> queries, running at a constant throughput as specified by the label in the
> x-axis. “Rpm” here stands for “requests per minute”.
>
> Solr 8.5’s maximum response times are consistently better. However, the
> 95th and 99th percentile are comparably worse than Solr 5.5’s response
> times.
> [image: image.png]
> [image: image.png]
>
> The maximum throughput for solr 8.5 is reached sooner than Solr 5.5 at
> around 4 requests per second.
>
>
> [image: image.png]
> [image: image.png]
>
> Regards,
> Keene
>
> --
>
>
> Keene Chen  | Senior Software Developer
>
>
>
> Connect with us
>
> LinkedIn   Twitter
>   Instagram
>   Facebook
>   News
>   Blog 
>
> 
>
>
>
> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
>
> Contact details for our other offices can be found at
> http://www.mintel.com/office-locations.
>
> This email and any attachments may include content that is confidential,
> privileged
> or otherwise protected under applicable law. Unauthorised disclosure,
> copying, distribution
> or use of the contents is prohibited and may be unlawful. If you have
> received this email in error,
> including without appropriate authorisation, then please reply to the
> sender about the error
> and delete this email and any attachments.
>
>
>
> Mintel Group Ltd | 11 Pilgrim Street | London | EC4V 6RN
> Registered in England: Number 1475918. | VAT Number: GB 232 9342 72
>
> Contact details for our other offices can be found at
> http://www.mintel.com/office-locations.
>
> This email and any attachments may include content that is confidential,
> privileged
> or otherwise protected under applicable law. Unauthorised disclosure,
> copying, distribution
> or use of the contents is prohibited and may be unlawful. If you have
> received this email in error,
> including without appropriate authorisation, then please reply to the
> sender about the error
> and delete this email and any attachments.
>
>


Re: "timeAllowed" param with "numFound" having a count value but doc list is empty

2020-09-16 Thread Colvin Cowie
Hi Mark,

If queries taking 10 (or however many) seconds isn't acceptable, then
either you need to a) prevent or optimize those queries, b) improve the
performance of your index, c) use timeAllowed and accept that queries
taking that long may fail or provide incomplete results, or d) a
combination of the above.

If you use timeAllowed then you have to accept the possibility that a query
won't complete within the time allowed. Therefore you need to be able to
deal with the possibility of the query failing or of it returning
incomplete results.

In our use of Solr, if a query exceeds timeAllowed we always treat it as a
failure, even if it might have returned partial results, and return a 5xx
response from our own server since we don't want to serve incomplete
results ever. But you could attempt to return whatever results you do
receive, perhaps with a warning message for your client indicating what
happened.


On Wed, 16 Sep 2020 at 02:05, Mark Robinson  wrote:

> Thanks Dominique!
> So is this parameter generally recommended or not. I wanted to try with a
> value of 10s. We are not using it now.
> My goal is to prevent a query from running more than 10s on the solr server
> and choking it.
>
> What is the general recommendation.
>
> Thanks!
> Mark
>
> On Tue, Sep 15, 2020 at 5:38 PM Dominique Bejean <
> dominique.bej...@eolya.fr>
> wrote:
>
> > Hi,
> >
> > 1. Yes, your analysis is correct
> >
> > 2. Yes, it can occurs too with very slow query.
> >
> > Regards
> >
> > Dominique
> >
> > Le mar. 15 sept. 2020 à 15:14, Mark Robinson  a
> > écrit :
> >
> > > Hi,
> > >
> > > When in a sample query I used "timeAllowed" as low as 10mS, I got value
> > for
> > >
> > > "numFound" as say 2000, but no docs were returned. But when I increased
> > the
> > >
> > > value for timeAllowed to be in seconds, never got this scenario.
> > >
> > >
> > >
> > > I have 2 qns:-
> > >
> > > 1. Why does numFound have a value like say 2000 or even 6000 but no
> > >
> > > documents actually returned. During document collection is calculation
> of
> > >
> > > numFound done first and doc collection later?. Is doc list empty
> > because,by
> > >
> > > the time doc collection started the timeAllowed cut off took effect?
> > >
> > >
> > >
> > > 2. If I give timeAllowed a value say, 10s or above do you think the
> above
> > >
> > > scenario of valid count displayed in numFound, but doc list empty can
> > ever
> > >
> > > occur still, as there is more time before cut-off to retrieve at least
> > one
> > >
> > > doc ?
> > >
> > >
> > >
> > > Thanks!
> > >
> > > Mark
> > >
> > >
> >
>


Re: "timeAllowed" param with "numFound" having a count value but doc list is empty

2020-09-16 Thread Bram Van Dam
There are a couple of open issues related to the timeAllowed parameter.
For instance it currently doesn't work on conjunction with the
cursorMark parameter [1]. And on Solr 7 it doesn't work at all [2].

But other than that, when users have a lot of query flexibility, it's a
pretty good idea to limit them somehow. You don't want your users to
blow up your servers.

[1] https://issues.apache.org/jira/browse/SOLR-14413

[2] https://issues.apache.org/jira/browse/SOLR-9882

 - Bram

On 16/09/2020 03:04, Mark Robinson wrote:
> Thanks Dominique!
> So is this parameter generally recommended or not. I wanted to try with a
> value of 10s. We are not using it now.
> My goal is to prevent a query from running more than 10s on the solr server
> and choking it.
> 
> What is the general recommendation.
> 
> Thanks!
> Mark
> 
> On Tue, Sep 15, 2020 at 5:38 PM Dominique Bejean 
> wrote:
> 
>> Hi,
>>
>> 1. Yes, your analysis is correct
>>
>> 2. Yes, it can occurs too with very slow query.
>>
>> Regards
>>
>> Dominique
>>
>> Le mar. 15 sept. 2020 à 15:14, Mark Robinson  a
>> écrit :
>>
>>> Hi,
>>>
>>> When in a sample query I used "timeAllowed" as low as 10mS, I got value
>> for
>>>
>>> "numFound" as say 2000, but no docs were returned. But when I increased
>> the
>>>
>>> value for timeAllowed to be in seconds, never got this scenario.
>>>
>>>
>>>
>>> I have 2 qns:-
>>>
>>> 1. Why does numFound have a value like say 2000 or even 6000 but no
>>>
>>> documents actually returned. During document collection is calculation of
>>>
>>> numFound done first and doc collection later?. Is doc list empty
>> because,by
>>>
>>> the time doc collection started the timeAllowed cut off took effect?
>>>
>>>
>>>
>>> 2. If I give timeAllowed a value say, 10s or above do you think the above
>>>
>>> scenario of valid count displayed in numFound, but doc list empty can
>> ever
>>>
>>> occur still, as there is more time before cut-off to retrieve at least
>> one
>>>
>>> doc ?
>>>
>>>
>>>
>>> Thanks!
>>>
>>> Mark
>>>
>>>
>>
>