Solr 8.0.0 Customized Indexing

2019-06-24 Thread Anuj Bhargava
Customized Indexing date specific

We have a huge database of more than 10 years. How can I index just some of
the records - say for last 30 days. One of the fields in the database is
*date_upload* which contains the date when the record was uploaded.

Currently using Cron to index -
curl -q
http://loaclhost:8983/solr/newsdata/dataimport?command=full-import=true=true
>/dev/null 2>&1


Re: Solr filter query on text fields

2019-06-24 Thread Wei
Thanks Shawn! I didn't notice the asterisks are created during copy/paste,
one lesson learned :)
Does that mean when fq is applied to text fields,  it is doing text match
in the field just like q in a query field?  While for string fields, it is
exact match.
If it is a phrase query,  what are the values for relate parameters such as
ps?

Thanks,
Wei

On Mon, Jun 24, 2019 at 4:51 PM Shawn Heisey  wrote:

> On 6/24/2019 5:37 PM, Wei wrote:
> >  stored="true"/>
>
> I'm assuming that the asterisks here are for emphasis, that they are not
> actually present.  This can be very confusing.  It is far better to
> relay the precise information and not try to emphasize anything.
>
> > For query q=*:*=description:”ice cream”,  the filter query returns
> > matches for “ice cream bar”  and “vanilla ice cream” , but does not match
> > for “ice cold cream”.
> >
> > The results seem neither exact match nor phrase match. What's the
> expected
> > behavior for fq on text fields?  I have tried to look into the solr docs
> > but there is no clear explanation.
>
> If the quotes are present in what you actually sent to Solr, then that
> IS a phrase query.  And that is why it did not match your third example.
>
> Try one of these instead:
>
> q=*:*=description:(ice cream)
>
> q=*:*=description:ice description:cream)
>
> Thanks,
> Shawn
>


Re: Solr filter query on text fields

2019-06-24 Thread Shawn Heisey

On 6/24/2019 5:37 PM, Wei wrote:




I'm assuming that the asterisks here are for emphasis, that they are not 
actually present.  This can be very confusing.  It is far better to 
relay the precise information and not try to emphasize anything.



For query q=*:*=description:”ice cream”,  the filter query returns
matches for “ice cream bar”  and “vanilla ice cream” , but does not match
for “ice cold cream”.

The results seem neither exact match nor phrase match. What's the expected
behavior for fq on text fields?  I have tried to look into the solr docs
but there is no clear explanation.


If the quotes are present in what you actually sent to Solr, then that 
IS a phrase query.  And that is why it did not match your third example.


Try one of these instead:

q=*:*=description:(ice cream)

q=*:*=description:ice description:cream)

Thanks,
Shawn


Solr filter query on text fields

2019-06-24 Thread Wei
Hi,

I have always been using solr fq on string fields. Recently I need to apply
fq on one text field defined as follows:



   

   

   

   

   

   

   


For query q=*:*=description:”ice cream”,  the filter query returns
matches for “ice cream bar”  and “vanilla ice cream” , but does not match
for “ice cold cream”.

The results seem neither exact match nor phrase match. What's the expected
behavior for fq on text fields?  I have tried to look into the solr docs
but there is no clear explanation.

Thanks,
Wei


Replication issue with version 0 index in SOLR 7.5

2019-06-24 Thread Patrick Bordelon
Hi,

We recently upgraded to SOLR 7.5 in AWS, we had previously been running SOLR
6.5. In our current configuration we have our applications broken into a
single instance primary environment and a multi-instance replica environment
separated behind a load balancer for each environment. 

Until recently we've been able to reload the primary without the replicas
updating until there was a full index. However when we upgraded to 7.5 we
started noticing that after terminating and rebuilding a primary instance
that the associated replicas would all start showing 0 documents in all
indexes. After some research we believe we've tracked down the issue.
SOLR-11293.

SOLR-11293 changes

  

This fix changed the way the replication handler checks before updating a
replica when the primary has an empty index. Whether it's from deleting the
old index or from terminating the instance. 

This is the code as it was in 6.5 replication handler

  if (latestVersion == 0L) {
if (forceReplication && commit.getGeneration() != 0) {
  // since we won't get the files for an empty index,
  // we just clear ours and commit
  RefCounted iw =
solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
  try {
iw.get().deleteAll();
  } finally {
iw.decref();
  }
  SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
ModifiableSolrParams());
  solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req,
false));
}


Without forced replication the index on the replica won't perform the
deletaAll operation and will keep the old index until a new index version is
created.

However in 7.5 the code was changed to this.

  if (latestVersion == 0L) {
if (commit.getGeneration() != 0) {
  // since we won't get the files for an empty index,
  // we just clear ours and commit
  log.info("New index in Master. Deleting mine...");
  RefCounted iw =
solrCore.getUpdateHandler().getSolrCoreState().getIndexWriter(solrCore);
  try {
iw.get().deleteAll();
  } finally {
iw.decref();
  }
  assert TestInjection.injectDelayBeforeSlaveCommitRefresh();
  if (skipCommitOnMasterVersionZero) {
openNewSearcherAndUpdateCommitPoint();
  } else {
SolrQueryRequest req = new LocalSolrQueryRequest(solrCore, new
ModifiableSolrParams());
solrCore.getUpdateHandler().commit(new CommitUpdateCommand(req,
false));
  }
}

With the removal of the forceReplication check we believe the replica always
deletes it's index when it detects that a new version 0 index is created. 

This is a problem as we can't afford to have active replicas to have 0
documents on them in the event of a failure of the primary. Since we can't
control the termination on AWS instances this opens up a problem as any
primary outage has a chance of jeopardizing the replicas viability. 

Is there a way to restore this functionality in the current or future
releases? We are willing to upgrade to a later version including the latest
if it will help resolve this problem.

If you suggest we use a load balancer health check to prevent this we
already are. However the load balancer type we are using (application) has a
feature that allows access through it when all instances under it are
failing. This bypasses our health check and still allows the replicas to
poll from the primary even when it's not fully loaded. We can't change load
balancer types as there are other features that we are taking advantage of
and can't change currently.




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Solr Cloud Kerberos cookie rejected spnego

2019-06-24 Thread Rakesh Enjala
Hi Team,

Enabled solrcloud-7.4.0 with kerberos. While creating a collection getting
below error

org.apache.http.impl.auth.HttpAuthenticator; NEGOTIATE authentication
error: No valid credentials provided (Mechanism level: No valid credentials
provided (Mechanism level: Server not found in Kerberos database (7)))
org.apache.http.client.protocol.ResponseProcessCookies; Cookie rejected
[hadoop.auth="", version:0, domain:xxx.xxx.com, path:/, expiry: Illegal
domain attribute "". Domain of origin: "localhost"

enabled krb5 debug true and am able to find the actual problem is that
sname is HTTP/localh...@realm.com, it should be HTTP/@DOMAIN1.COM
 not the localhost

solr.in.sh

SOLR_AUTH_TYPE="kerberos"
SOLR_AUTHENTICATION_OPTS="-DauthenticationPlugin=org.apache.solr.security.KerberosPlugin
-Djava.security.auth.login.config=/solr/jaas.conf
-Dsun.security.krb5.debug=true -Dsolr.kerberos.cookie.domain=
-Dsolr.kerberos.name.rules=DEFAULT -Dsolr.kerberos.principal=HTTP/@
DOMAIN1.COM  -Dsolr.kerberos.keytab=/solr/HTTP.keytab"

Please help me out!
*Regards,*
*Rakesh Enjala*


On Sun, Jun 23, 2019 at 8:04 PM Rakesh Enjala 
wrote:

> Hi Team,
>
> Enabled solrcloud-7.4.0 with kerberos. While creating a collection getting
> below error
>
> org.apache.http.impl.auth.HttpAuthenticator; NEGOTIATE authentication
> error: No valid credentials provided (Mechanism level: No valid credentials
> provided (Mechanism level: Server not found in Kerberos database (7)))
> org.apache.http.client.protocol.ResponseProcessCookies; Cookie rejected
> [hadoop.auth="", version:0, domain:xxx.xxx.com, path:/, expiry:
> Illegal domain attribute "". Domain of origin: "localhost"
>
> enabled krb5 debug true and am able to find the actual problem is that
> sname is HTTP/localh...@realm.com, it should be HTTP/@DOMAIN1.COM not the
> localhost
>
> solr.in.sh
>
> SOLR_AUTH_TYPE="kerberos"
> SOLR_AUTHENTICATION_OPTS="-DauthenticationPlugin=org.apache.solr.security.KerberosPlugin
> -Djava.security.auth.login.config=/solr/jaas.conf
> -Dsun.security.krb5.debug=true -Dsolr.kerberos.cookie.domain=
> -Dsolr.kerberos.name.rules=DEFAULT -Dsolr.kerberos.principal=HTTP/@
> DOMAIN1.COM -Dsolr.kerberos.keytab=/solr/HTTP.keytab"
>
> Please help me out!
> *Regards,*
> *Rakesh Enjala*
>


Solr 8.1.0: WordDelimiterGraphFilterFactory and StopFilterFactory interaction causes java.lang.ArrayIndexOutOfBoundsException

2019-06-24 Thread Arseny Shurunov
Query

textField:(and.and. catch)

causes

{
  "responseHeader":{
"status":500,
"QTime":3396},
  "error":{
"msg":"0",
"trace":"java.lang.ArrayIndexOutOfBoundsException: 0\n\tat
org.apache.lucene.util.QueryBuilder.newSynonymQuery(QueryBuilder.java:653)\n\tat
org.apache.solr.parser.SolrQueryParserBase.newSynonymQuery(SolrQueryParserBase.java:617)\n\tat
org.apache.lucene.util.QueryBuilder.analyzeGraphBoolean(QueryBuilder.java:533)\n\tat
org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:320)\n\tat
org.apache.lucene.util.QueryBuilder.createFieldQuery(QueryBuilder.java:240)\n\tat
org.apache.solr.parser.SolrQueryParserBase.newFieldQuery(SolrQueryParserBase.java:524)\n\tat
org.apache.solr.parser.QueryParser.newFieldQuery(QueryParser.java:62)\n\tat
org.apache.solr.parser.SolrQueryParserBase.getFieldQuery(SolrQueryParserBase.java:1122)\n\tat
org.apache.solr.parser.QueryParser.MultiTerm(QueryParser.java:593)\n\tat
org.apache.solr.parser.QueryParser.Query(QueryParser.java:142)\n\tat
org.apache.solr.parser.QueryParser.Clause(QueryParser.java:282)\n\tat
org.apache.solr.parser.QueryParser.Query(QueryParser.java:162)\n\tat
org.apache.solr.parser.QueryParser.TopLevelQuery(QueryParser.java:131)\n\tat
org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:260)\n\tat
org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:49)\n\tat
org.apache.solr.search.QParser.getQuery(QParser.java:173)\n\tat
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:159)\n\tat
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272)\n\tat
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)\n\tat
org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)\n\tat
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:756)\n\tat
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:542)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:397)\n\tat
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343)\n\tat
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1602)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:540)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)\n\tat
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1588)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1345)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)\n\tat
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)\n\tat
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1557)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)\n\tat
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)\n\tat
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)\n\tat
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:220)\n\tat
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)\n\tat
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)\n\tat
org.eclipse.jetty.server.Server.handle(Server.java:502)\n\tat
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:364)\n\tat
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)\n\tat
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)\n\tat
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)\n\tat
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)\n\tat
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)\n\tat
java.lang.Thread.run(Thread.java:748)\n",
"code":500}}

Bug appeared in 8.1.0 due to change in
Query newSynonymQuery(Term terms[])
in
org.apache.lucene.util.QueryBuilder
and


  


  




  

  

  



  



Re: Solr 6.5.1 SpellCheckCollator StringIndexOutOfBoundsException

2019-06-24 Thread Gonzalo Carracedo
Hi Erick,

It seems to be a very similar issue. However, no one has reply with a
solution for it yet.

Regards,
Gonzalo

On Fri, 21 Jun 2019 at 17:32, Erick Erickson 
wrote:

> Possibly https://issues.apache.org/jira/browse/SOLR-13360
>
> > On Jun 21, 2019, at 4:44 AM, Gonzalo Carracedo <
> gonzalo.carrac...@thecommercepartnership.com> wrote:
> >
> > StringIndexOutOfBoundsException
>
>


RE: Is Solr can do that ?

2019-06-24 Thread Bruno Mannina
Hi Toke,

Thanks for sharing this experience, it's very useful for me to have a first 
overview of what will I need.
If I could resume, I will:
- learn about Tika
- Ask a lot of question like the frequency of add/update solr data
- Number of Users
- CPU/RAM/HDD
- A first test with a representative sample

And of course a good expertise :)

Thanks,
Bruno


-Message d'origine-
De : Toke Eskildsen [mailto:t...@kb.dk]
Envoyé : samedi 22 juin 2019 11:36
À : solr_user lucene_apache
Objet : Re: Is Solr can do that ?

Matheo Software Info  wrote:
> My question is very simple ☺ I would like to know if Solr can process
> around 30To of data (Pdf, Text, Word, etc…) ?

Simple answer: Yes. Assuming 30To means 30 terabyte.

> What is the best way to index this huge data ? several servers ?
> several shards ? other ?

As other participants has mentioned, it is hard to give numbers. What we can do 
is share experience.

We are doing webarchive indexing and I guess there would be quite an overlap 
with your content as we also use Tika. One difference is that the images in a 
webarchive are quite cheap to index, so you'll probably need (relatively) more 
hardware than we use. Very roughly we used 40 CPU-years to index 600 (700? I 
forget) TB of data in one of our runs. Scaling to your 30TB this suggests 
something like 2 CPU-years, or a couple of months for a 16 core machine.

This is just to get a ballpark: You will do yourself a huge favor by building a 
test-setup and process 1 TB or so of your data to get _your_ numbers, before 
you design your indexing setup. It is our experience that the analyzing part 
(Tika) takes much more power than the Solr indexing part: At our last run we 
had 30-40 CPU-cores doing Tika (and related analysis) feeding into a Solr 
running on a 4-core machine on spinning drives.


As for Solr setup for search, then you need to describe in detail what your 
requirements are, before we can give you suggestions. Is the index updated all 
the time, in batches or one-off? How many concurrent users? Are the searches 
interactive or batch-jobs? What kind of aggregations do you need?

In our setup we build separate collections that are merged to single segments 
and never updated. Our use varies between very few interactive users and a lot 
of batch jobs. Scaling this specialized setup to your corpus size would require 
about 3TB of SSD, 64MB RAM and 4 CPU-cores, divided among 4 shards. You are 
likely to need quite a lot more than that, so this is just to say that at this 
scale the use of the index matters _a lot_.

- Toke Eskildsen


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



RE: Is Solr can do that ?

2019-06-24 Thread Bruno Mannina
Hello Erick,

Well I do not know TIKA, I will of course study it.

Thanks for the info concerning solrj and Tika.

Bruno

-Message d'origine-
De : Erick Erickson [mailto:erickerick...@gmail.com]
Envoyé : vendredi 21 juin 2019 19:10
À : solr-user@lucene.apache.org
Objet : Re: Is Solr can do that ?

What Sam said.

Here’s something to get you started on how and why it’s better to be using Tika 
rather than shipping the docs to Solr and having ExtractingRequestHandler do it 
on Solr: https://lucidworks.com/2012/02/14/indexing-with-solrj/

Best,
Erick

> On Jun 21, 2019, at 9:56 AM, Samuel Kasimalla  wrote:
>
> Hi Bruno,
>
> Assuming you meant 30TB, the first step is to use TIka parser and
> convert the rich documents into plain text.
>
> We need the number of documents, the unofficial word on the street is
> about
> 50 million documents per shard, of course a lot of parameters are
> involved in this, it's a simple question but answer is not so simple :).
>
> Hope this helps.
>
> Thanks
> Sam
> https://www.linkedin.com/in/skasimalla/
>
> On Fri, Jun 21, 2019 at 12:49 PM Matheo Software Info <
> i...@matheo-software.com> wrote:
>
>> Dear Solr User,
>>
>>
>>
>> My question is very simple J I would like to know if Solr can process
>> around 30To of data (Pdf, Text, Word, etc…) ?
>>
>>
>>
>> What is the best way to index this huge data ? several servers ?
>> several shards ? other ?
>>
>>
>>
>> Many thanks for your information,
>>
>>
>>
>>
>>
>> Cordialement, Best Regards
>>
>> Bruno Mannina
>>
>> www.matheo-software.com
>>
>> www.patent-pulse.com
>>
>> Tél. +33 0 970 738 743
>>
>> Mob. +33 0 634 421 817
>>
>> [image: facebook (1)] [image:
>> 1425551717] [image: 1425551737]
>> [image: 1425551760]
>> 
>>
>>
>>
>>
>> > _campaign=sig-email_content=emailclient> Garanti sans virus.
>> www.avast.com
>> > _campaign=sig-email_content=emailclient>
>> <#m_149119889610705423_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



RE: Is Solr can do that ?

2019-06-24 Thread Bruno Mannina
Hello Shawn,

Good news that Solr can do that.

I know that with 30Tb of data, hardware will be the first thing to have.
Concerning Expertise, it's the real problem for me.

First I think I will do several tests before seeing how Solr works with
non-xml document (I have only experience with XML documents)

Thanks,
Bruno

On 6/21/2019 10:32 AM, Matheo Software Info wrote:
> My question is very simple JI would like to know if Solr can process
> around 30To of data (Pdf, Text, Word, etc.) ?
>
> What is the best way to index this huge data ? several servers ?
> several shards ? other ?

Sure, Solr can do that.  Whether you have enough resources or expertise
available to accomplish it is an entirely different question.

Handling that much data is likely going to require a LOT of expensive
hardware.  The index will almost certainly need to be sharded.  Knowing
exactly what numbers are involved is impossible with the information
available ... and even with more information, it will most likely require
experimentation with your actual data to find an optimal solution.

https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-don
t-have-a-definitive-answer/

Thanks,
Shawn


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



RE: Is Solr can do that ?

2019-06-24 Thread Bruno Mannina
Hello Sam,

First, thanks for your answer.

I don't know yet the number of document, I know just that it will be Text, Pdf, 
Word, Xls, etc...
I will try to get more info about the number of document.

I don't know TIka, I will investigate it.

Thanks,
Bruno


-Message d'origine-
De : Samuel Kasimalla [mailto:skasima...@gmail.com]
Envoyé : vendredi 21 juin 2019 18:56
À : solr-user@lucene.apache.org
Objet : Re: Is Solr can do that ?

Hi Bruno,

Assuming you meant 30TB, the first step is to use TIka parser and convert the 
rich documents into plain text.

We need the number of documents, the unofficial word on the street is about
50 million documents per shard, of course a lot of parameters are involved in 
this, it's a simple question but answer is not so simple :).

Hope this helps.

Thanks
Sam
https://www.linkedin.com/in/skasimalla/

On Fri, Jun 21, 2019 at 12:49 PM Matheo Software Info < 
i...@matheo-software.com> wrote:

> Dear Solr User,
>
>
>
> My question is very simple J I would like to know if Solr can process
> around 30To of data (Pdf, Text, Word, etc…) ?
>
>
>
> What is the best way to index this huge data ? several servers ?
> several shards ? other ?
>
>
>
> Many thanks for your information,
>
>
>
>
>
> Cordialement, Best Regards
>
> Bruno Mannina
>
> www.matheo-software.com
>
> www.patent-pulse.com
>
> Tél. +33 0 970 738 743
>
> Mob. +33 0 634 421 817
>
> [image: facebook (1)] [image:
> 1425551717] [image: 1425551737]
> [image: 1425551760]
> 
>
>
>
>
>  campaign=sig-email_content=emailclient> Garanti sans virus.
> www.avast.com
>  campaign=sig-email_content=emailclient>
> <#m_149119889610705423_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>


---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus



Re: SolrCloud: Configured socket timeouts not reflecting

2019-06-24 Thread Rahul Goswami
Hi Gus,

Have created a pull request for JIRA 12550
 and updated the affected
Solr version (7.2.1) in the comments. The provided fix is on branch_7_2. I
haven't tried reproducing the issue on the latest version, but see that the
code for this part is different on the master.

Regards,
Rahul

On Thu, Jun 20, 2019 at 8:22 PM Rahul Goswami  wrote:

> Hi Gus,
> Thanks for the response and referencing the umbrella JIRA for these kind
> of issues. I see that it won't solve the problem since the builder object
> which is used to instantiate a ConcurrentUpdateSolrClient itself doesn't
> contain the timeout values. I did create a local solr-core binary to try
> the patch nevertheless, but it didn't help as I anticipated. I'll update
> the JIRA and submit a patch.
>
> Thank you,
> Rahul
>
> On Thu, Jun 20, 2019 at 11:35 AM Gus Heck  wrote:
>
>> Hi Rahul,
>>
>> Did you try the patch int that issue? Also food for thought:
>> https://issues.apache.org/jira/browse/SOLR-13457
>>
>> -Gus
>>
>> On Tue, Jun 18, 2019 at 5:52 PM Rahul Goswami 
>> wrote:
>>
>> > Hello,
>> >
>> > I was looking into the code to try to get to the root of this issue.
>> Looks
>> > like this is an issue after all (as of 7.2.1 which is the version we are
>> > using), but wanted to confirm on the user list before creating a JIRA. I
>> > found that the soTimeout property of ConcurrentUpdateSolrClient class
>> (in
>> > the code referenced below) remains null and hence the default of 60
>> ms
>> > is set as the timeout in HttpPost class instance variable "method".
>> >
>> >
>> https://github.com/apache/lucene-solr/blob/e6f6f352cfc30517235822b3deed83df1ee144c6/solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.java#L334
>> >
>> >
>> > When the call is finally made in the below line, the Httpclient does
>> > contain the configured timeout (as in solr.xml or
>> -DdistribUpdateSoTimeout)
>> > but gets overriden by the hard default of 60 in the "method"
>> parameter
>> > of the execute call.
>> >
>> >
>> >
>> https://github.com/apache/lucene-solr/blob/e6f6f352cfc30517235822b3deed83df1ee144c6/solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.java#L348
>> >
>> >
>> > The hard default of 60 is set here:
>> >
>> >
>> https://github.com/apache/lucene-solr/blob/e6f6f352cfc30517235822b3deed83df1ee144c6/solr/solrj/src/java/org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.java#L333
>> >
>> >
>> > I tried to create a local patch with the below fix which works fine:
>> >
>> >
>> https://github.com/apache/lucene-solr/blob/86fe24cbef238d2042d68494bd94e2362a2d996e/solr/core/src/java/org/apache/solr/update/StreamingSolrClients.java#L69
>> >
>> >
>> >
>> > client = new ErrorReportingConcurrentUpdateSolrClient.Builder(url, req,
>> > errors)
>> >   .withHttpClient(httpClient)
>> >   .withQueueSize(100)
>> >   .withSocketTimeout(getSocketTimeout(req))
>> >   .withThreadCount(runnerCount)
>> >   .withExecutorService(updateExecutor)
>> >   .alwaysStreamDeletes()
>> >   .build();
>> >
>> > private int getSocketTimeout(SolrCmdDistributor.Req req) {
>> > if(req==null) {
>> >   return UpdateShardHandlerConfig.DEFAULT_DISTRIBUPDATESOTIMEOUT;
>> > }
>> >
>> > return
>> >
>> >
>> req.cmd.req.getCore().getCoreContainer().getConfig().getUpdateShardHandlerConfig().getDistributedSocketTimeout();
>> >   }
>> >
>> > I found this open JIRA on this issue:
>> >
>> >
>> >
>> https://issues.apache.org/jira/browse/SOLR-12550?jql=text%20~%20%22distribUpdateSoTimeout%22
>> >
>> >
>> > Should I update the JIRA with this ?
>> >
>> > Thanks,
>> > Rahul
>> >
>> >
>> >
>> >
>> > On Thu, Jun 13, 2019 at 12:00 AM Rahul Goswami 
>> > wrote:
>> >
>> > > Hello,
>> > >
>> > > I am running Solr 7.2.1 in cloud mode. To overcome a setup hardware
>> > > bottleneck, I tried to configure distribUpdateSoTimeout and
>> socketTimeout
>> > > to a value greater than the default 10 mins. I did this by passing
>> these
>> > as
>> > > system properties at Solr start up time (-DdistribUpdateSoTimeout and
>> > > -DsocketTimeout  ). The Solr admin UI shows these values in the
>> Dashboard
>> > > args section. As a test, I tried setting each of them to one hour
>> > > (360). However I start seeing socket read timeouts within a few
>> mins.
>> > > Looks like the values are not taking effect. What am I missing? If
>> this
>> > is
>> > > a known issue, is there a JIRA for it ?
>> > >
>> > > Thanks,
>> > > Rahul
>> > >
>> >
>>
>>
>> --
>> http://www.needhamsoftware.com (work)
>> http://www.the111shift.com (play)
>>
>