Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-11 Thread Doss
Hi Dominique,

Our issues are similar to the one discussed here.
https://github.com/eclipse/jetty.project/issues/4105

Your views on this.

Thanks,
Mohandoss.

On Tue, Aug 11, 2020 at 7:06 AM Doss  wrote:

> Hi Dominique,
>
> Thanks for the response.
>
> I don't think I would use a JVM version 14. OpenJDK 11 in my opinion is
> the best choice for LTS version.
>
> >> We will try changing it.
>
> You change a lot of default values. Any specific raisons ? Il seems very
> aggressive !
>
> >> Our product team wants data to be reflected in Near Real Time.
>  mergePolicyFactory, mergeScheduler - This is based on our oldest SOLR
> cluster where these parameter tweaking gave good results.
>
> You have to analyze GC on all nodes !
>
> >> I checked other nodes GC, found no issues. I shared the node's GC which
> gets into trouble very frequently.
>
> Your heap is very big. According to full GC frequency, I don't think you
> really need such a big heap for only indexing. May be when you will perform
> queries.
>
> >> Heap Sizing is based on the select requests we are expecting. We expect
> it would be around 10 to 15 million per day. We have plans to increase CPU
> before routing select traffics.
>
> Did you check your network performances ?
>
> >> We do checked in sar reports, but unable to figure out an issue, we use
> 10 GBPS connection. Is there any SOLR metric API which will give network
> related information? Please suggest other ways to dig this further.
>
> Did you check Zookeeper logs ?
>
> >> We never looked at the Zookeeper logs, will check and share, is there
> any kind of information to watch out for?
>
> Regards,
> Doss
>
>
> On Monday, August 10, 2020, Dominique Bejean 
> wrote:
>
>> Doss,
>>
>> See below.
>>
>> Dominique
>>
>>
>> Le lun. 10 août 2020 à 17:41, Doss  a écrit :
>>
>>> Hi Dominique,
>>>
>>> Thanks for your response. Find below the details, please do let me know
>>> if anything I missed.
>>>
>>>
>>> *- hardware architecture and sizing*
>>> >> Centos 7, VMs,4CPUs, 66GB RAM, 16GB Heap, 250GB SSD
>>>
>>>
>>> *- JVM version / settings*
>>> >> Red Hat, Inc. OpenJDK 64-Bit Server VM, version:"14.0.1 14.0.1+7" -
>>> Default Settings including GC
>>>
>>
>> I don't think I would use a JVM version 14. OpenJDK 11 in my opinion is
>> the best choice for LTS version.
>>
>>
>>>
>>> *- Solr settings*
>>> >> softCommit: 15000 (15 sec), autoCommit: 30 (5 mins)
>>> >> class="org.apache.solr.index.TieredMergePolicyFactory">>> name="maxMergeAtOnce">30 100
>>> 30.0 
>>>
>>>   >> class="org.apache.lucene.index.ConcurrentMergeScheduler">>> name="maxMergeCount">18>> name="maxThreadCount">6
>>>
>>
>> You change a lot of default values. Any specific raisons ? Il seems very
>> aggressive !
>>
>>
>>>
>>>
>>> *- collections and queries information   *
>>> >> One Collection, with 4 shards , 3 replicas , 3.5 Million Records, 150
>>> columns, mostly integer fields, Average doc size is 350kb. Insert / Updates
>>> 0.5 Million Span across the whole day (peak time being 6PM to 10PM) ,
>>> selects not yet started. Daily once we do delta import of cetrain fields of
>>> type multivalued with some good amount of data.
>>>
>>> *- gc logs or gceasy results*
>>>
>>> Easy GC Report says GC health is good, one server's gc report:
>>> https://drive.google.com/file/d/1C2SqEn0iMbUOXnTNlYi46Gq9kF_CmWss/view?usp=sharing
>>> CPU Load Pattern:
>>> https://drive.google.com/file/d/1rjRMWv5ritf5QxgbFxDa0kPzVlXdbySe/view?usp=sharing
>>>
>>>
>> You have to analyze GC on all nodes !
>> Your heap is very big. According to full GC frequency, I don't think you
>> really need such a big heap for only indexing. May be when you will perform
>> queries.
>>
>> Did you check your network performances ?
>> Did you check Zookeeper logs ?
>>
>>
>>>
>>> Thanks,
>>> Doss.
>>>
>>>
>>>
>>> On Mon, Aug 10, 2020 at 7:39 PM Dominique Bejean <
>>> dominique.bej...@eolya.fr> wrote:
>>>
 Hi Doss,

 See a lot of TIMED_WATING connection occurs with high tcp traffic
 infrastructure as in a LAMP solution when the Apache server can't
 anymore connect to the MySQL/MariaDB database.
 In this case, tweak net.ipv4.tcp_tw_reuse is a possible solution (but
 never net.ipv4.tcp_tw_recycle as you suggested in your previous post).
 This
 is well explained in this great article
 https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux

 However, in general and more specifically in your case, I would
 investigate
 the root cause of your issue and do not try to find a workaround.

 Can you provide more information about your use case (we know : 3 node
 SOLR
 (8.3.1 NRT) + 3 Node Zookeeper Ensemble) ?

- hardware architecture and sizing
- JVM version / settings
- Solr settings
- collections and queries information
- gc logs or gceasy results

 Regards

 Dominique



 Le lun. 10 août 2020 à 15:43, Doss  a écrit :

Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-10 Thread Doss
Hi Dominique,

Thanks for the response.

I don't think I would use a JVM version 14. OpenJDK 11 in my opinion is the
best choice for LTS version.

>> We will try changing it.

You change a lot of default values. Any specific raisons ? Il seems very
aggressive !

>> Our product team wants data to be reflected in Near Real Time.
 mergePolicyFactory, mergeScheduler - This is based on our oldest SOLR
cluster where these parameter tweaking gave good results.

You have to analyze GC on all nodes !

>> I checked other nodes GC, found no issues. I shared the node's GC which
gets into trouble very frequently.

Your heap is very big. According to full GC frequency, I don't think you
really need such a big heap for only indexing. May be when you will perform
queries.

>> Heap Sizing is based on the select requests we are expecting. We expect
it would be around 10 to 15 million per day. We have plans to increase CPU
before routing select traffics.

Did you check your network performances ?

>> We do checked in sar reports, but unable to figure out an issue, we use
10 GBPS connection. Is there any SOLR metric API which will give network
related information? Please suggest other ways to dig this further.

Did you check Zookeeper logs ?

>> We never looked at the Zookeeper logs, will check and share, is there
any kind of information to watch out for?

Regards,
Doss


On Monday, August 10, 2020, Dominique Bejean 
wrote:

> Doss,
>
> See below.
>
> Dominique
>
>
> Le lun. 10 août 2020 à 17:41, Doss  a écrit :
>
>> Hi Dominique,
>>
>> Thanks for your response. Find below the details, please do let me know
>> if anything I missed.
>>
>>
>> *- hardware architecture and sizing*
>> >> Centos 7, VMs,4CPUs, 66GB RAM, 16GB Heap, 250GB SSD
>>
>>
>> *- JVM version / settings*
>> >> Red Hat, Inc. OpenJDK 64-Bit Server VM, version:"14.0.1 14.0.1+7" -
>> Default Settings including GC
>>
>
> I don't think I would use a JVM version 14. OpenJDK 11 in my opinion is
> the best choice for LTS version.
>
>
>>
>> *- Solr settings*
>> >> softCommit: 15000 (15 sec), autoCommit: 30 (5 mins)
>> > class="org.apache.solr.index.TieredMergePolicyFactory">> name="maxMergeAtOnce">30 100
>> 30.0 
>>
>>   > class="org.apache.lucene.index.ConcurrentMergeScheduler">> name="maxMergeCount">186<
>> /mergeScheduler>
>>
>
> You change a lot of default values. Any specific raisons ? Il seems very
> aggressive !
>
>
>>
>>
>> *- collections and queries information   *
>> >> One Collection, with 4 shards , 3 replicas , 3.5 Million Records, 150
>> columns, mostly integer fields, Average doc size is 350kb. Insert / Updates
>> 0.5 Million Span across the whole day (peak time being 6PM to 10PM) ,
>> selects not yet started. Daily once we do delta import of cetrain fields of
>> type multivalued with some good amount of data.
>>
>> *- gc logs or gceasy results*
>>
>> Easy GC Report says GC health is good, one server's gc report:
>> https://drive.google.com/file/d/1C2SqEn0iMbUOXnTNlYi46Gq9kF_
>> CmWss/view?usp=sharing
>> CPU Load Pattern: https://drive.google.com/file/d/
>> 1rjRMWv5ritf5QxgbFxDa0kPzVlXdbySe/view?usp=sharing
>>
>>
> You have to analyze GC on all nodes !
> Your heap is very big. According to full GC frequency, I don't think you
> really need such a big heap for only indexing. May be when you will perform
> queries.
>
> Did you check your network performances ?
> Did you check Zookeeper logs ?
>
>
>>
>> Thanks,
>> Doss.
>>
>>
>>
>> On Mon, Aug 10, 2020 at 7:39 PM Dominique Bejean <
>> dominique.bej...@eolya.fr> wrote:
>>
>>> Hi Doss,
>>>
>>> See a lot of TIMED_WATING connection occurs with high tcp traffic
>>> infrastructure as in a LAMP solution when the Apache server can't
>>> anymore connect to the MySQL/MariaDB database.
>>> In this case, tweak net.ipv4.tcp_tw_reuse is a possible solution (but
>>> never net.ipv4.tcp_tw_recycle as you suggested in your previous post).
>>> This
>>> is well explained in this great article
>>> https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux
>>>
>>> However, in general and more specifically in your case, I would
>>> investigate
>>> the root cause of your issue and do not try to find a workaround.
>>>
>>> Can you provide more information about your use case (we know : 3 node
>>> SOLR
>>> (8.3.1 NRT) + 3 Node Zookeeper Ensemble) ?
>>>
>>>- hardware architecture and sizing
>>>- JVM version / settings
>>>- Solr settings
>>>- collections and queries information
>>>- gc logs or gceasy results
>>>
>>> Regards
>>>
>>> Dominique
>>>
>>>
>>>
>>> Le lun. 10 août 2020 à 15:43, Doss  a écrit :
>>>
>>> > Hi,
>>> >
>>> > In solr 8.3.1 source, I see the following , which I assume could be the
>>> > reason for the issue "Max requests queued per destination 3000
>>> exceeded for
>>> > HttpDestination",
>>> >
>>> > solr/solrj/src/java/org/apache/solr/client/solrj/impl/
>>> Http2SolrClient.java:
>>> >private static final int MAX_OUTSTANDING_REQUESTS = 1000;
>>> > 

Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-10 Thread Dominique Bejean
Doss,

See below.

Dominique


Le lun. 10 août 2020 à 17:41, Doss  a écrit :

> Hi Dominique,
>
> Thanks for your response. Find below the details, please do let me know if
> anything I missed.
>
>
> *- hardware architecture and sizing*
> >> Centos 7, VMs,4CPUs, 66GB RAM, 16GB Heap, 250GB SSD
>
>
> *- JVM version / settings*
> >> Red Hat, Inc. OpenJDK 64-Bit Server VM, version:"14.0.1 14.0.1+7" -
> Default Settings including GC
>

I don't think I would use a JVM version 14. OpenJDK 11 in my opinion is the
best choice for LTS version.


>
> *- Solr settings*
> >> softCommit: 15000 (15 sec), autoCommit: 30 (5 mins)
>  class="org.apache.solr.index.TieredMergePolicyFactory"> name="maxMergeAtOnce">30 100
> 30.0 
>
>class="org.apache.lucene.index.ConcurrentMergeScheduler"> name="maxMergeCount">18 name="maxThreadCount">6
>

You change a lot of default values. Any specific raisons ? Il seems very
aggressive !


>
>
> *- collections and queries information   *
> >> One Collection, with 4 shards , 3 replicas , 3.5 Million Records, 150
> columns, mostly integer fields, Average doc size is 350kb. Insert / Updates
> 0.5 Million Span across the whole day (peak time being 6PM to 10PM) ,
> selects not yet started. Daily once we do delta import of cetrain fields of
> type multivalued with some good amount of data.
>
> *- gc logs or gceasy results*
>
> Easy GC Report says GC health is good, one server's gc report:
> https://drive.google.com/file/d/1C2SqEn0iMbUOXnTNlYi46Gq9kF_CmWss/view?usp=sharing
> CPU Load Pattern:
> https://drive.google.com/file/d/1rjRMWv5ritf5QxgbFxDa0kPzVlXdbySe/view?usp=sharing
>
>
You have to analyze GC on all nodes !
Your heap is very big. According to full GC frequency, I don't think you
really need such a big heap for only indexing. May be when you will perform
queries.

Did you check your network performances ?
Did you check Zookeeper logs ?


>
> Thanks,
> Doss.
>
>
>
> On Mon, Aug 10, 2020 at 7:39 PM Dominique Bejean <
> dominique.bej...@eolya.fr> wrote:
>
>> Hi Doss,
>>
>> See a lot of TIMED_WATING connection occurs with high tcp traffic
>> infrastructure as in a LAMP solution when the Apache server can't
>> anymore connect to the MySQL/MariaDB database.
>> In this case, tweak net.ipv4.tcp_tw_reuse is a possible solution (but
>> never net.ipv4.tcp_tw_recycle as you suggested in your previous post).
>> This
>> is well explained in this great article
>> https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux
>>
>> However, in general and more specifically in your case, I would
>> investigate
>> the root cause of your issue and do not try to find a workaround.
>>
>> Can you provide more information about your use case (we know : 3 node
>> SOLR
>> (8.3.1 NRT) + 3 Node Zookeeper Ensemble) ?
>>
>>- hardware architecture and sizing
>>- JVM version / settings
>>- Solr settings
>>- collections and queries information
>>- gc logs or gceasy results
>>
>> Regards
>>
>> Dominique
>>
>>
>>
>> Le lun. 10 août 2020 à 15:43, Doss  a écrit :
>>
>> > Hi,
>> >
>> > In solr 8.3.1 source, I see the following , which I assume could be the
>> > reason for the issue "Max requests queued per destination 3000 exceeded
>> for
>> > HttpDestination",
>> >
>> >
>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>> >private static final int MAX_OUTSTANDING_REQUESTS = 1000;
>> >
>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>> >  available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false);
>> >
>> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>> >  return MAX_OUTSTANDING_REQUESTS * 3;
>> >
>> > how can I increase this?
>> >
>> > On Mon, Aug 10, 2020 at 12:01 AM Doss  wrote:
>> >
>> > > Hi,
>> > >
>> > > We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble now
>> and
>> > > then we are facing "Max requests queued per destination 3000 exceeded
>> for
>> > > HttpDestination"
>> > >
>> > > After restart evering thing starts working fine until another problem.
>> > > Once a problem occurred we are seeing soo many TIMED_WAITING threads
>> > >
>> > > Server 1:
>> > >*7722*  Threads are in TIMED_WATING
>> > >
>> >
>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
>> > > ")
>> > > Server 2:
>> > >*4046*   Threads are in TIMED_WATING
>> > >
>> >
>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
>> > > ")
>> > > Server 3:
>> > >*4210*   Threads are in TIMED_WATING
>> > >
>> >
>> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
>> > > ")
>> > >
>> > > Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how can
>> we
>> > > increase the 3000 limit?
>> > >
>> > > Sorry, since I haven't got any response to my previous query,  I am
>> > > creating this as new,
>> > >
>> > > Thanks,
>> > > Mohandoss.
>> > >
>> >
>>
>


Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-10 Thread Doss
Hi Dominique,

Thanks for your response. Find below the details, please do let me know if
anything I missed.


*- hardware architecture and sizing*
>> Centos 7, VMs,4CPUs, 66GB RAM, 16GB Heap, 250GB SSD


*- JVM version / settings*
>> Red Hat, Inc. OpenJDK 64-Bit Server VM, version:"14.0.1 14.0.1+7" -
Default Settings including GC


*- Solr settings*
>> softCommit: 15000 (15 sec), autoCommit: 30 (5 mins)
30 100
30.0 

  186


*- collections and queries information   *
>> One Collection, with 4 shards , 3 replicas , 3.5 Million Records, 150
columns, mostly integer fields, Average doc size is 350kb. Insert / Updates
0.5 Million Span across the whole day (peak time being 6PM to 10PM) ,
selects not yet started. Daily once we do delta import of cetrain fields of
type multivalued with some good amount of data.

*- gc logs or gceasy results*

Easy GC Report says GC health is good, one server's gc report:
https://drive.google.com/file/d/1C2SqEn0iMbUOXnTNlYi46Gq9kF_CmWss/view?usp=sharing
CPU Load Pattern:
https://drive.google.com/file/d/1rjRMWv5ritf5QxgbFxDa0kPzVlXdbySe/view?usp=sharing



Thanks,
Doss.



On Mon, Aug 10, 2020 at 7:39 PM Dominique Bejean 
wrote:

> Hi Doss,
>
> See a lot of TIMED_WATING connection occurs with high tcp traffic
> infrastructure as in a LAMP solution when the Apache server can't
> anymore connect to the MySQL/MariaDB database.
> In this case, tweak net.ipv4.tcp_tw_reuse is a possible solution (but
> never net.ipv4.tcp_tw_recycle as you suggested in your previous post). This
> is well explained in this great article
> https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux
>
> However, in general and more specifically in your case, I would investigate
> the root cause of your issue and do not try to find a workaround.
>
> Can you provide more information about your use case (we know : 3 node SOLR
> (8.3.1 NRT) + 3 Node Zookeeper Ensemble) ?
>
>- hardware architecture and sizing
>- JVM version / settings
>- Solr settings
>- collections and queries information
>- gc logs or gceasy results
>
> Regards
>
> Dominique
>
>
>
> Le lun. 10 août 2020 à 15:43, Doss  a écrit :
>
> > Hi,
> >
> > In solr 8.3.1 source, I see the following , which I assume could be the
> > reason for the issue "Max requests queued per destination 3000 exceeded
> for
> > HttpDestination",
> >
> >
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
> >private static final int MAX_OUTSTANDING_REQUESTS = 1000;
> >
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
> >  available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false);
> >
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
> >  return MAX_OUTSTANDING_REQUESTS * 3;
> >
> > how can I increase this?
> >
> > On Mon, Aug 10, 2020 at 12:01 AM Doss  wrote:
> >
> > > Hi,
> > >
> > > We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble now
> and
> > > then we are facing "Max requests queued per destination 3000 exceeded
> for
> > > HttpDestination"
> > >
> > > After restart evering thing starts working fine until another problem.
> > > Once a problem occurred we are seeing soo many TIMED_WAITING threads
> > >
> > > Server 1:
> > >*7722*  Threads are in TIMED_WATING
> > >
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
> > > ")
> > > Server 2:
> > >*4046*   Threads are in TIMED_WATING
> > >
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
> > > ")
> > > Server 3:
> > >*4210*   Threads are in TIMED_WATING
> > >
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
> > > ")
> > >
> > > Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how can
> we
> > > increase the 3000 limit?
> > >
> > > Sorry, since I haven't got any response to my previous query,  I am
> > > creating this as new,
> > >
> > > Thanks,
> > > Mohandoss.
> > >
> >
>


Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-10 Thread Dominique Bejean
Hi Doss,

See a lot of TIMED_WATING connection occurs with high tcp traffic
infrastructure as in a LAMP solution when the Apache server can't
anymore connect to the MySQL/MariaDB database.
In this case, tweak net.ipv4.tcp_tw_reuse is a possible solution (but
never net.ipv4.tcp_tw_recycle as you suggested in your previous post). This
is well explained in this great article
https://vincent.bernat.ch/en/blog/2014-tcp-time-wait-state-linux

However, in general and more specifically in your case, I would investigate
the root cause of your issue and do not try to find a workaround.

Can you provide more information about your use case (we know : 3 node SOLR
(8.3.1 NRT) + 3 Node Zookeeper Ensemble) ?

   - hardware architecture and sizing
   - JVM version / settings
   - Solr settings
   - collections and queries information
   - gc logs or gceasy results

Regards

Dominique



Le lun. 10 août 2020 à 15:43, Doss  a écrit :

> Hi,
>
> In solr 8.3.1 source, I see the following , which I assume could be the
> reason for the issue "Max requests queued per destination 3000 exceeded for
> HttpDestination",
>
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>private static final int MAX_OUTSTANDING_REQUESTS = 1000;
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>  available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false);
> solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
>  return MAX_OUTSTANDING_REQUESTS * 3;
>
> how can I increase this?
>
> On Mon, Aug 10, 2020 at 12:01 AM Doss  wrote:
>
> > Hi,
> >
> > We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble now and
> > then we are facing "Max requests queued per destination 3000 exceeded for
> > HttpDestination"
> >
> > After restart evering thing starts working fine until another problem.
> > Once a problem occurred we are seeing soo many TIMED_WAITING threads
> >
> > Server 1:
> >*7722*  Threads are in TIMED_WATING
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
> > ")
> > Server 2:
> >*4046*   Threads are in TIMED_WATING
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
> > ")
> > Server 3:
> >*4210*   Threads are in TIMED_WATING
> >
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
> > ")
> >
> > Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how can we
> > increase the 3000 limit?
> >
> > Sorry, since I haven't got any response to my previous query,  I am
> > creating this as new,
> >
> > Thanks,
> > Mohandoss.
> >
>


Re: Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-10 Thread Doss
Hi,

In solr 8.3.1 source, I see the following , which I assume could be the
reason for the issue "Max requests queued per destination 3000 exceeded for
HttpDestination",

solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
   private static final int MAX_OUTSTANDING_REQUESTS = 1000;
solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
 available = new Semaphore(MAX_OUTSTANDING_REQUESTS, false);
solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java:
 return MAX_OUTSTANDING_REQUESTS * 3;

how can I increase this?

On Mon, Aug 10, 2020 at 12:01 AM Doss  wrote:

> Hi,
>
> We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble now and
> then we are facing "Max requests queued per destination 3000 exceeded for
> HttpDestination"
>
> After restart evering thing starts working fine until another problem.
> Once a problem occurred we are seeing soo many TIMED_WAITING threads
>
> Server 1:
>*7722*  Threads are in TIMED_WATING
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
> ")
> Server 2:
>*4046*   Threads are in TIMED_WATING
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
> ")
> Server 3:
>*4210*   Threads are in TIMED_WATING
> ("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
> ")
>
> Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how can we
> increase the 3000 limit?
>
> Sorry, since I haven't got any response to my previous query,  I am
> creating this as new,
>
> Thanks,
> Mohandoss.
>


Production Issue: TIMED_WAITING - Will net.ipv4.tcp_tw_reuse=1 help?

2020-08-09 Thread Doss
Hi,

We are having 3 node SOLR (8.3.1 NRT) + 3 Node Zookeeper Ensemble now and
then we are facing "Max requests queued per destination 3000 exceeded for
HttpDestination"

After restart evering thing starts working fine until another problem. Once
a problem occurred we are seeing soo many TIMED_WAITING threads

Server 1:
   *7722*  Threads are in TIMED_WATING
("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@151d5f2f
")
Server 2:
   *4046*   Threads are in TIMED_WATING
("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@1e0205c3
")
Server 3:
   *4210*   Threads are in TIMED_WATING
("lock":"java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5ee792c0
")

Please suggest whether net.ipv4.tcp_tw_reuse=1 will help ? or how can we
increase the 3000 limit?

Sorry, since I haven't got any response to my previous query,  I am
creating this as new,

Thanks,
Mohandoss.