Re: Frequent recovery of nodes in SolrCloud

2014-10-16 Thread Sachin Kale
Also, the PingRequestHandler is configured as:


server-enabled.txt


On Fri, Oct 17, 2014 at 9:07 AM, Sachin Kale  wrote:

> From ZooKeeper side, we have following configuration:
> tickTime=2000
> dataDir=/var/lib/zookeeper
> clientPort=2181
> initLimit=5
> syncLimit=2
> server.1=192.168.70.27:2888:3888
> server.2=192.168.70.64:2889:3889
> server.3=192.168.70.26:2889:3889
>
> Also, in solr.xml, we have zkClientTimeout set to 3.
>
> On Fri, Oct 17, 2014 at 7:27 AM, Erick Erickson 
> wrote:
>
>> And what is your zookeeper timeout? When it's too short that can lead
>> to this behavior.
>>
>> Best,
>> Erick
>>
>> On Thu, Oct 16, 2014 at 4:34 PM, "Jürgen Wagner (DVT)"
>>  wrote:
>> > Hello,
>> >   you have one shard and 11 replicas? Hmm...
>> >
>> > - Why you have to keep two nodes on some machines?
>> > - Physical hardware or virtual machines?
>> > - What is the size of this index?
>> > - Is this all on a local network or are there links with potential
>> outages
>> > or failures in between?
>> > - What is the query load?
>> > - Have you had a look at garbage collection?
>> > - Do you use the internal Zookeeper?
>> > - How many nodes?
>> > - Any observers?
>> > - What kind of load does Zookeeper show?
>> > - How much RAM do these nodes have available?
>> > - Do some servers get into swapping?
>> > - ...
>> >
>> > How about some more details in terms of sizing and topology?
>> >
>> > Cheers,
>> > --Jürgen
>> >
>> >
>> > On 16.10.2014 18:41, sachinpkale wrote:
>> >
>> > Hi,
>> >
>> > Recently we have shifted to SolrCloud (4.10.1) from traditional
>> Master-Slave
>> > configuration. We have only one collection and it has only only one
>> shard.
>> > Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we
>> have
>> > two instances running on each) out of which one is leader.
>> >
>> > Whenever I see the cluster status using http://:/solr/#/~cloud,
>> it
>> > shows at least one (sometimes, it is 2-3) node status as recovering. We
>> are
>> > using HAProxy load balancer and there also many times, it is showing the
>> > nodes are recovering. This is happening for all nodes in the cluster.
>> >
>> > What would be the problem here? How do I check this in logs?
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> >
>> http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >
>> >
>> >
>> > --
>> >
>> > Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
>> > уважением
>> > i.A. Jürgen Wagner
>> > Head of Competence Center "Intelligence"
>> > & Senior Cloud Consultant
>> >
>> > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
>> > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864
>> 1543
>> > E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
>> >
>> > 
>> > Managing Board: Jürgen Hatzipantelis (CEO)
>> > Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
>> > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
>> >
>> >
>>
>
>


Re: Frequent recovery of nodes in SolrCloud

2014-10-16 Thread Sachin Kale
>From ZooKeeper side, we have following configuration:
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.70.27:2888:3888
server.2=192.168.70.64:2889:3889
server.3=192.168.70.26:2889:3889

Also, in solr.xml, we have zkClientTimeout set to 3.

On Fri, Oct 17, 2014 at 7:27 AM, Erick Erickson 
wrote:

> And what is your zookeeper timeout? When it's too short that can lead
> to this behavior.
>
> Best,
> Erick
>
> On Thu, Oct 16, 2014 at 4:34 PM, "Jürgen Wagner (DVT)"
>  wrote:
> > Hello,
> >   you have one shard and 11 replicas? Hmm...
> >
> > - Why you have to keep two nodes on some machines?
> > - Physical hardware or virtual machines?
> > - What is the size of this index?
> > - Is this all on a local network or are there links with potential
> outages
> > or failures in between?
> > - What is the query load?
> > - Have you had a look at garbage collection?
> > - Do you use the internal Zookeeper?
> > - How many nodes?
> > - Any observers?
> > - What kind of load does Zookeeper show?
> > - How much RAM do these nodes have available?
> > - Do some servers get into swapping?
> > - ...
> >
> > How about some more details in terms of sizing and topology?
> >
> > Cheers,
> > --Jürgen
> >
> >
> > On 16.10.2014 18:41, sachinpkale wrote:
> >
> > Hi,
> >
> > Recently we have shifted to SolrCloud (4.10.1) from traditional
> Master-Slave
> > configuration. We have only one collection and it has only only one
> shard.
> > Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we
> have
> > two instances running on each) out of which one is leader.
> >
> > Whenever I see the cluster status using http://:/solr/#/~cloud,
> it
> > shows at least one (sometimes, it is 2-3) node status as recovering. We
> are
> > using HAProxy load balancer and there also many times, it is showing the
> > nodes are recovering. This is happening for all nodes in the cluster.
> >
> > What would be the problem here? How do I check this in logs?
> >
> >
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
> >
> >
> > --
> >
> > Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
> > уважением
> > i.A. Jürgen Wagner
> > Head of Competence Center "Intelligence"
> > & Senior Cloud Consultant
> >
> > Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
> > Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864
> 1543
> > E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
> >
> > 
> > Managing Board: Jürgen Hatzipantelis (CEO)
> > Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
> > Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
> >
> >
>


Re: Frequent recovery of nodes in SolrCloud

2014-10-16 Thread Sachin Kale
- Why you have to keep two nodes on some machines?
- These are very powerful machines (32-Core, 64GB) and our index size
is 1GB. We are allocating 7GB to JVM, so we thought it would be OK to have
two instances on the same machine.

- Physical hardware or virtual machines?
- Physical hardware

- What is the size of this index?
- 1GB

- Is this all on a local network or are there links with potential outages
or failures in between?
- local network

- What is the query load?
- 10K requests per minute.

- Have you had a look at garbage collection?
- GC time is generally 5-10%. I have attached a screenshot.

- Do you use the internal Zookeeper?
   - No. We have setup external Zookeeper ensemble with 3 instances.
Following is the ZooKeeper configuration:

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=192.168.70.27:2888:3888
server.2=192.168.70.64:2889:3889
server.3=192.168.70.26:2889:3889

Also, in solr.xml, we have zkClientTimeout set to 3.

- How many nodes?
- 3
- Any observers?
- I don't know what observers are. Can you please explain?

- What kind of load does Zookeeper show?
- Load is normal I guess. Need to double-check.

- How much RAM do these nodes have available?
   - Each SOLR node has 7GB allocated. For ZooKeeper, we have not allocated
the memory explicitly.

- Do some servers get into swapping?
- Not sure. How do I check that?


On Fri, Oct 17, 2014 at 2:04 AM, "Jürgen Wagner (DVT)" <
juergen.wag...@devoteam.com> wrote:

>  Hello,
>   you have one shard and 11 replicas? Hmm...
>
> - Why you have to keep two nodes on some machines?
> - Physical hardware or virtual machines?
> - What is the size of this index?
> - Is this all on a local network or are there links with potential outages
> or failures in between?
> - What is the query load?
> - Have you had a look at garbage collection?
> - Do you use the internal Zookeeper?
> - How many nodes?
> - Any observers?
> - What kind of load does Zookeeper show?
> - How much RAM do these nodes have available?
> - Do some servers get into swapping?
> - ...
>
> How about some more details in terms of sizing and topology?
>
> Cheers,
> --Jürgen
>
>
> On 16.10.2014 18:41, sachinpkale wrote:
>
> Hi,
>
> Recently we have shifted to SolrCloud (4.10.1) from traditional Master-Slave
> configuration. We have only one collection and it has only only one shard.
> Cloud Cluster contains total 12 nodes (on 8 machines. On 4 machiens, we have
> two instances running on each) out of which one is leader.
>
> Whenever I see the cluster status using http://:/solr/#/~cloud, it
> shows at least one (sometimes, it is 2-3) node status as recovering. We are
> using HAProxy load balancer and there also many times, it is showing the
> nodes are recovering. This is happening for all nodes in the cluster.
>
> What would be the problem here? How do I check this in logs?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Frequent-recovery-of-nodes-in-SolrCloud-tp4164541.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
>
> --
>
> Mit freundlichen Grüßen/Kind regards/Cordialement vôtre/Atentamente/С
> уважением
> *i.A. Jürgen Wagner*
> Head of Competence Center "Intelligence"
> & Senior Cloud Consultant
>
> Devoteam GmbH, Industriestr. 3, 70565 Stuttgart, Germany
> Phone: +49 6151 868-8725, Fax: +49 711 13353-53, Mobile: +49 171 864 1543
> E-Mail: juergen.wag...@devoteam.com, URL: www.devoteam.de
> --
> Managing Board: Jürgen Hatzipantelis (CEO)
> Address of Record: 64331 Weiterstadt, Germany; Commercial Register:
> Amtsgericht Darmstadt HRB 6450; Tax Number: DE 172 993 071
>
>
>


Manual leader election in SolrCloud

2014-10-13 Thread Sachin Kale
Is it possible to elect the leader manually in SOLR Cloud 4.10.1?


-Sachin-


Re: Master-Slave setup using SolrCloud

2014-10-04 Thread Sachin Kale
Apparently, there is a bug in Solr 4.10.0 which was causing the
NullPointerExceptions. SOLR-6501
<https://issues.apache.org/jira/browse/SOLR-6501>
We have updated our production SOLR to 4.10.1


On Thu, Oct 2, 2014 at 8:13 PM, Sachin Kale  wrote:

> If I look into the logs, many times I get only following line without any
> stacktrace:
>
> *ERROR - 2014-10-02 19:35:25.516; org.apache.solr.common.SolrException;
> java.lang.NullPointerException*
>
> These exceptions are not coming continuously. Once in every 10-15 minutes.
> But once it starts, there are continuous 800-1000 such exceptions one after
> another. Is it related to cache warmup?
>
> I can provide following information regarding the setup:
> We are now on using Solr 4.10.0
> Memory allocated to each SOLR instance is 7GB. I guess it is more than
> sufficient for 1 GB index, right?
> Indexes are stored as normal, local filesystem.
> I am using three caches:
> Query Cache: Size 4096, autoWarmCount 2048
> Filter cache: size 8192, autoWarmCount 4096
> Document cache: size 4096
>
> I am experimenting with commitMaxTime for both soft and hard commits
>
> After referring following:
>
> http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Hence, I set following:
>
> 
> ${solr.autoCommit.maxTime:6}
> false
> 
>
> 
> ${solr.autoSoftCommit.maxTime:90}
> 
>
> Also, we are getting following warnings many times:
>
> *java.lang.NumberFormatException: For input string: "5193.0"*
>
> Earlier we were on SOLR 4.4.0 and when we are upgraded to 4.10.0, we
> pointed it to the same index we were using for 4.4.0
>
> On Thu, Oct 2, 2014 at 7:11 PM, Shawn Heisey  wrote:
>
>> On 10/2/2014 6:58 AM, Sachin Kale wrote:
>> > We are trying to move our traditional master-slave Solr configuration to
>> > SolrCloud. As our index size is very small (around 1 GB), we are having
>> > only one shard.
>> > So basically, we are having same master-slave configuration with one
>> leader
>> > and 6 replicas.
>> > We are experimenting with maxTime of both AutoCommit and AutoSoftCommit.
>> > Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1
>> minute
>> > (Let me know if these values does not make sense).
>> >
>> > Caches are set such that warmup time is at most 20 seconds.
>> >
>> > We are having continuous indexing requests mostly for updating the
>> existing
>> > documents. Few requests are for deleting/adding the documents.
>> >
>> > The problem we are facing is that we are getting very frequent
>> > NullPointerExceptions.
>> > We get continuous 200-300 such exceptions within a period of 30 seconds
>> and
>> > for next few minutes, it works fine.
>> >
>> > Stacktrace of NullPointerException:
>> >
>> > *ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException;
>> > null:java.lang.NullPointerException*
>> > *at
>> >
>> org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)*
>> > *at
>> >
>> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)*
>> > *at
>> >
>> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)*
>> >
>> > ​
>> > I am not sure what would be causing it. My guess, whenever, it is
>> trying to
>> > replay tlog, we are getting these exceptions. Is anything wrong in my
>> > configuration?
>>
>> Your automatic commit settings are fine.  If you had tried to use a very
>> small maxTime like 1000 (1 second), I would tell you that it's probably
>> too short.
>>
>> The tlogs only get replayed when a core is first started or reloaded.
>> These appear to be errors during queries, having nothing at all to do
>> with indexing.
>>
>> I can't be sure with the available information (no Solr version,
>> incomplete stacktrace, no info about what request caused and received
>> the error), but if I had to guess, I'd say you probably changed your
>> schema so that certain fields are now required that weren't required
>> before, and didn't reindex, so those fields are not present on every
>> document.  Or it might be that you added a uniqueKey and didn't reindex,
>> and that field is not present on every document.
>>
>> http://wiki.apache.org/solr/HowToReindex
>>
>> Thanks,
>> Shawn
>>
>>
>


Re: Master-Slave setup using SolrCloud

2014-10-02 Thread Sachin Kale
If I look into the logs, many times I get only following line without any
stacktrace:

*ERROR - 2014-10-02 19:35:25.516; org.apache.solr.common.SolrException;
java.lang.NullPointerException*

These exceptions are not coming continuously. Once in every 10-15 minutes.
But once it starts, there are continuous 800-1000 such exceptions one after
another. Is it related to cache warmup?

I can provide following information regarding the setup:
We are now on using Solr 4.10.0
Memory allocated to each SOLR instance is 7GB. I guess it is more than
sufficient for 1 GB index, right?
Indexes are stored as normal, local filesystem.
I am using three caches:
Query Cache: Size 4096, autoWarmCount 2048
Filter cache: size 8192, autoWarmCount 4096
Document cache: size 4096

I am experimenting with commitMaxTime for both soft and hard commits

After referring following:
http://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Hence, I set following:


${solr.autoCommit.maxTime:6}
false



${solr.autoSoftCommit.maxTime:90}


Also, we are getting following warnings many times:

*java.lang.NumberFormatException: For input string: "5193.0"*

Earlier we were on SOLR 4.4.0 and when we are upgraded to 4.10.0, we
pointed it to the same index we were using for 4.4.0

On Thu, Oct 2, 2014 at 7:11 PM, Shawn Heisey  wrote:

> On 10/2/2014 6:58 AM, Sachin Kale wrote:
> > We are trying to move our traditional master-slave Solr configuration to
> > SolrCloud. As our index size is very small (around 1 GB), we are having
> > only one shard.
> > So basically, we are having same master-slave configuration with one
> leader
> > and 6 replicas.
> > We are experimenting with maxTime of both AutoCommit and AutoSoftCommit.
> > Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1
> minute
> > (Let me know if these values does not make sense).
> >
> > Caches are set such that warmup time is at most 20 seconds.
> >
> > We are having continuous indexing requests mostly for updating the
> existing
> > documents. Few requests are for deleting/adding the documents.
> >
> > The problem we are facing is that we are getting very frequent
> > NullPointerExceptions.
> > We get continuous 200-300 such exceptions within a period of 30 seconds
> and
> > for next few minutes, it works fine.
> >
> > Stacktrace of NullPointerException:
> >
> > *ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException;
> > null:java.lang.NullPointerException*
> > *at
> >
> org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)*
> > *at
> >
> org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)*
> > *at
> >
> org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)*
> >
> > ​
> > I am not sure what would be causing it. My guess, whenever, it is trying
> to
> > replay tlog, we are getting these exceptions. Is anything wrong in my
> > configuration?
>
> Your automatic commit settings are fine.  If you had tried to use a very
> small maxTime like 1000 (1 second), I would tell you that it's probably
> too short.
>
> The tlogs only get replayed when a core is first started or reloaded.
> These appear to be errors during queries, having nothing at all to do
> with indexing.
>
> I can't be sure with the available information (no Solr version,
> incomplete stacktrace, no info about what request caused and received
> the error), but if I had to guess, I'd say you probably changed your
> schema so that certain fields are now required that weren't required
> before, and didn't reindex, so those fields are not present on every
> document.  Or it might be that you added a uniqueKey and didn't reindex,
> and that field is not present on every document.
>
> http://wiki.apache.org/solr/HowToReindex
>
> Thanks,
> Shawn
>
>


Master-Slave setup using SolrCloud

2014-10-02 Thread Sachin Kale
Hello,

We are trying to move our traditional master-slave Solr configuration to
SolrCloud. As our index size is very small (around 1 GB), we are having
only one shard.
So basically, we are having same master-slave configuration with one leader
and 6 replicas.
We are experimenting with maxTime of both AutoCommit and AutoSoftCommit.
Currently, autoCommit maxTime is 15 minutes and autoSoftCommit is 1 minute
(Let me know if these values does not make sense).

Caches are set such that warmup time is at most 20 seconds.

We are having continuous indexing requests mostly for updating the existing
documents. Few requests are for deleting/adding the documents.

The problem we are facing is that we are getting very frequent
NullPointerExceptions.
We get continuous 200-300 such exceptions within a period of 30 seconds and
for next few minutes, it works fine.

Stacktrace of NullPointerException:

*ERROR - 2014-10-02 18:09:38.464; org.apache.solr.common.SolrException;
null:java.lang.NullPointerException*
*at
org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1257)*
*at
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:720)*
*at
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:695)*

​
I am not sure what would be causing it. My guess, whenever, it is trying to
replay tlog, we are getting these exceptions. Is anything wrong in my
configuration?


-Sachin-