Hey Roland!

When you look at the Admin UI in the Cloud tab, do you see both
instances as active?

Also, how are you querying the nodes?

One tricky thing at the moment is that if you are not using a 'smart'
client like the solrj CloudSolrServer, and you directly query the node
that never recovered - I *think* it will happily respond to querys.

We should probably look into that and file a JIRA issue if it's a
problem. Perhaps we need some defensive checks so that a node that did
not recover properly won't serve queries if that is not already
enforced.

Part of the issue is that we continue serving queries even if we are
not connected to zookeeper. Perhaps we need to be looking at our last
publish state as the defensive check and only serve queries if that
was active.

Now it's another issue if the node said it was active and it hadn't
fully recovered. That we would want to investigate.

- Mark

On Tue, Dec 11, 2012 at 6:23 AM, Roland Villemoes <r...@alpha-solutions.dk> 
wrote:
> Hi There,
>
>
>
> We have a 2 instance/1 shards setup running Solr 4
> (4.0.0.2012.10.06.03.04.33).
>
> Each instance running on each on server, running a separate ZooKeeper on one
> of these machines.
>
>
>
> We had a bad experience that originated from a network error:
>
>
>
> From the log:
>
> Caused by: java.io.IOException: Connection reset by peer
>
> at sun.nio.ch.FileDispatcher.read0(Native Method)
>
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:251)
>
> at sun.nio.ch.IOUtil.read(IOUtil.java:218)
>
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:254)
>
> at org.mortbay.io.nio.ChannelEndPoint.fill(ChannelEndPoint.java:131)
>
>
>
>
>
> Solr tries to do a commit, and then we see this in the log – as something is
> wrong and it tries to recover:
>
>
>
> Dec 8, 2012 2:36:33 AM org.apache.solr.common.cloud.ZkStateReader$2 process
>
> INFO: A cluster state change has occurred - updating...
>
> Dec 8, 2012 2:36:34 AM org.apache.solr.cloud.RecoveryStrategy run
>
> INFO: Starting recovery process.  core=default1_English
> recoveringAfterStartup=true
>
>
>
> It seems like it have problems getting in contact with ZooKeeper due to the
> network problems.
>
>
>
> INFO: Unable to reconnect to ZooKeeper service, session 0x13b6666d0ed00d4
> has expired, closing socket connection
>
>
>
> Problem is: The solr established itselves with around 30% of the documents
> that was in the other index. I would have liked it to withdraw from the
> cluster and leave all handling of queries to the other server.
>
> When network worked again the solr instances still stayed like this having
> the full index on one server and 30% on the other. This resulted in “funny”
> results from queries – sometimes corrects sometimes not.
>
>
>
>
>
> med venlig hilsen/best regards
>
>
>
> Roland Villemoes
>
> Tel: (+45) 22 69 59 62
>
> E-Mail: mailto:r...@alpha-solutions.dk
>
>
>
> Alpha Solutions A/S
>
> Sølvgade 10, 1.sal, 1307 København K
>
> Tel: (+45) 70 20 65 38
>
> Web: http://www.alpha-solutions.dk
>
>
>
> ** This message including any attachments may contain confidential and/or
> privileged information intended only for the person or entity to which it is
> addressed. If you are not the intended recipient you should delete this
> message. Any printing, copying, distribution or other use of this message is
> strictly prohibited. If you have received this message in error, please
> notify the sender immediately by telephone, or e-mail and delete all copies
> of this message and any attachments from your system. Thank you.
>
>
>
>
>
>



-- 
- Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to