Solrcloud updating issue.

2017-06-29 Thread Wudong Liu
Hi All:
We are trying to index a large number of documents in solrcloud and keep
seeing the following error: org.apache.solr.common.SolrException: Service
Unavailable, or org.apache.solr.common.SolrException: Service Unavailable

but with a similar stack:

request: http://wp-np2-c0:8983/solr/uniprot/update?wt=javabin=2
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:320)
at
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.run(ConcurrentUpdateSolrClient.java:185)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$57/936653983.run(Unknown
Source)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


the settings are:
5 nodes in the cluster with each 16g memory, for the collection, it is
defined with 5 shards, and replicate factor 2. the total number of
documents is about 90m, each document size is quite large as well.
we have also 5 zookeeper instances running on each node.

On the solr side, we can see error like:
solr.log.3-Error from server at
http://wp-np2-c4.ebi.ac.uk:8983/solr/uniprot_shard5_replica1: Server Error
solr.log.3-request:
http://wp-np2-c4.ebi.ac.uk:8983/solr/uniprot_shard5_replica1/update?update.distrib=TOLEADER=http%3A%2F%2Fwp-np2-c0.ebi.ac.uk%3A8983%2Fsolr%2Funiprot_shard2_replica1%2F=javabin=2
solr.log.3-Remote error message: Async exception during distributed update:
Connect to wp-np2-c2.ebi.ac.uk:8983 timed out
solr.log.3- at
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:948)
solr.log.3- at
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1679)
solr.log.3- at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
--
solr.log.3- at
org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
solr.log.3- at
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
solr.log.3- at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
solr.log.3- at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
solr.log.3- at
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
solr.log.3- at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
solr.log.3- at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
solr.log.3- at java.lang.Thread.run(Thread.java:745)


The strange bit is this exception doesn't seem to be captured by the
try/catch block in our main thread. and the cluster seems in the good
health (all nodes up) after the job done, we just missing lots of
documents!

any suggestion where we should look to resolve this problem?

Best Regards,
Wudong


Can solrcloud be running on a read-only filesystem?

2017-06-02 Thread Wudong Liu
Hi All:

We have a normal build/stage -> prod settings for our production pipeline.
And we would build solr index in the build environment and then the index
is copied to the prod environment.

The solrcloud in prod seems working fine when the file system backing it is
writable. However, we see many errors when the file system is readonly.
Many exceptions are thrown regarding the tlog file cannot be open for write
when the solr nodes are restarted with the new data; some of the nodes
eventually are stuck in the recovering phase and never able to go back
online in the cloud.

Just wondering is anyone has any experience on Solrcloud running in
readonly file system? Is it possible at all?

Regards,
Wudong