proper routing (from non-Java client) in solr cloud 5.0.0

2015-04-14 Thread Ian Rose
Hi all -

I've just upgraded my dev install of Solr (cloud) from 4.10 to 5.0.  Our
client is written in Go, for which I am not aware of a client, so we wrote
our own.  One tricky bit for this was the routing logic; if a document has
routing prefix X and belong to collection Y, we need to know which solr
node to connect to.  Previously we accomplished this by watching the
clusterstate.json
file in zookeeper - at startup and whenever it changes, the client parses
the file contents to build a routing table.

However in 5.0 newly create collections do not show up in clusterstate.json
but instead have their own state.json document.  Are there any
recommendations for how to handle this from the client?  The obvious answer
is to watch every collection's state.json document, but we run a lot of
collections (~1000 currently, and growing) so I'm concerned about keeping
that many watches open at the same time (should I be?).  How does the SolrJ
client handle this?

Thanks!
- Ian


Re: proper routing (from non-Java client) in solr cloud 5.0.0

2015-04-14 Thread Hrishikesh Gadre
Hi Ian,

As per my understanding, Solrj does not use Zookeeper watches but instead
caches the information (along with a TTL). You can find more information
here,

https://issues.apache.org/jira/browse/SOLR-5473
https://issues.apache.org/jira/browse/SOLR-5474

Regards
Hrishikesh


On Tue, Apr 14, 2015 at 8:49 AM, Ian Rose ianr...@fullstory.com wrote:

 Hi all -

 I've just upgraded my dev install of Solr (cloud) from 4.10 to 5.0.  Our
 client is written in Go, for which I am not aware of a client, so we wrote
 our own.  One tricky bit for this was the routing logic; if a document has
 routing prefix X and belong to collection Y, we need to know which solr
 node to connect to.  Previously we accomplished this by watching the
 clusterstate.json
 file in zookeeper - at startup and whenever it changes, the client parses
 the file contents to build a routing table.

 However in 5.0 newly create collections do not show up in clusterstate.json
 but instead have their own state.json document.  Are there any
 recommendations for how to handle this from the client?  The obvious answer
 is to watch every collection's state.json document, but we run a lot of
 collections (~1000 currently, and growing) so I'm concerned about keeping
 that many watches open at the same time (should I be?).  How does the SolrJ
 client handle this?

 Thanks!
 - Ian



Re: proper routing (from non-Java client) in solr cloud 5.0.0

2015-04-14 Thread Ian Rose
Hi Hrishikesh,

Thanks for the pointers - I had not looked at SOLR-5474
https://issues.apache.org/jira/browse/SOLR-5474 previously.  Interesting
approach...  I think we will stick with trying to keep zk watches open from
all clients to all collections for now, but if that starts to be a
bottleneck its good to know how the route that Solrj has chosen...

cheers,
Ian



On Tue, Apr 14, 2015 at 3:56 PM, Hrishikesh Gadre gadre.s...@gmail.com
wrote:

 Hi Ian,

 As per my understanding, Solrj does not use Zookeeper watches but instead
 caches the information (along with a TTL). You can find more information
 here,

 https://issues.apache.org/jira/browse/SOLR-5473
 https://issues.apache.org/jira/browse/SOLR-5474

 Regards
 Hrishikesh


 On Tue, Apr 14, 2015 at 8:49 AM, Ian Rose ianr...@fullstory.com wrote:

  Hi all -
 
  I've just upgraded my dev install of Solr (cloud) from 4.10 to 5.0.  Our
  client is written in Go, for which I am not aware of a client, so we
 wrote
  our own.  One tricky bit for this was the routing logic; if a document
 has
  routing prefix X and belong to collection Y, we need to know which solr
  node to connect to.  Previously we accomplished this by watching the
  clusterstate.json
  file in zookeeper - at startup and whenever it changes, the client parses
  the file contents to build a routing table.
 
  However in 5.0 newly create collections do not show up in
 clusterstate.json
  but instead have their own state.json document.  Are there any
  recommendations for how to handle this from the client?  The obvious
 answer
  is to watch every collection's state.json document, but we run a lot of
  collections (~1000 currently, and growing) so I'm concerned about keeping
  that many watches open at the same time (should I be?).  How does the
 SolrJ
  client handle this?
 
  Thanks!
  - Ian