Re: IndexSearcher.search(query, collect)
Mikhail We do add new nodes with our custom results in some cases... just curious- does that preclude us from doing what we're trying to do above? FWIW, we can avoid the custom nodes if we had to. Chetan On Wed, Jul 15, 2015 at 12:39 PM, Mikhail Khludnev < mkhlud...@griddynamics.com> wrote: > > > On Wed, Jul 15, 2015 at 10:46 AM, Chetan Vora > wrote: > >> Hi all >> >> I asked a related question before but couldn't get any response (see >> SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently >> here. >> >> Is there a way to invoke >> >> IndexSearcher.search(Query, Collector) over a SolrCloud collection so that >> in invokes the search/collect implicitly on individual shards of the >> collection? If not, how does one do this explicitly? >> >> I have a usecase that was implemented using custom request handler in >> standalone Solr and we're trying to move to SolrCloud. > > > In your custom request handler do you add any new "nodes" into response? > or you just modifies the standard response structure? > > It is necessary for >> us to understand how to do the above so we can use SolrCloud >> functionality. >> >> Thanks and would *really really* appreciate ANY help. >> >> Regards >> CV >> > > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > >
Re: IndexSearcher.search(query, collect)
Erick Thanks for your response and for the pointers! This will be a good starting point; I will go through these. The good news is in our usecase, we don't really care about the two passes. In fact, our results are ConstantScore so we only need to aggregrate (i/e sum) the results from each shard. Regards Chetan On Wed, Jul 15, 2015 at 12:14 PM, Erick Erickson wrote: > bq: Is there a way to invoke IndexSearcher.search(Query, Collector) > > Problem is that this question doesn't make a lot of sense to me. > IndexSearcher is, by definition, local to a single Lucene > instance. Distributed requests are a whole different beast. If you're going > to try to use custom request handlers in a distributed environment > (SolrCloud), you need to abstract up a level, see: > Here are some places to start: > > > https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding > http://wiki.apache.org/solr/WritingDistributedSearchComponents > > The thing to be aware of is that the "usual" way of writing this > involves two passes. Say you want to return the top 10 docs and have 5 > shards. > The first pass sends the request to one replica of each shard. Each returns > its top 10 docs, but only the doc ID and score (or sort criteria). Then the > aggregator (whichever node received the original requests) sorts those 50 > docs > into the true top N and sends a second request to each of the shards > hosting one > of those docs for the contents of the doc. > > Now, you can probably bypass a lot of that if you're happy with > returning the topN > lists from all the shards, this two-pass mechanism was put in place to > handle, say, > a 100 shard system where you wouldn't want to transmit all the top N from > every > shard. > > HTH, > Erick > > > On Wed, Jul 15, 2015 at 8:46 AM, Chetan Vora wrote: > > Hi all > > > > I asked a related question before but couldn't get any response (see > > SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently > > here. > > > > Is there a way to invoke > > > > IndexSearcher.search(Query, Collector) over a SolrCloud collection so > that > > in invokes the search/collect implicitly on individual shards of the > > collection? If not, how does one do this explicitly? > > > > I have a usecase that was implemented using custom request handler in > > standalone Solr and we're trying to move to SolrCloud. It is necessary > for > > us to understand how to do the above so we can use SolrCloud > functionality. > > > > Thanks and would *really really* appreciate ANY help. > > > > Regards > > CV >
IndexSearcher.search(query, collect)
Hi all I asked a related question before but couldn't get any response (see SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently here. Is there a way to invoke IndexSearcher.search(Query, Collector) over a SolrCloud collection so that in invokes the search/collect implicitly on individual shards of the collection? If not, how does one do this explicitly? I have a usecase that was implemented using custom request handler in standalone Solr and we're trying to move to SolrCloud. It is necessary for us to understand how to do the above so we can use SolrCloud functionality. Thanks and would *really really* appreciate ANY help. Regards CV
SolrQueryRequest in SolrCloud vs Standalone Solr
Hi all We have a cluster of standalone Solr cores (Solr 4.3) for which we had built some custom requesthandlers and filters which do query processing using the Terms API. I'm now trying to port the custom functionality to work in the Solr Cloud world. Old configuration had standalone cores with the requesthandler embedded into each: core1 -> requesthandler plugin core2 -> requesthandler plugin We built an exernal (non-Solr) component that sent every query request to each core and aggregrated the results. When processing the request, within each request handler, it obtained a index searcher by doing SolrIndexSearcher searcher = solrQueryRequest.getSearcher(); followed by searcher.search()... Request1: http://localhost:xxx/solr/core1/plugin?q=blahblah Request2: http://localhost:xxx/solr/core2/plugin?q=blahblah In the SolrCloud version, I expected things to work similarly but at the collection level. New configuration: SolrCloud collection with plugin -> shard1 -> shard2 So my expectation is when I invoke SolrIndexSearcher searcher = solrQueryRequest.getSearcher() ... I obtain a searcher which can search against the collection i.e against all the shards. But this doesn't seem to happen. It seems that the searcher is executing the query only against shard1 ! Note: I peeked into the SolrQueryRequest object using a debugger and it has a reference to a SolrCore object which just points to shard1. Request: http://localhost:xxx/solr/collection1/plugin?q=blahblah Am I doing something wrong? Is my expectation of how it should work flawed? Any help would be appreciated. Regards CV
Re: Solr Cloud: No live SolrServers available
47_shard1_replica1/ ERROR - 2015-05-20 10:36:25.037; org.apache.solr.cloud.SyncStrategy; No UpdateLog found - cannot sync INFO - 2015-05-20 10:36:25.037; org.apache.solr.cloud.ShardLeaderElectionContext; We failed sync, but we have no versions - we can't sync in that case - we were active before, so become leader anyway INFO - 2015-05-20 10:36:25.037; org.apache.solr.cloud.ShardLeaderElectionContext; I am the new leader: http://10.1.172.231:8987/solr/myapp47_shard1_replica1/ shard1 INFO - 2015-05-20 10:36:25.037; org.apache.solr.common.cloud.SolrZkClient; makePath: /collections/myapp47/leaders/shard1 INFO - 2015-05-20 10:36:25.045; org.apache.solr.cloud.OverseerElectionContext; I am going to be the leader 10.1.172.231:8987_solr INFO - 2015-05-20 10:36:25.045; org.apache.solr.common.cloud.SolrZkClient; makePath: /overseer_elect/leader INFO - 2015-05-20 10:36:25.047; org.apache.solr.cloud.Overseer; Overseer (id=93856238667694080-10.1.172.231:8987_solr-n_09) starting INFO - 2015-05-20 10:36:25.060; org.apache.solr.cloud.OverseerCollectionProcessor; Process current queue of collection creations INFO - 2015-05-20 10:36:25.061; org.apache.solr.cloud.OverseerCollectionProcessor; prioritizing overseer nodes INFO - 2015-05-20 10:36:25.061; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Starting to work on the main queue INFO - 2015-05-20 10:36:25.063; org.apache.solr.common.cloud.ZkStateReader; Updating cloud state from ZooKeeper... INFO - 2015-05-20 10:36:25.064; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update state numShards=2 message={ "operation":"state", "state":"down", "base_url":"http://10.1.172.231:8987/solr";, "core":"myapp47_shard2_replica1", "roles":null, "node_name":"10.1.172.231:8987_solr", "shard":"shard2", "collection":"myapp47", "numShards":"2", "core_node_name":"core_node2"} INFO - 2015-05-20 10:36:25.067; org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged INFO - 2015-05-20 10:36:25.068; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 1) INFO - 2015-05-20 10:36:25.071; org.apache.solr.common.cloud.ZkStateReader; Updating cloud state from ZooKeeper... INFO - 2015-05-20 10:36:25.072; org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update state numShards=2 message={ "operation":"state", "state":"down", "base_url":"http://10.1.172.231:8987/solr";, "core":"myapp47_shard1_replica1", "roles":null, "node_name":"10.1.172.231:8987_solr", "shard":"shard1", "collection":"myapp47", "numShards":"2", "core_node_name":"core_node1"} INFO - 2015-05-20 10:36:25.075; org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged INFO - 2015-05-20 10:36:25.078; org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged INFO - 2015-05-20 10:36:25.081; org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged INFO - 2015-05-20 10:36:25.083; org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged INFO - 2015-05-20 10:36:25.086; org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged INFO - 2015-05-20 10:36:25.190; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 1) On Wed, May 20, 2015 at 10:44 AM, Chetan Vora wrote: > Erick > > Thanks for your response. > > Logs don't seem to show any explicit errors (I have log level at INFO). > > I am attaching the logs from a 4.7 start and a 5.1 start here. Note that > both logs seem to show the shards as "Down" initially but for 5.1, the > state change to Active later on. > > Also, note that all the config files, libraries, jarfiles etc are the > same for both Solr instances. > > Regards > > > On Tue, May 19, 2015 at 11:57 AM, Erick Erickson > wrote: > >> What you've done _looks_
Re: Solr Cloud: No live SolrServers available
Erick Thanks for your response. Logs don't seem to show any explicit errors (I have log level at INFO). I am attaching the logs from a 4.7 start and a 5.1 start here. Note that both logs seem to show the shards as "Down" initially but for 5.1, the state change to Active later on. Also, note that all the config files, libraries, jarfiles etc are the same for both Solr instances. Regards On Tue, May 19, 2015 at 11:57 AM, Erick Erickson wrote: > What you've done _looks_ correct at a glance. Take a look at the Solr > logs. Don't bother trying to index things unless and until your nodes > are "active", it won't happen. > > My first guess is that you have some error in your schema or > solrconfig.xml files, syntax errors, typos, class names that are > mis-typed, jars that are missing, whatever. > > If that's true, the Solr log (or the screen if you're just running > from the command line) will show big ugly stack traces. > > If nothing shows up in the logs then I'm puzzled, but what you > describe is consistent with what I've seen in terms of having bad > configs and trying to create a collection. > > Best, > Erick > > On Tue, May 19, 2015 at 4:33 AM, Chetan Vora wrote: > > Hi all > > > > We have a cluster of standalone Solr cores (Solr 4.3) for which we had > > built some custom plugins. I'm now trying to prototype converting the > > cluster to a Solr Cloud cluster. This is how I am trying to deploy the > > cores (in 4.7.2). > > > >1. > > > >Start solr with zookeeper embedded. > > > >java -DzkRun -Djetty.port=8985 -jar start.jar > >2. > > > >upload a config into Zookeeper (same config as the standalone cores) > > > >zkcli.bat -zkhost localhost:9985 -cmd upconfig -confdir myconfig > >-confname myconfig > >3. > > > >Create a new collection (mycollection) of 2 shards using the > Collections > >API > > > http://localhost:8985/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=myconfig > > > > So at this point I have two shards under my solr directory with the > > appropriate core.properties > > > > But when I go to http://localhost:8985/solr/#/~cloud, I see that the two > > shards' status is "Down" when they are supposed to be active by default. > > > > And when I try to index documents in them using SolrJ (via > CloudSolrServer > > API) , I get the error "No live SolrServers available to handle this > > request". I restarted Solr but same issue. > > > > private CloudSolrServer cloudSolr; > > cloudSolr = new CloudSolrServer(zkHOST); > > cloudSolr.setZkClientTimeout(zkClientTimeout); > > cloudSolr.setDefaultCollection(collectionName); > > cloudSolr.connect(); > > cloudSolr.add(doc) > > > > What am I doing wrong? I did a lot of digging around and saw an old Jira > > bug saying that Solr Cloud shards won't be active until there are some > > documents in the index. If that is the reason, that's kind of like a > > catch-22 isn't it? > > > > So anyways, I also tried adding some test documents manually and > committed > > to see if things improved. Now on the shard statistics page, it correctly > > gives me the Numdocs count but when I try to query it says "no servers > > hosting shard". I next tried passing in shards.tolerant=true as a query > > parameter and search, but no cigar. It says 0 documents found. > > > > Any help would be appreciated. My main objective is to rebuilt the old > > standalone cores using SolrCloud and test to see if our custom > > requesthandlers still work as expected. And at this point, I can't index > > documents inside of the 4.7 Solr Cloud collection I have created. I am > > trying to use a 4.x SolrCloud release as it seems the internal APIs have > > changed quite a bit for the 5.x releases and our custom requesthandlers > > don't work anymore as expected. > > > > Thanks and Regards >
Solr Cloud: No live SolrServers available
Hi all We have a cluster of standalone Solr cores (Solr 4.3) for which we had built some custom plugins. I'm now trying to prototype converting the cluster to a Solr Cloud cluster. This is how I am trying to deploy the cores (in 4.7.2). 1. Start solr with zookeeper embedded. java -DzkRun -Djetty.port=8985 -jar start.jar 2. upload a config into Zookeeper (same config as the standalone cores) zkcli.bat -zkhost localhost:9985 -cmd upconfig -confdir myconfig -confname myconfig 3. Create a new collection (mycollection) of 2 shards using the Collections API http://localhost:8985/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=myconfig So at this point I have two shards under my solr directory with the appropriate core.properties But when I go to http://localhost:8985/solr/#/~cloud, I see that the two shards' status is "Down" when they are supposed to be active by default. And when I try to index documents in them using SolrJ (via CloudSolrServer API) , I get the error "No live SolrServers available to handle this request". I restarted Solr but same issue. private CloudSolrServer cloudSolr; cloudSolr = new CloudSolrServer(zkHOST); cloudSolr.setZkClientTimeout(zkClientTimeout); cloudSolr.setDefaultCollection(collectionName); cloudSolr.connect(); cloudSolr.add(doc) What am I doing wrong? I did a lot of digging around and saw an old Jira bug saying that Solr Cloud shards won't be active until there are some documents in the index. If that is the reason, that's kind of like a catch-22 isn't it? So anyways, I also tried adding some test documents manually and committed to see if things improved. Now on the shard statistics page, it correctly gives me the Numdocs count but when I try to query it says "no servers hosting shard". I next tried passing in shards.tolerant=true as a query parameter and search, but no cigar. It says 0 documents found. Any help would be appreciated. My main objective is to rebuilt the old standalone cores using SolrCloud and test to see if our custom requesthandlers still work as expected. And at this point, I can't index documents inside of the 4.7 Solr Cloud collection I have created. I am trying to use a 4.x SolrCloud release as it seems the internal APIs have changed quite a bit for the 5.x releases and our custom requesthandlers don't work anymore as expected. Thanks and Regards
How to use NumericTermsRangeEnum from NumericRangeQuery
Hi all I was trying to use the above enum to do some range search on dates... this enum is returned by NumericRangeQuery.getTermsEnum() but I realized that this is a protected method of the class and since this is a final class, I can't see how I can use it. Maybe I'm missing something ? Would appreciate any pointers. Thanks