Re: IndexSearcher.search(query, collect)

2015-07-15 Thread Chetan Vora
Mikhail

We do add new nodes with our custom results in some cases... just curious-
 does that preclude us from doing what we're trying to do above? FWIW, we
can avoid the custom nodes if we had to.

Chetan

On Wed, Jul 15, 2015 at 12:39 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

>
>
> On Wed, Jul 15, 2015 at 10:46 AM, Chetan Vora 
> wrote:
>
>> Hi all
>>
>> I asked a related question before but couldn't get any response (see
>> SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently
>> here.
>>
>> Is there a way to invoke
>>
>> IndexSearcher.search(Query, Collector) over a SolrCloud collection so that
>> in invokes the search/collect implicitly on individual shards of the
>> collection? If not, how does one do this explicitly?
>>
>> I have a usecase that was implemented using custom request handler in
>> standalone Solr and we're trying to move to SolrCloud.
>
>
> In your  custom request handler do you add any new "nodes" into response?
> or you just modifies the standard response structure?
>
> It is necessary for
>> us to understand how to do the above so we can use SolrCloud
>> functionality.
>>
>> Thanks and would *really really* appreciate ANY help.
>>
>> Regards
>> CV
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> 
>


Re: IndexSearcher.search(query, collect)

2015-07-15 Thread Chetan Vora
Erick

Thanks for your response and for the pointers! This will be a good starting
point; I will go through these.

The good news is in our usecase, we don't really care about the two passes.
In fact, our results are ConstantScore so we only need to aggregrate (i/e
sum) the results from each shard.

Regards
Chetan



On Wed, Jul 15, 2015 at 12:14 PM, Erick Erickson 
wrote:

> bq: Is there a way to invoke IndexSearcher.search(Query, Collector)
>
> Problem is that this question doesn't make a lot of sense to me.
> IndexSearcher is, by definition, local to a single Lucene
> instance. Distributed requests are a whole different beast. If you're going
> to try to use custom request handlers in a distributed environment
> (SolrCloud), you need to abstract up a level, see:
> Here are some places to start:
>
>
> https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding
> http://wiki.apache.org/solr/WritingDistributedSearchComponents
>
> The thing to be aware of is that the "usual" way of writing this
> involves two passes. Say you want to return the top 10 docs and have 5
> shards.
> The first pass sends the request to one replica of each shard. Each returns
> its top 10 docs, but only the doc ID and score (or sort criteria). Then the
> aggregator (whichever node received the original requests) sorts those 50
> docs
> into the true top N and sends a second request to each of the shards
> hosting one
> of those docs for the contents of the doc.
>
> Now, you can probably bypass a lot of that if you're happy with
> returning the topN
> lists from all the shards, this two-pass mechanism was put in place to
> handle, say,
> a 100 shard system where you wouldn't want to transmit all the top  N from
> every
> shard.
>
> HTH,
> Erick
>
>
> On Wed, Jul 15, 2015 at 8:46 AM, Chetan Vora  wrote:
> > Hi all
> >
> > I asked a related question before but couldn't get any response (see
> > SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently
> > here.
> >
> > Is there a way to invoke
> >
> > IndexSearcher.search(Query, Collector) over a SolrCloud collection so
> that
> > in invokes the search/collect implicitly on individual shards of the
> > collection? If not, how does one do this explicitly?
> >
> > I have a usecase that was implemented using custom request handler in
> > standalone Solr and we're trying to move to SolrCloud. It is necessary
> for
> > us to understand how to do the above so we can use SolrCloud
> functionality.
> >
> > Thanks and would *really really* appreciate ANY help.
> >
> > Regards
> > CV
>


IndexSearcher.search(query, collect)

2015-07-15 Thread Chetan Vora
Hi all

I asked a related question before but couldn't get any response (see
SolrQueryRequest in SolrCloud vs Standalone Solr), asking it differently
here.

Is there a way to invoke

IndexSearcher.search(Query, Collector) over a SolrCloud collection so that
in invokes the search/collect implicitly on individual shards of the
collection? If not, how does one do this explicitly?

I have a usecase that was implemented using custom request handler in
standalone Solr and we're trying to move to SolrCloud. It is necessary for
us to understand how to do the above so we can use SolrCloud functionality.

Thanks and would *really really* appreciate ANY help.

Regards
CV


SolrQueryRequest in SolrCloud vs Standalone Solr

2015-07-08 Thread Chetan Vora
Hi all

We have a cluster of standalone Solr cores (Solr 4.3) for which we had
built  some custom requesthandlers and filters which do query processing
using the Terms API. I'm now trying to port the custom functionality to
work in the Solr Cloud world.

Old configuration had standalone cores with the requesthandler embedded
into each:

core1

-> requesthandler plugin

core2

-> requesthandler plugin

We built an exernal (non-Solr) component that sent every query request to
each core and aggregrated the results. When processing the request, within
each request handler, it obtained a index searcher by doing

SolrIndexSearcher searcher = solrQueryRequest.getSearcher();

followed by

searcher.search()...

Request1: http://localhost:xxx/solr/core1/plugin?q=blahblah

Request2: http://localhost:xxx/solr/core2/plugin?q=blahblah


In the SolrCloud version, I expected things to work similarly but at the
collection level.

New configuration:

SolrCloud collection with plugin

-> shard1

-> shard2

So my expectation is when I invoke

SolrIndexSearcher searcher = solrQueryRequest.getSearcher() ...


I obtain a searcher which can search against the collection i.e against all
the shards. But this doesn't seem to happen. It seems that the searcher is
executing the query only against shard1 !

Note: I peeked into the SolrQueryRequest object using a debugger and it has
a reference to a SolrCore object which just points to shard1.

Request: http://localhost:xxx/solr/collection1/plugin?q=blahblah

Am I doing something wrong? Is my expectation of how it should work flawed?

Any help would be appreciated.

Regards

CV


Re: Solr Cloud: No live SolrServers available

2015-05-20 Thread Chetan Vora
47_shard1_replica1/
ERROR - 2015-05-20 10:36:25.037; org.apache.solr.cloud.SyncStrategy; No
UpdateLog found - cannot sync
INFO  - 2015-05-20 10:36:25.037;
org.apache.solr.cloud.ShardLeaderElectionContext; We failed sync, but we
have no versions - we can't sync in that case - we were active before, so
become leader anyway
INFO  - 2015-05-20 10:36:25.037;
org.apache.solr.cloud.ShardLeaderElectionContext; I am the new leader:
http://10.1.172.231:8987/solr/myapp47_shard1_replica1/ shard1
INFO  - 2015-05-20 10:36:25.037; org.apache.solr.common.cloud.SolrZkClient;
makePath: /collections/myapp47/leaders/shard1
INFO  - 2015-05-20 10:36:25.045;
org.apache.solr.cloud.OverseerElectionContext; I am going to be the leader
10.1.172.231:8987_solr
INFO  - 2015-05-20 10:36:25.045; org.apache.solr.common.cloud.SolrZkClient;
makePath: /overseer_elect/leader
INFO  - 2015-05-20 10:36:25.047; org.apache.solr.cloud.Overseer; Overseer
(id=93856238667694080-10.1.172.231:8987_solr-n_09) starting
INFO  - 2015-05-20 10:36:25.060;
org.apache.solr.cloud.OverseerCollectionProcessor; Process current queue of
collection creations
INFO  - 2015-05-20 10:36:25.061;
org.apache.solr.cloud.OverseerCollectionProcessor; prioritizing overseer
nodes
INFO  - 2015-05-20 10:36:25.061;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; Starting to work on the
main queue
INFO  - 2015-05-20 10:36:25.063;
org.apache.solr.common.cloud.ZkStateReader; Updating cloud state from
ZooKeeper...
INFO  - 2015-05-20 10:36:25.064;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update state
numShards=2 message={
  "operation":"state",
  "state":"down",
  "base_url":"http://10.1.172.231:8987/solr";,
  "core":"myapp47_shard2_replica1",
  "roles":null,
  "node_name":"10.1.172.231:8987_solr",
  "shard":"shard2",
  "collection":"myapp47",
  "numShards":"2",
  "core_node_name":"core_node2"}
INFO  - 2015-05-20 10:36:25.067;
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher
fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
INFO  - 2015-05-20 10:36:25.068;
org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged
path:/clusterstate.json, has occurred - updating... (live nodes size: 1)
INFO  - 2015-05-20 10:36:25.071;
org.apache.solr.common.cloud.ZkStateReader; Updating cloud state from
ZooKeeper...
INFO  - 2015-05-20 10:36:25.072;
org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update state
numShards=2 message={
  "operation":"state",
  "state":"down",
  "base_url":"http://10.1.172.231:8987/solr";,
  "core":"myapp47_shard1_replica1",
  "roles":null,
  "node_name":"10.1.172.231:8987_solr",
  "shard":"shard1",
  "collection":"myapp47",
  "numShards":"2",
  "core_node_name":"core_node1"}
INFO  - 2015-05-20 10:36:25.075;
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher
fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
INFO  - 2015-05-20 10:36:25.078;
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher
fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
INFO  - 2015-05-20 10:36:25.081;
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher
fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
INFO  - 2015-05-20 10:36:25.083;
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher
fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
INFO  - 2015-05-20 10:36:25.086;
org.apache.solr.cloud.DistributedQueue$LatchChildWatcher; LatchChildWatcher
fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged
INFO  - 2015-05-20 10:36:25.190;
org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged
path:/clusterstate.json, has occurred - updating... (live nodes size: 1)


On Wed, May 20, 2015 at 10:44 AM, Chetan Vora  wrote:

> Erick
>
> Thanks for your response.
>
> Logs don't seem to show any explicit errors (I have log level at INFO).
>
> I am attaching the logs from a 4.7 start and a 5.1 start here. Note that
> both logs seem to show the shards as "Down" initially but for 5.1, the
> state change to Active later on.
>
> Also, note that all the config files, libraries, jarfiles  etc are the
> same for both Solr instances.
>
> Regards
>
>
> On Tue, May 19, 2015 at 11:57 AM, Erick Erickson 
> wrote:
>
>> What you've done _looks_ 

Re: Solr Cloud: No live SolrServers available

2015-05-20 Thread Chetan Vora
Erick

Thanks for your response.

Logs don't seem to show any explicit errors (I have log level at INFO).

I am attaching the logs from a 4.7 start and a 5.1 start here. Note that
both logs seem to show the shards as "Down" initially but for 5.1, the
state change to Active later on.

Also, note that all the config files, libraries, jarfiles  etc are the same
for both Solr instances.

Regards


On Tue, May 19, 2015 at 11:57 AM, Erick Erickson 
wrote:

> What you've done _looks_ correct at a glance. Take a look at the Solr
> logs. Don't bother trying to index things unless and until your nodes
> are "active", it won't happen.
>
> My first guess is that you have some error in your schema or
> solrconfig.xml files, syntax errors, typos, class names that are
> mis-typed, jars that are missing, whatever.
>
> If that's true, the Solr log (or the screen if you're just running
> from the command line) will show big ugly stack traces.
>
> If nothing shows up in the logs then I'm puzzled, but what you
> describe is consistent with what I've seen in terms of having bad
> configs and trying to create a collection.
>
> Best,
> Erick
>
> On Tue, May 19, 2015 at 4:33 AM, Chetan Vora  wrote:
> > Hi all
> >
> > We have a cluster of standalone Solr cores (Solr 4.3) for which we had
> > built  some custom plugins. I'm now trying to prototype converting the
> > cluster to a Solr Cloud cluster. This is how I am trying to deploy the
> > cores (in 4.7.2).
> >
> >1.
> >
> >Start solr with zookeeper embedded.
> >
> >java -DzkRun -Djetty.port=8985 -jar start.jar
> >2.
> >
> >upload a config into Zookeeper (same config as the standalone cores)
> >
> >zkcli.bat -zkhost localhost:9985 -cmd upconfig -confdir myconfig
> >-confname myconfig
> >3.
> >
> >Create a new collection (mycollection) of 2 shards using the
> Collections
> >API
> >
> http://localhost:8985/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=myconfig
> >
> > So at this point I have two shards under my solr directory with the
> > appropriate core.properties
> >
> > But when I go to http://localhost:8985/solr/#/~cloud, I see that the two
> > shards' status is "Down" when they are supposed to be active by default.
> >
> > And when I try to index documents in them using SolrJ (via
> CloudSolrServer
> > API) , I get the error "No live SolrServers available to handle this
> > request". I restarted Solr but same issue.
> >
> > private CloudSolrServer cloudSolr;
> > cloudSolr = new CloudSolrServer(zkHOST);
> > cloudSolr.setZkClientTimeout(zkClientTimeout);
> > cloudSolr.setDefaultCollection(collectionName);
> > cloudSolr.connect();
> > cloudSolr.add(doc)
> >
> > What am I doing wrong? I did a lot of digging around and saw an old Jira
> > bug saying that Solr Cloud shards won't be active until there are some
> > documents in the index. If that is the reason, that's kind of like a
> > catch-22 isn't it?
> >
> > So anyways, I also tried adding some test documents manually and
> committed
> > to see if things improved. Now on the shard statistics page, it correctly
> > gives me the Numdocs count but when I try to query it says "no servers
> > hosting shard". I next tried passing in shards.tolerant=true as a query
> > parameter and search, but no cigar. It says 0 documents found.
> >
> > Any help would be appreciated. My main objective is to rebuilt the old
> > standalone cores using SolrCloud and test to see if our custom
> > requesthandlers still work as expected. And at this point, I can't index
> > documents inside of the 4.7 Solr Cloud collection I have created. I am
> > trying to use a 4.x SolrCloud release as it seems the internal APIs have
> > changed quite a bit for the 5.x releases and our custom requesthandlers
> > don't work anymore as expected.
> >
> > Thanks and Regards
>


Solr Cloud: No live SolrServers available

2015-05-19 Thread Chetan Vora
Hi all

We have a cluster of standalone Solr cores (Solr 4.3) for which we had
built  some custom plugins. I'm now trying to prototype converting the
cluster to a Solr Cloud cluster. This is how I am trying to deploy the
cores (in 4.7.2).

   1.

   Start solr with zookeeper embedded.

   java -DzkRun -Djetty.port=8985 -jar start.jar
   2.

   upload a config into Zookeeper (same config as the standalone cores)

   zkcli.bat -zkhost localhost:9985 -cmd upconfig -confdir myconfig
   -confname myconfig
   3.

   Create a new collection (mycollection) of 2 shards using the Collections
   API
   
http://localhost:8985/solr/admin/collections?action=CREATE&name=mycollection&numShards=2&replicationFactor=1&maxShardsPerNode=2&collection.configName=myconfig

So at this point I have two shards under my solr directory with the
appropriate core.properties

But when I go to http://localhost:8985/solr/#/~cloud, I see that the two
shards' status is "Down" when they are supposed to be active by default.

And when I try to index documents in them using SolrJ (via CloudSolrServer
API) , I get the error "No live SolrServers available to handle this
request". I restarted Solr but same issue.

private CloudSolrServer cloudSolr;
cloudSolr = new CloudSolrServer(zkHOST);
cloudSolr.setZkClientTimeout(zkClientTimeout);
cloudSolr.setDefaultCollection(collectionName);
cloudSolr.connect();
cloudSolr.add(doc)

What am I doing wrong? I did a lot of digging around and saw an old Jira
bug saying that Solr Cloud shards won't be active until there are some
documents in the index. If that is the reason, that's kind of like a
catch-22 isn't it?

So anyways, I also tried adding some test documents manually and committed
to see if things improved. Now on the shard statistics page, it correctly
gives me the Numdocs count but when I try to query it says "no servers
hosting shard". I next tried passing in shards.tolerant=true as a query
parameter and search, but no cigar. It says 0 documents found.

Any help would be appreciated. My main objective is to rebuilt the old
standalone cores using SolrCloud and test to see if our custom
requesthandlers still work as expected. And at this point, I can't index
documents inside of the 4.7 Solr Cloud collection I have created. I am
trying to use a 4.x SolrCloud release as it seems the internal APIs have
changed quite a bit for the 5.x releases and our custom requesthandlers
don't work anymore as expected.

Thanks and Regards


How to use NumericTermsRangeEnum from NumericRangeQuery

2013-09-26 Thread Chetan Vora
Hi all

I was trying to use the above enum to do some range search on dates... this
enum is returned by NumericRangeQuery.getTermsEnum() but I realized that
this is a protected method of the class and since this is a final class, I
can't see how I can use it. Maybe I'm missing something ?

Would appreciate any pointers.

Thanks