Re: AnalyticsQuery fails on a sharded collection
plica1] org.apache.solr.core.SolrCore; > [ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr > params={distrib=false=/aggr=VENDOR_NAME=[AggregationStats]=id& > shards.purpose=64={!AggregationPostFilter+count% > 3DCount+spend%3DINVOICE_AMOUNT}=http://localhost:8983/solr/ > ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/ > ShardTest1_shard1_1_replica2/=2=*:*= > 1470925120206=100713,940122,44812,210965,584851& > isShard=true=javabin&_=1470925120222} > status=0 QTime=2070 > > INFO - 2016-08-11 09:19:53.176; [ShardTest1 shard1_0 core_node3 > ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore; > [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr > params={distrib=false=/aggr=VENDOR_NAME=[AggregationStats]=id& > shards.purpose=64={!AggregationPostFilter+count% > 3DCount+spend%3DINVOICE_AMOUNT}=http://localhost:8983/solr/ > ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/ > ShardTest1_shard1_0_replica2/=2=*:*= > 1470925120206=533737,44864,100672,940123,96752& > isShard=true=javabin&_=1470925120222} > status=0 QTime=4293 > > INFO - 2016-08-11 09:19:53.178; [ShardTest1 shard1_0 core_node3 > ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore; > [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr > params={q=*:*=true=VENDOR_NAME=VENDOR_NAME+ > asc=json&_=1470925120222} > hits=24158 status=0 QTime=72972 > > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/AnalyticsQuery-fails-on-a-sharded-collection- > tp4289274p4291301.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: AnalyticsQuery fails on a sharded collection
OK, some more info ... it's not aggregating because the doc values it's using for grouping are the unique ID field's. There are some big differences in the whole flow between searches against a single shard collection, and searches against a multi-shard collection. In a single shard collection the AnalyticsQuery is called one time, and there's only one pass through the delegating collector. If someone could explain what's going on in a multi-sharded search that would help a lot I think. My test collection has two shards each one has a replica. For this search .../aggr?q=*:*=VENDOR_NAME=VENDOR_NAME+asc The user has selected just one field to view, so I make VENDOR_NAME the group by field. This is what I see while debugging: 1. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME + [AggregationStats] 2. custom AnalyticsQuery is instantiated (again) and the "fl" param is id + [AggregationStats] 3. custom AnalyticsQuery is instantiated (again) and the "fl" param is id + [AggregationStats] 4. getAnalyticsCollector() is called (fl is id + [AggregationStats]) 5. getAnalyticsCollector() is called again (fl is id + [AggregationStats]) 6. custom DelegatingCollector finish() is called 7. custom DelegatingCollector finish() is called 8. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME + [AggregationStats] + id + [AggregationStats] 9. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME + [AggregationStats] + id + [AggregationStats] And from the log: INFO - 2016-08-11 09:19:47.245; [ShardTest1 shard1_1 core_node4 ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore; [ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr params={distrib=false=/aggr=id=4=0=true=VENDOR_NAME+asc={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}=http://localhost:8983/solr/ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/ShardTest1_shard1_1_replica2/=10=2=*:*=1470925120206=true=javabin&_=1470925120222} hits=12096 status=0 QTime=64734 INFO - 2016-08-11 09:19:48.876; [ShardTest1 shard1_0 core_node3 ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore; [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr params={distrib=false=/aggr=id=4=0=true=VENDOR_NAME+asc={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}=http://localhost:8983/solr/ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/ShardTest1_shard1_0_replica2/=10=2=*:*=1470925120206=true=javabin&_=1470925120222} hits=12062 status=0 QTime=66365 INFO - 2016-08-11 09:19:50.952; [ShardTest1 shard1_1 core_node4 ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore; [ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr params={distrib=false=/aggr=VENDOR_NAME=[AggregationStats]=id=64={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}=http://localhost:8983/solr/ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/ShardTest1_shard1_1_replica2/=2=*:*=1470925120206=100713,940122,44812,210965,584851=true=javabin&_=1470925120222} status=0 QTime=2070 INFO - 2016-08-11 09:19:53.176; [ShardTest1 shard1_0 core_node3 ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore; [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr params={distrib=false=/aggr=VENDOR_NAME=[AggregationStats]=id=64={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}=http://localhost:8983/solr/ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/ShardTest1_shard1_0_replica2/=2=*:*=1470925120206=533737,44864,100672,940123,96752=true=javabin&_=1470925120222} status=0 QTime=4293 INFO - 2016-08-11 09:19:53.178; [ShardTest1 shard1_0 core_node3 ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore; [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr params={q=*:*=true=VENDOR_NAME=VENDOR_NAME+asc=json&_=1470925120222} hits=24158 status=0 QTime=72972 -- View this message in context: http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4291301.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: AnalyticsQuery fails on a sharded collection
Quick update: the NPE was related to the way in which I passed params into the Query via solrconfig.xml. It works fine for single sharded, but something about it was masking the unique ID field in a multisharded environment. Anyway, I was able to fix that by cleaning up the request handler config: {!AggregationPostFilter count=Count spend=INVOICE_AMOUNT} [AggregationStats] Now my post filter completes without errors (!) but it doesn't work - it returns every single document specified by the query (q) param. It isn't aggregating. (Broken record) It still works correctly on a single shard collection. With this query, it should do exactly what the collapsing filter does (and yes, that works perfectly): .../aggr?q=*:*=VENDOR_NAME=VENDOR_NAME+asc -- View this message in context: http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4291190.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: AnalyticsQuery fails on a sharded collection
t.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { if (SearchPreProcessor.SortBy.COUNT.equals(sortBy)) { dummy.score = statsArray.get(docId).getCount(); } else if (SearchPreProcessor.SortBy.SPEND.equals(sortBy)) { dummy.score = (float) statsArray.get(docId).getSpend(); } while (docId >= nextDocBase) { currentContext++; currentDocBase = contexts[currentContext].docBase; nextDocBase = currentContext+1 < contexts.length ? contexts[currentContext+1].docBase : maxDoc; super.leafDelegate = super.delegate.getLeafCollector(contexts[currentContext]); super.leafDelegate.setScorer(dummy); } int contextDoc = docId-currentDocBase; dummy.docId = contextDoc; super.leafDelegate.collect(contextDoc); } rb.rsp.add(TOTAL_DOCS_STAT, Integer.valueOf(totalDocs)); if (super.delegate instanceof DelegatingCollector) { ((DelegatingCollector) super.delegate).finish(); } } private class FieldOrdinals { private final int[] ords; FieldOrdinals(int[] ords) { this.ords = ords; } int[] getOrds() { return ords; } @Override public int hashCode() { return Arrays.hashCode(ords); } @Override public boolean equals(Object obj) { return Arrays.equals(ords, ((FieldOrdinals)obj).getOrds()); } } private class DummyScorer extends Scorer { float score; int docId; DummyScorer() { super(null); } @Override public float score() throws IOException { return score; } @Override public int freq() throws IOException { return 0; } @Override public int advance(int i) throws IOException { return -1; } @Override public long cost() { return 0; } @Override public int docID() { return docId; } @Override public int nextDoc() throws IOException { return 0; } } } -- View this message in context: http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4291180.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: AnalyticsQuery fails on a sharded collection
Thanks Joel! However I'm come to realize that upgrading to Solr 6 is not a near term reality due to the Java 8 requirement. I don't want anyone to waste their time debugging my code. At least not until I've made time to really work through it myself. I was just looking for a pointer on generalities - if the collector works with a single shard but not two, perhaps look at A and B. Joel Bernstein wrote > ... > > As far using a MergeStrategy, I would suggest creating a streaming > expression that handles the merge. This is a much cleaner approach. An > example of how this works can be seen in this patch: > > https://issues.apache.org/jira/secure/attachment/12820171/SOLR-9252.patch > > The AnalyticsQuery in this case is: > > TextLogisticRegressionQParserPlugin.java > > The expression is: > > TextLogitStream.java > > The TextLogitStream has sample code for calling the shards and merging > the results. > > If you want to use this approach the following patch is needed so you > can add your own streaming expression: > > https://issues.apache.org/jira/browse/SOLR-9103 > > This will likely be in 6.2 -- View this message in context: http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4289364.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: AnalyticsQuery fails on a sharded collection
The finish() method operates on the search node, not the aggregator node. So whether it's distributed shouldn't effect how it runs. If you can post your code I might be able to see the issue. As far using a MergeStrategy, I would suggest creating a streaming expression that handles the merge. This is a much cleaner approach. An example of how this works can be seen in this patch: https://issues.apache.org/jira/secure/attachment/12820171/SOLR-9252.patch The AnalyticsQuery in this case is: TextLogisticRegressionQParserPlugin.java The expression is: TextLogitStream.java The TextLogitStream has sample code for calling the shards and merging the results. If you want to use this approach the following patch is needed so you can add your own streaming expression: https://issues.apache.org/jira/browse/SOLR-9103 This will likely be in 6.2 Joel Bernstein http://joelsolr.blogspot.com/ On Wed, Jul 27, 2016 at 5:36 PM, tedsolr <tsm...@sciquest.com> wrote: > I'm looking to create a merge strategy for a custom QParserPlugin I have. > The > plugin works fine on collections with one shard. I was very surprised to > see > it throw an exception when I ran it against a sharded collection. So my > question is a bit of a shot in the dark. I'll first note that the > CollapsingQParserPlugin included with Solr works as expected on my test > collection with two shards. > > The NPE occurs in my DelegatingCollector's finish() method as it's setting > the next doc base. It appears I have a null LeafReaderContext. Without > knowing anything about my code, what is it about multiple shards that might > throw off a collector like this? > > thanks! > v5.2.1 > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274.html > Sent from the Solr - User mailing list archive at Nabble.com. >
AnalyticsQuery fails on a sharded collection
I'm looking to create a merge strategy for a custom QParserPlugin I have. The plugin works fine on collections with one shard. I was very surprised to see it throw an exception when I ran it against a sharded collection. So my question is a bit of a shot in the dark. I'll first note that the CollapsingQParserPlugin included with Solr works as expected on my test collection with two shards. The NPE occurs in my DelegatingCollector's finish() method as it's setting the next doc base. It appears I have a null LeafReaderContext. Without knowing anything about my code, what is it about multiple shards that might throw off a collector like this? thanks! v5.2.1 -- View this message in context: http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274.html Sent from the Solr - User mailing list archive at Nabble.com.