Re: AnalyticsQuery fails on a sharded collection

2016-08-11 Thread Joel Bernstein
plica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr
> params={distrib=false=/aggr=VENDOR_NAME=[AggregationStats]=id&
> shards.purpose=64={!AggregationPostFilter+count%
> 3DCount+spend%3DINVOICE_AMOUNT}=http://localhost:8983/solr/
> ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/
> ShardTest1_shard1_1_replica2/=2=*:*=
> 1470925120206=100713,940122,44812,210965,584851&
> isShard=true=javabin&_=1470925120222}
> status=0 QTime=2070
>
> INFO  - 2016-08-11 09:19:53.176; [ShardTest1 shard1_0 core_node3
> ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
> params={distrib=false=/aggr=VENDOR_NAME=[AggregationStats]=id&
> shards.purpose=64={!AggregationPostFilter+count%
> 3DCount+spend%3DINVOICE_AMOUNT}=http://localhost:8983/solr/
> ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/
> ShardTest1_shard1_0_replica2/=2=*:*=
> 1470925120206=533737,44864,100672,940123,96752&
> isShard=true=javabin&_=1470925120222}
> status=0 QTime=4293
>
> INFO  - 2016-08-11 09:19:53.178; [ShardTest1 shard1_0 core_node3
> ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
> [ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
> params={q=*:*=true=VENDOR_NAME=VENDOR_NAME+
> asc=json&_=1470925120222}
> hits=24158 status=0 QTime=72972
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-
> tp4289274p4291301.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: AnalyticsQuery fails on a sharded collection

2016-08-11 Thread tedsolr
OK, some more info ... it's not aggregating because the doc values it's using
for grouping are the unique ID field's. There are some big differences in
the whole flow between searches against a single shard collection, and
searches against a multi-shard collection. In a single shard collection the
AnalyticsQuery is called one time, and there's only one pass through the
delegating collector. If someone could explain what's going on in a
multi-sharded search that would help a lot I think. My test collection has
two shards each one has a replica.

For this search
.../aggr?q=*:*=VENDOR_NAME=VENDOR_NAME+asc 
The user has selected just one field to view, so I make VENDOR_NAME the
group by field.

This is what I see while debugging:
1. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME +
[AggregationStats]
2. custom AnalyticsQuery is instantiated (again) and the "fl" param is id +
[AggregationStats]
3. custom AnalyticsQuery is instantiated (again) and the "fl" param is id +
[AggregationStats]
4. getAnalyticsCollector() is called (fl is id + [AggregationStats])
5. getAnalyticsCollector() is called again (fl is id + [AggregationStats])
6. custom DelegatingCollector finish() is called
7. custom DelegatingCollector finish() is called
8. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME +
[AggregationStats] + id +  [AggregationStats]
9. custom AnalyticsQuery is instantiated and the "fl" param is VENDOR_NAME +
[AggregationStats] + id +  [AggregationStats]

And from the log:

INFO  - 2016-08-11 09:19:47.245; [ShardTest1 shard1_1 core_node4
ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr
params={distrib=false=/aggr=id=4=0=true=VENDOR_NAME+asc={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}=http://localhost:8983/solr/ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/ShardTest1_shard1_1_replica2/=10=2=*:*=1470925120206=true=javabin&_=1470925120222}
hits=12096 status=0 QTime=64734 

INFO  - 2016-08-11 09:19:48.876; [ShardTest1 shard1_0 core_node3
ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
params={distrib=false=/aggr=id=4=0=true=VENDOR_NAME+asc={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}=http://localhost:8983/solr/ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/ShardTest1_shard1_0_replica2/=10=2=*:*=1470925120206=true=javabin&_=1470925120222}
hits=12062 status=0 QTime=66365 

INFO  - 2016-08-11 09:19:50.952; [ShardTest1 shard1_1 core_node4
ShardTest1_shard1_1_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_1_replica1] webapp=/solr path=/aggr
params={distrib=false=/aggr=VENDOR_NAME=[AggregationStats]=id=64={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}=http://localhost:8983/solr/ShardTest1_shard1_1_replica1/|http://localhost:8984/solr/ShardTest1_shard1_1_replica2/=2=*:*=1470925120206=100713,940122,44812,210965,584851=true=javabin&_=1470925120222}
status=0 QTime=2070 

INFO  - 2016-08-11 09:19:53.176; [ShardTest1 shard1_0 core_node3
ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
params={distrib=false=/aggr=VENDOR_NAME=[AggregationStats]=id=64={!AggregationPostFilter+count%3DCount+spend%3DINVOICE_AMOUNT}=http://localhost:8983/solr/ShardTest1_shard1_0_replica1/|http://localhost:8984/solr/ShardTest1_shard1_0_replica2/=2=*:*=1470925120206=533737,44864,100672,940123,96752=true=javabin&_=1470925120222}
status=0 QTime=4293 

INFO  - 2016-08-11 09:19:53.178; [ShardTest1 shard1_0 core_node3
ShardTest1_shard1_0_replica1] org.apache.solr.core.SolrCore;
[ShardTest1_shard1_0_replica1] webapp=/solr path=/aggr
params={q=*:*=true=VENDOR_NAME=VENDOR_NAME+asc=json&_=1470925120222}
hits=24158 status=0 QTime=72972 




--
View this message in context: 
http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4291301.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AnalyticsQuery fails on a sharded collection

2016-08-10 Thread tedsolr
Quick update: the NPE was related to the way in which I passed params into
the Query via solrconfig.xml. It works fine for single sharded, but
something about it was masking the unique ID field in a multisharded
environment. Anyway, I was able to fix that by cleaning up the request
handler config:



{!AggregationPostFilter count=Count
spend=INVOICE_AMOUNT}
[AggregationStats]

   

Now my post filter completes without errors (!) but it doesn't work - it
returns every single document specified by the query (q) param. It isn't
aggregating. (Broken record) It still works correctly on a single shard
collection. With this query, it should do exactly what the collapsing filter
does (and yes, that works perfectly):

.../aggr?q=*:*=VENDOR_NAME=VENDOR_NAME+asc



--
View this message in context: 
http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4291190.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AnalyticsQuery fails on a sharded collection

2016-08-10 Thread tedsolr
t.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) 
{
if (SearchPreProcessor.SortBy.COUNT.equals(sortBy)) {
dummy.score = statsArray.get(docId).getCount();
} else if 
(SearchPreProcessor.SortBy.SPEND.equals(sortBy)) {
dummy.score = (float) 
statsArray.get(docId).getSpend();
}

while (docId >= nextDocBase) {
currentContext++;
currentDocBase = 
contexts[currentContext].docBase;
nextDocBase = currentContext+1 < 
contexts.length ?
contexts[currentContext+1].docBase : maxDoc;

super.leafDelegate =
super.delegate.getLeafCollector(contexts[currentContext]);
super.leafDelegate.setScorer(dummy);
}

int contextDoc = docId-currentDocBase;
dummy.docId = contextDoc;
super.leafDelegate.collect(contextDoc);
}

rb.rsp.add(TOTAL_DOCS_STAT, Integer.valueOf(totalDocs));

if (super.delegate instanceof DelegatingCollector) {
((DelegatingCollector) super.delegate).finish();
}
}

private class FieldOrdinals {
private final int[] ords;

FieldOrdinals(int[] ords) {
this.ords = ords;
}

int[] getOrds() {
return ords;
}

@Override
public int hashCode() {
return Arrays.hashCode(ords);
}

@Override
public boolean equals(Object obj) {
return Arrays.equals(ords, 
((FieldOrdinals)obj).getOrds());
}
}

private class DummyScorer extends Scorer {
float score;
int docId;

DummyScorer() {
super(null);
}

@Override
public float score() throws IOException {
return score;
}

@Override
public int freq() throws IOException {
return 0;
}

@Override
public int advance(int i) throws IOException {
return -1;
}

@Override
public long cost() {
return 0;
}

@Override
public int docID() {
return docId;
}

@Override
public int nextDoc() throws IOException {
return 0;
}
}
}



--
View this message in context: 
http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4291180.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AnalyticsQuery fails on a sharded collection

2016-07-28 Thread tedsolr
Thanks Joel! However I'm come to realize that upgrading to Solr 6 is not a
near term reality due to the Java 8 requirement.

I don't want anyone to waste their time debugging my code. At least not
until I've made time to really work through it myself. I was just looking
for a pointer on generalities - if the collector works with a single shard
but not two, perhaps look at A and B.


Joel Bernstein wrote
> ...
> 
> As far using a MergeStrategy, I would suggest creating a streaming
> expression that handles the merge. This is a much cleaner approach. An
> example of how this works can be seen in this patch:
> 
> https://issues.apache.org/jira/secure/attachment/12820171/SOLR-9252.patch
> 
> The AnalyticsQuery in this case is:
> 
> TextLogisticRegressionQParserPlugin.java
> 
> The expression is:
> 
> TextLogitStream.java
> 
> The TextLogitStream has sample code for calling the shards and merging
> the results.
> 
> If you want to use this approach the following patch is needed so you
> can add your own streaming expression:
> 
> https://issues.apache.org/jira/browse/SOLR-9103
> 
> This will likely be in 6.2





--
View this message in context: 
http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274p4289364.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: AnalyticsQuery fails on a sharded collection

2016-07-27 Thread Joel Bernstein
The finish() method operates on the search node, not the aggregator node.
So whether it's distributed shouldn't effect how it runs. If you can post
your code I might be able to see the issue.

As far using a MergeStrategy, I would suggest creating a streaming
expression that handles the merge. This is a much cleaner approach. An
example of how this works can be seen in this patch:

https://issues.apache.org/jira/secure/attachment/12820171/SOLR-9252.patch

The AnalyticsQuery in this case is:

TextLogisticRegressionQParserPlugin.java

The expression is:

TextLogitStream.java

The TextLogitStream has sample code for calling the shards and merging
the results.

If you want to use this approach the following patch is needed so you
can add your own streaming expression:

https://issues.apache.org/jira/browse/SOLR-9103

This will likely be in 6.2











Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Jul 27, 2016 at 5:36 PM, tedsolr <tsm...@sciquest.com> wrote:

> I'm looking to create a merge strategy for a custom QParserPlugin I have.
> The
> plugin works fine on collections with one shard. I was very surprised to
> see
> it throw an exception when I ran it against a sharded collection. So my
> question is a bit of a shot in the dark. I'll first note that the
> CollapsingQParserPlugin included with Solr works as expected on my test
> collection with two shards.
>
> The NPE occurs in my DelegatingCollector's finish() method as it's setting
> the next doc base. It appears I have a null LeafReaderContext. Without
> knowing anything about my code, what is it about multiple shards that might
> throw off a collector like this?
>
> thanks!
> v5.2.1
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


AnalyticsQuery fails on a sharded collection

2016-07-27 Thread tedsolr
I'm looking to create a merge strategy for a custom QParserPlugin I have. The
plugin works fine on collections with one shard. I was very surprised to see
it throw an exception when I ran it against a sharded collection. So my
question is a bit of a shot in the dark. I'll first note that the
CollapsingQParserPlugin included with Solr works as expected on my test
collection with two shards.

The NPE occurs in my DelegatingCollector's finish() method as it's setting
the next doc base. It appears I have a null LeafReaderContext. Without
knowing anything about my code, what is it about multiple shards that might
throw off a collector like this?

thanks!
v5.2.1 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/AnalyticsQuery-fails-on-a-sharded-collection-tp4289274.html
Sent from the Solr - User mailing list archive at Nabble.com.