Re: Grouping and result pagination

2017-03-21 Thread Shawn Heisey
On 3/21/2017 10:34 AM, Shawn Heisey wrote:
> Restating the original problem:  I cannot paginate through the groups
> in a grouped query.  The first page works, subsequent pages do not.  I
> have a distributed index.  Co-locating documents in the same group
> onto the same shard is going to require a complete redesign of
> indexing.  It's something that could be done, but not without a LOT of
> work.

Strange thing ... now when I try a paginated query, it works.  I have no
idea what I was doing differently before when it wasn't working.

solr-impl version:
4.9-SNAPSHOT 1680667 - solr - 2015-05-20 14:23:11

I have discovered that I can't get the query to work at all on 6.3.0
with my schema even without pagination.  I've encountered this bug again:

https://issues.apache.org/jira/browse/SOLR-8088

Thanks,
Shawn



Re: Grouping and result pagination

2017-03-21 Thread Shawn Heisey

On 3/17/2017 9:26 AM, Shawn Heisey wrote:

On 3/17/2017 9:07 AM, Erick Erickson wrote:
"group.ngroups and group.facet require that all documents in each 
group must be co-located on the same shard in order for accurate 
counts to be returned."
That is not how things work right now. The index has 170 million 
documents in it, split into six large cold shards and a very small hot 
shard.


Restating the original problem:  I cannot paginate through the groups in 
a grouped query.  The first page works, subsequent pages do not.  I have 
a distributed index.  Co-locating documents in the same group onto the 
same shard is going to require a complete redesign of indexing.  It's 
something that could be done, but not without a LOT of work.


Should it be considered a bug that this doesn't work at all?  I call it 
a bug.  I'd be OK with being told that performance of paginated queries 
with grouping is terrible on a distributed index, but I'd like it to at 
least function.


Thanks,
Shawn



Re: Grouping and result pagination

2017-03-17 Thread Shawn Heisey
On 3/17/2017 9:07 AM, Erick Erickson wrote:
> I think the answer is that you have to co-locate the docs with the
> same value you're grouping by on the same shard whether in SolrCloud
> or not...
>
> Hmmm: from: 
> https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats
>
> "group.ngroups and group.facet require that all documents in each
> group must be co-located on the same shard in order for accurate
> counts to be returned."

That is not how things work right now.  The index has 170 million
documents in it, split into six large cold shards and a very small hot
shard.  The routing I'm using for the cold shards is the CRC32 hash of
the database primary key (different field than Solr's uniqueKey) run
through a mod function to determine shard number (0-5).  The hash/mod
calculation is done in the MySQL query.

Is pagination of a grouped query impossible with this index?

I suppose it's theoretically possible that I could hash the set name
instead of the DB primary key which would result in docs from a set
being co-located.  Would that help?  My worry with that approach is that
the cold shards would no longer have relatively uniform sizes.

Thanks,
Shawn



Re: Grouping and result pagination

2017-03-17 Thread Erick Erickson
I think the answer is that you have to co-locate the docs with the
same value you're grouping by on the same shard whether in SolrCloud
or not...

Hmmm: from: 
https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats

"group.ngroups and group.facet require that all documents in each
group must be co-located on the same shard in order for accurate
counts to be returned."

Best,
Erick

On Fri, Mar 17, 2017 at 8:00 AM, Shawn Heisey  wrote:
> We use pagination (start/rows) frequently with our queries.  Nothing
> unusual there.
>
> Now we have need to use grouping with a request like this, for a
> set-mode search, where only one document from each set is returned:
>
> http://idxb1.REDACTED.com:8981/solr/ncmain/lbcheck?q=*:*=true=set_name=set_lead%20desc=1=50
>
> We've worked through most of the problems encountered with this idea.
> The first page of results works perfectly.
>
> The remaining problem is that I cannot seem to paginate -- set the start
> value to 50, 100, etc.  I found some information saying that
> group.ngroups=true is required for pagination, so I added that.  I have
> found that occasionally I can load page two (rows=50=50), but that
> *most* of the time, I can't even get page two to load, and further pages
> have never worked.  The response contains no documents.
>
> The index is distributed (sharded), but not running SolrCloud.
>
> The server where I am trying this is running a SNAPSHOT build of 4.9.  I
> haven't had an opportunity yet to try a newer version -- we don't have
> newer versions on any of the machines for this index.  I can only
> upgrade as far as 5.3, because that's as far as we can go with a
> third-party plugin we are using.
>
> I found the following issue, which says it was fixed before 4.0 was
> released:
>
> https://issues.apache.org/jira/browse/SOLR-2207
>
> Does anyone know whether pagination with grouping is expected to work,
> and if so, how to do it?
>
> Thanks,
> Shawn
>


Grouping and result pagination

2017-03-17 Thread Shawn Heisey
We use pagination (start/rows) frequently with our queries.  Nothing
unusual there.

Now we have need to use grouping with a request like this, for a
set-mode search, where only one document from each set is returned:

http://idxb1.REDACTED.com:8981/solr/ncmain/lbcheck?q=*:*=true=set_name=set_lead%20desc=1=50

We've worked through most of the problems encountered with this idea. 
The first page of results works perfectly.

The remaining problem is that I cannot seem to paginate -- set the start
value to 50, 100, etc.  I found some information saying that
group.ngroups=true is required for pagination, so I added that.  I have
found that occasionally I can load page two (rows=50=50), but that
*most* of the time, I can't even get page two to load, and further pages
have never worked.  The response contains no documents.

The index is distributed (sharded), but not running SolrCloud.

The server where I am trying this is running a SNAPSHOT build of 4.9.  I
haven't had an opportunity yet to try a newer version -- we don't have
newer versions on any of the machines for this index.  I can only
upgrade as far as 5.3, because that's as far as we can go with a
third-party plugin we are using.

I found the following issue, which says it was fixed before 4.0 was
released:

https://issues.apache.org/jira/browse/SOLR-2207

Does anyone know whether pagination with grouping is expected to work,
and if so, how to do it?

Thanks,
Shawn