Re: Query results vs. facets results

2012-07-16 Thread Erick Erickson
Ahhh, you need to look down another few lines. When you specify fq, there
should be a section of the debug output like
arr name=filter_queries
  .
  .
  .
/arr

where the array is the parsed form of the filter queries. I was thinking about
comparing that with the parsed form of the q parameter in the non-filter
case to see what insight one could gain from that.

But there's already one difference, when you use *, you get
 str name=parsedqueryID:*/str

Is it possible that you have some documents that do NOT have an ID field?
try *:* rather than just *. I'm guessing that your default search field is ID
and you have some documents without an ID field. Not a good guess if ID
is your uniqueKey though..

Try q=*:* -ID:* and see if you get 31 docs.

Also note that if you _have_ specified ID as your uniqueKey _but_ you didn't
re-index afterwards (actually, I'd blow away the entire
solrhome/data directory
and restart) you may have stale data in there that allowed documents to exist
that do not have uniqueKey fields.

Best
Erick

On Sun, Jul 15, 2012 at 4:49 PM, tudor tudor.zaha...@gmail.com wrote:
 Hi Erick,

 Thanks for the reply.

 The query:

 http://localhost:8983/solr/db/select?indent=onversion=2.2q=CITY:MILTONfq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.ngroups=truegroup.truncate=truedebugQuery=on

 yields this in the debug section:

 lst name=debugstr name=rawquerystringCITY:MILTON/str
   str name=querystringCITY:MILTON/str
   str name=parsedqueryCITY:MILTON/str
   str name=parsedquery_toStringCITY:MILTON/str
   str name=QParserLuceneQParser/str

 There is no information about grouping.

 Second query:

 http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field=CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

 yields this in the debug section:

 lst name=debug
   str name=rawquerystring*/str
   str name=querystring*/str
   str name=parsedqueryID:*/str
   str name=parsedquery_toStringID:*/str
   str name=QParserLuceneQParser/str

 To be honest, these do not tell me too much. I would like to see some
 information about the grouping, since I believe this is where I am missing
 something.

 In the mean time, I have combined the two queries above, hoping to make some
 sense out of the results. The following query filters all the entries with
 the city name MILTON and groups together the ones with the same ID. Also,
 the query facets the entries on city, grouping the ones with the same ID. So
 the results numbers refer to the number of groups.

 http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

 yields the same (for me perplexing) results:

 lst name=grouped
   lst name=ID
   int name=matches284/int
   int name=ngroups134/int

 (i.e.: fq says: 134 groups with CITY:MILTON)
 ...

 lst name=facet_counts
   lst name=facet_queries/
   lst name=facet_fields
...
   int name=MILTON103/int

 (i.e.: faceted search says: 103 groups with CITY:MILTON)

 I really believe that these different results have something to do with the
 grouping that Solr makes, but I do not know how to dig into this.

 Thank you and best regards,
 Tudor

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995156.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-16 Thread tudor

Erick Erickson wrote
 
 Ahhh, you need to look down another few lines. When you specify fq, there
 should be a section of the debug output like
 arr name=filter_queries
   .
   .
   .
 /arr
 
 where the array is the parsed form of the filter queries. I was thinking
 about
 comparing that with the parsed form of the q parameter in the non-filter
 case to see what insight one could gain from that.
 
 

There is no filter_queries section because I do not use an fq in the first
two queries. I use one in the combined query, for which you can see the
output further below.


Erick Erickson wrote
 
 
 But there's already one difference, when you use *, you get
  str name=parsedqueryID:*/str
 
 Is it possible that you have some documents that do NOT have an ID field?
 try *:* rather than just *. I'm guessing that your default search field is
 ID
 and you have some documents without an ID field. Not a good guess if ID
 is your uniqueKey though..
 
 Try q=*:* -ID:* and see if you get 31 docs.
 
 

All the entries have an ID, so q=*:* -ID:* yielded 0 results.
The ID could appear multiple times, that is the reason behind grouping of
results. Indeed, ID is the default search field.


Erick Erickson wrote
 
 
 Also note that if you _have_ specified ID as your uniqueKey _but_ you
 didn't
 re-index afterwards (actually, I'd blow away the entire
 solrhome/data directory
 and restart) you may have stale data in there that allowed documents to
 exist
 that do not have uniqueKey fields.
 
 

For Solr's unique id I use a fieldType name=uuid class=solr.UUIDField
indexed=true / field (which, of course, has a different name than the
default search ID), so it should not be a problem.

I have re-indexed the data, and I get somewhat a different result. This is
the query:

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*:*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=STR_ENTERPRISE_IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

And the results as well as the debug information:

lst name=grouped
  lst name=ID
  int name=matches284/int
  int name=ngroups134/int
  arr name=groups/arr
   ...

lst name=facet_counts
  lst name=facet_queries/
lst name=facet_fields
lst name=CITY
  ...
int name=MILTON89/int
  ...

lst name=debug
  str name=rawquerystring*:*/str
  str name=querystring*:*/str
  str name=parsedqueryMatchAllDocsQuery(*:*)/str
  str name=parsedquery_toString*:*/str
  lst name=explain/lst
  str name=QParserLuceneQParser/str
  arr name=filter_queries
  str{!tag=dt}CITY:MILTON/str
  /arrarr name=parsed_filter_queries
  strCITY:MILTON/str
  /arr
  lst name=timing/lst
/lst

So now fq says: 134 groups with CITY:MILTON and faceted search says: 83
groups with CITY:MILTON. 

How can I see some information about the grouping in Solr?

Thanks Erick!

Regards,
Tudor


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995388.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread Erick Erickson
q and fq queries don't necessarily run through the same query parser, see:
http://wiki.apache.org/solr/SimpleFacetParameters#facet.query_:_Arbitrary_Query_Faceting

So try adding debugQuery=on to both queries you submitted. My guess
is that if you look at the parsed queries, you'll see something that explains
your differences. If not, paste the results back and we can take a look.

BTW, ignore all the explain bits for now, the important bit is the parsed form
of q and fq in your queries.

Best
Erick

On Sat, Jul 14, 2012 at 5:11 AM, tudor tudor.zaha...@gmail.com wrote:
 Hello,

 I am new to Solr and I running some tests with our data in Solr. We are
 using version 3.6 and the data is imported form a DB2 database using Solr's
 DIH. We have defined a single entity in the db-data-config.xml, which is an
 equivalent of the following query:
 entity name=connections
 query=
 SELECT C.NAME,
F.CITY
 FROM
 NAME_CONNECTIONS AS C
 JOIN NAME_DETAILS AS F
 ON C.DETAILS_NAME = F.NAME

 /entity

 This might lead to some names appearing multiple times in the result set.
 This is OK.

 For the unique ID in the schema, we are using a solr.UUIDField:

 fieldType name=uuid class=solr.UUIDField indexed=true /
 field name=quot;idquot; type=quot;uuidquot; indexed=quot;truequot;
 stored=quot;truequot; default=quot;NEWquot;/

 All the searchable fields are declared as indexed and stored.

 I am aware of the fact that this is a very crude configuration, but for the
 tests that I am running it is fine.

 The problem that I have is the different result counts that I receive when I
 do equivalent queries for searching and faceting. For example, running the
 following query

 http://localhost:8983/solr/db/select?indent=onamp;version=2.2amp;q=CITY:MILTONamp;fq=amp;start=0amp;rows=100amp;fl=*amp;wt=amp;explainOther=amp;hl.fl=amp;group=trueamp;group.field=NAMEamp;group.ngroups=trueamp;group.truncate=true

 yields

 lt;int name=quot;ngroupsquot;134/int

 as a result, which is exactly what we expect.

 On the other hand, running

 http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=NAMEgroup.truncate=truefacet=truefacet.field=CITYgroup.ngroups=true

 yields

 lst name=facet_counts
lst name=facet_queries/
  lst name=facet_fields
   lst name=CITY
 int name=MILTON103/int

 I would expect to have the same number (134) in this facet result as well.
 Could you please let me know why these two results are different?

 Thank you,
 Tudor



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3994988.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Eric,

Thanks for the reply.

The query:
 
http://localhost:8983/solr/db/select?indent=onversion=2.2q=CITY:MILTONfq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.ngroups=truegroup.truncate=truedebugQuery=on

yields this in the debug section:

lst name=debugstr name=rawquerystringCITY:MILTON/str
  str name=querystringCITY:MILTON/str
  str name=parsedqueryCITY:MILTON/str
  str name=parsedquery_toStringCITY:MILTON/str
  str name=QParserLuceneQParser/str

in the explain section. There is no information about grouping.

Second query:

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field=CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

yields this in the debug section:

lst name=debug
  str name=rawquerystring*/str
  str name=querystring*/str
  str name=parsedqueryID:*/str
  str name=parsedquery_toStringID:*/str
  str name=QParserLuceneQParser/str

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something.

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name MILTON and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups.

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

yields the same (for me perplexing) results:

lst name=grouped
  lst name=ID
  int name=matches284/int
  int name=ngroups134/int

(i.e.: fq says: 134 groups with CITY:MILTON)
...

lst name=facet_counts
  lst name=facet_queries/
  lst name=facet_fields
   ...
  int name=MILTON103/int

(i.e.: faceted search says: 103 groups with CITY:MILTON)

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this.

Thank you and best regards,
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995150.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Erick, 

Thanks for the reply. 

The query: 
  
http://localhost:8983/solr/db/select?indent=onversion=2.2q=CITY:MILTONfq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.ngroups=truegroup.truncate=truedebugQuery=on

yields this in the debug section: 

lst name=debugstr name=rawquerystringCITY:MILTON/str
  str name=querystringCITY:MILTON/str
  str name=parsedqueryCITY:MILTON/str
  str name=parsedquery_toStringCITY:MILTON/str
  str name=QParserLuceneQParser/str

in the explain section. There is no information about grouping. 

Second query: 

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field=CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

yields this in the debug section: 

lst name=debug
  str name=rawquerystring*/str
  str name=querystring*/str
  str name=parsedqueryID:*/str
  str name=parsedquery_toStringID:*/str
  str name=QParserLuceneQParser/str

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something. 

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name MILTON and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups. 

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on
 

yields the same (for me perplexing) results: 

lst name=grouped
  lst name=ID
  int name=matches284/int
  int name=ngroups134/int

(i.e.: fq says: 134 groups with CITY:MILTON) 
... 

lst name=facet_counts
  lst name=facet_queries/
  lst name=facet_fields
   ... 
  int name=MILTON103/int

(i.e.: faceted search says: 103 groups with CITY:MILTON) 

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this. 

Thank you and best regards, 
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995152.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Eric, 

Thanks for the reply. 

The query: 
  
http://localhost:8983/solr/db/select?indent=onversion=2.2q=CITY:MILTONfq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.ngroups=truegroup.truncate=truedebugQuery=on

yields this in the debug section: 

lst name=debugstr name=rawquerystringCITY:MILTON/str
  str name=querystringCITY:MILTON/str
  str name=parsedqueryCITY:MILTON/str
  str name=parsedquery_toStringCITY:MILTON/str
  str name=QParserLuceneQParser/str

There is no information about grouping. 

Second query: 

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field=CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

yields this in the debug section: 

lst name=debug
  str name=rawquerystring*/str
  str name=querystring*/str
  str name=parsedqueryID:*/str
  str name=parsedquery_toStringID:*/str
  str name=QParserLuceneQParser/str

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something. 

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name MILTON and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups. 

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on
 

yields the same (for me perplexing) results: 

lst name=grouped
  lst name=ID
  int name=matches284/int
  int name=ngroups134/int

(i.e.: fq says: 134 groups with CITY:MILTON) 
... 

lst name=facet_counts
  lst name=facet_queries/
  lst name=facet_fields
   ... 
  int name=MILTON103/int

(i.e.: faceted search says: 103 groups with CITY:MILTON) 

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this. 

Thank you and best regards, 
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995154.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Query results vs. facets results

2012-07-15 Thread tudor
Hi Erick, 

Thanks for the reply. 

The query: 
  
http://localhost:8983/solr/db/select?indent=onversion=2.2q=CITY:MILTONfq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.ngroups=truegroup.truncate=truedebugQuery=on

yields this in the debug section: 

lst name=debugstr name=rawquerystringCITY:MILTON/str
  str name=querystringCITY:MILTON/str
  str name=parsedqueryCITY:MILTON/str
  str name=parsedquery_toStringCITY:MILTON/str
  str name=QParserLuceneQParser/str

There is no information about grouping. 

Second query: 

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq=start=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field=CITYfacet.missing=truegroup.ngroups=truedebugQuery=on

yields this in the debug section: 

lst name=debug
  str name=rawquerystring*/str
  str name=querystring*/str
  str name=parsedqueryID:*/str
  str name=parsedquery_toStringID:*/str
  str name=QParserLuceneQParser/str

To be honest, these do not tell me too much. I would like to see some
information about the grouping, since I believe this is where I am missing
something. 

In the mean time, I have combined the two queries above, hoping to make some
sense out of the results. The following query filters all the entries with
the city name MILTON and groups together the ones with the same ID. Also,
the query facets the entries on city, grouping the ones with the same ID. So
the results numbers refer to the number of groups. 

http://localhost:8983/solr/db/select?indent=onversion=2.2q=*fq={!tag=dt}CITY:MILTONstart=0rows=10fl=*wt=explainOther=hl.fl=group=truegroup.field=IDgroup.truncate=truefacet=truefacet.field={!ex=dt}CITYfacet.missing=truegroup.ngroups=truedebugQuery=on
 

yields the same (for me perplexing) results: 

lst name=grouped
  lst name=ID
  int name=matches284/int
  int name=ngroups134/int

(i.e.: fq says: 134 groups with CITY:MILTON) 
... 

lst name=facet_counts
  lst name=facet_queries/
  lst name=facet_fields
   ... 
  int name=MILTON103/int

(i.e.: faceted search says: 103 groups with CITY:MILTON) 

I really believe that these different results have something to do with the
grouping that Solr makes, but I do not know how to dig into this. 

Thank you and best regards, 
Tudor

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Query-results-vs-facets-results-tp3995079p3995156.html
Sent from the Solr - User mailing list archive at Nabble.com.