Inconsistent facet ranges when using distributed search in Solr 4.3

2013-07-31 Thread Jose Aguilar
Hi all,

I am seeing some inconsistent behavior with facets, specifically range facets, 
on Solr 4.3. Running the same query several times (pressing F5 on the browser) 
produces different facet ranges when doing distributed searches, as some times 
it doesn't include some of the buckets. The results of the search are always 
correct as far as I can tell, it is just the range facets that sometimes miss 
ranges . 

Has anyone seen this behavior in Solr before? Any recommendations on how to 
troubleshoot this issue?

Here are some details and an example:

As an example of what I am seeing, take this query, in which I'll be faceting 
on the docnumber field:

http://SERVER:8081/solr/shard1/myhandler?

shards=SERVER:8081/solr/shard1,SERVER:8081/solr/shard2,SERVER:8081/solr/shard3
shards.qt=myhandler
facet=true
facet.field=docnumber
f.docnumber.facet.sort=index
facet.range=docnumber
f.docnumber.facet.range.start=0
f.docnumber.facet.range.gap=100
f.docnumber.facet.range.end=10
f.docnumber.facet.limit=1000
facet.mincount=1
q=type:document
wt=xml

When I run it, I get one of the following three response, seemingly at random 
(haven't been able to notice a pattern so far):

1. Get 859 results (correct), but nothing on the facet ranges:

...
result name=response numFound=859 start=0 maxScore=8.006225
...
lst name=facet_ranges
lst name=docnumber
lst name=counts/
int name=gap100/int
int name=start0/int
int name=end10/int
/lst
/lst

2. Get 859 results (correct), and the correct number of facets come up in the 
facet ranges (118+109+119+122+134+100+100+57=859):

...
result name=response numFound=859 start=0 maxScore=8.006225
...
lst name=facet_ranges
lst name=docnumber
lst name=counts
int name=0118/int
int name=100109/int
int name=200119/int
int name=300122/int
int name=400134/int
int name=500100/int
int name=600100/int
int name=70057/int
/lst
int name=gap100/int
int name=start0/int
int name=end10/int
/lst
/lst

3. Get 859 results (correct), and only a partial number of facet ranges 
(118+109+119+122+134=602 vs. 859 results):

...
result name=response numFound=859 start=0 maxScore=8.006225
...
lst name=facet_ranges
lst name=docnumber
lst name=counts
int name=0118/int
int name=100109/int
int name=200119/int
int name=300122/int
int name=400134/int
/lst
int name=gap100/int
int name=start0/int
int name=end10/int
/lst
/lst

I am using Solr 4.3 (4.3.0 1477023), with these parameters:

Facet-related:
facet=true
facet.field=docnumber
f.docnumber.facet.sort=index
facet.range=docnumber
f.docnumber.facet.range.start=0
f.docnumber.facet.range.gap=100
f.docnumber.facet.range.end=10
f.docnumber.facet.limit=1000
facet.mincount=1

For distributed search (environment has 3 cores in the same box):

shards=SERVER:8081/solr/shard1,SERVER:8081/solr/shard2,SERVER:8081/solr/shard3
shards.qt=myhandler

And the query:
q=type:document
wt=xml

It is also worth noting that the facet field section does come up with the 
correct facets, the issue seems to be related only to the facet  ranges (unless 
I am missing something). In the responses for all three examples above, the 
facet_fields list has all the values for docnumber, from 1 to 756, even if the 
facet ranges are missing buckets.

lst name=facet_fields
lst name=docnumber
int name=11/int
int name=22/int
... (continues on from 3 to 754) ...
int name=7551/int
int name=7561/int
/lst
/lst


Thanks,


Jose. 

Timeout when calling Luke request handler after migrating from Solr 3.5 to 3.6.1

2012-11-19 Thread Jose Aguilar
Hi all,

As part of our business logic we query the Luke request handler to extract the 
fields in the index from our code using the following url:

http://server:8080/solr/admin/luke?wt=jsonnumTerms=0

This worked fine with Solr 3.5, but now with 3.6.1 this call never returns, it 
hangs, and there is no error message in the server logs. Has any one seen this, 
or has an idea of what may be causing this?

The Luke request handler is configured by default, we didn't change the 
configuration for this. If I go to solr/admin/stats.jsp, it is shown:

name: /admin/luke
class: org.apache.solr.handler.admin.LukeRequestHandler
version: $Revision: 1242152 $
description: Lucene Index Browser. Inspired and modeled after Luke: 
http://www.getopt.org/luke/
stats: handlerStart : 1353373022984
requests : 0
errors : 0
timeouts : 0
totalTime : 0
avgTimePerRequest : NaN
avgRequestsPerSecond : 0.0

We are running Apache Tomcat 6.0.35 with JDK 1.7.0_03, in case that rings a 
bell. The index has about

Alternatively, our requirement is to get the list of fields in the index, 
including dynamic fields – is there any other way to obtain this at runtime? It 
is an application that runs on a separate process from Solr, and may even run 
on a separate box, thus the Luke call.

Thank you for any help you can provide.

Jose.


Re: Problem with spellchecker

2012-10-02 Thread Jose Aguilar
Thank you for your help, the whole team overlooked this simple error. It
was driving us crazy! :)

Thanks!! 

Jose.

On 10/2/12 1:23 AM, Markus Jelsma markus.jel...@openindex.io wrote:

The problem is your stray double quote:
str name=queryAnalyzerFieldTypetext_general_fr/str

I'd think this would throw an exception somewhere.
 
 
-Original message-
 From:Jose Aguilar jagui...@searchtechnologies.com
 Sent: Tue 02-Oct-2012 01:40
 To: solr-user@lucene.apache.org
 Subject: Problem with spellchecker
 
 We have configured 2 spellcheckers English and French in solr 4 BETA.
Each spellchecker works with a specific search handler. The English
spellchecker is working as expected with any word regardless of the
case.  On the other hand, the French spellchecker works with lowercase
words. If the first letter is uppercase, then the spellchecker is not
returning any suggestion unless we add the spellcheck.q parameter with
that term. To further clarify, this doesn't return any corrections:
 
 http://localhost:8984/solr/collection1/handler?wt=xmlq=Systme
 
 But this one works as expected:
 
 
http://localhost:8984/solr/collection1/handler?wt=xmlq=Systmespellcheck
.q=Systme
 
 According to this page
(http://wiki.apache.org/solr/SpellCheckComponent#q_OR_spellcheck.q) ,
the spellcheck.q paramater shouldn't be required:
 
 If spellcheck.q is defined, then it is used, otherwise the original
input query is used
 
 Are we missing something?  We double checked the configuration settings
for English which is working fine and it seems well configured.
 
 Here is an extract of the spellcheck component configuration for French
language
 
   searchComponent name=spellcheckfr class=solr.SpellCheckComponent
   str name=queryAnalyzerFieldTypetext_general_fr/str
   lst name=spellchecker
   str name=namedefault/str
   str name=fieldSpellingFr/str
   str name=classnamesolr.DirectSolrSpellChecker/str
   str name=distanceMeasureinternal/str
   float name=accuracy0.5/float
  int name=maxEdits2/int
  int name=minPrefix1/int
   int name=maxInspections5/int
   int name=minQueryLength4/int
   float name=maxQueryFrequency0.01/float
   str name=buildOnCommittrue/str
 /lst
   /searchComponent
 
 Thanks for any help
 



Problem with spellchecker

2012-10-01 Thread Jose Aguilar
We have configured 2 spellcheckers English and French in solr 4 BETA.  Each 
spellchecker works with a specific search handler. The English spellchecker is 
working as expected with any word regardless of the case.  On the other hand, 
the French spellchecker works with lowercase words. If the first letter is 
uppercase, then the spellchecker is not returning any suggestion unless we add 
the spellcheck.q parameter with that term. To further clarify, this doesn't 
return any corrections:

http://localhost:8984/solr/collection1/handler?wt=xmlq=Systme

But this one works as expected:

http://localhost:8984/solr/collection1/handler?wt=xmlq=Systmespellcheck.q=Systme

According to this page 
(http://wiki.apache.org/solr/SpellCheckComponent#q_OR_spellcheck.q) , the 
spellcheck.q paramater shouldn't be required:

If spellcheck.q is defined, then it is used, otherwise the original input 
query is used

Are we missing something?  We double checked the configuration settings for 
English which is working fine and it seems well configured.

Here is an extract of the spellcheck component configuration for French language

  searchComponent name=spellcheckfr class=solr.SpellCheckComponent
  str name=queryAnalyzerFieldTypetext_general_fr/str
  lst name=spellchecker
  str name=namedefault/str
  str name=fieldSpellingFr/str
  str name=classnamesolr.DirectSolrSpellChecker/str
  str name=distanceMeasureinternal/str
  float name=accuracy0.5/float
 int name=maxEdits2/int
 int name=minPrefix1/int
  int name=maxInspections5/int
  int name=minQueryLength4/int
  float name=maxQueryFrequency0.01/float
  str name=buildOnCommittrue/str
/lst
  /searchComponent

Thanks for any help


Enforce overall Solr timeout

2012-01-18 Thread Jose Aguilar
Hi all,

Is there a setting to enforce an overall timeout for Solr? For example, we are 
using setting timeallowed=2000 in solrconfig.xml (using version 3.5), but as 
far as I can tell, that only applies to the search part that returns partial 
results if it takes more than 2 seconds and returns partialResults=true, but 
the other processing time (facetting, highlighting, etc) is not covered in this 
timeallowed setting.

Is there something that can be done so that for example if a Solr call overall 
takes more than say 5 seconds, kill the request it and return an error, or 
empty response or something?

--
Jose Aguilar.


Using sort_values (fsv=true parameter) and Field Collapsing (group=true) at the same time

2011-12-27 Thread Jose Aguilar
Hi all,

I am using Solr 4.0 trunk with the Field Collapsing feature 
(http://wiki.apache.org/solr/FieldCollapsing) and I notice that when used at 
the same time as the fsv=true parameter, the sort_values in the response is 
gone. I haven't found much information about the fsv parameter, so I turned to 
the list to see if someone here can help us out, or shed some light if there is 
any incompatibility between the two features (which is what I think is 
happening, because of the field collapse implementation). Or maybe give us some 
pointers on how to achieve a similar effect.

We use fsv=true to help in debugging as to why one document was sorted on top 
of the other when using certain sort orders in our application, so this is a  
great way to visualize this and save us debugging time.

To clarify further, we send in this query to Solr expecting the grouped, 
sort_values and debug tags to be on the response, with the sort_values 
arrays corresponding to the first element of each group:

http://localhost:8983/solr/select?wt=xmlfl=*q=solr+memorygroup=truegroup.field=manu_exactfsv=truedebugQuery=on…

But we don't get the sort_values part back, we only get the following 
top-level tags in the response:
response
lst name=responseHeader…
lst name=grouped…
lst name=debug…
/response

If we don't use Field Collapsing, and instead send in something like this:

http://localhost:8983/solr/select?wt=xmlfl=*q=solr+memoryfsv=truedebugQuery=on…

Then we do get the sort_values element in the response:

response
lst name=responseHeader
lst name=sort_values
result name=response…
lst name=debug
/response

Is there some incompatibility between the two features? Any other way to 
retrieve this information in a way that would be compatible with field 
collapsing?

Thanks,

Jose Aguilar