Re: Read past EOF error due to broken connection

2011-06-22 Thread pravesh
First commit and then try again to search.

You can also use lucene's CheckIndex tool to check  fix your index (it may
remove some corrupt segments in your index)

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Read-past-EOF-error-due-to-broken-connection-tp3091247p3094334.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem in accessing a variable's changed value outside of if block in javascript code

2011-06-22 Thread Romi
*$(#submit).click(function(){
var query=getquerystring() ; //get the query string entered by user
// get the JSON response from solr server 
var newquery=query;

   
$.getJSON(http://192.168.1.9:8983/solr/db/select/?wt=jsonstart=0rows=100q=+query+json.wrf=?;,
function(result){
//$.each(result.response.docs, function(result){

if(result.response.numFound==0)
{

   
$.getJSON(http://192.168.1.9:8983/solr/db/select/?wt=jsonstart=0rows=100q=+query+spellcheck=truejson.wrf=?;,
function(result){


$.each(result.spellcheck.suggestions, function(i,item){
newquery=item.suggestion;

});

});

}*


favorite


$(#submit).click(function(){
var query=getquerystring() ; //get the query string entered by user
// get the JSON response from solr server 
var newquery=query;

   
$.getJSON(http://192.168.1.9:8983/solr/db/select/?wt=jsonstart=0rows=100q=+query+json.wrf=?;,
function(result){
//$.each(result.response.docs, function(result){

if(result.response.numFound==0)
{

   
$.getJSON(http://192.168.1.9:8983/solr/db/select/?wt=jsonstart=0rows=100q=+query+spellcheck=truejson.wrf=?;,
function(result){


$.each(result.spellcheck.suggestions, function(i,item){
newquery=item.suggestion;

});

});

}

In the above javascript code a variable newquery initialy having value of
query. but when the if condition is true its value have changed. but my
problem is i am not getting its changed value outside of if block while i
want this changed value. how can i do this.


-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-in-accessing-a-variable-s-changed-value-outside-of-if-block-in-javascript-code-tp3094342p3094342.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Clustering For Multiple Pages

2011-06-22 Thread nilay....@gmail.com
Thanks Alot . I was thinking  i am not doing in correct way . 

-
Regards
Nilay Tiwari
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Clustering-For-Multiple-Pages-tp3085507p3094379.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Clustering For Multiple Pages

2011-06-22 Thread nilay....@gmail.com
Can you please tell me how can i apply filter in cluster data  in Solr  ? 

Currently i  storing docid and topic name in Map and get the ids  by topic 
from Map and then pass into solr separating by OR condition 

Is there any other way to do this 



-
Regards
Nilay Tiwari
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Clustering-For-Multiple-Pages-tp3085507p3094390.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Parse solr json object

2011-06-22 Thread lee carroll
try this mail list
http://docs.jquery.com/Discussion
or this doc
http://api.jquery.com/jQuery.each/


On 21 June 2011 07:32, Romi romijain3...@gmail.com wrote:
 Hi, for enabling highlighting i want to parse json object. for readilibility
 i included xml format of that json object. please tell me how should i parse
 this object using
 $.each(, function(i,item){

 so that i could get highlighted result.


 lst name=highlighting
 -
 lst name=12250
 -
 arr name=description
 -
 str
 These emelegant/em and fluid earrings have six round prong-set and
 twenty-six faceted briolette
 /str
 /arr
 /lst
 -
 lst name=12254
 -
 arr name=description
 -
 str
 These emelegant/em and fluid earrings have six round prong-set and
 twenty-six faceted briolette
 /str
 /arr
 /lst


 -
 Thanks  Regards
 Romi
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Parse-solr-json-object-tp3089470p3089470.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr Clustering For Multiple Pages

2011-06-22 Thread Stanislaw Osinski
I don't quite follow, I must admit. Maybe it's faceting you're after?

http://wiki.apache.org/solr/SolrFacetingOverview

Staszek

On Wed, Jun 22, 2011 at 08:40, nilay@gmail.com nilay@gmail.comwrote:

 Can you please tell me how can i apply filter in cluster data  in Solr  ?

 Currently i  storing docid and topic name in Map and get the ids  by topic
 from Map and then pass into solr separating by OR condition

 Is there any other way to do this



 -
 Regards
 Nilay Tiwari
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Clustering-For-Multiple-Pages-tp3085507p3094390.html
 Sent from the Solr - User mailing list archive at Nabble.com.



strange utf-8 problem

2011-06-22 Thread ramires

 I use solr 4 trunk  to index some sites with nutch 1-2-rc4.  When i try to
index 300k documents with solr4 i get error.
But when i use solr 1.4.1 version there is no problem like that. I install
solr4 to tomcat5,6 jetty7,8 there is no change.

I use apache-solr-core-1.4.0.jar apache-solr-solrj-1.4.0.jar for solr 1.4.1
becouse of javabin errors.

here is problematic chars.  Sao Tom���nd Princip���STP

SEVERE: java.lang.RuntimeException: [was class
java.io.CharConversionException] Invalid UTF-8 character 0x at char
#681112, byte #700315)
at
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at
com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at
com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:266)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:126)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1308)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1323)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:476)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:480)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:937)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:183)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:871)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:247)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
at org.eclipse.jetty.server.Server.handle(Server.java:346)
at
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:589)
at
org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:1065)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:915)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
at
org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:411)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:535)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:40)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.CharConversionException: Invalid UTF-8 character 0x
at char #681112, byte #700315)
at com.ctc.wstx.io.UTF8Reader.reportInvalid(UTF8Reader.java:335)
at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:249)
at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
at
com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
at
com.ctc.wstx.sr.BasicStreamReader.readTextSecondary(BasicStreamReader.java:4628)
at
com.ctc.wstx.sr.BasicStreamReader.readCoalescedText(BasicStreamReader.java:4126)
at
com.ctc.wstx.sr.BasicStreamReader.finishToken(BasicStreamReader.java:3701)
at
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3649)
... 32 more


--
View this message in context: 
http://lucene.472066.n3.nabble.com/strange-utf-8-problem-tp3094473p3094473.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MultiValued facet behavior question

2011-06-22 Thread Dennis de Boer
Hi Bill,

yes, you absolutely do make sense. I posted the exact same question to this
mailing list (subject: faceting on multivalued fields), but got no response
out of it. A friend of mine is now helping out.

I hope someone on the list can give us some advice. I'll post our findings
to this topic.

Regards,
Dennis


On Wed, Jun 22, 2011 at 5:37 AM, Bill Bell billnb...@gmail.com wrote:

 Doing it with q=specialities:Cardiologist or
 q=CardiologistdefType=dismaxqf=specialties
 does not matter, the issue is how I see facets. I want the facets to only
 show the one match,
 and not all the multiValued fields in specialties that match...

 Example,

 Name|specialties
 Bell|Cardiologist
 Smith|Cardiologist,Family Doctor
 Adams,Cardiologist,Family Doctor,Internist

 When I facet.field=specialties I get:

 Cardiologist: 3
 Internist: 1
 Family Doctor: 1


 I only want it to return:

 Cardiologist: 3

 Because this matches exactly... Facet on the field that matches and only
 return the number for that.

 It can get more complicated. Here is another example:

 q=cardiologydefType=dismaxqf=specialties


 (Cardiology and cardiologist are stems)...

 But I don't really know which value in Cardiologist match perfectly.

 Again, I only want it to return:

 Cardiologist: 3

 If I searched on q=internistdefType=dismaxqf=specialties, I want the
 result to be:


 Internist: 1


 Does this all make sense?







 On 6/21/11 8:23 PM, Darren Govoni dar...@ontrenet.com wrote:

 So are you saying that for all results for cardiologist,
 you don't want facets not matching Cardiologist to be
 returned as facets?
 
 what happens when you make q=specialities:Cardiologist?
 instead of just q=Cardiologist?
 
 Seems that if you make the query on the field, then all
 your results will necessarily qualify and you can discard
 any additional facets you don't want (e.g. that don't
 match the initial query term).
 
 Maybe you can write what you see now, with what you
 want to help clarify.
 
 On 06/21/2011 09:47 PM, Bill Bell wrote:
  I have a field: specialties that is multiValued.
 
  It indicates the doctor's specialties: cardiologist, internist, etc.
 
  When someone does a search: Cardiologist, I use
 
 q=cardiologistdefType=dismaxqf=specialtiesfacet=truefacet.field=speci
 alt
  ies
 
  What I want to come out in the facet is the Cardiologist (since it
 matches
  exactly) and the number that matches: 700.
  I don't want to see the other values that are not Cardiologist.
 
  Now I see:
 
  Cardiologist: 700
  Internist: 45
  Family Doctor: 20
 
  This means that several Cardiologist's are also internists and family
  doctors. When it matches exactly, I don't want to see Internists, Family
  Doctors. How do I send a query to Solr with a condition.
  Facet.query=specialties:Cardiologistfacet.field=specialties
 
  Then if the query returns something use it, otherwise use the field one?
 
  Other ideas?
 
 
 
 
 





Re: MultiValued facet behavior question

2011-06-22 Thread lee carroll
Can your front end app normalize the q parameter. Either with a drop
down or a type a head derived from the values in the specialties
field. that way q will match value(s) in your facet results. I'm not
sure what you are trying to achieve though so maybe i'm off the mark.



On 22 June 2011 04:37, Bill Bell billnb...@gmail.com wrote:
 Doing it with q=specialities:Cardiologist or
 q=CardiologistdefType=dismaxqf=specialties
 does not matter, the issue is how I see facets. I want the facets to only
 show the one match,
 and not all the multiValued fields in specialties that match...

 Example,

 Name|specialties
 Bell|Cardiologist
 Smith|Cardiologist,Family Doctor
 Adams,Cardiologist,Family Doctor,Internist

 When I facet.field=specialties I get:

 Cardiologist: 3
 Internist: 1
 Family Doctor: 1


 I only want it to return:

 Cardiologist: 3

 Because this matches exactly... Facet on the field that matches and only
 return the number for that.

 It can get more complicated. Here is another example:

 q=cardiologydefType=dismaxqf=specialties


 (Cardiology and cardiologist are stems)...

 But I don't really know which value in Cardiologist match perfectly.

 Again, I only want it to return:

 Cardiologist: 3

 If I searched on q=internistdefType=dismaxqf=specialties, I want the
 result to be:


 Internist: 1


 Does this all make sense?







 On 6/21/11 8:23 PM, Darren Govoni dar...@ontrenet.com wrote:

So are you saying that for all results for cardiologist,
you don't want facets not matching Cardiologist to be
returned as facets?

what happens when you make q=specialities:Cardiologist?
instead of just q=Cardiologist?

Seems that if you make the query on the field, then all
your results will necessarily qualify and you can discard
any additional facets you don't want (e.g. that don't
match the initial query term).

Maybe you can write what you see now, with what you
want to help clarify.

On 06/21/2011 09:47 PM, Bill Bell wrote:
 I have a field: specialties that is multiValued.

 It indicates the doctor's specialties: cardiologist, internist, etc.

 When someone does a search: Cardiologist, I use

q=cardiologistdefType=dismaxqf=specialtiesfacet=truefacet.field=speci
alt
 ies

 What I want to come out in the facet is the Cardiologist (since it
matches
 exactly) and the number that matches: 700.
 I don't want to see the other values that are not Cardiologist.

 Now I see:

 Cardiologist: 700
 Internist: 45
 Family Doctor: 20

 This means that several Cardiologist's are also internists and family
 doctors. When it matches exactly, I don't want to see Internists, Family
 Doctors. How do I send a query to Solr with a condition.
 Facet.query=specialties:Cardiologistfacet.field=specialties

 Then if the query returns something use it, otherwise use the field one?

 Other ideas?










Re: MultiValued facet behavior question

2011-06-22 Thread lee carroll
Oh sorry forgot to also type:
Often facet fields are not stemmed or heavily analysed. The facet
values are from the index.


On 22 June 2011 08:21, lee carroll lee.a.carr...@googlemail.com wrote:
 Can your front end app normalize the q parameter. Either with a drop
 down or a type a head derived from the values in the specialties
 field. that way q will match value(s) in your facet results. I'm not
 sure what you are trying to achieve though so maybe i'm off the mark.



 On 22 June 2011 04:37, Bill Bell billnb...@gmail.com wrote:
 Doing it with q=specialities:Cardiologist or
 q=CardiologistdefType=dismaxqf=specialties
 does not matter, the issue is how I see facets. I want the facets to only
 show the one match,
 and not all the multiValued fields in specialties that match...

 Example,

 Name|specialties
 Bell|Cardiologist
 Smith|Cardiologist,Family Doctor
 Adams,Cardiologist,Family Doctor,Internist

 When I facet.field=specialties I get:

 Cardiologist: 3
 Internist: 1
 Family Doctor: 1


 I only want it to return:

 Cardiologist: 3

 Because this matches exactly... Facet on the field that matches and only
 return the number for that.

 It can get more complicated. Here is another example:

 q=cardiologydefType=dismaxqf=specialties


 (Cardiology and cardiologist are stems)...

 But I don't really know which value in Cardiologist match perfectly.

 Again, I only want it to return:

 Cardiologist: 3

 If I searched on q=internistdefType=dismaxqf=specialties, I want the
 result to be:


 Internist: 1


 Does this all make sense?







 On 6/21/11 8:23 PM, Darren Govoni dar...@ontrenet.com wrote:

So are you saying that for all results for cardiologist,
you don't want facets not matching Cardiologist to be
returned as facets?

what happens when you make q=specialities:Cardiologist?
instead of just q=Cardiologist?

Seems that if you make the query on the field, then all
your results will necessarily qualify and you can discard
any additional facets you don't want (e.g. that don't
match the initial query term).

Maybe you can write what you see now, with what you
want to help clarify.

On 06/21/2011 09:47 PM, Bill Bell wrote:
 I have a field: specialties that is multiValued.

 It indicates the doctor's specialties: cardiologist, internist, etc.

 When someone does a search: Cardiologist, I use

q=cardiologistdefType=dismaxqf=specialtiesfacet=truefacet.field=speci
alt
 ies

 What I want to come out in the facet is the Cardiologist (since it
matches
 exactly) and the number that matches: 700.
 I don't want to see the other values that are not Cardiologist.

 Now I see:

 Cardiologist: 700
 Internist: 45
 Family Doctor: 20

 This means that several Cardiologist's are also internists and family
 doctors. When it matches exactly, I don't want to see Internists, Family
 Doctors. How do I send a query to Solr with a condition.
 Facet.query=specialties:Cardiologistfacet.field=specialties

 Then if the query returns something use it, otherwise use the field one?

 Other ideas?











Re: MultiValued facet behavior question

2011-06-22 Thread Michael Kuhlmann
Am 22.06.2011 05:37, schrieb Bill Bell:
 It can get more complicated. Here is another example:
 
 q=cardiologydefType=dismaxqf=specialties
 
 
 (Cardiology and cardiologist are stems)...
 
 But I don't really know which value in Cardiologist match perfectly.
 
 Again, I only want it to return:
 
 Cardiologist: 3

You would never get Cardiologist: 3 as the facet result, because if
Cardiologist would be in your index, it's impossible to find it when
searching for cardiology (except when you manage to write some strange
tokenizer that translates cardiology to Cardiologist on query time,
including the upper case letter).

Facets are always taken from the index, so they usually match exactly or
never when querying for it.

-Kuli


Re: Problem with field collapsing of patched Solr 1.4

2011-06-22 Thread Thalaiselvam
Hi,

Iam using solr collapse, it is working perfectly with default sorting
(score), when we give the more than fileld in sort with pagination, it
through incorrect result. 

Could any one help to solve this?.

Thanks in advance...

Regards
Thalaiselvam N


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-field-collapsing-of-patched-Solr-1-4-tp2678850p3094555.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MultiValued facet behavior question

2011-06-22 Thread Bill Bell
Here is an example using exampledocs and trunk 4.0:

http://localhost:8983/solr/select/?q=cat:%22hard%20drive%22version=2.2sta
rt=0rows=10indent=onfacet=truefacet.field=catfacet.query={!lucene}cat:
%22hard%20drive%22facet.mincount=1

Results:

result name=response numFound=2 start=0
Etc
lst name=facet_queries
int name={!lucene}cat:hard drive2/int
/lst
lst name=facet_fields
lst name=cat
int name=electronics2/int
int name=hard drive2/int
/lst/lst

Notice that the facet_queries count 2 is the same as the the numFound=2.

But I have no way to use facet.field to count the matches.

The algorithm -

Loop through multiValued field and match on hard drive. Ignore other
values in there when setting the facet list




On 6/22/11 1:19 AM, Dennis de Boer datdeb...@gmail.com wrote:

Hi Bill,

yes, you absolutely do make sense. I posted the exact same question to
this
mailing list (subject: faceting on multivalued fields), but got no
response
out of it. A friend of mine is now helping out.

I hope someone on the list can give us some advice. I'll post our findings
to this topic.

Regards,
Dennis


On Wed, Jun 22, 2011 at 5:37 AM, Bill Bell billnb...@gmail.com wrote:

 Doing it with q=specialities:Cardiologist or
 q=CardiologistdefType=dismaxqf=specialties
 does not matter, the issue is how I see facets. I want the facets to
only
 show the one match,
 and not all the multiValued fields in specialties that match...

 Example,

 Name|specialties
 Bell|Cardiologist
 Smith|Cardiologist,Family Doctor
 Adams,Cardiologist,Family Doctor,Internist

 When I facet.field=specialties I get:

 Cardiologist: 3
 Internist: 1
 Family Doctor: 1


 I only want it to return:

 Cardiologist: 3

 Because this matches exactly... Facet on the field that matches and only
 return the number for that.

 It can get more complicated. Here is another example:

 q=cardiologydefType=dismaxqf=specialties


 (Cardiology and cardiologist are stems)...

 But I don't really know which value in Cardiologist match perfectly.

 Again, I only want it to return:

 Cardiologist: 3

 If I searched on q=internistdefType=dismaxqf=specialties, I want the
 result to be:


 Internist: 1


 Does this all make sense?







 On 6/21/11 8:23 PM, Darren Govoni dar...@ontrenet.com wrote:

 So are you saying that for all results for cardiologist,
 you don't want facets not matching Cardiologist to be
 returned as facets?
 
 what happens when you make q=specialities:Cardiologist?
 instead of just q=Cardiologist?
 
 Seems that if you make the query on the field, then all
 your results will necessarily qualify and you can discard
 any additional facets you don't want (e.g. that don't
 match the initial query term).
 
 Maybe you can write what you see now, with what you
 want to help clarify.
 
 On 06/21/2011 09:47 PM, Bill Bell wrote:
  I have a field: specialties that is multiValued.
 
  It indicates the doctor's specialties: cardiologist, internist, etc.
 
  When someone does a search: Cardiologist, I use
 
 
q=cardiologistdefType=dismaxqf=specialtiesfacet=truefacet.field=spe
ci
 alt
  ies
 
  What I want to come out in the facet is the Cardiologist (since it
 matches
  exactly) and the number that matches: 700.
  I don't want to see the other values that are not Cardiologist.
 
  Now I see:
 
  Cardiologist: 700
  Internist: 45
  Family Doctor: 20
 
  This means that several Cardiologist's are also internists and family
  doctors. When it matches exactly, I don't want to see Internists,
Family
  Doctors. How do I send a query to Solr with a condition.
  Facet.query=specialties:Cardiologistfacet.field=specialties
 
  Then if the query returns something use it, otherwise use the field
one?
 
  Other ideas?
 
 
 
 
 







Re: MultiValued facet behavior question

2011-06-22 Thread Bill Bell
You can type q=cardiology and match on cardiologist. If stemming did not
work you can just add a synonym:

cardiology,cardiologist

But that is not the issue. The issue is around multiValue fields and
facets. You would expect a user
Who is searching on the multiValued field to match on some values in
there. For example,
they type Cardiologist and it matches on the value Cardiologist. So it
matches in the multiValue field.
So that part works. Then when I output the facet, I need a different
behavior than the default. I need
The facet to only output the value that matches (scored) - NOT ALL VALUES
in the multiValued field.

I think it makes sense?


On 6/22/11 1:42 AM, Michael Kuhlmann s...@kuli.org wrote:

Am 22.06.2011 05:37, schrieb Bill Bell:
 It can get more complicated. Here is another example:
 
 q=cardiologydefType=dismaxqf=specialties
 
 
 (Cardiology and cardiologist are stems)...
 
 But I don't really know which value in Cardiologist match perfectly.
 
 Again, I only want it to return:
 
 Cardiologist: 3

You would never get Cardiologist: 3 as the facet result, because if
Cardiologist would be in your index, it's impossible to find it when
searching for cardiology (except when you manage to write some strange
tokenizer that translates cardiology to Cardiologist on query time,
including the upper case letter).

Facets are always taken from the index, so they usually match exactly or
never when querying for it.

-Kuli




Re: MultiValued facet behavior question

2011-06-22 Thread lee carroll
Hi Bill, can you explain a little bit more around why you need this.
Knowing the motivation
might suggest a different solution not just involving faceting.



On 22 June 2011 08:49, Bill Bell billnb...@gmail.com wrote:
 You can type q=cardiology and match on cardiologist. If stemming did not
 work you can just add a synonym:

 cardiology,cardiologist

 But that is not the issue. The issue is around multiValue fields and
 facets. You would expect a user
 Who is searching on the multiValued field to match on some values in
 there. For example,
 they type Cardiologist and it matches on the value Cardiologist. So it
 matches in the multiValue field.
 So that part works. Then when I output the facet, I need a different
 behavior than the default. I need
 The facet to only output the value that matches (scored) - NOT ALL VALUES
 in the multiValued field.

 I think it makes sense?


 On 6/22/11 1:42 AM, Michael Kuhlmann s...@kuli.org wrote:

Am 22.06.2011 05:37, schrieb Bill Bell:
 It can get more complicated. Here is another example:

 q=cardiologydefType=dismaxqf=specialties


 (Cardiology and cardiologist are stems)...

 But I don't really know which value in Cardiologist match perfectly.

 Again, I only want it to return:

 Cardiologist: 3

You would never get Cardiologist: 3 as the facet result, because if
Cardiologist would be in your index, it's impossible to find it when
searching for cardiology (except when you manage to write some strange
tokenizer that translates cardiology to Cardiologist on query time,
including the upper case letter).

Facets are always taken from the index, so they usually match exactly or
never when querying for it.

-Kuli





Re: MultiValued facet behavior question

2011-06-22 Thread Dennis de Boer
Hi Bill,

as far as I understood now, with the help of my friend, you can't.
Multivalued fields don't work that way.
You can however always filter the facet results manually in the JSP. You
knwo what the user chose as a facet.

The issue I ran into is when you have additional facet fields. For example
when you also have country as a facetfield. Now when you search for
Cardiologist, it also returns Internist and family doctor as you described.
What Sorl now also returns for the country list are the countries for
Cardiologist, but also for Internist  and family doctor. This is not what
you want.

I don't think what we want her is supported out of the box by solr.


Regards,
Dennis


On Wed, Jun 22, 2011 at 9:49 AM, Bill Bell billnb...@gmail.com wrote:

 You can type q=cardiology and match on cardiologist. If stemming did not
 work you can just add a synonym:

 cardiology,cardiologist

 But that is not the issue. The issue is around multiValue fields and
 facets. You would expect a user
 Who is searching on the multiValued field to match on some values in
 there. For example,
 they type Cardiologist and it matches on the value Cardiologist. So it
 matches in the multiValue field.
 So that part works. Then when I output the facet, I need a different
 behavior than the default. I need
 The facet to only output the value that matches (scored) - NOT ALL VALUES
 in the multiValued field.

 I think it makes sense?


 On 6/22/11 1:42 AM, Michael Kuhlmann s...@kuli.org wrote:

 Am 22.06.2011 05:37, schrieb Bill Bell:
  It can get more complicated. Here is another example:
 
  q=cardiologydefType=dismaxqf=specialties
 
 
  (Cardiology and cardiologist are stems)...
 
  But I don't really know which value in Cardiologist match perfectly.
 
  Again, I only want it to return:
 
  Cardiologist: 3
 
 You would never get Cardiologist: 3 as the facet result, because if
 Cardiologist would be in your index, it's impossible to find it when
 searching for cardiology (except when you manage to write some strange
 tokenizer that translates cardiology to Cardiologist on query time,
 including the upper case letter).
 
 Facets are always taken from the index, so they usually match exactly or
 never when querying for it.
 
 -Kuli





Re: MultiValued facet behavior question

2011-06-22 Thread Michael Kuhlmann
Am 22.06.2011 09:49, schrieb Bill Bell:
 You can type q=cardiology and match on cardiologist. If stemming did not
 work you can just add a synonym:
 
 cardiology,cardiologist

Okay, synonyms are the only way I can think of a realistic match.

Stemming won't work on a facet field; you wouldn't get Cardiologist: 3
as the result but cardiolog: 3 or something like that instead.

Normally, you use declare facet field explicitly for facetting, and not
for searching, exactly because stemming and tokenizing on facet fields
don't make sense.

And the short answer is: No, that's not possible.

-Kuli


Understanding query explain information

2011-06-22 Thread Alexander Ramos Jardim
Hi guys,

I am getting some doubts about how to correctly understand the debugQuery
output. I have a field named itemName in my index. This is a text field,
just that. When I quqery a simple ?q=itemName:iPad , I end up with the
following query result.

Simply trying to understand why these strings generated such scores, and as
far as I can understand, the only difference between them is the field
norms, as all the other results maintain themselves.

Now, how do I get these field norm values? Field Norm is the result of this
formula right?

*1/square root of (terms)*,* where terms is the number of terms in my field
 after it is indexed*


Well, if this is true, the field norm for my first document should be 0.5
(1/sqrt(4)) as  Livro - IPAD - O Guia do Profissional ends up with the
terms livro|ipad|guia|profissional as tokens.

What I am forgetting to take into account?

?xml version=1.0 encoding=UTF-8?
response

lst name=responseHeader
 int name=status0/int
 int name=QTime3/int
 lst name=params
  str name=debugQueryon/str
  str name=start0/str

  str name=rows10/str
  arr name=indent
stron/str
stron/str
  /arr
  str name=flitemName,score/str
  str name=version2.2/str

  str name=qitemName:ipad/str
 /lst
/lst
result name=response numFound=161 start=0 maxScore=3.6808658
 doc
  float name=score3.6808658/float
  str name=itemNameLivro - IPAD - O Guia do Profissional/str
 /doc

 doc
  float name=score3.1550279/float
  str name=itemNameLeitor de Cartão para Ipad - Mobimax/str
 /doc
 doc
  float name=score3.1550279/float
  str name=itemNameSleeve para iPad/str

 /doc
 doc
  float name=score3.1550279/float
  str name=itemNameSleeve de Neoprene para iPad/str
 /doc
 doc
  float name=score3.1550279/float

  str name=itemNameCarregador de parede para iPad/str
 /doc
 doc
  float name=score2.6291897/float
  str name=itemNameCase Envelope para iPad - Black - Built NY/str
 /doc
 doc

  float name=score2.6291897/float
  str name=itemNameCase Protetora p/ IPad de Silicone Duo - Browm
- Iskin/str
 /doc
 doc
  float name=score2.6291897/float
  str name=itemNameCase Protetora p/ IPad de Silicone Duo - Clear
- Iskin/str
 /doc

 doc
  float name=score2.6291897/float
  str name=itemNameCase p/ iPad Sleeve - Black - Built NY/str
 /doc
 doc
  float name=score2.6291897/float
  str name=itemNameBolsa de Proteção p/ iPad Preta - Geonav/str

 /doc
/result
lst name=debug
 str name=rawquerystringitemName:ipad/str
 str name=querystringitemName:ipad/str
 str name=parsedqueryitemName:ipad/str
 str name=parsedquery_toStringitemName:ipad/str
 lst name=explain

  str name=7369507
3.6808658 = (MATCH) fieldWeight(itemName:ipad in 102507), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.4375 = fieldNorm(field=itemName, doc=102507)
/str
  str name=739
3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226401), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.375 = fieldNorm(field=itemName, doc=226401)
/str
  str name=7356941
3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226409), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.375 = fieldNorm(field=itemName, doc=226409)
/str
  str name=7356931
3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226447), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.375 = fieldNorm(field=itemName, doc=226447)
/str
  str name=7360321

3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226583), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.375 = fieldNorm(field=itemName, doc=226583)
/str
  str name=7428354
2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223178), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = fieldNorm(field=itemName, doc=223178)
/str
  str name=7366074
2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223196), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = fieldNorm(field=itemName, doc=223196)
/str
  str name=7366068
2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223831), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = fieldNorm(field=itemName, doc=223831)
/str
  str name=7428358
2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223856), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = fieldNorm(field=itemName, doc=223856)

/str
  str name=7422680
2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223908), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = fieldNorm(field=itemName, doc=223908)
/str
 /lst
 str name=QParserLuceneQParser/str
 lst name=timing
  double name=time3.0/double
  lst name=prepare

double name=time1.0/double

Re: char sets accepted via xml

2011-06-22 Thread Tom Gross

Hi,

I also have this issue with Solr 3.2.0. It is probably this:
https://issues.apache.org/jira/browse/SOLR-2381

Tom

On 06/15/2011 02:09 PM, Mark Cunningham wrote:

Hi,

If you submit information to solr using xml, does the server assume you're
using unicode encoded in utf8? And does it accept the whole range of
possible characters in unicode? (For example, characters that require
multiple bytes when encoded in utf-8).

I'm getting quite a few Invalid UTF-8 middle byte 0x20 (at char #408, byte
#-1) errors (with different bytes/characters) that seem to be coming from
characters such as the trademark symbol or registered or some characters
that look like normal characters (such as a dash). It comes out as UTF-8
code units (E2 80 93) using this very handy website
http://rishida.net/tools/conversion/

I tried inserting?xml version=1.0 encoding=utf-8?  at the start of the
xml however this didn't seem to make much difference.

Anyone else have these issues or know what they might be coming from?

Mark




--
Auther of the book Plone 3 Multimedia - http://amzn.to/dtrp0C

Tom Gross
email.@toms-projekte.de
skype.tom_gross
web.http://toms-projekte.de
blog...http://blog.toms-projekte.de



Conflict in wildcard query and spellchecker in solr search

2011-06-22 Thread Romi
Using solr search when i search for rin* it run wildcard query and i get the
result for ring but when i search for Rin* it run spellchecker and then
gives the result for ring. why so ?? please explain

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Conflict-in-wildcard-query-and-spellchecker-in-solr-search-tp3095198p3095198.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Conflict in wildcard query and spellchecker in solr search

2011-06-22 Thread Markus Jelsma
Wildcard queries are not analyzed. Lowercase your query beforehand.

On Wednesday 22 June 2011 14:08:48 Romi wrote:
 Using solr search when i search for rin* it run wildcard query and i get
 the result for ring but when i search for Rin* it run spellchecker and
 then gives the result for ring. why so ?? please explain
 
 -
 Thanks  Regards
 Romi
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Conflict-in-wildcard-query-and-spellche
 cker-in-solr-search-tp3095198p3095198.html Sent from the Solr - User
 mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-22 Thread Surendra
Hi Chris ,Andreas

I have upgraded to solr 3.2 ... everything seems fine now. I will have to
integrate this to my application and observe if any further issues...again
thanks for your patience and time...

--Surendra




Re: MultiValued facet behavior question

2011-06-22 Thread lee carroll
Hi Bill,

So that part works. Then when I output the facet, I need a different
behavior than the default. I need
The facet to only output the value that matches (scored) - NOT ALL VALUES
in the multiValued field.

I think it makes sense?

Why do you need this ? If your use case is faceted navigation then not showing
all the facet terms which match your query would be mis-leading to your users.
The fact is your data indicates Ben the cardiologist is also a GP etc.
Is it not valid for
your users to be able to further filter on cardiologists who are also
specialists in x other disciplines ? If the specialisms are mutually
exclusive then your data will reflect this.

The fact is x number of cardiologists match and x number of GP's match etc

I may be missing the point here as you have not said why you need to do this ?

cheers lee c


On 22 June 2011 09:34, Michael Kuhlmann s...@kuli.org wrote:
 Am 22.06.2011 09:49, schrieb Bill Bell:
 You can type q=cardiology and match on cardiologist. If stemming did not
 work you can just add a synonym:

 cardiology,cardiologist

 Okay, synonyms are the only way I can think of a realistic match.

 Stemming won't work on a facet field; you wouldn't get Cardiologist: 3
 as the result but cardiolog: 3 or something like that instead.

 Normally, you use declare facet field explicitly for facetting, and not
 for searching, exactly because stemming and tokenizing on facet fields
 don't make sense.

 And the short answer is: No, that's not possible.

 -Kuli



Re: Conflict in wildcard query and spellchecker in solr search

2011-06-22 Thread Romi
* fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/

filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SynonymFilterFactory 
synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
filter class=solr.ReversedWildcardFilterFactory 
withOriginal=true
maxPosAsterisk=6 maxPosQuestion=2 maxFractionAsterisk=0.33/ 
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/


  /analyzer
/fieldType*

I am using this fieldtype and applied these filters. for wildcard searches 
do i need to include some more filters or what other configurations are
needed

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Conflict-in-wildcard-query-and-spellchecker-in-solr-search-tp3095198p3095290.html
Sent from the Solr - User mailing list archive at Nabble.com.


Search is taking long-long time.

2011-06-22 Thread Mohammad Shariq
I am running two solrShards. I have indexed 100 million docs in each shard (
each are 50 GB and only 'id' is stored).
My search have became very slow. Its taking around 2-3 seconds.
below is my query :

http://solrHost1:8080/solr/select?shards=solrHost1:8080/solr,solrHost2:8080/solrq=
QUERYfq=FilterQueryfl=idstart=0rows=100indent=onsort=time desc

QUERY and FilterQuery is below :

QUERY = Online Shopping AND ( Amex OR Am ex OR American express OR
americanexpress )
FilterQuery = time:[1308659371 TO 1308745771] AND category:news AND
lang:English

How to boost the query perfomance.
default search filed is title( text).


-- 
Thanks and Regards
Mohammad Shariq


Re: Conflict in wildcard query and spellchecker in solr search

2011-06-22 Thread Markus Jelsma
No, wildcard queries are not analyzed. They are not _passed_ through your 
analyzers. If you lowercase at index-time, you must lowercase it outside of 
Solr before sending a query.

On Wednesday 22 June 2011 14:35:12 Romi wrote:
 * fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer type=index
 tokenizer class=solr.WhitespaceTokenizerFactory/
 
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=1
 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
   filter class=solr.SynonymFilterFactory 
 synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
   filter class=solr.ReversedWildcardFilterFactory 
 withOriginal=true
 maxPosAsterisk=6 maxPosQuestion=2 maxFractionAsterisk=0.33/
   /analyzer
   analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
 filter class=solr.StopFilterFactory ignoreCase=true
 words=stopwords.txt/
 filter class=solr.WordDelimiterFilterFactory
 generateWordParts=1 generateNumberParts=1 catenateWords=0
 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EnglishPorterFilterFactory
 protected=protwords.txt/
 filter class=solr.RemoveDuplicatesTokenFilterFactory/
 
 
   /analyzer
 /fieldType*
 
 I am using this fieldtype and applied these filters. for wildcard searches
 do i need to include some more filters or what other configurations are
 needed
 
 -
 Thanks  Regards
 Romi
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Conflict-in-wildcard-query-and-spellche
 cker-in-solr-search-tp3095198p3095290.html Sent from the Solr - User
 mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: MultiValued facet behavior question

2011-06-22 Thread Dennis de Boer
Hi Lee,

since I have the same problem, I might as well try to answer this question.

You want this behaviour to make things clear for your users. If they select
cardiologists, does it make sense to also show family doctors as a
facetvalue to the user.
The same thing goed for the facets that are related to family doctors. They
are returned as well, thus making it even moren unclear for the end-user.



On Wed, Jun 22, 2011 at 2:27 PM, lee carroll
lee.a.carr...@googlemail.comwrote:

 Hi Bill,

 So that part works. Then when I output the facet, I need a different
 behavior than the default. I need
 The facet to only output the value that matches (scored) - NOT ALL VALUES
 in the multiValued field.

 I think it makes sense?

 Why do you need this ? If your use case is faceted navigation then not
 showing
 all the facet terms which match your query would be mis-leading to your
 users.
 The fact is your data indicates Ben the cardiologist is also a GP etc.
 Is it not valid for
 your users to be able to further filter on cardiologists who are also
 specialists in x other disciplines ? If the specialisms are mutually
 exclusive then your data will reflect this.

 The fact is x number of cardiologists match and x number of GP's match etc

 I may be missing the point here as you have not said why you need to do
 this ?

 cheers lee c


 On 22 June 2011 09:34, Michael Kuhlmann s...@kuli.org wrote:
  Am 22.06.2011 09:49, schrieb Bill Bell:
  You can type q=cardiology and match on cardiologist. If stemming did not
  work you can just add a synonym:
 
  cardiology,cardiologist
 
  Okay, synonyms are the only way I can think of a realistic match.
 
  Stemming won't work on a facet field; you wouldn't get Cardiologist: 3
  as the result but cardiolog: 3 or something like that instead.
 
  Normally, you use declare facet field explicitly for facetting, and not
  for searching, exactly because stemming and tokenizing on facet fields
  don't make sense.
 
  And the short answer is: No, that's not possible.
 
  -Kuli
 



Re: Conflict in wildcard query and spellchecker in solr search

2011-06-22 Thread Romi
how can I lowercase query outside of Solr before sending a query?

-
Thanks  Regards
Romi
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Conflict-in-wildcard-query-and-spellchecker-in-solr-search-tp3095198p3095345.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Understanding query explain information

2011-06-22 Thread lee carroll
Hi are you using synonyms ?



On 22 June 2011 10:30, Alexander Ramos Jardim
alexander.ramos.jar...@gmail.com wrote:
 Hi guys,

 I am getting some doubts about how to correctly understand the debugQuery
 output. I have a field named itemName in my index. This is a text field,
 just that. When I quqery a simple ?q=itemName:iPad , I end up with the
 following query result.

 Simply trying to understand why these strings generated such scores, and as
 far as I can understand, the only difference between them is the field
 norms, as all the other results maintain themselves.

 Now, how do I get these field norm values? Field Norm is the result of this
 formula right?

 *1/square root of (terms)*,* where terms is the number of terms in my field
 after it is indexed*


 Well, if this is true, the field norm for my first document should be 0.5
 (1/sqrt(4)) as  Livro - IPAD - O Guia do Profissional ends up with the
 terms livro|ipad|guia|profissional as tokens.

 What I am forgetting to take into account?

 ?xml version=1.0 encoding=UTF-8?
 response

 lst name=responseHeader
  int name=status0/int
  int name=QTime3/int
  lst name=params
  str name=debugQueryon/str
  str name=start0/str

  str name=rows10/str
  arr name=indent
        stron/str
        stron/str
  /arr
  str name=flitemName,score/str
  str name=version2.2/str

  str name=qitemName:ipad/str
  /lst
 /lst
 result name=response numFound=161 start=0 maxScore=3.6808658
  doc
  float name=score3.6808658/float
  str name=itemNameLivro - IPAD - O Guia do Profissional/str
  /doc

  doc
  float name=score3.1550279/float
  str name=itemNameLeitor de Cartão para Ipad - Mobimax/str
  /doc
  doc
  float name=score3.1550279/float
  str name=itemNameSleeve para iPad/str

  /doc
  doc
  float name=score3.1550279/float
  str name=itemNameSleeve de Neoprene para iPad/str
  /doc
  doc
  float name=score3.1550279/float

  str name=itemNameCarregador de parede para iPad/str
  /doc
  doc
  float name=score2.6291897/float
  str name=itemNameCase Envelope para iPad - Black - Built NY/str
  /doc
  doc

  float name=score2.6291897/float
  str name=itemNameCase Protetora p/ IPad de Silicone Duo - Browm
 - Iskin/str
  /doc
  doc
  float name=score2.6291897/float
  str name=itemNameCase Protetora p/ IPad de Silicone Duo - Clear
 - Iskin/str
  /doc

  doc
  float name=score2.6291897/float
  str name=itemNameCase p/ iPad Sleeve - Black - Built NY/str
  /doc
  doc
  float name=score2.6291897/float
  str name=itemNameBolsa de Proteção p/ iPad Preta - Geonav/str

  /doc
 /result
 lst name=debug
  str name=rawquerystringitemName:ipad/str
  str name=querystringitemName:ipad/str
  str name=parsedqueryitemName:ipad/str
  str name=parsedquery_toStringitemName:ipad/str
  lst name=explain

  str name=7369507
 3.6808658 = (MATCH) fieldWeight(itemName:ipad in 102507), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.4375 = fieldNorm(field=itemName, doc=102507)
 /str
  str name=739
 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226401), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.375 = fieldNorm(field=itemName, doc=226401)
 /str
  str name=7356941
 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226409), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.375 = fieldNorm(field=itemName, doc=226409)
 /str
  str name=7356931
 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226447), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.375 = fieldNorm(field=itemName, doc=226447)
 /str
  str name=7360321

 3.1550279 = (MATCH) fieldWeight(itemName:ipad in 226583), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.375 = fieldNorm(field=itemName, doc=226583)
 /str
  str name=7428354
 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223178), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = fieldNorm(field=itemName, doc=223178)
 /str
  str name=7366074
 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223196), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = fieldNorm(field=itemName, doc=223196)
 /str
  str name=7366068
 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223831), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = fieldNorm(field=itemName, doc=223831)
 /str
  str name=7428358
 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223856), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = fieldNorm(field=itemName, doc=223856)

 /str
  str name=7422680
 2.6291897 = (MATCH) fieldWeight(itemName:ipad in 223908), product of:
  1.0 = tf(termFreq(itemName:ipad)=1)
  8.413407 = idf(docFreq=165, maxDocs=275239)
  0.3125 = 

Re: MultiValued facet behavior question

2011-06-22 Thread Mike Sokolov


On 06/22/2011 04:01 AM, Dennis de Boer wrote:

Hi Bill,

as far as I understood now, with the help of my friend, you can't.
Multivalued fields don't work that way.
You can however always filter the facet results manually in the JSP. You
knwo what the user chose as a facet.
   
Yes - that is the most sensible suggestion: if you want to display the 
facets the user chose, and only those, regardless of what was found in 
the index, then I think you know what to do!

The issue I ran into is when you have additional facet fields. For example
when you also have country as a facetfield. Now when you search for
Cardiologist, it also returns Internist and family doctor as you described.
What Sorl now also returns for the country list are the countries for
Cardiologist, but also for Internist  and family doctor. This is not what
you want.
   
I don't think this is accurate.  Your query matches some set of 
documents - the facet values shown will only be those that occur in that 
set.  If some internist's countries are shown when the user selects 
Cardiologist, that is because those internists are aldo cardiologists, 
right?


-Mike


Tika Jax-RS and DIH

2011-06-22 Thread Tod

Mattmann, Chris A (388J chris.a.mattmann at jpl.nasa.gov writes:



 Hi Jo,

 You may consider checking out Tika trunk, where we recently have a Tika JAX-RS

web service [1] committed as

 part of the tika-server module. You could probably wire DIH into it and

accomplish the same thing.


 Cheers,
 Chris

 [1] https://issues.apache.org/jira/browse/TIKA-593



Chris - could you elaborate on using Tika Jax-RS and DIH?  How 
production ready is it?  Could you summarize the steps necessary to get 
it to work?  Any examples yet?


I'd be happy to work with you to get something out to the group.


Thanks - Tod


Re: Search is taking long-long time.

2011-06-22 Thread Ahmet Arslan
 I am running two solrShards. I have
 indexed 100 million docs in each shard (
 each are 50 GB and only 'id' is stored).
 My search have became very slow. Its taking around 2-3
 seconds.
 below is my query :
 
 http://solrHost1:8080/solr/select?shards=solrHost1:8080/solr,solrHost2:8080/solrq=
 QUERYfq=FilterQueryfl=idstart=0rows=100indent=onsort=time
 desc
 
 QUERY and FilterQuery is below :
 
 QUERY = Online Shopping AND ( Amex OR Am ex OR American
 express OR
 americanexpress )
 FilterQuery = time:[1308659371 TO 1308745771] AND
 category:news AND
 lang:English
 
 How to boost the query perfomance.
 default search filed is title( text).

If fieldType of time is not trie-based, you can change it to tdate, tint etc. 
For range queries.

If you don't update your index frequently, you can use separate filter queries 
(fq) for your clauses. To benefit from caching. 
fq=category:newsfq=lang:English

http://wiki.apache.org/solr/SolrPerformanceFactors
http://wiki.apache.org/lucene-java/ImproveSearchingSpeed


Re: MultiValued facet behavior question

2011-06-22 Thread lee carroll
Hi Dennis,

I think maybe I just disagree. Your not showing facet counts for
cardiologists and Family Doctors independently. The Family Doctor
count will be all Family Doctors who are also Cardiologists.

This allows users to further filter Cardiologists who are also family
Doctors. (this could be of use to them ??)

If your front end app implements the filtering as a list of fq=xxx
then that would make for consistent results ?

I don't see how not showing that some cardiologists are also Family
Doctors is a better user experience... But again you might have a very
specific use case?

On 22 June 2011 13:44, Dennis de Boer datdeb...@gmail.com wrote:
 Hi Lee,

 since I have the same problem, I might as well try to answer this question.

 You want this behaviour to make things clear for your users. If they select
 cardiologists, does it make sense to also show family doctors as a
 facetvalue to the user.
 The same thing goed for the facets that are related to family doctors. They
 are returned as well, thus making it even moren unclear for the end-user.



 On Wed, Jun 22, 2011 at 2:27 PM, lee carroll
 lee.a.carr...@googlemail.comwrote:

 Hi Bill,

 So that part works. Then when I output the facet, I need a different
 behavior than the default. I need
 The facet to only output the value that matches (scored) - NOT ALL VALUES
 in the multiValued field.

 I think it makes sense?

 Why do you need this ? If your use case is faceted navigation then not
 showing
 all the facet terms which match your query would be mis-leading to your
 users.
 The fact is your data indicates Ben the cardiologist is also a GP etc.
 Is it not valid for
 your users to be able to further filter on cardiologists who are also
 specialists in x other disciplines ? If the specialisms are mutually
 exclusive then your data will reflect this.

 The fact is x number of cardiologists match and x number of GP's match etc

 I may be missing the point here as you have not said why you need to do
 this ?

 cheers lee c


 On 22 June 2011 09:34, Michael Kuhlmann s...@kuli.org wrote:
  Am 22.06.2011 09:49, schrieb Bill Bell:
  You can type q=cardiology and match on cardiologist. If stemming did not
  work you can just add a synonym:
 
  cardiology,cardiologist
 
  Okay, synonyms are the only way I can think of a realistic match.
 
  Stemming won't work on a facet field; you wouldn't get Cardiologist: 3
  as the result but cardiolog: 3 or something like that instead.
 
  Normally, you use declare facet field explicitly for facetting, and not
  for searching, exactly because stemming and tokenizing on facet fields
  don't make sense.
 
  And the short answer is: No, that's not possible.
 
  -Kuli
 




Re: MultiValued facet behavior question

2011-06-22 Thread Dennis de Boer
Well, the use case is rather simple. It is not a use case but more auser
experience.

If I have a list of values I can facet on, for example :
A
B
C
D
E

And I click on B, does it make sense for the user to display
B
C
E

after the selection ? Just because items in B are C and E items as well?
As A user I chose B because I'm interested in B items. I do not care if they
are also C and E items.
Technically this is correct, but functional wise, the user doesn't care
because it is not what they searched for.

In this case they were searching for a Cardiologists. Do I care that a
cardiologist is also a family doctor? No. So I also do not want to see this
as a facet value presented to me in frontend logic.
In the item details you can show that the cardiologist is also a family
doctor. That is fine, but not as an availbale facet option, if you just
chose an speciality you want to filter on.

Does it make sense?


On Wed, Jun 22, 2011 at 3:31 PM, lee carroll
lee.a.carr...@googlemail.comwrote:

 Hi Dennis,

 I think maybe I just disagree. Your not showing facet counts for
 cardiologists and Family Doctors independently. The Family Doctor
 count will be all Family Doctors who are also Cardiologists.

 This allows users to further filter Cardiologists who are also family
 Doctors. (this could be of use to them ??)

 If your front end app implements the filtering as a list of fq=xxx
 then that would make for consistent results ?

 I don't see how not showing that some cardiologists are also Family
 Doctors is a better user experience... But again you might have a very
 specific use case?

 On 22 June 2011 13:44, Dennis de Boer datdeb...@gmail.com wrote:
  Hi Lee,
 
  since I have the same problem, I might as well try to answer this
 question.
 
  You want this behaviour to make things clear for your users. If they
 select
  cardiologists, does it make sense to also show family doctors as a
  facetvalue to the user.
  The same thing goed for the facets that are related to family doctors.
 They
  are returned as well, thus making it even moren unclear for the end-user.
 
 
 
  On Wed, Jun 22, 2011 at 2:27 PM, lee carroll
  lee.a.carr...@googlemail.comwrote:
 
  Hi Bill,
 
  So that part works. Then when I output the facet, I need a different
  behavior than the default. I need
  The facet to only output the value that matches (scored) - NOT ALL
 VALUES
  in the multiValued field.
 
  I think it makes sense?
 
  Why do you need this ? If your use case is faceted navigation then not
  showing
  all the facet terms which match your query would be mis-leading to your
  users.
  The fact is your data indicates Ben the cardiologist is also a GP etc.
  Is it not valid for
  your users to be able to further filter on cardiologists who are also
  specialists in x other disciplines ? If the specialisms are mutually
  exclusive then your data will reflect this.
 
  The fact is x number of cardiologists match and x number of GP's match
 etc
 
  I may be missing the point here as you have not said why you need to do
  this ?
 
  cheers lee c
 
 
  On 22 June 2011 09:34, Michael Kuhlmann s...@kuli.org wrote:
   Am 22.06.2011 09:49, schrieb Bill Bell:
   You can type q=cardiology and match on cardiologist. If stemming did
 not
   work you can just add a synonym:
  
   cardiology,cardiologist
  
   Okay, synonyms are the only way I can think of a realistic match.
  
   Stemming won't work on a facet field; you wouldn't get Cardiologist:
 3
   as the result but cardiolog: 3 or something like that instead.
  
   Normally, you use declare facet field explicitly for facetting, and
 not
   for searching, exactly because stemming and tokenizing on facet fields
   don't make sense.
  
   And the short answer is: No, that's not possible.
  
   -Kuli
  
 
 



Re: Search is taking long-long time.

2011-06-22 Thread Mohammad Shariq
this is how my 'time' field looks in schema :
field name=time type=tint indexed=true stored=false/

and also, I am doing frequent Update to Solr (every 5 minuts).


On 22 June 2011 18:41, Ahmet Arslan iori...@yahoo.com wrote:

  I am running two solrShards. I have
  indexed 100 million docs in each shard (
  each are 50 GB and only 'id' is stored).
  My search have became very slow. Its taking around 2-3
  seconds.
  below is my query :
 
 
 http://solrHost1:8080/solr/select?shards=solrHost1:8080/solr,solrHost2:8080/solrq=
  QUERYfq=FilterQueryfl=idstart=0rows=100indent=onsort=time
  desc
 
  QUERY and FilterQuery is below :
 
  QUERY = Online Shopping AND ( Amex OR Am ex OR American
  express OR
  americanexpress )
  FilterQuery = time:[1308659371 TO 1308745771] AND
  category:news AND
  lang:English
 
  How to boost the query perfomance.
  default search filed is title( text).

 If fieldType of time is not trie-based, you can change it to tdate, tint
 etc. For range queries.

 If you don't update your index frequently, you can use separate filter
 queries (fq) for your clauses. To benefit from caching.
 fq=category:newsfq=lang:English

 http://wiki.apache.org/solr/SolrPerformanceFactors
 http://wiki.apache.org/lucene-java/ImproveSearchingSpeed




-- 
Thanks and Regards
Mohammad Shariq


RE: MultiValued facet behavior question

2011-06-22 Thread Bob Sandiford
Hi, Bill (and others).

I post this for what it's worth - it's a very specialized resolution we wrote 
to a similar requirement that may help with your (and similar) requirements.

Caveats abound [1]

We're running 3.1.

We wanted to be able to return facets which matched on our actual search, 
rather than all facets from the entire result set.  For example, if a user 
searches for author 'Twain', we present to them a list of facets which match 
'Twain', and exclude facets where 'Twain' is not found.  (Now - we don't tell 
our users that these are 'facet' values - we just present an alpha-sorted list 
of author names with a count of associated documents) So, we search our Author 
search field to identify matching documents, get all the facets (i.e. normal 
Solr processing to this point), and then filter that facet set to include only 
those that match our original search.


We added our own extra facet parameter (facet.sirsidynix.filter.facets) to 
instruct Solr when to do this special facet filtering. We modified SimpleFacets 
method getTermCounts right before the final return counts; like this:

  // Custom SirsiDynix code.
  if (params.getBool(FacetParams.FACET_SIRSIDYNIX_FILTER_FACETS, false))
  {
  counts = filterCounts(field, counts);
  }
return counts;

and added method 'filterCounts()' which is this class, basically wrapping 
things up to run the search against each facet value, setting up MemoryIndex 
instances based on our schema, inserting the facet value, and running our 
original query against the MemoryIndex.  Anything that matches has a score  0, 
and those are the only ones we keep:

/**
 * Custom SirsiDynix code:
 * Filters counts down to only those entries that match the original
 * query.  Does this by using lucene's MemoryIndex - a very fast, in-memory,
 * single document index that can have queries run against it.
 * For each string value in count, we create a MemoryIndex and run the
 * original query against it.  Anything with a score  0 means a 'hit', so
 * the value matches the original query, and we'll retain it.  Score 0 means
 * no hit (i.e. was a facet value that was associated with a document that 
matched
 * the query, but the facet value itself didn't match the query).
 * @param field name of the field that the facet values came from.
 * @param counts Lucene's list of facet values.
 * @return filtered set, only those matching the original query.
 */
private NamedList filterCounts(String field, NamedList counts)
{
if (!field.endsWith(_facet))
{
return counts;
}
// Trim off _facet
String fieldBase = field.substring(0,field.length() - 6);
// Builds fields to search against.
// Note that original came from (e.g.) AUTHOR_facet.
// And, original search would have been for INITIAL_AUTHOR_SRCH_boost 
as well as
// SUBSEQUENT_AUTHOR_SRCH_boost (and fuzzy's).  However, we're only 
searching
// one string at a time, so we'll shove it into the single-values 
INITIAL_xxx
// fields.  That will be good enough for the Query to be able to 
correctly
// evaluate against the document.
String fieldBoost = INITIAL_ + fieldBase + _SRCH_boost;
String fieldFuzzy = INITIAL_ + fieldBase + _SRCH_fuzzy;
NamedList newCounts = new NamedList();

IndexSchema schema = searcher.getSchema();
SchemaField schemaField = schema.getField(fieldBoost);
FieldType fieldType = schemaField.getType();
Analyzer fieldAnalyzer = fieldType.getAnalyzer();

SchemaField schemaFuzzyField = schema.getField(fieldFuzzy);
FieldType fuzzyFieldType = schemaFuzzyField.getType();
Analyzer fuzzyFieldAnalyzer = fuzzyFieldType.getAnalyzer();

for (int i = 0; i  counts.size(); i++)
{
String testValue = counts.getName(i);
MemoryIndex index = new MemoryIndex();
index.addField(fieldBoost, testValue, fieldAnalyzer);
index.addField(fieldFuzzy, testValue, fuzzyFieldAnalyzer);
float score = index.search(rb.getQuery());
if (score  0.0f)
{
newCounts.add(testValue, counts.getVal(i));
}
}

return newCounts;
}

A bit of explanation on our schema will be in order here.

1) We've suffixed all our facet fields with _facet - hence that first if 
statement.
2) We have matching 'searchable' and 'facet' fields, names basically differ 
only in the suffix.  So, we strip off '_facet' and append '_boost' and '_fuzzy' 
(our two field types for searching against (and possibly applying boosts), and 
doing fuzzy matching against).  (You'll see it's not exactly that - but you can 
hopefully modify your version to match your schema)  Basically the idea is that 
we can derive the field name(s) against which the original search was issued 
from the 

Re: upgrading to Tika 0.9 on Solr 1.4.1

2011-06-22 Thread Mattmann, Chris A (388J)
Glad it worked out!

Cheers,
Chris

On Jun 22, 2011, at 5:14 AM, Surendra wrote:

 Hi Chris ,Andreas
 
 I have upgraded to solr 3.2 ... everything seems fine now. I will have to
 integrate this to my application and observe if any further issues...again
 thanks for your patience and time...
 
 --Surendra
 
 


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++



Re: Weird optimize performance degradation

2011-06-22 Thread Santiago Bazerque
Thanks for your answers Erick  Mohammad!

I'll get back to the list if I have more specific info about this issue, so
far the index is performing normally again.

Best,
Santiago

On Mon, Jun 20, 2011 at 9:29 AM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, that is odd, anyone else want to chime in here?

 But optimizing isn't going to help with the strange commit
 times, it'll only make it worse. It's not doing you much if
 any good, so I'd think about not optimizing

 About the commit times in general. Depending upon when the
 merge happens, lots of work can go on under the covers.

 Here's a detailed look at merging...
 http://juanggrande.wordpress.com/2011/02/07/merge-policy-internals/

 But the short form is that, depending upon the number of
 segments and the merge policy, you may periodically hit
 a commit that copies, perhaps,  #all# the current segments
 into a single segment, which will create a large pause.

 But it's always possible that something's wonky with documents
 that have a very large number of fields.

 There's some interesting work being done on trunk to flatten
 out this curve, but that's not going to do you much good
 in the 3.x code line...

 Best
 Erick

 On Sun, Jun 19, 2011 at 10:32 AM, Santiago Bazerque sbazer...@gmail.com
 wrote:
  Hello Erick, thanks for your answer!
 
  Yes, our over-optimization is mainly due to paranoia over these strange
  commit times. The long optimize time persisted in all the subsequent
  commits, and this is consistent with what we are seeing in other
 production
  indexes that have the same problem. Once the anomaly shows up, it never
  commits quickly again.
 
  I combed through the last 50k documents that were added before the first
  slow commit. I found one with a larger than usual number of fields
 (didn't
  write down the number, but it was a few thousands).
 
  I deleted it, and the following optimize was normal again (110 seconds).
 So
  I'm pretty sure a document with lots of fields is the cause of the
 slowdown.
 
  If that would be useful, I can do some further testing to confirm this
  hypothesis and send the document to the list.
 
  Thanks again for your answer.
 
  Best,
  Santiago
 
  On Sun, Jun 19, 2011 at 10:21 AM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
  First, there's absolutely no reason to optimize this often, if at all.
  Older
  versions of Lucene would search faster on an optimized index, but
  this is no longer necessary. Optimize will reclaim data from
  deleted documents, but is generally recommended to be performed
  fairly rarely, often at off-peak hours.
 
  Note that optimize will re-write your entire index into a single new
  segment,
  so following your pattern it'll take longer and longer each time.
 
  But the speed change happening at 500,000 documents is suspiciously
  close to the default mergeFactor of 10 X 50,000. Do subsequent
  optimizes (i.e. on the 750,000th document) still take that long? But
  this doesn't make sense because if you're optimizing instead of
  committing, each optimize should reduce your index to 1 segment and
  you'll never hit a merge.
 
  So I'm a little confused. If you're really optimizing every 50K docs,
 what
  I'd expect to see is successively longer times, and at the end of each
  optimize I'd expect there to be only one segment in your index.
 
  Are you sure you're not just seeing successively longer times on each
  optimize and just noticing it after 10?
 
  Best
  Erick
 
  On Sun, Jun 19, 2011 at 6:04 AM, Santiago Bazerque sbazer...@gmail.com
 
  wrote:
   Hello!
  
   Here is a puzzling experiment:
  
   I build an index of about 1.2MM documents using SOLR 3.1. The index
 has a
   large number of dynamic fields (about 15.000). Each document has about
  100
   fields.
  
   I add the documents in batches of 20, and every 50.000 documents I
  optimize
   the index.
  
   The first 10 optimizes (up to exactly 500k documents) take less than a
   minute and a half.
  
   But the 11th and all subsequent commits take north of 10 minutes. The
  commit
   logs look identical (in the INFOSTREAM.txt file), but what used to be
  
 Jun 19, 2011 4:03:59 AM IW 13 [Sun Jun 19 04:03:59 EDT 2011; Lucene
  Merge
   Thread #0]: merge: total 50 docs
  
   Jun 19, 2011 4:04:37 AM IW 13 [Sun Jun 19 04:04:37 EDT 2011; Lucene
 Merge
   Thread #0]: merge store matchedCount=2 vs 2
  
  
   now eats a lot of time:
  
  
 Jun 19, 2011 4:37:06 AM IW 14 [Sun Jun 19 04:37:06 EDT 2011; Lucene
  Merge
   Thread #0]: merge: total 55 docs
  
   Jun 19, 2011 4:46:42 AM IW 14 [Sun Jun 19 04:46:42 EDT 2011; Lucene
 Merge
   Thread #0]: merge store matchedCount=2 vs 2
  
  
   What could be happening between those two lines that takes 10 minutes
 at
   full CPU? (and with 50k docs less used to take so much less?).
  
  
   Thanks in advance,
  
   Santiago
  
 
 



SEVERE: java.lang.NoSuchFieldError: core Solr branch3.x

2011-06-22 Thread Markus Jelsma

Hi,

Today's checkout (Solr Specification Version: 3.4.0.2011.06.22.16.10.08) 
produces the exception below on start up. The same exception with very similar 
strack trace comes when committing and add. Example schema and docs will 
reproduce the error.

Jun 22, 2011 4:11:57 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NoSuchFieldError: core
at 
org.apache.lucene.index.SegmentTermDocs.init(SegmentTermDocs.java:48)
at 
org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:491)
at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:1005)
at 
org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:484)
at 
org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:321)
at 
org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:101)
at 
org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:298)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:524)
at 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:320)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1178)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1066)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:358)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:258)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at 
org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54)
at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)



-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


size of synonyms.txt

2011-06-22 Thread Bernd Fehling

While trying some synonyms.txt files I noticed a huge increase of heap usage.

synonyms_1.txt -- 6645 lines (2826104 bytes in size)
results in 66364 entries in SynonymMap with 730MB heap usage.
Startup time about 2 minutes.

synonyms_2.txt -- 6645 lines (5384884 bytes in size)
results in 115168 entries in SynonymMap with 3.3GB heap usage.
Startup time about 4 minutes.


What is your size of synonyms.txt?


Any limitations (e.g. file size, number of synonyms, ...)?


How to deal with _really_ large numbers of synonyms?


To the experts:
Why not using synonyms from a file, just because memory is faster?


Regards,
Bernd


Re: rename a core to same name of existing core

2011-06-22 Thread Stefan Matheis
Koji,

the Description on http://wiki.apache.org/solr/CoreAdmin#CREATE is:

*quote*
If a core with the same name exists, while the new created core is
initalizing, the old one will continue to accept requests. Once it
has finished, all new request will go to the new core, and the old
core will be unloaded.
*/quote*

I guess, same handling for other actions, like rename.

Regards
Stefan

2011/6/21 Koji Sekiguchi k...@r.email.ne.jp:
 I accidentally rename a core to the same name of existing core, e.g. using 
 example-DIH:

 http://localhost:8983/solr/admin/cores?action=RENAMEcore=dbother=tika

 I expected solr throws an exception, but it worked, and the existing core
 (tika) is gone.

 Does it a known bug (but I couldn't find open issue in jira) or intended 
 behavior?

 koji
 --
 http://www.rondhuit.com/en/



Re: commit time and lock

2011-06-22 Thread Ranveer

Dear all,

Kindly help me..

thanks

On Tuesday 21 June 2011 11:46 AM, Jonty Rhods wrote:

I am using solrj to index the data. I have around 5 docs indexed. As at
the time of commit due to lock server stop giving response so I was
calculating commit time:

double starttemp = System.currentTimeMillis();
server.add(docs);
server.commit();
System.out.println(total time in commit =  + (System.currentTimeMillis() -
starttemp)/1000);

It taking around 9 second to commit the 5000 docs with 15 fields. However I
am not confirm the lock time of index whether it is start
since server.add(docs); time or server.commit(); time only.

If I am changing from above to following

server.add(docs);
double starttemp = System.currentTimeMillis();
server.commit();
System.out.println(total time in commit =  + (System.currentTimeMillis() -
starttemp)/1000);

then commit time becomes less then 1 second. I am not sure which one is
right.

please help.

regards
Jonty





Re: Read past EOF error due to broken connection

2011-06-22 Thread Anuj Kumar
Hi Pravesh,

Thanks for your reply. I tried both the approaches-

Commit fails with this exception-

Exception in thread main org.apache.solr.common.SolrException: Severe
errors in solr configuration.  Check your log files for more detailed
information on what may be wrong.  If you want solr to continue after
configuration errors, change:
 abortOnConfigurationErrorfalse/abortOnConfigurationError  in solr.xml
 -
java.lang.RuntimeException: java.io.IOException: read past EOF at
org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1091) at
org.apache.solr.core.SolrCore.init(SolrCore.java:585) at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:463) at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) at
org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at
org.mortbay.jetty.Server.doStart(Server.java:224) at org

Severe errors in solr configuration.  Check your log files for more detailed
information on what may be wrong.  If you want solr to continue after
configuration errors, change:
 abortOnConfigurationErrorfalse/abortOnConfigurationError  in solr.xml
 -
java.lang.RuntimeException: java.io.IOException: read past EOF at
org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1091) at
org.apache.solr.core.SolrCore.init(SolrCore.java:585) at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:463) at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:316) at
org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) at
org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:97) at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at
org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:713)
at org.mortbay.jetty.servlet.Context.startContext(Context.java:140) at
org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1282)
at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:518)
at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:499) at
org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at
org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:156)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at
org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:152)
at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:50) at
org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:130) at
org.mortbay.jetty.Server.doStart(Server.java:224) at org

And Checkindex fails with this exception-

Opening index @ ./index/

ERROR: could not read any segments file in directory
java.io.IOException: read past EOF
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:207)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:40)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:71)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:268)
at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:358)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:753)
at
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:592)
at 

Re: MultiValued facet behavior question

2011-06-22 Thread Mike Sokolov
We always remove the facet filter when faceting: in other words, for a 
good user experience, you generally want to show facets based on the 
query excluding any restriction based on the facets.
So in your example (facet B selected), we would continue to show *all* 
facets.  Only if you performed a search using some other filter 
(proximity, gender, etc), would we restrict the facet list.


-Mike

On 06/22/2011 09:42 AM, Dennis de Boer wrote:

Well, the use case is rather simple. It is not a use case but more auser
experience.

If I have a list of values I can facet on, for example :
A
B
C
D
E

And I click on B, does it make sense for the user to display
B
C
E

after the selection ? Just because items in B are C and E items as well?
As A user I chose B because I'm interested in B items. I do not care if they
are also C and E items.
Technically this is correct, but functional wise, the user doesn't care
because it is not what they searched for.

In this case they were searching for a Cardiologists. Do I care that a
cardiologist is also a family doctor? No. So I also do not want to see this
as a facet value presented to me in frontend logic.
In the item details you can show that the cardiologist is also a family
doctor. That is fine, but not as an availbale facet option, if you just
chose an speciality you want to filter on.

Does it make sense?


On Wed, Jun 22, 2011 at 3:31 PM, lee carroll
lee.a.carr...@googlemail.comwrote:

   

Hi Dennis,

I think maybe I just disagree. Your not showing facet counts for
cardiologists and Family Doctors independently. The Family Doctor
count will be all Family Doctors who are also Cardiologists.

This allows users to further filter Cardiologists who are also family
Doctors. (this could be of use to them ??)

If your front end app implements the filtering as a list of fq=xxx
then that would make for consistent results ?

I don't see how not showing that some cardiologists are also Family
Doctors is a better user experience... But again you might have a very
specific use case?

On 22 June 2011 13:44, Dennis de Boerdatdeb...@gmail.com  wrote:
 

Hi Lee,

since I have the same problem, I might as well try to answer this
   

question.
 

You want this behaviour to make things clear for your users. If they
   

select
 

cardiologists, does it make sense to also show family doctors as a
facetvalue to the user.
The same thing goed for the facets that are related to family doctors.
   

They
 

are returned as well, thus making it even moren unclear for the end-user.



On Wed, Jun 22, 2011 at 2:27 PM, lee carroll
lee.a.carr...@googlemail.comwrote:

   

Hi Bill,

 

So that part works. Then when I output the facet, I need a different
behavior than the default. I need
The facet to only output the value that matches (scored) - NOT ALL
   

VALUES
 

in the multiValued field.
   
 

I think it makes sense?
   

Why do you need this ? If your use case is faceted navigation then not
showing
all the facet terms which match your query would be mis-leading to your
users.
The fact is your data indicates Ben the cardiologist is also a GP etc.
Is it not valid for
your users to be able to further filter on cardiologists who are also
specialists in x other disciplines ? If the specialisms are mutually
exclusive then your data will reflect this.

The fact is x number of cardiologists match and x number of GP's match
 

etc
 

I may be missing the point here as you have not said why you need to do
this ?

cheers lee c


On 22 June 2011 09:34, Michael Kuhlmanns...@kuli.org  wrote:
 

Am 22.06.2011 09:49, schrieb Bill Bell:
   

You can type q=cardiology and match on cardiologist. If stemming did
 

not
 

work you can just add a synonym:

cardiology,cardiologist
 

Okay, synonyms are the only way I can think of a realistic match.

Stemming won't work on a facet field; you wouldn't get Cardiologist:
   

3
 

as the result but cardiolog: 3 or something like that instead.

Normally, you use declare facet field explicitly for facetting, and
   

not
 

for searching, exactly because stemming and tokenizing on facet fields
don't make sense.

And the short answer is: No, that's not possible.

-Kuli

   
 
   
 
   


Re: rename a core to same name of existing core

2011-06-22 Thread Koji Sekiguchi

Stefan,

 I guess, same handling for other actions, like rename.

I agree. Thank you for the pointer!

koji

(11/06/22 23:16), Stefan Matheis wrote:

Koji,

the Description on http://wiki.apache.org/solr/CoreAdmin#CREATE is:

*quote*
If a core with the same name exists, while the new created core is
initalizing, the old one will continue to accept requests. Once it
has finished, all new request will go to the new core, and the old
core will be unloaded.
*/quote*

I guess, same handling for other actions, like rename.

Regards
Stefan

2011/6/21 Koji Sekiguchik...@r.email.ne.jp:

I accidentally rename a core to the same name of existing core, e.g. using 
example-DIH:

http://localhost:8983/solr/admin/cores?action=RENAMEcore=dbother=tika

I expected solr throws an exception, but it worked, and the existing core
(tika) is gone.

Does it a known bug (but I couldn't find open issue in jira) or intended 
behavior?

koji
--
http://www.rondhuit.com/en/






--
http://www.rondhuit.com/en/


response time for pdf indexing

2011-06-22 Thread libnova
Hi !

 

We are using Zend Search based on Lucene. Our indexing pdf consultations
take longer than 2 seconds. 

 

We want to change to solr to try to solve this problem.

i. Can anyone tell me the response time for querys on pdf documents on solr?


ii. Can anyone tell me some strategies to reduce this response time? 

 

Note: the pdf is not indexed in a simple way. The pdf is converted to text
previously and then, indexed with some additional information needed.

 

Thank you.

 

---

Rode González

 

  _  

No se encontraron virus en este mensaje.
Comprobado por AVG - www.avg.com
Versión: 10.0.1382 / Base de datos de virus: 1513/3719 - Fecha de
publicación: 06/22/11



Re: Exception using Analyze from the Solr Admin app

2011-06-22 Thread karthik
any help on this would be really appreciated.

i just setup a totally brand new setup of solr  still got this exception ..
I can see that this would be something to do with classpath, but not able to
figure out exactly what is causing this issue.

-- karthik

On Mon, Jun 13, 2011 at 4:23 PM, karthik kmoha...@gmail.com wrote:

 Hi Everyone,

 I am new to the Solr world and just started playing around with it. I had
 everything up  running and suddenly the Analyze functionality started
 throwing an exception when i tried using it. It was working a few days ago 
 suddenly it stopped working  started throwing this exception.

 This is a Solr 3.1 setup running on tomcat-7.0.11. The exception trace is:

 -
 Jun 13, 2011 4:04:19 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.jasper.JasperException: javax.servlet.ServletException:
 java.lang.NoSuchMethodError:
 org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token;
 at
 org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:534)
 at
 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:442)
 at
 org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:391)
 at
 org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:304)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
 org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:684)
 at
 org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:471)
 at
 org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:402)
 at
 org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:329)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164)
 at
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:498)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
 at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:562)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:394)
 at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243)
 at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 Caused by: javax.servlet.ServletException: java.lang.NoSuchMethodError:
 org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token;
 at
 org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:911)
 at
 org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:840)
 at
 org.apache.jsp.admin.analysis_jsp._jspService(analysis_jsp.java:725)
 at
 org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
 at
 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:419)
 ... 26 more
 Caused by: java.lang.NoSuchMethodError:
 org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token;
 at
 org.apache.jsp.admin.analysis_jsp.getTokens(analysis_jsp.java:118)
 at
 org.apache.jsp.admin.analysis_jsp._jspService(analysis_jsp.java:696)
 ... 29 more

 -

 I verified that the webapps/app/WEB-INF/lib has all the latest 3.1.0
 JAR files in them. Any pointers to fix this issue would be great.

 Thanks,
 Karthik





[Announce] Solr 3.2 with RankingAlgorithm

2011-06-22 Thread Nagendra Nagarajayya

Hi!

I would like to announce the availability of Solr 3.2 with 
RankingAlgorithm. Please download and give the new version a try. This 
version of RankingAlgorithm exposes a lucene compatible api so almost 
all of the Solr features should work as it is.


Note:
NRT support will be available by next week.

Sincerely,

- Nagendra Nagarajayya
http://solr-ra.tgels.com
http://rankingalgorithm.tgels.com


Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer

2011-06-22 Thread Jonathan Rochkind

Yeah, I see your points. It's complicated. I'm not sure either.

But the thing is:

 in order to use a feature like that you'd have to really think hard 
about

 the query analysis of your fields, and which ones will produce which
 tokens in which situations

You need to think really hard about the (index and query) analysis of 
your fields and which ones will produce which tokens _now_, if you are 
using multiple fields in a 'qf' with differing analysis, and using a 
percent mm. (Or similarly an mm that varies depending on how many terms).


That's what I've come to realize, that's the status quo. If your qf 
fields don't all have identical analysis, right _now_ you need to think 
really hard about the analysis and how it's going to possibly effect 
'mm', including for edge case queries.  If you don't, you likely have 
edge case queries (at least) which aren't behaving how you expected 
(whether you notice or have it brought to your attention by users or not).


Or you can just make sure all fields in your qf have identical analysis, 
and then you don't have to worry about it. But that's not always 
practical, a lot of the power of dismax qf ends up being combining 
fields with different analysis.


So I was trying to think of a way to make this less so, but still be 
able to take advantage of dismax, but I think you're right that maybe 
there isn't any, or at least nothing we've come up with yet.


Maybe what I really need is a query parser that does not do disjunction 
maximum at all, but somehow still combines different 'qf' type fields 
with different boosts on each field. I personally don't _neccesarily_ 
need the actual disjunction max calculation, but I do need combining 
of mutiple fields with different boosts. Of course, I'm not sure exactly 
how it would combine multiple fields if not disjunction maximum, but 
perhaps one is conceivable that wouldn't be subject to this particular 
gotcha with differing analysis.


I also remain kind of confused about how the existing dismax figures out 
how many terms for the 'mm' type calculations. If someone wanted to 
explain that,  I would find it enlightening and helpful for 
understanding what's going on.


Jonathan

On 6/21/2011 10:20 PM, Chris Hostetter wrote:

: not other) setups/intentions.  It's counter-intuitive to me that adding
: a field to the 'qf' set results in _fewer_ hits than the same 'qf' set

agreed .. but that's where looking the debug info comes in to understand
the reason for that behavior is that your old qf treated part of your
input as garbage and that new field respects it and uses it in the
calculation.

mind you: the fewer hits behavior only happens when using a percentage
value in mm ... if you had mm=2 you'd get more results, but you've asked
for 66% (or whatever) and with that new qf there is a differnet number
of clauses produced by query parsing.

: I wonder if it would be a good idea to have a parameter to (e)dismax
: that told it which of these two behaviors to use? The one where the
: 'term count' is based on the maximum number of terms from any field in
: the 'qf', and one where it's based on the minimum number of terms
: produced from any field in the qf?  I am still not sure how feasible

even in your use case, i don't think you are fully considering what that
would produce.  imagine that an mmType=min param existed and gave you what
you're asking for.  Now imagine that you have two fields, one named
simple that strips all punctuation and one named complex that doesn't,
and you have a query like this...

q=Foo  Bar
qf=simple complex
mm=100%
mmType=min

   * Foo produces tokens for all qf
   *  only produces tokens for some qf (complex)
   * Bar products tokens for all qf

your mmType would say there are only 2 tokens that we can query across
all fields, so our computed minShouldMatch should be 100% of 2 == 2

sounds good so far right?

the problem is you still have query clause coming from that 
character ... you have 3 real clauses, one of which is that term query for
complex: which means that with your (computed) minShouldMatch of 2 you
would see matches for any doc that happened to have indexed the  symbol
in the complex field and also matched *either* of Foo or Bar (in either
field)

So while a lot of your results would match both Foo and Bar, you'd get
still get a bunch of weird results.

: Or maybe a feature where you tell dismax, the number of tokens produced
: by field X, THAT's the one you should use for your 'term count' for mm,

Hmmm maybe.  i'd have to see a patch in action and play with it, to
really think it through ... hmmm ... honestly i really can't imagine how
that would be helpful in general...

in order to use a feature like that you'd have to really think hard about
the query analysis of your fields, and which ones will produce which
tokens in which situations in order to make sure you pick the *right*
value for that param -- but once you've done that hard 

Re: Exception using Analyze from the Solr Admin app

2011-06-22 Thread Stefan Matheis
Karthik,

could you attach/pastebin your schema and also the text you're trying
to analyze?

Regards
Stefan

On Wed, Jun 22, 2011 at 5:29 PM, karthik kmoha...@gmail.com wrote:
 any help on this would be really appreciated.

 i just setup a totally brand new setup of solr  still got this exception ..
 I can see that this would be something to do with classpath, but not able to
 figure out exactly what is causing this issue.

 -- karthik

 On Mon, Jun 13, 2011 at 4:23 PM, karthik kmoha...@gmail.com wrote:

 Hi Everyone,

 I am new to the Solr world and just started playing around with it. I had
 everything up  running and suddenly the Analyze functionality started
 throwing an exception when i tried using it. It was working a few days ago 
 suddenly it stopped working  started throwing this exception.

 This is a Solr 3.1 setup running on tomcat-7.0.11. The exception trace is:

 -
 Jun 13, 2011 4:04:19 PM org.apache.solr.common.SolrException log
 SEVERE: org.apache.jasper.JasperException: javax.servlet.ServletException:
 java.lang.NoSuchMethodError:
 org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token;
         at
 org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:534)
         at
 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:442)
         at
 org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:391)
         at
 org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334)
         at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
         at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:304)
         at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
         at
 org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:684)
         at
 org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:471)
         at
 org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:402)
         at
 org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:329)
         at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
         at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
         at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
         at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
         at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164)
         at
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:498)
         at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
         at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
         at
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:562)
         at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
         at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:394)
         at
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243)
         at
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188)
         at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302)
         at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:619)
 Caused by: javax.servlet.ServletException: java.lang.NoSuchMethodError:
 org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token;
         at
 org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:911)
         at
 org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:840)
         at
 org.apache.jsp.admin.analysis_jsp._jspService(analysis_jsp.java:725)
         at
 org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
         at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
         at
 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:419)
         ... 26 more
 Caused by: java.lang.NoSuchMethodError:
 org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token;
         at
 org.apache.jsp.admin.analysis_jsp.getTokens(analysis_jsp.java:118)
         at
 org.apache.jsp.admin.analysis_jsp._jspService(analysis_jsp.java:696)
         ... 29 more

 -

 I verified that the webapps/app/WEB-INF/lib has all the latest 3.1.0
 JAR files in them. Any pointers to fix this issue would be great.

 Thanks,
 Karthik






Re: MultiValued facet behavior question

2011-06-22 Thread Darren Govoni
How is that different from doing a field search and just counting the 
results?

If you only want the facet of the searched term (input), then why not just
combine that with the result count and use that?

Facets are more useful when you _don't_ know the distribution of values
across a result set because they weren't included in the search criteria.

Maybe this needs a new name or handler than facet.

What am I missing?

On 06/22/2011 03:44 AM, Bill Bell wrote:

Here is an example using exampledocs and trunk 4.0:

http://localhost:8983/solr/select/?q=cat:%22hard%20drive%22version=2.2sta
rt=0rows=10indent=onfacet=truefacet.field=catfacet.query={!lucene}cat:
%22hard%20drive%22facet.mincount=1

Results:

result name=response numFound=2 start=0
Etc
lst name=facet_queries
int name={!lucene}cat:hard drive2/int
/lst
lst name=facet_fields
lst name=cat
 int name=electronics2/int
 int name=hard drive2/int
/lst/lst

Notice that the facet_queries count 2 is the same as the the numFound=2.

But I have no way to use facet.field to count the matches.

The algorithm -

Loop through multiValued field and match on hard drive. Ignore other
values in there when setting the facet list




On 6/22/11 1:19 AM, Dennis de Boerdatdeb...@gmail.com  wrote:


Hi Bill,

yes, you absolutely do make sense. I posted the exact same question to
this
mailing list (subject: faceting on multivalued fields), but got no
response
out of it. A friend of mine is now helping out.

I hope someone on the list can give us some advice. I'll post our findings
to this topic.

Regards,
Dennis


On Wed, Jun 22, 2011 at 5:37 AM, Bill Bellbillnb...@gmail.com  wrote:


Doing it with q=specialities:Cardiologist or
q=CardiologistdefType=dismaxqf=specialties
does not matter, the issue is how I see facets. I want the facets to
only
show the one match,
and not all the multiValued fields in specialties that match...

Example,

Name|specialties
Bell|Cardiologist
Smith|Cardiologist,Family Doctor
Adams,Cardiologist,Family Doctor,Internist

When I facet.field=specialties I get:

Cardiologist: 3
Internist: 1
Family Doctor: 1


I only want it to return:

Cardiologist: 3

Because this matches exactly... Facet on the field that matches and only
return the number for that.

It can get more complicated. Here is another example:

q=cardiologydefType=dismaxqf=specialties


(Cardiology and cardiologist are stems)...

But I don't really know which value in Cardiologist match perfectly.

Again, I only want it to return:

Cardiologist: 3

If I searched on q=internistdefType=dismaxqf=specialties, I want the
result to be:


Internist: 1


Does this all make sense?







On 6/21/11 8:23 PM, Darren Govonidar...@ontrenet.com  wrote:


So are you saying that for all results for cardiologist,
you don't want facets not matching Cardiologist to be
returned as facets?

what happens when you make q=specialities:Cardiologist?
instead of just q=Cardiologist?

Seems that if you make the query on the field, then all
your results will necessarily qualify and you can discard
any additional facets you don't want (e.g. that don't
match the initial query term).

Maybe you can write what you see now, with what you
want to help clarify.

On 06/21/2011 09:47 PM, Bill Bell wrote:

I have a field: specialties that is multiValued.

It indicates the doctor's specialties: cardiologist, internist, etc.

When someone does a search: Cardiologist, I use

q=cardiologistdefType=dismaxqf=specialtiesfacet=truefacet.field=spe
ci
alt
ies

What I want to come out in the facet is the Cardiologist (since it
matches
exactly) and the number that matches: 700.
I don't want to see the other values that are not Cardiologist.

Now I see:

Cardiologist: 700
Internist: 45
Family Doctor: 20

This means that several Cardiologist's are also internists and family
doctors. When it matches exactly, I don't want to see Internists,

Family

Doctors. How do I send a query to Solr with a condition.
Facet.query=specialties:Cardiologistfacet.field=specialties

Then if the query returns something use it, otherwise use the field

one?

Other ideas?













Re: size of synonyms.txt

2011-06-22 Thread Robert Muir
On Wed, Jun 22, 2011 at 10:14 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
 While trying some synonyms.txt files I noticed a huge increase of heap
 usage.

 synonyms_1.txt -- 6645 lines (2826104 bytes in size)
 results in 66364 entries in SynonymMap with 730MB heap usage.
 Startup time about 2 minutes.

 synonyms_2.txt -- 6645 lines (5384884 bytes in size)
 results in 115168 entries in SynonymMap with 3.3GB heap usage.
 Startup time about 4 minutes.


 What is your size of synonyms.txt?


 Any limitations (e.g. file size, number of synonyms, ...)?


 How to deal with _really_ large numbers of synonyms?


 To the experts:
 Why not using synonyms from a file, just because memory is faster?


Hi,

I think we should look at implementing synonyms with an FST, to reduce
the ram usage.
I also think this would make it easier for us to minimize the number
of captureState/restoreState that it does,
because it would just be a more natural way to handle all the
multi-word cases... this could actually speed up the analysis time for
this filter.


Re: size of synonyms.txt

2011-06-22 Thread Darren Govoni
I once tried to load wordnet synsets as a synonym file and it was 
prohibitively slow and unusable. fyi.


On 06/22/2011 12:23 PM, Robert Muir wrote:

On Wed, Jun 22, 2011 at 10:14 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de  wrote:

While trying some synonyms.txt files I noticed a huge increase of heap
usage.

synonyms_1.txt --  6645 lines (2826104 bytes in size)
results in 66364 entries in SynonymMap with 730MB heap usage.
Startup time about 2 minutes.

synonyms_2.txt --  6645 lines (5384884 bytes in size)
results in 115168 entries in SynonymMap with 3.3GB heap usage.
Startup time about 4 minutes.


What is your size of synonyms.txt?


Any limitations (e.g. file size, number of synonyms, ...)?


How to deal with _really_ large numbers of synonyms?


To the experts:
Why not using synonyms from a file, just because memory is faster?


Hi,

I think we should look at implementing synonyms with an FST, to reduce
the ram usage.
I also think this would make it easier for us to minimize the number
of captureState/restoreState that it does,
because it would just be a more natural way to handle all the
multi-word cases... this could actually speed up the analysis time for
this filter.




Re: MultiValued facet behavior question

2011-06-22 Thread Jonathan Rochkind
Okay, so since you put cardiologist in the 'q', you only want facet 
values that have 'cardiologist' (or 'Cardiologist') to show in up the 
facet list.


In general, there's no good way to do that.

But.

If you want to do some client-side processing before you submit the 
query to Solr, and on the client side you can figure out exactly what 
you want: then you could try to play around with facet.filter or 
facet.query, to see if you can make it do what you want. It may or may 
not work out, depending on exactly your use pattern, which you still 
haven't articulated very well, but you can mess around with it and see 
what you can do.


Ie, if you KNOW (that is, your own app code knows, when creating the 
Solr request) that you only want the facet value for Cardiologist 
(including exact case), you can try facet.query=specialty:Cardiologist


Your app code would have to pull out the results special too, they won't 
be in the Solr response in same way ordinary facet.field is. It also 
requires your query value to match _exactly_ (case, punctuation, etc) 
the value in the index. Not cardiologist and Cardiologist.


I think Solr 3.1 has some regex based facet.filter abilities that might 
be useful, and help you get around the 'exact match' issues, but watch 
out for performance.





On 6/21/2011 11:37 PM, Bill Bell wrote:

Doing it with q=specialities:Cardiologist or
q=CardiologistdefType=dismaxqf=specialties
does not matter, the issue is how I see facets. I want the facets to only
show the one match,
and not all the multiValued fields in specialties that match...

Example,

Name|specialties
Bell|Cardiologist
Smith|Cardiologist,Family Doctor
Adams,Cardiologist,Family Doctor,Internist

When I facet.field=specialties I get:

Cardiologist: 3
Internist: 1
Family Doctor: 1


I only want it to return:

Cardiologist: 3

Because this matches exactly... Facet on the field that matches and only
return the number for that.

It can get more complicated. Here is another example:

q=cardiologydefType=dismaxqf=specialties


(Cardiology and cardiologist are stems)...

But I don't really know which value in Cardiologist match perfectly.

Again, I only want it to return:

Cardiologist: 3

If I searched on q=internistdefType=dismaxqf=specialties, I want the
result to be:


Internist: 1


Does this all make sense?







On 6/21/11 8:23 PM, Darren Govonidar...@ontrenet.com  wrote:


So are you saying that for all results for cardiologist,
you don't want facets not matching Cardiologist to be
returned as facets?

what happens when you make q=specialities:Cardiologist?
instead of just q=Cardiologist?

Seems that if you make the query on the field, then all
your results will necessarily qualify and you can discard
any additional facets you don't want (e.g. that don't
match the initial query term).

Maybe you can write what you see now, with what you
want to help clarify.

On 06/21/2011 09:47 PM, Bill Bell wrote:

I have a field: specialties that is multiValued.

It indicates the doctor's specialties: cardiologist, internist, etc.

When someone does a search: Cardiologist, I use

q=cardiologistdefType=dismaxqf=specialtiesfacet=truefacet.field=speci
alt
ies

What I want to come out in the facet is the Cardiologist (since it
matches
exactly) and the number that matches: 700.
I don't want to see the other values that are not Cardiologist.

Now I see:

Cardiologist: 700
Internist: 45
Family Doctor: 20

This means that several Cardiologist's are also internists and family
doctors. When it matches exactly, I don't want to see Internists, Family
Doctors. How do I send a query to Solr with a condition.
Facet.query=specialties:Cardiologistfacet.field=specialties

Then if the query returns something use it, otherwise use the field one?

Other ideas?









Re: size of synonyms.txt

2011-06-22 Thread Bernd Fehling

 On Wed, Jun 22, 2011 at 10:14 AM, Bernd Fehling
 bernd.fehl...@uni-bielefeld.de wrote:
  While trying some synonyms.txt files I noticed a huge increase 
 of heap
  usage.
 
  synonyms_1.txt -- 6645 lines (2826104 bytes in size)
  results in 66364 entries in SynonymMap with 730MB heap usage.
  Startup time about 2 minutes.
 
  synonyms_2.txt -- 6645 lines (5384884 bytes in size)
  results in 115168 entries in SynonymMap with 3.3GB heap usage.
  Startup time about 4 minutes.
 
 
  What is your size of synonyms.txt?
 
 
  Any limitations (e.g. file size, number of synonyms, ...)?
 
 
  How to deal with _really_ large numbers of synonyms?
 
 
  To the experts:
  Why not using synonyms from a file, just because memory is faster?
 
 
 Hi,
 
 I think we should look at implementing synonyms with an FST, to reduce
 the ram usage.
 I also think this would make it easier for us to minimize the number
 of captureState/restoreState that it does,
 because it would just be a more natural way to handle all the
 multi-word cases... this could actually speed up the analysis 
 time for
 this filter.

Wow you can read between the lines ;-)
Exactly what I have on my mind.


RE: response time for pdf indexing

2011-06-22 Thread Steven A Rowe
Hi Rode,

Have you seen http://wiki.apache.org/solr/SolrPerformanceFactors ?

Steve

 -Original Message-
 From: Rode González (libnova) [mailto:r...@libnova.es]
 Sent: Wednesday, June 22, 2011 11:30 AM
 To: solr-user@lucene.apache.org
 Cc: dan...@silvereme.com; Gonzalo Iglesias; Leo; Marcos; Mario Crespo
 (Silvereme); 'Rode'
 Subject: response time for pdf indexing
 
 Hi !
 
 
 
 We are using Zend Search based on Lucene. Our indexing pdf consultations
 take longer than 2 seconds.
 
 
 
 We want to change to solr to try to solve this problem.
 
 i. Can anyone tell me the response time for querys on pdf documents on
 solr?
 
 
 ii. Can anyone tell me some strategies to reduce this response time?
 
 
 
 Note: the pdf is not indexed in a simple way. The pdf is converted to
 text
 previously and then, indexed with some additional information needed.
 
 
 
 Thank you.
 
 
 
 ---
 
 Rode González
 
 
 
   _
 
 No se encontraron virus en este mensaje.
 Comprobado por AVG - www.avg.com
 Versión: 10.0.1382 / Base de datos de virus: 1513/3719 - Fecha de
 publicación: 06/22/11



Re: Exception using Analyze from the Solr Admin app

2011-06-22 Thread karthik
Thanks for offering to help Stefan.

I just resolved the issue. It was some crazy thing within Tomcat (I still
need to find out what it was). I just backed up my old tomcat installation 
just created a new instance of tomcat  deployed my solr installation in
there  everything started working fine as it was before.

I will compare the 2 tomcat folders to see what was different and respond
back with my findings.

-- karthik

On Wed, Jun 22, 2011 at 11:48 AM, Stefan Matheis 
matheis.ste...@googlemail.com wrote:

 Karthik,

 could you attach/pastebin your schema and also the text you're trying
 to analyze?

 Regards
 Stefan

 On Wed, Jun 22, 2011 at 5:29 PM, karthik kmoha...@gmail.com wrote:
  any help on this would be really appreciated.
 
  i just setup a totally brand new setup of solr  still got this exception
 ..
  I can see that this would be something to do with classpath, but not able
 to
  figure out exactly what is causing this issue.
 
  -- karthik
 
  On Mon, Jun 13, 2011 at 4:23 PM, karthik kmoha...@gmail.com wrote:
 
  Hi Everyone,
 
  I am new to the Solr world and just started playing around with it. I
 had
  everything up  running and suddenly the Analyze functionality started
  throwing an exception when i tried using it. It was working a few days
 ago 
  suddenly it stopped working  started throwing this exception.
 
  This is a Solr 3.1 setup running on tomcat-7.0.11. The exception trace
 is:
 
  -
  Jun 13, 2011 4:04:19 PM org.apache.solr.common.SolrException log
  SEVERE: org.apache.jasper.JasperException:
 javax.servlet.ServletException:
  java.lang.NoSuchMethodError:
 
 org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token;
  at
 
 org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:534)
  at
 
 org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:442)
  at
  org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:391)
  at
  org.apache.jasper.servlet.JspServlet.service(JspServlet.java:334)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:304)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  at
 
 org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:684)
  at
 
 org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:471)
  at
 
 org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:402)
  at
 
 org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:329)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:275)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
  at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
  at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:164)
  at
 
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:498)
  at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
  at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
  at
 
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:562)
  at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
  at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:394)
  at
 
 org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243)
  at
 
 org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188)
  at
 
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:302)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:619)
  Caused by: javax.servlet.ServletException: java.lang.NoSuchMethodError:
 
 org.apache.lucene.analysis.TokenStream.next()Lorg/apache/lucene/analysis/Token;
  at
 
 org.apache.jasper.runtime.PageContextImpl.doHandlePageException(PageContextImpl.java:911)
  at
 
 org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:840)
  at
  org.apache.jsp.admin.analysis_jsp._jspService(analysis_jsp.java:725)
  at
  org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
  at 

Re: MultiValued facet behavior question

2011-06-22 Thread Darren Govoni

Yeah, I agree with that last statement.

It seems to me that the use case where it _might_ matter is where
you have a query for MORE than one.

q=cardiologist OR family

and in that case, it MIGHT be useful to separate the facets
in a XOR sense where you don't get cross-pollution. But
the original poster didn't indicate this scenario originally.

Maybe for that, Solr's grouping mechanism will help?
Although I have not used it myself.

On 06/22/2011 09:31 AM, lee carroll wrote:

Hi Dennis,

I think maybe I just disagree. Your not showing facet counts for
cardiologists and Family Doctors independently. The Family Doctor
count will be all Family Doctors who are also Cardiologists.

This allows users to further filter Cardiologists who are also family
Doctors. (this could be of use to them ??)

If your front end app implements the filtering as a list of fq=xxx
then that would make for consistent results ?

I don't see how not showing that some cardiologists are also Family
Doctors is a better user experience... But again you might have a very
specific use case?

On 22 June 2011 13:44, Dennis de Boerdatdeb...@gmail.com  wrote:

Hi Lee,

since I have the same problem, I might as well try to answer this question.

You want this behaviour to make things clear for your users. If they select
cardiologists, does it make sense to also show family doctors as a
facetvalue to the user.
The same thing goed for the facets that are related to family doctors. They
are returned as well, thus making it even moren unclear for the end-user.



On Wed, Jun 22, 2011 at 2:27 PM, lee carroll
lee.a.carr...@googlemail.comwrote:


Hi Bill,


So that part works. Then when I output the facet, I need a different
behavior than the default. I need
The facet to only output the value that matches (scored) - NOT ALL VALUES
in the multiValued field.
I think it makes sense?

Why do you need this ? If your use case is faceted navigation then not
showing
all the facet terms which match your query would be mis-leading to your
users.
The fact is your data indicates Ben the cardiologist is also a GP etc.
Is it not valid for
your users to be able to further filter on cardiologists who are also
specialists in x other disciplines ? If the specialisms are mutually
exclusive then your data will reflect this.

The fact is x number of cardiologists match and x number of GP's match etc

I may be missing the point here as you have not said why you need to do
this ?

cheers lee c


On 22 June 2011 09:34, Michael Kuhlmanns...@kuli.org  wrote:

Am 22.06.2011 09:49, schrieb Bill Bell:

You can type q=cardiology and match on cardiologist. If stemming did not
work you can just add a synonym:

cardiology,cardiologist

Okay, synonyms are the only way I can think of a realistic match.

Stemming won't work on a facet field; you wouldn't get Cardiologist: 3
as the result but cardiolog: 3 or something like that instead.

Normally, you use declare facet field explicitly for facetting, and not
for searching, exactly because stemming and tokenizing on facet fields
don't make sense.

And the short answer is: No, that's not possible.

-Kuli





Re: MultiValued facet behavior question

2011-06-22 Thread Gino Rodrigues
An interesting live scenario for this matter:
http://www.bondfaro.com.br/  (brazilian site)

The query ipad returns results spread across many categories (links
on the left, teasers in the center). The Tablet category (facet) is
one of them.

The query tablet does exactly the same as clicking Tablet in the
search for ipad. Note the breadcrumb
(InícioInformáticaTablettablet)

In this case of a broad term, that exactly matches a product facet, it
totally makes sense for the user. In general, it tends to make more
sense as the search bias from full-text to structured metadata.

So, is it possible to turn q=cardiologist into
q=specialities:Cardiologist by boosting an exact match on a facet
label?


Re: SEVERE: java.lang.NoSuchFieldError: core Solr branch3.x

2011-06-22 Thread Yonik Seeley
I just tried branch_3x and couldn't reproduce this.
Looks like maybe there is something wrong with your build, or some old
class files left over somewhere being picked up.

-Yonik
http://www.lucidimagination.com



On Wed, Jun 22, 2011 at 10:15 AM, Markus Jelsma
markus.jel...@openindex.io wrote:

 Hi,

 Today's checkout (Solr Specification Version: 3.4.0.2011.06.22.16.10.08)
 produces the exception below on start up. The same exception with very similar
 strack trace comes when committing and add. Example schema and docs will
 reproduce the error.

 Jun 22, 2011 4:11:57 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.NoSuchFieldError: core
        at
 org.apache.lucene.index.SegmentTermDocs.init(SegmentTermDocs.java:48)
        at
 org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:491)
        at org.apache.lucene.index.IndexReader.termDocs(IndexReader.java:1005)
        at
 org.apache.lucene.index.SegmentReader.termDocs(SegmentReader.java:484)
        at
 org.apache.solr.search.SolrIndexReader.termDocs(SolrIndexReader.java:321)
        at
 org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:101)
        at
 org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:298)
        at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:524)
        at
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:320)
        at
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1178)
        at
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1066)
        at
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:358)
        at
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:258)
        at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
        at
 org.apache.solr.core.QuerySenderListener.newSearcher(QuerySenderListener.java:54)
        at org.apache.solr.core.SolrCore$3.call(SolrCore.java:1177)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)



 --
 Markus Jelsma - CTO - Openindex
 http://www.linkedin.com/in/markus17
 050-8536620 / 06-50258350



sorting by termfreq on trunk doesn't work?

2011-06-22 Thread Jason Toy
I am trying to use sorting by the termfreq function using the trunk code
since termfreq was added in the 4.0 code base.
I run this query:
http://127.0.0.1:8983/solr/select/?q=librariansort=termfreq(all_lists_text,librarian)%20desc

but I get:

HTTP ERROR 500

Problem accessing /solr/select/. Reason:

null

java.lang.NullPointerException
at 
org.apache.solr.search.function.TermFreqValueSource$1.reset(TermFreqValueSource.java:53)
at 
org.apache.solr.search.function.TermFreqValueSource$1.init(TermFreqValueSource.java:49)
at 
org.apache.solr.search.function.TermFreqValueSource.getValues(TermFreqValueSource.java:44)
at 
org.apache.solr.search.function.ValueSource$ValueSourceComparator.setNextReader(ValueSource.java:188)
at 
org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:544)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:313)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1190)
at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1078)
at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:346)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:400)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:231)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1308)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)





Is termfreq stable and how can I run this query?


-- 
- sent from my mobile
6176064373


Re: sorting by termfreq on trunk doesn't work?

2011-06-22 Thread Yonik Seeley
Thanks for the problem report.  It turns out we didn't check for a
null pointer when there were no terms in a field for a segment.
I've just committed a fix to trunk.

-Yonik
http://www.lucidimagination.com



On Wed, Jun 22, 2011 at 10:28 PM, Jason Toy jason...@gmail.com wrote:
 I am trying to use sorting by the termfreq function using the trunk code
 since termfreq was added in the 4.0 code base.
 I run this query:
 http://127.0.0.1:8983/solr/select/?q=librariansort=termfreq(all_lists_text,librarian)%20desc

 but I get:

 HTTP ERROR 500

 Problem accessing /solr/select/. Reason:

    null

 java.lang.NullPointerException
        at 
 org.apache.solr.search.function.TermFreqValueSource$1.reset(TermFreqValueSource.java:53)
        at 
 org.apache.solr.search.function.TermFreqValueSource$1.init(TermFreqValueSource.java:49)
        at 
 org.apache.solr.search.function.TermFreqValueSource.getValues(TermFreqValueSource.java:44)
        at 
 org.apache.solr.search.function.ValueSource$ValueSourceComparator.setNextReader(ValueSource.java:188)
        at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:97)
        at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:544)
        at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:313)
        at 
 org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1190)
        at 
 org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1078)
        at 
 org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:346)
        at 
 org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:400)
        at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:231)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1308)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
        at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
        at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)





 Is termfreq stable and how can I run this query?


 --
 - sent from my mobile
 6176064373



Re: Search is taking long-long time.

2011-06-22 Thread pravesh
Was your searches always slow, OR, since you did some changes at
index/config/schema level?
Is it due to 5-mins index updation? Are you warming ur searches?

Thanx
Pravesh

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-is-taking-long-long-time-tp3095306p3098552.html
Sent from the Solr - User mailing list archive at Nabble.com.


Query time noun, verb boosting

2011-06-22 Thread Pooja Verlani
Hi,

At the query time, I want to make the lucene query such that it should boost
only the noun from the query or some concept existing in the index. Are
there any possibilities or any possible ideas that can be worked around?


Regards,
Pooja


Re: Query time noun, verb boosting

2011-06-22 Thread Anshum
What would you mean by 'noun or some concept'. Would be better if you could
give a rather concrete example.
About detecting parts of speech, you could use a lot of libraries but I
didn't get about boosting terms from the Index.


--
Anshum Gupta
http://ai-cafe.blogspot.com


On Thu, Jun 23, 2011 at 11:02 AM, Pooja Verlani pooja.verl...@gmail.comwrote:

 Hi,

 At the query time, I want to make the lucene query such that it should
 boost
 only the noun from the query or some concept existing in the index. Are
 there any possibilities or any possible ideas that can be worked around?


 Regards,
 Pooja



Re: Query time noun, verb boosting

2011-06-22 Thread Pooja Verlani
Hi,

Say for example, a query like mammohan singh dancing, I am preferring to
make a compulsory condition on nouns to be searched but any verb isnt
important for me, I am preferring to extract results for manmohan singh and
not for dancing. If I can extract noun verb or can get to know that in my
index I have a concept of manmohan singh or an identity if not concept, I
would like to define rules for doing a strict(compulsory) match of
noun(concept) and loose match(non-compulsory boosting) for the verb.

Basically, I want to avoid getting zero results for a compulsory match of
the 3 tokens(in this case manmohan singh dancing) of the query and instead I
want to do a compulsory match on manmohan singh since that exists in my
index and dancing shouldn't be a compulsory match for non-zero number of
results.

Hope this explains.
Any suggestions?

Regards,
Pooja


On Thu, Jun 23, 2011 at 11:07 AM, Anshum ansh...@gmail.com wrote:

 What would you mean by 'noun or some concept'. Would be better if you could
 give a rather concrete example.
 About detecting parts of speech, you could use a lot of libraries but I
 didn't get about boosting terms from the Index.


 --
 Anshum Gupta
 http://ai-cafe.blogspot.com


 On Thu, Jun 23, 2011 at 11:02 AM, Pooja Verlani pooja.verl...@gmail.com
 wrote:

  Hi,
 
  At the query time, I want to make the lucene query such that it should
  boost
  only the noun from the query or some concept existing in the index. Are
  there any possibilities or any possible ideas that can be worked around?
 
 
  Regards,
  Pooja