Search Query (Should I use fq)

2011-12-30 Thread reeuv
I am looking to write a query in which a user will enter two conditions i.e. 
  Search for description:text where
category:someCategory

So whats the best way to query it

1. q = (description:text) AND (category:someCategory)

or

2. q = (description:text) AND (fq=category:someCategory)

or Is there a better way then the ones written above ?

Thanks
Rahul

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-Query-Should-I-use-fq-tp3620521p3620521.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search Query (Should I use fq)

2011-12-30 Thread Ahmet Arslan
 I am looking to write a query in
 which a user will enter two conditions i.e. 
                
   Search for description:text where
 category:someCategory
 
 So whats the best way to query it
 
 1. q = (description:text) AND (category:someCategory)
 
 or
 
 2. q = (description:text) AND (fq=category:someCategory)
 
 or Is there a better way then the ones written above ?

Your second example is invalid actually. Correct syntax is :
q=description:textfq=category:someCategory

It seems that fq is more appropriate for category:someCategory, if that query 
would possibly be issued again.  Filter queries are cached. 
http://wiki.apache.org/solr/CommonQueryParameters#fq


Re: 3.5 QueryResponseWriter

2011-12-30 Thread Aleksander Akerø

Den 30.12.2011 06:03, skrev Chris Hostetter:

: Looks like you've experienced the issue described with fixes here:
:http://www.lucidimagination.com/search/document/48b9e75fe68be4b7

but specifically, since you've already copied the jar file in question,
and are now getting a class not found for the *baseclass* it suggests you
have a diff problem

:  What I have done this far is basicly just to copy the /example/solr
: folder, install the webapp .war file in a tomcat instance and start up.
:At first I complained about the VelocityResponseWriter, so i created
: a /lib folder in /$SOLR_HOME and added the velocity jar from dist. That
: seemed to take care of the VRW error.But now I get an
: NoClassDefFoundError wich sais something about QueryResponseWriter. So

...that suggests that it is loading VRW at a higher (or lower depending on
how you look at it) classloader then where it loads the rest of the solr
jars.

if you are using the example solr setup, then it sounds like you copied
the jar to example/lib (which is where the jetty jars live) instead of
example/solr/lib (which would be a new lib folder in the $SOLR_HOME dir.

unfortunately, people frequently get these confused, which is one of the
reasons i have started encouraging people to just use thelib /
declarations in their solrconfig.xml file instead of making a single lib
dir in $SOLR_HOME.  (but either way, you'll need to remove the copy of the
VRW jar you've got loading in the system classpath before either approach
will work)



-Hoss

Well, what I did was to create a lib directory within $SOLR_HOME ( 
$SOLR_HOME/lib ), and that is where I put the VRM  jar found in the dist 
folder. Then what I did to the solrconfig was basicly to uncomment all 
of the lib  statements and use lib dir=../lib /. The solrconfig is 
placed as normal in $SOLR_HOME/conf. Is it wrong to do so?


To me this QueryResponseWriter thing doesn't necessarily have anything 
to do with VRM. Are all of the libraries from the /dist and contrib 
folder necessary for startup? Because my routine in setting up solr is 
to only copy the /solr folder from /example and into my tomcat 
environment. So everything above the /solr folder does not exist. When 
I want to use additional features i mainly copy the needed jar files 
into $SOLR_HOME/lib as explained above. So this jetty lib folder you are 
talking about, does not exist for me.


Getting results in (reverse) order they were indexed

2011-12-30 Thread reeuv
Is there any possible way to get the results back from Solr in the reverse
order they were indexed (i.e.  the documents that was most recently added
should be the first in the result)

I know I can add a indexedAt=NOW field of type date and sort on it in desc
order.

But if I have a paginated web application giving 10 results each time, every
time user goes to the next page, Solr has to re-evaluate all the results,
sort the whole data set on date and return the 10 documents relevant. Which
I think is a lot of overhead. 

Is there a good approach to deal with this problem ? 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Getting-results-in-reverse-order-they-were-indexed-tp3620577p3620577.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search Query (Should I use fq)

2011-12-30 Thread reeuv
Thanks for your help iorixxx .

If you can help me solve one of my other questions as well that would be
great

http://lucene.472066.n3.nabble.com/Getting-results-in-reverse-order-they-were-indexed-td3620577.html
http://lucene.472066.n3.nabble.com/Getting-results-in-reverse-order-they-were-indexed-td3620577.html
 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-Query-Should-I-use-fq-tp3620521p3620586.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr, SQL Server's LIKE

2011-12-30 Thread Chantal Ackermann

The problem with the wildcard searches is that the input is not
analyzed. For english, this might not be such a problem (except if you
expect case insenstive search). But than again, you don't get that with
like, either. Ngrams bring that and more.

What I think is often forgotten when comparing 'like' and Solr search
is:
Solr's analyzer allow not only for case insenstive search but also for
other analysis such as removing diacritics and this is also applied when
sorting (you have to create a separate index in the DB, as well, if you
want that).

Say you have the following names:
'Van Hinden'
'van Hinden'
'Música'
'Musil'

like 'mu%' - no hits
like 'Mu%' - 1 hit
like 'van%' - 1 hit
like 'hin%' - no hits

with Solr whitespace or standard tokenizer, ngrams and a diacritcs and
lowercase filter (no wildcard search):
'mu'/'Mu' - 2 hits sorted ignoring case and diacritics
'van' - 2 hits
'hin' - 2 hits


(This is written down from experience. I haven't checked those examples
explicitly.)

Cheers,
Chantal



On Fri, 2011-12-30 at 02:00 +0100, Chris Hostetter wrote:
 : Thanks. I know I'll be able to utilize some of Solr's free text 
 : searching capabilities in other search types in this project. The 
 : product manager wants this particular search to exactly mimic LIKE%.
   ...
 : Ex: If I search Albatross I want Albert to be excluded completely, 
 : rather than having a low score.
 
 please be specific about the types of queries you want. ie: we need more 
 then one example of the type of input you want to provide, the type of 
 matches you want to see for that input, and the type of matches you want 
 to get back.
 
 in your first message you said you need to match company titles pretty 
 exactly but then seem to contradict yourself by saying the SQL's LIKE 
 command fit's the bill -- even though the SQL LIKE command exists 
 specificly for in-exact matches on field values.
 
 Based on your one example above of Albatross, you don't need anything 
 special: don't use ngrams, don't use stemming, don't use fuzzy anything -- 
 just search for Albatross and it will match Albatross but not 
 Albert.  if you want Albatross to match Albatross Road use some 
 basic tokenization.
 
 If all you really care about is prefix searching (which seems suggested by 
 your LIKE% comment above, which i'm guessing is shorthand for something 
 similar to LIKE 'ABC%'), so that queries like abc and abcd both 
 match abcdef and abcd but neither of them match abcd 
 then just use prefix queries (ie: abcd*) -- they should be plenty 
 efficient for your purposes.  you only need to worry about ngrams when you 
 want to efficiently match in the middle of a string. (ie: TITLE LIKE 
 %ABC%)
 
 
 -Hoss



RE: Best practices for installing and maintaining Solr configuration

2011-12-30 Thread Brandon Ramirez
I actually have read that and I have Solr up and running on Tomcat.  I didn't 
realize that it was example/ including Jetty, etc. that was being recommended 
against, but the $SOLR_HOME, which I created by copying example/solr/

Thanks for the tips on upgrading.  I'll keep that in our documentation.


Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 
Software Engineer II | Element K | www.elementk.com

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, December 29, 2011 8:18 AM
To: solr-user@lucene.apache.org
Subject: Re: Best practices for installing and maintaining Solr configuration

This should help: http://wiki.apache.org/solr/SolrTomcat

The difference here is that you're not copying the example directory, you're 
copying the example/solr directory. And this is just basically to get the 
configuration files and directory structure right. You're not copying 
executables, jars, wars, or any of that stuff from example. You get the war 
file from the dist directory and that should contain all the executables  etc.


As to your other questions:
1 If at all possible, upping the match version and reindexing
 are good things to do.
2 It's also a good idea to update the config files. Alternatively,
 you can diff the config files between releases to see what the
 changes are and selectively add them to your config file.
 But you should test, test, test before rolling out into prod.

My rule of thumb for upgrading is to just not upgrade minor releases unless 
there are compelling reasons. The CHANGES.txt file will identify major 
additions.

There are good reasons not to get too far behind on major (i.e. 3.x - 4.x) 
releases, the primary one being that Solr only makes an effort to be 
backwards-compatible through one major release. i.e. 1.4 can be read by 3.x 
(there was no 2.x Solr release). But no attempt will be made to by 4.x code to 
read 1.x indexes.

Hope this helps
Erick

On Wed, Dec 28, 2011 at 8:49 AM, Brandon Ramirez brandon_rami...@elementk.com 
wrote:
 Hi List,
 I've seen several Solr developers mention the fact that people often copy 
 example/ to become their solr installation and that that is not recommended.  
 We are rebuilding our search functionality to use Solr and will be deploying 
 it in a few weeks.

 I have read the README, several wiki articles, mailing list and browsed the 
 Solr distribution.  The example/ directory seems to be the only configuration 
 I can find.  So, I have to ask: what is the recommended way to install Solr?

 What about maintaining it?  For example, Is it wise to up the 
 luceneMatchVersion and re-index with every upgrade?  When new configuration 
 options are added in new versions of Solr, should we worry about updating our 
 configuration to include them?  I realize these may be vague questions and 
 the answers could be case-by-case, but some general or high-level 
 documentation may help.

 Thanks!


 Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 Software 
 Engineer II | Element K | www.elementk.comhttp://www.elementk.com/





Solr memory usage

2011-12-30 Thread Bai Shen
I have solr running on a single machine with 8GB of ram.  Right now I have
about 1.5 million documents indexed, which had produced a 30GB index.  When
I look in top, the tomcat process which is hosting solr says that it's
using 38GB of VIRT, 6.6G RES, and 2GB SHR.

The machine is showing a completely full swap file and very little memory
free.  Is this because solr is trying to load the entire index into
memory?  The searches are still responsive, so it doesn't seem to be
affecting performance.

Thanks.


Re: NoClassDefFoundError: org/apache/solr/common/params/SolrParams

2011-12-30 Thread Bruno Adam Osiek

Thanks.

I'm still working on this issue with no success so far.  I'm 
reinstalling right now my whole development environment, for I have 
probably messed with it while attempting to find the reason for this 
error message.


Bruno

On 12/29/2011 08:27 PM, Dyer, James wrote:

The SolrParams class is in the solrj.jar file so you should verify that this is in the classpath.  Also see 
if it is listed in the manifest.mf file in the war's META-INF dir.  If you're running this on a server within 
Eclipse and letting Eclipse do the deploy, my experience is it can be frustrating at times to get Eclipse to 
get the dependencies right.  In this case look at the Java EE Module Dependencies screen in 
Eclipse.  I often resort to hand-editing the org.eclipse.wst.common.component file in the 
project's .settings directory.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Bruno Adam Osiek [mailto:baos...@gmail.com]
Sent: Thursday, December 29, 2011 4:17 PM
To: solr-user@lucene.apache.org
Subject: NoClassDefFoundError: org/apache/solr/common/params/SolrParams

Hi,

I'm trying to deploy a Solrj based application into JBoss AS 7 using
Eclipse Indigo. When deploying it I get the following error message:



ERROR [org.jboss.msc.service.fail] (MSC service thread 1-4) MSC1:
Failed to start service
jboss.deployment.unit.SolrIntegration.war.POST_MODULE:
org.jboss.msc.service.StartException in service
jboss.deployment.unit.SolrIntegration.war.POST_MODULE: Failed to
process phase POST_MODULE of deployment SolrIntegration.war
  at
org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:121)
  at
org.jboss.msc.service.ServiceControllerImpl$StartTask.startService(ServiceControllerImpl.java:1824)
  at
org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1759)
  at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
[:1.7.0_02]
  at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
[:1.7.0_02]
  at java.lang.Thread.run(Thread.java:722) [:1.7.0_02]
*Caused by: java.lang.NoClassDefFoundError:
org/apache/solr/common/params/SolrParams*
  at java.lang.Class.getDeclaredConstructors0(Native Method) [:1.7.0_02]
  at java.lang.Class.privateGetDeclaredConstructors(Class.java:2404)
[:1.7.0_02]
  at java.lang.Class.getConstructor0(Class.java:2714) [:1.7.0_02]
  at java.lang.Class.getConstructor(Class.java:1674) [:1.7.0_02]
  at
org.jboss.as.web.deployment.jsf.JsfManagedBeanProcessor.deploy(JsfManagedBeanProcessor.java:105)
  at
org.jboss.as.server.deployment.DeploymentUnitPhaseService.start(DeploymentUnitPhaseService.java:115)
  ... 5 more
*Caused by: java.lang.ClassNotFoundException:
org.apache.solr.common.params.SolrParams* from [Module
deployment.SolrIntegration.war:main from Service Module Loader]
  at
org.jboss.modules.ModuleClassLoader.findClass(ModuleClassLoader.java:191)
  at
org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:361)
  at
org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:333)
  at
org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:310)
  at
org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:103)
  ... 11 more
===

I have searched with no success for a solution.

I've managed to deploy successfully *solr.war* into JBoss.

Any help will be welcomed.

Regards.




Re: 3.5 QueryResponseWriter

2011-12-30 Thread Chris Hostetter

: Well, what I did was to create a lib directory within $SOLR_HOME (
: $SOLR_HOME/lib ), and that is where I put the VRM  jar found in the dist
: folder. Then what I did to the solrconfig was basicly to uncomment all of the
: lib  statements and use lib dir=../lib /. The solrconfig is placed as
: normal in $SOLR_HOME/conf. Is it wrong to do so?

Hmmm... that may be the cause of the problem -- you don't need to do both.  
either add a lib/ directive pointed at where you put VRW *OR* put it in 
$SOLR_HOME/lib ... don't do both (that may confuse your JVM classloader 
... i'd have to sit down and really think through what's happening there)

: To me this QueryResponseWriter thing doesn't necessarily have anything to do
: with VRM. Are all of the libraries from the /dist and contrib folder necessary

it has everything to do with VRW .. VelocityResponseWriter is a subclass 
of QueryResponseWriter, when VRW is 
loaded the JVM will attempt to rectify it's entire java class 
inheritence tree, using the classloader that VRW was loaded with, which 
then delegates up the tree of classloaders (looking at other specified 
lib dirs, and the solr.war, and the container classloader, and the 
bootloader, etc...).  if the classpath is wonky (ie: VRW exists in 
two places for example) then you can get errors like this even if class 
loader X has already loaded a base class (like QRW) if it's not consulted 
in the delegation.

: for startup? Because my routine in setting up solr is to only copy the /solr
: folder from /example and into my tomcat environment. So everything above the

well, for starters, you should not copy example/solr into my tomcat 
environment .. the only thing that needs copied into tomcat is the 
solr.war (either by putting it in the webapps directory, or by pointing to 
it fro ma context file) and then solr.war is the only thing that needs to 
know about (your copy of) the example/solr directory (either using a 
system property or jndi).  But as i said: classpaths are a pain in the 
ass, if you actually *copy* example/solr somewhere in your tomcat 
installation dir, it's *possible* that you have done so in a way that 
tomcat is finding things like the VRW jar in a higher classpath before 
it ever even loads the solr.war and the QRW base class.

: /solr folder does not exist. When I want to use additional features i mainly
: copy the needed jar files into $SOLR_HOME/lib as explained above. So this

that should work fine ... as long as:

  a) you aren't telling solr load the same jars more then once (see above 
about not needing lib/ if you use $SOLR_HOME/lib) ... you can check if 
that's happening by looking at your log messages on solr startup and see 
if there are duplicates in the list of jars solr says it's adding ot hte 
classpath

  b) tomcat isn't already loading these jars ... this is harder to 
recognize, but the safe way to do it is to keep all of these jars the hell 
away from tomcat.


-Hoss


Re: Getting results in (reverse) order they were indexed

2011-12-30 Thread Chris Hostetter


: Is there any possible way to get the results back from Solr in the reverse
: order they were indexed (i.e.  the documents that was most recently added
: should be the first in the result)
: 
: I know I can add a indexedAt=NOW field of type date and sort on it in desc
: order.
: 
: But if I have a paginated web application giving 10 results each time, every
: time user goes to the next page, Solr has to re-evaluate all the results,
: sort the whole data set on date and return the 10 documents relevant. Which
: I think is a lot of overhead. 

the overhead you are speculating about is largely imagined .. it can be 
problematic if you have users going to the 1th page, but for normal 
user traffic you aren't going to see any problems with the computational 
effort of loading page two  (if you use the example solr configs, and 
display 10 results per page, solr will automaticly cache pages 1-5 for you 
when page #1 is requested -- see the queryResultWindowSize for details)

If you don't want to use a specific indexedAt field, you can use 
sort=_docid_ desc but it's not 100% garunteed to be reverse the order 
added if you use a MergePolicy that re-orderes segments.  (and even this 
will have the same imaginary overhead for loading page #2, 3, 4, etc... as 
if you sort o na field -- all this type of approach saves you is the 
overhead of the FieldCache)

https://wiki.apache.org/solr/CommonQueryParameters#sort

-Hoss


RE: Solr, SQL Server's LIKE

2011-12-30 Thread Devon Baumgarten
Hoss,

Thanks. You've answered my question. To clarify, what I should have asked for 
instead of 'exact' was 'not fuzzy'. For some reason it didn't occur to me that 
I didn't need n-grams to use the wildcard. You asking for me to clarify what I 
meant made me realize that the n-grams are the source of all my current 
problems. :)

Thanks!

Devon Baumgarten


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Thursday, December 29, 2011 7:00 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr, SQL Server's LIKE


: Thanks. I know I'll be able to utilize some of Solr's free text 
: searching capabilities in other search types in this project. The 
: product manager wants this particular search to exactly mimic LIKE%.
...
: Ex: If I search Albatross I want Albert to be excluded completely, 
: rather than having a low score.

please be specific about the types of queries you want. ie: we need more 
then one example of the type of input you want to provide, the type of 
matches you want to see for that input, and the type of matches you want 
to get back.

in your first message you said you need to match company titles pretty 
exactly but then seem to contradict yourself by saying the SQL's LIKE 
command fit's the bill -- even though the SQL LIKE command exists 
specificly for in-exact matches on field values.

Based on your one example above of Albatross, you don't need anything 
special: don't use ngrams, don't use stemming, don't use fuzzy anything -- 
just search for Albatross and it will match Albatross but not 
Albert.  if you want Albatross to match Albatross Road use some 
basic tokenization.

If all you really care about is prefix searching (which seems suggested by 
your LIKE% comment above, which i'm guessing is shorthand for something 
similar to LIKE 'ABC%'), so that queries like abc and abcd both 
match abcdef and abcd but neither of them match abcd 
then just use prefix queries (ie: abcd*) -- they should be plenty 
efficient for your purposes.  you only need to worry about ngrams when you 
want to efficiently match in the middle of a string. (ie: TITLE LIKE 
%ABC%)


-Hoss


Re: a question on jmx solr exposure

2011-12-30 Thread Chris Hostetter

:   Well, we don't use multicore feature of SOLR, so in our case SOLR
:  instances
:   are just separate web-apps. The web-app loading order probably then
:  affects
:   on which app gets hold of a jmx 'pipe'.

A feature was added in SOLR-1843 specificly to help address this potential 
collision, by allowing you to override the rootName used in your jmx / 
declaration (by default it's solr/${corename}) but looking at the issue 
now i see it was only committed to trunk...

https://issues.apache.org/jira/browse/SOLR-1843

...even though this looks like a fairly straight forward candidate to 
merge back to 3x.  i'll look into it.


-Hoss


Re: MLT as a nested query

2011-12-30 Thread Chris Hostetter

: is it possible to use MLT as a nested query? I tried the following:
: select?q=field1:foo field2:bar AND _query_:{!mlt fl=mltField  mindf=1 
mintf=1 mlt.match.include=false} selectField:baz}

MLT functionality exists in two forms: as a component, that decorates 
results produced by another search (similar to highlighting and faceting), 
and as a handler that produces a main result set based on an MLT query (so 
highlighting and faceting happen to the results of the MLT itself)...

https://wiki.apache.org/solr/MoreLikeThis

In order for what you are describing to work, someone would have to 
implement a MLT QParser but no one has ever attempted that to my 
knowledge.  I have considered looking into it, and i suspect it would be 
somewhat straight forward to do, but only for single node instances -- 
there is no way i know of for a QParser to sanely generate a query like 
MLT based on the terms of distributed shards.


-Hoss


Re: reposting highlighting questions

2011-12-30 Thread Chris Hostetter

: I am new to solr/xml/xslt, and trying to figure out how to display 
: search query fields highlighted in html. I can enable the highlighting 
: in the query, and I think I get the correct xml response back (See 
: below: I search using 'Contents' and the highlighting is shown with 
: strong and /strong. However, I cannot figure out what to add to the 
: xslt file to transform it in html. I think it is a question of defining 

i think you are looking for the disable-output-escaping=yes option on 
your xsl:value-of select=.../ expression to echo out the highlighted 
string.  hard to be sure since you didn't actaully provide any example of 
the XSLT or xpath you are trying to use.


-Hoss


Re: Decimal Mapping problem

2011-12-30 Thread Chris Hostetter

: Try to cast MySQL decimal data type to string, i.e.
: 
: CAST( IF(drt.discount IS NULL,'0',(drt.discount/100)) AS CHAR) as discount
: (or CAST AS TEXT)

...to clarify here, the values you are seeing are what happens when the DB 
returns to DIH a value in a type it doesn't udnerstand -- in this case 
it's a byte array.  DIH isn't sure what do do with this byte array, so it 
just calls the java toString() method on it.

casting that byte array to something DIH understands (like a string) is 
one way to solve the problem, but the other would be to use some SQL 
expression that always returns aconsistent type, so the SQL server knows 
what type to declare in it's response -- in your example you are sometimes 
returning a string (if NULL, you return the string '0') and sometimes 
returning a number (if not null, drt.discount/100) 

use SQL that alwasy returns a number, and this problem will also go away.


-Hoss


Highlighting with prefix queries and maxBooleanClause

2011-12-30 Thread Michael Lissner

This question has come up a few times, but I've yet to see a good solution.

Basically, if I have highlighting turned on and do a query for q=*, I 
get an error that maxBooleanClauses has been exceeded. Granted, this is 
a silly query, but a user might do something similar. My expectation is 
that queries that work when highlighting is OFF should continue working 
when it is ON.


What's the best solution for queries like this? Is it simply to catch 
the error and then up maxBooleanClauses? Or to turn off highlighting 
when this error occurs?


Or am I doing something altogether wrong?

This is the query I'm using to cause the error:

http://localhost:8983/solr/select/?q=*start=0rows=20hl=truehl.fl=text


Changing hl to false makes the query go through.

I'm using Solr 4.0.0-dev

The traceback is:

SEVERE: org.apache.lucene.search.BooleanQuery$TooManyClauses: 
maxClauseCount is set to 1024
at 
org.apache.lucene.search.ScoringRewrite$1.checkMaxClauseCount(ScoringRewrite.java:68)
at 
org.apache.lucene.search.ScoringRewrite$ParallelArraysTermCollector.collect(ScoringRewrite.java:159)
at 
org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:81)
at 
org.apache.lucene.search.ScoringRewrite.rewrite(ScoringRewrite.java:114)
at 
org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:312)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:155)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:144)
at 
org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:384)
at 
org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
at 
org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
at 
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:205)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:511)
at 
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:402)
at 
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:121)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1478)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


Thanks,

Mike


Re: issues with WordDelimiterFilter

2011-12-30 Thread Chris Hostetter

: I'm having an issue with the way the WordDelimiterFilter parses compound 
: words. My field declaration is simple, looks like this:
: 
:   analyzer type=index
: tokenizer class=solr.WhitespaceTokenizerFactory/
: filter class=solr.WordDelimiterFilterFactory preserveOriginal=1/
: filter class=solr.LowerCaseFilterFactory/
:   /analyzer

you haven't said anything about what your query time analyzer looks like 
-- based on your other comments, i'm going to assume it just uses 
whitespaceTokenizer and lower case filter w/o WDF at all -- but if you 
don't have any query analyzer declared that means the analyzer above is 
used in both case, which is most likely not what you want.

: : When indexing 'fokker-plank' I do get the token for both fokker, 
: : planck, and fokker-planck. But in that case the fokker-planck token it 
: : is followed by a 'planck' token. The analysis looks like this.

that is expected - when WDF splits up a token (and keeps hte original) it 
puts the first of the split tokens at the same position as the original, 
and each other split token follows in subsequent positions -- positions in 
token streams are simple integer increments, so there is no way to say 
that the split fokker and planck tokens appear in that sequence *and* 
that they both appear at the same position as the original fokker-planck

: So in the case where fokker-plank is the first token there should be no 
: second token, its already been used if the first was matched. The 

that type of logic (hierarchical sequences of tokens) is just not possible 
with lucene.

: problem manifests itself when doing phrase searches...
: 
: Fokker-Plank equations won't find the exact phrase, Fokker-Plank 
: equations, because its sees the term planck as between Fokker-Plank and 
: equations. Hope that makes sense! Should I submit this as a bug?

for phrase queries like this to work when using WDF, it's neccessary to 
use some slop in your phrase query (to overcome the position gaps 
introduced by the split out tokens) ... either that, or turn off 
preserveOriginal and use a query analyzer thta also splits at query time

: As it stands it would return a true hit (erroneously I believe) on the 
: phrase search fokker planck, so really all 3 tokens should be returned 

Hmmm... if you do *not* want a phrase search for fokker planck to match 
documents containing fokker-planck then why are you using WDF at all?

: at offset 0 and there should be no second token so phrase searches are 
: preserved.

if all the tokens wound up in the exact same position, then a 
phrase query for fokker planck would still match this document (so it 
wouldn't solve your problem) but you would also get matches for things 
like the phrase planck fokker -- which is not likelye what *anyone* 
would expect.


-Hoss


Re: [Solr Event Listener plug-in] Execute query search from SolrCore - Java Code

2011-12-30 Thread Alessandro Benedetti
Ok, I have made progresses, I built my architecture and I execute queries ,
inside the PostCommit method, and they are launched as i want.
But The core can't see the early updated documents and the commit ends
after than the postCommit method has ended!!
But i have to see the early updated document, this is the principal need of
my plugin.
How can i search the early indexed documents? How can i open the new
searcher? and where?
Inside the postCommit seems to be not good...
Any suggestion?

2011/12/29 Alessandro Benedetti benedetti.ale...@gmail.com

 Hi guys,
 I'm developing a custom SolrEventListener, and inside the PostCommit()
 method I need to execute some queries and collect results.
 In my SolrEventListener class, I have a SolrCore
 Object( org.apache.solr.core.SolrCore) and a list of queries (Strings ).

 How can I use the SolrCore to optimally parse the queries ( I have to
 parse them like Solr Query parser does and launch them?

 I'm fighting with Searchers and Execute methods in the solrCore object,
 but I don't know which is the best way to do this ...

 Cheers


 --
 --

 Benedetti Alessandro
 Personal Page: http://tigerbolt.altervista.org

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England




-- 
--

Benedetti Alessandro
Personal Page: http://tigerbolt.altervista.org

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: [Solr Event Listener plug-in] Execute query search from SolrCore - Java Code

2011-12-30 Thread Alessandro Benedetti
I have tried to open a new searcher and make a forced commit inside the
postCommit method of the listener, but it caused many issues.
How can I complete the commit and then call the postCommit method of the
listener with the logic inside ( with a lot of queries on the last
committed docs)?

Cheers

2011/12/31 Alessandro Benedetti benedetti.ale...@gmail.com

 Ok, I have made progresses, I built my architecture and I execute queries
 , inside the PostCommit method, and they are launched as i want.
 But The core can't see the early updated documents and the commit ends
 after than the postCommit method has ended!!
 But i have to see the early updated document, this is the principal need
 of my plugin.
 How can i search the early indexed documents? How can i open the new
 searcher? and where?
 Inside the postCommit seems to be not good...
 Any suggestion?


 2011/12/29 Alessandro Benedetti benedetti.ale...@gmail.com

 Hi guys,
 I'm developing a custom SolrEventListener, and inside the PostCommit()
 method I need to execute some queries and collect results.
 In my SolrEventListener class, I have a SolrCore
 Object( org.apache.solr.core.SolrCore) and a list of queries (Strings ).

 How can I use the SolrCore to optimally parse the queries ( I have to
 parse them like Solr Query parser does and launch them?

 I'm fighting with Searchers and Execute methods in the solrCore object,
 but I don't know which is the best way to do this ...

 Cheers


 --
 --

 Benedetti Alessandro
 Personal Page: http://tigerbolt.altervista.org

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England




 --
 --

 Benedetti Alessandro
 Personal Page: http://tigerbolt.altervista.org

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England




-- 
--

Benedetti Alessandro
Personal Page: http://tigerbolt.altervista.org

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: How to test solr filter

2011-12-30 Thread Chris Hostetter

: References: 1324729256338-3610466.p...@n3.nabble.com
:  4ef69412.3040...@r.email.ne.jp
: In-Reply-To: 4ef69412.3040...@r.email.ne.jp
: Subject: How to test solr filter

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.


-Hoss


Re: Best practices for installing and maintaining Solr configuration

2011-12-30 Thread Chris Hostetter

: I've seen several Solr developers mention the fact that people often 
: copy example/ to become their solr installation and that that is not 
: recommended.  We are rebuilding our search functionality to use Solr and 
: will be deploying it in a few weeks.

do you have specific examples from the mailing list of the recomendations 
you are seeing?

in my opinion there's nothing wrong with copying the solr example to use 
as the basis for your own configs -- that's why it's there.

Where people tend to run into problems (in my opinion) is...

a) they never change the configs.  those configs are examples.  they 
showcase features, and the comments suggest best practices.  that doesn't 
mean you need to shoe-horn your data into the declared fields, it doesn't 
mean you *have* to use dynamic fields in order to add a field with a new 
name. you should feel free to customize the configs to meet your needs.

b) they assume that when upgrading, they should through out their old 
configs, and copy the newer examples configs.  you should by all means 
*consult* the new configs, because there may be new features in there that 
you should consider, or new comments suggesting new best-practices, but if 
you completley throw out your old configs you *have* to reindex.  some 
people either don't realize that and are suprised when they get weird 
errors, or take it for granted that they should *always* reindex even 
though it isn't neccessary.

My advice: when starting a new collection from scratch: base the configs 
on the example from the current version of solr you ar using, and 
customize.  when upgrading solr: consult the new example config, and 
cut/paste anything you think you would like to have into your existing 
configs, considering the implications of reindexing (ie: changing field 
types).  when adding a new collection to an existing installation: decide 
if it's really differnet from what you've already got, in which case 
base the configs off of the current example; or if it's similar to some 
collection you already have, base the configs off of those.

: What about maintaining it?  For example, Is it wise to up the 
: luceneMatchVersion and re-index with every upgrade?  When new 

only if you have read the CHANGES.txt and feel like there is something 
about hte new features or modified behavior that suggests to you that you 
want those changes -- if you are happy with the existing behavior, leave 
it alone.  The flip side is that if for some reason you decide you need to 
re-index everything, then you should consider bumping the 
luceneMatchVersion up to get what is now considered the correct behavior 
of those classes -- but just like upgrading, you should test these 
behavior hcanges and verify they are really what you want.


-Hoss


hl.maxAnalyzedChars seems does not work

2011-12-30 Thread Min Yang
Hi,
My situation is that highlight a text field(about 2M per document) costed
too much time(1s).

So I want to limit the characters highlighter analyze.

These are my highlighting parameters, which seem to have no problem:

   bool name=hltrue/bool
   str name=hl.fltext/str
   int name=hl.snippets1/int
   int name=hl.maxAnalyzedChars200/int
   str name=f.text.hl.alternateFieldtext/str
   int name=hl.maxAlternateFieldLength100/int

The parameter hl.maxAlternateFieldLength seems does not work...


Re: Solr memory usage

2011-12-30 Thread Otis Gospodnetic
Hi Bai,

Solr doesn't try to load the whole index into memory, no.
You can control how much memory Tomcat uses with -Xmx Java command line 
parameter.

 
Otis


Performance Monitoring SaaS for Solr - 
http://sematext.com/spm/solr-performance-monitoring/index.html




 From: Bai Shen baishen.li...@gmail.com
To: solr-user@lucene.apache.org 
Sent: Friday, December 30, 2011 9:16 AM
Subject: Solr memory usage
 
I have solr running on a single machine with 8GB of ram.  Right now I have
about 1.5 million documents indexed, which had produced a 30GB index.  When
I look in top, the tomcat process which is hosting solr says that it's
using 38GB of VIRT, 6.6G RES, and 2GB SHR.

The machine is showing a completely full swap file and very little memory
free.  Is this because solr is trying to load the entire index into
memory?  The searches are still responsive, so it doesn't seem to be
affecting performance.

Thanks.