date:20110607

Gabriele
Lucene uses a combination of boolean and VSM for its IR.

A straight forward query for a keyword will only match docs with that keyword.

Now things quickly get subtle and complex the more sugar you add, more
complicated queries across fields and more complex
analysis chains but I think the short answer to your question is C
will not be returned, it will not be scored either

lee c

On 7 June 2011 08:30, Gabriele Kahlout gabri...@mysimpatico.com wrote:
 On Tue, Jun 7, 2011 at 8:43 AM, pravesh suyalprav...@yahoo.com wrote:

 k0 -- A | C
 k1 -- A | B
 k2 -- A | B | C
 k3 -- B | C
 Now let q=k1, how do I make sure C doesn't appear as a result since it
 doesn't contain any occurence of k1?
 Do we bother to do that. Now that's what lucene does :)

 Lucene/Solr doesn't do that, it ranks documents based on a scoring
 function, and with that it lacks the possibility of specifying that a
 particular term must appear (the closest way I know of is boosting it).

 The solution would be a way to tell Solr/lucene which documents/indices to
 query, i.e. query only the union/intersection of the documents in which
 k1,...kn appear, instead of query all indexed documents and apply the
 ranking function (which will give weight to documents that contains
 k1...kn).



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/How-do-I-make-sure-the-resulting-documents-contain-the-query-terms-tp3031637p3033451.html
 Sent from the Solr - User mailing list archive at Nabble.com.




 --
 Regards,
 K. Gabriele

 --- unchanged since 20/9/10 ---
 P.S. If the subject contains [LON] or the addressee acknowledges the
 receipt within 48 hours then I don't resend the email.
 subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
  Now + 48h) ⇒ ¬resend(I, this).

 If an email is sent by a sender that is not a trusted contact or the email
 does not contain a valid code then the email is not received. A valid code
 starts with a hyphen and ends with X.
 ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
 L(-[a-z]+[0-9]X)).

clustering problems on 3.1

2011-06-07 Thread bryan rasmussen

I added the following to my configuration

  lib dir=c:/projects/solrtest/dist/
regex=apache-solr-clustering-.*\.jar /




requestHandler name=clusty class=solr.SearchHandler default=true
  lst name=defaults
str name=echoParamsexplicit/str

bool name=clusteringtrue/bool
str name=clustering.enginedefault/str
bool name=clustering.resultstrue/bool

!-- Fields to cluster on --
str name=carrot.titletitle/str
str name=carrot.snippetall_text/str
str name=hl.flall_text title/str
 !-- for this field, we want no fragmenting, just highlighting --
 str name=f.name.hl.fragsize150/str
  /lst
  arr name=last-components
strclustering/str
  /arr
/requestHandler


 searchComponent
class=org.apache.solr.handler.clustering.ClusteringComponent
name=clustering
  lst name=engine
str name=namedefault/str
str 
name=carrot.algorithmorg.carrot2.clustering.lingo.LingoClusteringAlgorithm/str

!-- Engine-specific parameters --
str name=LingoClusteringAlgorithm.desiredClusterCountBase20/str
  /lst
/searchComponent

which ended up with the message solr java.lang.NoClassDefFoundError:
org/carrot2/core/ControllerFactory
and whenever I did a request I got a 404 response back and

SEVERE: REFCOUNT ERROR: unreferenced org.apache.solr.SolrCore@14db38a4 (core1)
has a reference count of 1
appeared in my console.

Any suggestions?

Thanks,
Bryan Rasmussen

Re: Commit taking very long

Are you optimizing? That is unnecessary when committing, and is often the
culprit.


Best
Erick

On Tue, Jun 7, 2011 at 5:42 AM, Rohit Gupta ro...@in-rev.com wrote:
 Hi,

 My commit seems to be taking too much time, if you notice from the Dataimport
 status given below to commit 1000 docs its taking longer than 24 minutes

 /lst
 str name=statusbusy/str
 str name=importResponseA command is still running.../str
 -
 lst name=statusMessages
 str name=Time Elapsed0:24:43.156/str
 str name=Total Requests made to DataSource1001/str
 str name=Total Rows Fetched1658/str
 str name=Total Documents Skipped0/str
 str name=Full Dump Started2011-06-07 09:15:17/str
 -
 str name=
 Indexing completed. Added/Updated: 1000 documents. Deleted 0 documents.
 /str
 /lst

 What can be causing this, I have tried looking for a reason or a way to 
 improve
 this, but am just not able to find. At this rate my documents would never get
 indexed, given that I have more than 100,000 records coming into the database
 every hour.

 Regards,
 Rohit

Re: problem: zooKeeper Integration with solr

2011-06-07 Thread Mohammad Shariq

how this method
(http://localhost:8983/solr/select?shards=*Machine:Port/Solr
Path,**Machine:Port/Solr Path*indent=trueq=query)
is better than zooKeeper, could you please refer any performance doc.


On 7 June 2011 08:18, bmdakshinamur...@gmail.com bmdakshinamur...@gmail.com
 wrote:

 Instead of integrating zookeeper, you could create shards over multiple
 machines and specify the shards while you are querying solr.
 Eg: http://localhost:8983/solr/select?shards=*Machine:Port/Solr
 Path,*
 *Machine:Port/Solr Path*indent=trueq=query



 On Mon, Jun 6, 2011 at 5:59 PM, Mohammad Shariq shariqn...@gmail.com
 wrote:

  Hi folk,
  I am using solr to index around 100mn docs.
  now I am planning to move to cluster based solr, so that I can scale the
  indexing and searching process.
  since solrCloud is in development  stage, I am trying to index in shard
  based environment using zooKeeper.
 
  I followed the steps from
  http://wiki.apache.org/solr/ZooKeeperIntegrationthen also I am not
  able to do distributes search.
  Once I index the docs in one shard, not able to query from other shard
 and
  vice-versa, (using the query
 
 
 http://localhost:8180/solr/select/?q=itunesversion=2.2start=0rows=10indent=on
  )
 
  I am running solr3.1 on ubuntu 10.10.
 
  please help me.
 
 
  --
  Thanks and Regards
  Mohammad Shariq
 



 --
 Thanks and Regards,
 DakshinaMurthy BM




-- 
Thanks and Regards
Mohammad Shariq

RE: SpellCheckComponent performance

2011-06-07 Thread Demian Katz

As I may have mentioned before, VuFind is actually doing two Solr queries for 
every search -- a base query that gets basic spelling suggestions, and a 
supplemental spelling-only query that gets shingled spelling suggestions.  If 
there's a way to get two different spelling responses in a single query, I'd 
love to hear about it...  but the double-querying doesn't seem to be a huge 
problem -- the delays I'm talking about are in the spelling portion of the 
initial query.  Just for the sake of completeness, here are both of my spelling 
field types:

!-- Basic Text Field for use with Spell Correction --
fieldType name=textSpell class=solr.TextField 
positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=schema.UnicodeNormalizationFilterFactory 
version=icu4j composed=false remove_diacritics=true 
remove_modifiers=true fold=true/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType
!-- More advanced spell checking field. --
fieldType name=textSpellShingle class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.ShingleFilterFactory maxShingleSize=2 
outputUnigrams=false/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.ShingleFilterFactory maxShingleSize=2 
outputUnigrams=false/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

...and here are the fields:

   field name=spelling type=textSpell indexed=true stored=true/
   field name=spellingShingle type=textSpellShingle indexed=true 
stored=true multiValued=true/

As you can probably guess, I'm using spelling in my main query and 
spellingShingle in my supplemental query.

Here are stats on the spelling field:

{field=spelling,memSize=107830314,tindexSize=249184,time=25747,phase1=25150,nTerms=1343061,bigTerms=231,termInstances=40960454,uses=1}

(I obtained these numbers by temporarily adding the spelling field as a facet 
to my warming query -- probably not a very smart way to do it, but it was the 
only way I could figure out!  If there's a more elegant and accurate approach, 
I'd be interested to know what it is.)

I should also note that my basic spelling index is 114MB and my shingled 
spelling index is 931MB -- not outrageously large.  Is there a way to persuade 
Solr to load these into memory for faster performance?

thanks,
Demian

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Monday, June 06, 2011 6:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: SpellCheckComponent performance
 
 Hmmm, how are you configuring you spell checker? The first-time
 slowdown
 is probably due to cache warming, but subsequent 500 ms slowdowns
 seem odd. How many unique terms are there in your spellecheck index?
 
 It'd probably be best if you showed us your fieldtype and field
 definition...
 
 Best
 Erick
 
 On Mon, Jun 6, 2011 at 4:04 PM, Demian Katz demian.k...@villanova.edu
 wrote:
  I'm continuing to work on tuning my Solr server, and now I'm noticing
 that my biggest bottleneck is the SpellCheckComponent.  This is eating
 multiple seconds on most first-time searches, and still taking around
 500ms even on cached searches.  Here is my configuration:
 
   searchComponent name=spellcheck
 class=org.apache.solr.handler.component.SpellCheckComponent
     lst name=spellchecker
       str name=namebasicSpell/str
       str name=fieldspelling/str
       str name=accuracy0.75/str
       str name=spellcheckIndexDir./spellchecker/str
       str name=queryAnalyzerFieldTypetextSpell/str
       str name=buildOnOptimizetrue/str
     /lst
   /searchComponent
 
  I've done a bit of searching, but the best advice I could find for
 making the search component go faster involved reducing
 spellcheck.maxCollationTries, which doesn't even seem to apply to my
 settings.
 
  Does anyone have any advice on tuning this aspect of my
 configuration?  Are there any extra debug settings that might give
 deeper insight into how the component is spending its time?
 
  thanks,
  Demian

Re: [ANNOUNCEMENT] PHP Solr Extension 1.0.1 Stable Has Been Released

2011-06-07 Thread roySolr

Hello,

I have some problems with the installation of the new PECL package
solr-1.0.1.

I run this command:

pecl uninstall solr-beta ( to uninstall old version, 0.9.11)
pecl install solr

The installing is running but then it gives the following error message:

/tmp/tmpKUExET/solr-1.0.1/solr_functions_helpers.c: In function
'solr_json_to_php_native':
/tmp/tmpKUExET/solr-1.0.1/solr_functions_helpers.c:1123: error: too many
arguments to function 'php_json_decode'
make: *** [solr_functions_helpers.lo] Error 1
ERROR: `make' failed

I have php version 5.2.17.

How can i fix this?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/ANNOUNCEMENT-PHP-Solr-Extension-1-0-1-Stable-Has-Been-Released-tp3024040p3034350.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: java.lang.AbstractMethodError at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)

2011-06-07 Thread idivad

Finally figured out the problem.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-AbstractMethodError-at-org-apache-solr-handler-ContentStreamHandlerBase-handleRequestBody--tp3026470p3034456.html
Sent from the Solr - User mailing list archive at Nabble.com.

Solr Cloud Query Question

2011-06-07 Thread Jamie Johnson

I am currently experimenting with the Solr Cloud code on trunk and just had
a quick question.  Lets say my setup had 3 nodes a, b and c.  Node a has
1000 results which meet a particular query, b has 2000 and c has 3000.  When
executing this query and asking for row 900 what specifically happens?  From
reading the Distributed Search Wiki I would expect that node a responds with
900, node b responds with 900 and c responds with 900 and the coordinating
node is responsible for taking the top scored items and throwing away the
rest, is this correct or is there some additional coordination that happens
where nodes a, b and c return back an id and a score and the coordinating
node makes an additional request to get back the documents for the ids which
make up the top list?

Re: Solr Cloud Query Question

On Tue, Jun 7, 2011 at 9:35 AM, Jamie Johnson jej2...@gmail.com wrote:
 I am currently experimenting with the Solr Cloud code on trunk and just had
 a quick question.  Lets say my setup had 3 nodes a, b and c.  Node a has
 1000 results which meet a particular query, b has 2000 and c has 3000.  When
 executing this query and asking for row 900 what specifically happens?  From
 reading the Distributed Search Wiki I would expect that node a responds with
 900, node b responds with 900 and c responds with 900 and the coordinating
 node is responsible for taking the top scored items and throwing away the
 rest, is this correct or is there some additional coordination that happens
 where nodes a, b and c return back an id and a score and the coordinating
 node makes an additional request to get back the documents for the ids which
 make up the top list?

The latter is correct - the first phase only collects enough
information to merge ids from the shards, and then a second phase
requests the stored fields, highlighting, etc for the specific docs
that will be returned.

-Yonik
http://www.lucidimagination.com

Re: function queries scope

One way is to use the boost qparser:
http://search-lucene.com/jd/solr/org/apache/solr/search/BoostQParserPlugin.html
q={!boost b=productValueField}shops in madrid

Or you can use the edismax parser which as a boost parameter that
does the same thing:
defType=edismaxq=shops in madridboost=productValueField


-Yonik
http://www.lucidimagination.com


On Tue, Jun 7, 2011 at 6:53 AM, Marco Martinez
mmarti...@paradigmatecnologico.com wrote:
 Hi,

 I need to use the function queries operations with the score of a given
 query, but only in the docset that i get from the query and i dont know if
 this is possible.

 Example:

 q=shops in madrid    returns  1 docs  with a specific score for each doc

 but now i need to do some stuff like

 q=sum(product(2,query(shops in madrid),productValueField) but this will be
 return all the docs in my index.


 I know that i can do it via filter queries, ex, q=sum(product(2,query(shops
 in madrid),productValueField)fq=shops in madrid but this will do the query
 two times and i dont want this because the performance is important to our
 application.


 Is there other approach to accomplished that=


 Thanks in advance,

 Marco Martínez Bautista
 http://www.paradigmatecnologico.com
 Avenida de Europa, 26. Ática 5. 3ª Planta
 28224 Pozuelo de Alarcón
 Tel.: 91 352 59 42

Re: function queries scope

2011-06-07 Thread Marco Martinez

Thanks, but its not what i'm looking for, because the BoostQParserPlugin
multiplies the score of the query with the function queries defined in the b
param of the BoostQParserPlugin. and i can't use the edismax because we have
our own qparser. Its seems that i have to code another qparser.


Thanks Yonik anyway,

Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª Planta
28224 Pozuelo de Alarcón
Tel.: 91 352 59 42


2011/6/7 Yonik Seeley yo...@lucidimagination.com

 One way is to use the boost qparser:

 http://search-lucene.com/jd/solr/org/apache/solr/search/BoostQParserPlugin.html
 q={!boost b=productValueField}shops in madrid

 Or you can use the edismax parser which as a boost parameter that
 does the same thing:
 defType=edismaxq=shops in madridboost=productValueField


 -Yonik
 http://www.lucidimagination.com


 On Tue, Jun 7, 2011 at 6:53 AM, Marco Martinez
 mmarti...@paradigmatecnologico.com wrote:
  Hi,
 
  I need to use the function queries operations with the score of a given
  query, but only in the docset that i get from the query and i dont know
 if
  this is possible.
 
  Example:
 
  q=shops in madridreturns  1 docs  with a specific score for each
 doc
 
  but now i need to do some stuff like
 
  q=sum(product(2,query(shops in madrid),productValueField) but this will
 be
  return all the docs in my index.
 
 
  I know that i can do it via filter queries, ex,
 q=sum(product(2,query(shops
  in madrid),productValueField)fq=shops in madrid but this will do the
 query
  two times and i dont want this because the performance is important to
 our
  application.
 
 
  Is there other approach to accomplished that=
 
 
  Thanks in advance,
 
  Marco Martínez Bautista
  http://www.paradigmatecnologico.com
  Avenida de Europa, 26. Ática 5. 3ª Planta
  28224 Pozuelo de Alarcón
  Tel.: 91 352 59 42

RE: SpellCheckComponent performance

2011-06-07 Thread Dyer, James

Demian,

If you omit spellcheckIndexDir from the configuration, it will create an 
in-memory spelling dictionary.  

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Demian Katz [mailto:demian.k...@villanova.edu] 
Sent: Tuesday, June 07, 2011 7:59 AM
To: solr-user@lucene.apache.org
Subject: RE: SpellCheckComponent performance

As I may have mentioned before, VuFind is actually doing two Solr queries for 
every search -- a base query that gets basic spelling suggestions, and a 
supplemental spelling-only query that gets shingled spelling suggestions.  If 
there's a way to get two different spelling responses in a single query, I'd 
love to hear about it...  but the double-querying doesn't seem to be a huge 
problem -- the delays I'm talking about are in the spelling portion of the 
initial query.  Just for the sake of completeness, here are both of my spelling 
field types:

!-- Basic Text Field for use with Spell Correction --
fieldType name=textSpell class=solr.TextField 
positionIncrementGap=100
  analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=schema.UnicodeNormalizationFilterFactory 
version=icu4j composed=false remove_diacritics=true 
remove_modifiers=true fold=true/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType
!-- More advanced spell checking field. --
fieldType name=textSpellShingle class=solr.TextField 
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.ShingleFilterFactory maxShingleSize=2 
outputUnigrams=false/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.ShingleFilterFactory maxShingleSize=2 
outputUnigrams=false/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType

...and here are the fields:

   field name=spelling type=textSpell indexed=true stored=true/
   field name=spellingShingle type=textSpellShingle indexed=true 
stored=true multiValued=true/

As you can probably guess, I'm using spelling in my main query and 
spellingShingle in my supplemental query.

Here are stats on the spelling field:

{field=spelling,memSize=107830314,tindexSize=249184,time=25747,phase1=25150,nTerms=1343061,bigTerms=231,termInstances=40960454,uses=1}

(I obtained these numbers by temporarily adding the spelling field as a facet 
to my warming query -- probably not a very smart way to do it, but it was the 
only way I could figure out!  If there's a more elegant and accurate approach, 
I'd be interested to know what it is.)

I should also note that my basic spelling index is 114MB and my shingled 
spelling index is 931MB -- not outrageously large.  Is there a way to persuade 
Solr to load these into memory for faster performance?

thanks,
Demian

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Monday, June 06, 2011 6:23 PM
 To: solr-user@lucene.apache.org
 Subject: Re: SpellCheckComponent performance
 
 Hmmm, how are you configuring you spell checker? The first-time
 slowdown
 is probably due to cache warming, but subsequent 500 ms slowdowns
 seem odd. How many unique terms are there in your spellecheck index?
 
 It'd probably be best if you showed us your fieldtype and field
 definition...
 
 Best
 Erick
 
 On Mon, Jun 6, 2011 at 4:04 PM, Demian Katz demian.k...@villanova.edu
 wrote:
  I'm continuing to work on tuning my Solr server, and now I'm noticing
 that my biggest bottleneck is the SpellCheckComponent.  This is eating
 multiple seconds on most first-time searches, and still taking around
 500ms even on cached searches.  Here is my configuration:
 
   searchComponent name=spellcheck
 class=org.apache.solr.handler.component.SpellCheckComponent
     lst name=spellchecker
       str name=namebasicSpell/str
       str name=fieldspelling/str
       str name=accuracy0.75/str
       str name=spellcheckIndexDir./spellchecker/str
       str name=queryAnalyzerFieldTypetextSpell/str
       str name=buildOnOptimizetrue/str
     /lst
   /searchComponent
 
  I've done a bit of searching, but the best advice I could find for
 making the search component go faster involved reducing

Re: Nullpointer Exception in Solr 4.x in DebugComponent when using wildcard in facet value

2011-06-07 Thread Stefan Moises


Hi Yonik,

thanks, it's working in trunk now again... I had to re-index though 
because of exceptions at startup, did the index format change again 
between trunk of beginning / mid may and current trunk?


best regards,
Stefan

Am 03.06.2011 15:32, schrieb Yonik Seeley:

This bug was introduced during the cutover from strings to BytesRef on
TermRangeQuery.
I just committed a fix.

-Yonik
http://www.lucidimagination.com

On Fri, Jun 3, 2011 at 5:42 AM, Stefan Moisesmoi...@shoptimax.de  wrote:

Hi,

in Solr 4.x (trunk version of mid may) I have noticed a null pointer
exception if I activate debugging (debug=true) and use a wildcard to filter
by facet value, e.g.
if I have a price field

...debug=truefacet.field=pricefq=price[500+TO+*]
I get

SEVERE: java.lang.RuntimeException: java.lang.NullPointerException
at
org.apache.solr.search.QueryParsing.toString(QueryParsing.java:538)
at
org.apache.solr.handler.component.DebugComponent.process(DebugComponent.java:77)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:239)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1298)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:852)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.NullPointerException
at
org.apache.solr.search.QueryParsing.toString(QueryParsing.java:402)
at
org.apache.solr.search.QueryParsing.toString(QueryParsing.java:535)

This used to work in Solr 1.4 and I was wondering if it's a bug or a new
feature and if there is a trick to get this working again?

Best regards,
Stefan




.



--
Mit den besten Grüßen aus Nürnberg,
Stefan Moises

***
Stefan Moises
Senior Softwareentwickler

shoptimax GmbH
Guntherstraße 45 a
90461 Nürnberg
Amtsgericht Nürnberg HRB 21703
GF Friedrich Schreieck

Tel.: 0911/25566-25
Fax:  0911/25566-29
moi...@shoptimax.de
http://www.shoptimax.de
***

Re: Debugging a Solr/Jetty Hung Process

2011-06-07 Thread Chris Cowan

OK... The fix I thought would fix it didn't fix it (which was to use the 
commitWithin feature). What I can gather from `ps` is that the thread has pages 
locked in memory. Currently I'm using native locking for Solr. Would switching 
to simple help alleviate this problem?

Chris

On Jun 4, 2011, at 2:48 PM, Chris Cowan wrote:

 I found this thread that looks similar to what's happening on my system. I 
 think what happens is there are multiple commits happening at once from the 
 clients and it's causing the same issue. I'm going to use the commitWithin 
 argument to the updates to see if that fixes the problem. I will report back 
 with any findings.
 
 Chris
 
 On Jun 1, 2011, at 12:42 PM, Jonathan Rochkind wrote:
 
 First guess (and it really is just a guess) would be Java garbage 
 collection taking over. There are some JVM parameters you can use to 
 tune the GC process, especially if the machine is multi-core, making 
 sure GC happens in a seperate thread is helpful.
 
 But figuring out exactly what's going on requires confusing JVM 
 debugging of which I am no expert at either.
 
 On 6/1/2011 3:04 PM, Chris Cowan wrote:
 About once a day a Solr/Jetty process gets hung on my server consuming 100% 
 of one of the CPU's. Once this happens the server no longer responds to 
 requests. I've looked through the logs to try and see if anything stands 
 out but so far I've found nothing out of the ordinary.
 
 My current remedy is to log in and just kill the single processes that's 
 hung. Once that happens everything goes back to normal and I'm good for a 
 day or so.  I'm currently  the running following:
 
 solr-jetty-1.4.0+ds1-1ubuntu1
 
 which is comprised of
 
 Solr 1.4.0
 Jetty 6.1.22
 on Unbuntu 10.10
 
 I'm pretty new to managing a Jetty/Solr instance so at this point I'm just 
 looking for advice on how I should go about trouble shooting this problem.
 
 Chris

Re: Default query parser operator

2011-06-07 Thread Brian Lamb

I feel like this should be fairly easy to do but I just don't see anywhere
in the documentation on how to do this. Perhaps I am using the wrong search
parameters.

On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

 Hi all,

 Is it possible to change the query parser operator for a specific field
 without having to explicitly type it in the search field?

 For example, I'd like to use:

 http://localhost:8983/solr/search/?q=field1:word token field2:parser
 syntax

 instead of

 http://localhost:8983/solr/search/?q=field1:word AND token field2:parser
 syntax

 But, I only want it to be applied to field1, not field2 and I want the
 operator to always be AND unless the user explicitly types in OR.

 Thanks,

 Brian Lamb

Solr Custom Installation

2011-06-07 Thread Federico Czerwinski

Hey there. I was wondering if Solr can be embedded into my Java Web App. As
far as I know, Solr comes as a war or bundled with Jetty if you don't have a
container. I've opened the war's web.xml and found out that it only has a
couple of servlets, filters and that's it.

So, would it be possible to declare those servlets in *my* web.xml, and
include the appropiate jars in my classpath, instead of  having another
webapp deployed in the container?  Does Solr has the jars mavenized?

Thank you

Fede.

Re: How do I make sure the resulting documents contain the query terms?

Um, normally that would never happen, because, well, like you say, the 
inverted index doesn't have docC for term K1, because doc C didn't 
include term K1.


If you search on q=K1, then how/why would docC ever be in your result 
set?  Are you seeing it in your result set? The question then would be 
_why_, what weird thing is going on to make that happen,  that's not 
expected.


The result set _starts_ from only the documents that actually include 
the term.  Boosting/relevancy ranking only effects what order these 
documents appear in, but there's no reason documentC should be in the 
result set at all in your case of q=k1, where docC is not indexed under k1.


On 6/7/2011 2:35 AM, Gabriele Kahlout wrote:

Sorry being unclear and thank you for answering.
Consider the following documents A(k0,k1,k2), B(k1,k2,k3), and C(k0,k2,k3),
where A,B,C are document identifiers and the ks in bracket with each are the
terms each contains.
So Solr inverted index should be something like:

k0 --  A | C
k1 --  A | B
k2 --  A | B | C
k3 --  B | C

Now let q=k1, how do I make sure C doesn't appear as a result since it
doesn't contain any occurence of k1?

Re: Default query parser operator


Nope, not possible.

I'm not even sure what it would mean semantically. If you had default 
operator OR ordinarily, but default operator AND just for field2, 
then what would happen if you entered:


field1:foo field2:bar field1:baz field2:bom

Where the heck would the ANDs and ORs go?  The operators are BETWEEN the 
clauses that specify fields, they don't belong to a field. In general, 
the operators are part of the query as a whole, not any specific field.


In fact, I'd be careful of your example query:
q=field1:foo bar field2:baz

I don't think that means what you think it means, I don't think the 
field1 applies to the bar in that case. Although I could be wrong, 
but you definitely want to check it.  You need field1:foo field1:bar, 
or set the default field for the query to field1, or use parens 
(although that will change the execution strategy and ranking): 
q=field1:(foo bar)   


At any rate, even if there's a way to specify this so it makes sense, 
no, Solr/lucene doesn't support any such thing.




On 6/7/2011 10:56 AM, Brian Lamb wrote:

I feel like this should be fairly easy to do but I just don't see anywhere
in the documentation on how to do this. Perhaps I am using the wrong search
parameters.

On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb
brian.l...@journalexperts.comwrote:


Hi all,

Is it possible to change the query parser operator for a specific field
without having to explicitly type it in the search field?

For example, I'd like to use:

http://localhost:8983/solr/search/?q=field1:word token field2:parser
syntax

instead of

http://localhost:8983/solr/search/?q=field1:word AND token field2:parser
syntax

But, I only want it to be applied to field1, not field2 and I want the
operator to always be AND unless the user explicitly types in OR.

Thanks,

Brian Lamb

Re: Solr Custom Installation

2011-06-07 Thread Tomás Fernández Löbbe

Hi Federico, you can take a look to this wiki page:
http://wiki.apache.org/solr/EmbeddedSolr
http://wiki.apache.org/solr/EmbeddedSolrSolr also has some maven support,
see the ant target generate-maven-artifacts, don't know if that's what you
need.
Regards,
Tomás

On Tue, Jun 7, 2011 at 12:17 PM, Federico Czerwinski fed...@gmail.comwrote:

 Hey there. I was wondering if Solr can be embedded into my Java Web App. As
 far as I know, Solr comes as a war or bundled with Jetty if you don't have
 a
 container. I've opened the war's web.xml and found out that it only has a
 couple of servlets, filters and that's it.

 So, would it be possible to declare those servlets in *my* web.xml, and
 include the appropiate jars in my classpath, instead of  having another
 webapp deployed in the container?  Does Solr has the jars mavenized?

 Thank you

 Fede.

Re: How do I make sure the resulting documents contain the query terms?

2011-06-07 Thread Gabriele Kahlout

You are right, Lucene will return based on my scoring function
implementation (Similarity
classhttp://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html
):

score(q,d)   =
coord(q,d)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_coord
·
queryNorm(q)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_queryNorm
·
∑  ( tf(t in 
d)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_tf
·
idf(t)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_idf
2  ·  
t.getBoost()http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_termBoost
·
norm(t,d)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_norm
)
It can be seen that whenever tf(t in d) =0 the whole score will be 0, so as
you say C will never be returned.

My issue is when the query has multiple terms (my example was too simple!),
and some are 'mandatory' while others not. In that case I should make a
query that uses the
+%20http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#+(eg.
q=+k1).
I'm unsure I'll get the syntax right, but let's say k1 is mandatory and and
k2 and k3 are optional, then q=k2 k3 +k1. I see that queries made through
solrj are received with + in place of the   (default to OR), so
q=k2+k3++k1.



On Tue, Jun 7, 2011 at 5:23 PM, Jonathan Rochkind rochk...@jhu.edu wrote:

 Um, normally that would never happen, because, well, like you say, the
 inverted index doesn't have docC for term K1, because doc C didn't include
 term K1.

 If you search on q=K1, then how/why would docC ever be in your result set?
  Are you seeing it in your result set? The question then would be _why_,
 what weird thing is going on to make that happen,  that's not expected.

 The result set _starts_ from only the documents that actually include the
 term.  Boosting/relevancy ranking only effects what order these documents
 appear in, but there's no reason documentC should be in the result set at
 all in your case of q=k1, where docC is not indexed under k1.


 On 6/7/2011 2:35 AM, Gabriele Kahlout wrote:

 Sorry being unclear and thank you for answering.
 Consider the following documents A(k0,k1,k2), B(k1,k2,k3), and
 C(k0,k2,k3),
 where A,B,C are document identifiers and the ks in bracket with each are
 the
 terms each contains.
 So Solr inverted index should be something like:

 k0 --  A | C
 k1 --  A | B
 k2 --  A | B | C
 k3 --  B | C

 Now let q=k1, how do I make sure C doesn't appear as a result since it
 doesn't contain any occurence of k1?




-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains [LON] or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
 Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with X.
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Re: How do I make sure the resulting documents contain the query terms?

Okay, if you're using a custom similarity, I'm not sure what's going on,
I'm not familiar with that.

But ordinarily, you are right, you would require k1 with +k1.

What you say about the + being lost suggests something is going wrong.
Either you are not sending your query to Solr properly escaped, or
there's a bug in your custom similarity or query parser, or (not too
likely) there's a bug in Solr.

My experience is using the standard query parser, standard similarity
class, and contacting Solr via HTTP. (are you using SolrJ or HTTP?). In
that case, when you send the q to Solr, you are responsible for
URI-encoding it when you send it. So if you want to send a query like
k2 k3 +k1, you need to URI-escape it first, and this is what you'd send:

q=k2+k3+%2Bk1

or, escaping spaces as %20 instead, which is actually more 'correct'
with current standards:

q=k2%20k3%20%2Bk1

The important thing is that + escapes as %2B. You need to escape it
before sending it to Solr via an HTTP URI query string or HTTP form post
data. Yes, if you send a raw +, Solr will understand that as
representing a space, not an actual +. This is because the +
character is not 'safe', it needs to be escaped. The programming
language of your choice probably already has a library function for
URI-escaping values.

On 6/7/2011 11:36 AM, Gabriele Kahlout wrote:

You are right, Lucene will return based on my scoring function
implementation (Similarity
classhttp://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html
):

score(q,d) =
coord(q,d)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_coord
·
queryNorm(q)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_queryNorm
·
∑ ( tf(t in
d)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_tf
·
idf(t)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_idf
2 ·
t.getBoost()http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_termBoost
·
norm(t,d)http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/Similarity.html#formula_norm
)
It can be seen that whenever tf(t in d) =0 the whole score will be 0, so as
you say C will never be returned.

My issue is when the query has multiple terms (my example was too simple!),
and some are 'mandatory' while others not. In that case I should make a
query that uses the
+%20http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#+(eg.
q=+k1).
I'm unsure I'll get the syntax right, but let's say k1 is mandatory and and
k2 and k3 are optional, then q=k2 k3 +k1. I see that queries made through
solrj are received with + in place of the (default to OR), so
q=k2+k3++k1.

On Tue, Jun 7, 2011 at 5:23 PM, Jonathan Rochkindrochk...@jhu.edu wrote:

Um, normally that would never happen, because, well, like you say, the
inverted index doesn't have docC for term K1, because doc C didn't include
term K1.

If you search on q=K1, then how/why would docC ever be in your result set?
Are you seeing it in your result set? The question then would be _why_,
what weird thing is going on to make that happen, that's not expected.

The result set _starts_ from only the documents that actually include the
term. Boosting/relevancy ranking only effects what order these documents
appear in, but there's no reason documentC should be in the result set at
all in your case of q=k1, where docC is not indexed under k1.

On 6/7/2011 2:35 AM, Gabriele Kahlout wrote:

Sorry being unclear and thank you for answering.
Consider the following documents A(k0,k1,k2), B(k1,k2,k3), and
C(k0,k2,k3),
where A,B,C are document identifiers and the ks in bracket with each are
the
terms each contains.
So Solr inverted index should be something like:

k0 -- A | C
k1 -- A | B
k2 -- A | B | C
k3 -- B | C

Now let q=k1, how do I make sure C doesn't appear as a result since it
doesn't contain any occurence of k1?

Data not always returned

2011-06-07 Thread Jerome Renard

Hi all,

I have a problem with my index. Even though I always index the same
data over and over again, whenever I try
a couple of searches (they are always the same as they are issued by a
unit test suite) I do not get the same
results, sometimes I get 3 successes and 2 failures and sometimes it
is the other way around it is unpredictable.

Here is what I am trying to do:

I created a new Solr core with its specific solrconfig.xml and schema.xml
This core stores a list of towns which I plan to use with an
auto-suggestion system, using ngrams (no Suggester)

The indexing process is always the same :
1. the import script deletes all documents in the core :
deletequery*:*/query/delete and commit/
2. the import script fetches date from postgres, 100 rows at a time
2. the import script adds these 100 documents and sends a commit/
3. once all the rows (around 40 000) have been imported the script
send an optimize/ query

Here is what happens:
I run the indexer once and search for 'foo' I get results I expect but
if I search for 'bar' I get nothing
I reindex once again and search for 'foo' I get nothing, but if I
search for 'bar' I get results
The search is made on the name field which is a pretty common
TextField with ngrams.

I tried to physically remove the index (rm -rf path/to/index) and
reindex everything as well and
not all searches work, sometimes the 'foo' search work, sometimes the 'bar' one.

I tried a lot of differents things but now I am running out of ideas.
This is why I am asking for help.

Some useful informations :
Solr version : 3.1.0
Solr Implementation Version: 3.1.0 1085815 - grantingersoll -
2011-03-26 18:00:07
Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58
Java 1.5.0_24 on Mac Os X
solrconfig.xml and schema.xml are attached

Thanks in advance for your help.


schema.xml.gz
Description: GNU Zip compressed data


solrconfig.xml.gz
Description: GNU Zip compressed data

Question about tokenizing, searching and retrieving results.

2011-06-07 Thread Luis Cappa Banda

Hello!

My problem is as follows: I've got a field (indexed and stored setted as
true) tokenized by whitespaces and other patterns, with a gap with value
100. For example, if index the following expression for the field that I
mentioned:


*Expression*: A B C D E-  *Index*: tokenAtokenBtokenC
tokenD   tokenE


This behaviour is replicated in search context, so each content asociated to
this field during a search will be tokenized as I explained. If I search the
whole expresion the document indexed is returned correctly as expected, but
if I search something like:

*Expression*: A B C D E F G H I


It doesn't retrieve the document. What's happening? The expression will
match partially and I thought that the document will be returned too. I
tried modifying the gap value but it doesn't work.


Thank you very much.

Re: Solr Cloud Query Question

2011-06-07 Thread Jamie Johnson

Thanks Yonik.  I have a follow on now, how does Solr ensure consistent
results across pages?  So for example if we had my 3 theoretical solr
instances again and a, b and c each returned 100 documents with the same
score and the user only requested 100 documents, how are those 100 documents
chosen from the set available from a, b and c if the documents have the same
score?

On Tue, Jun 7, 2011 at 9:38 AM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Tue, Jun 7, 2011 at 9:35 AM, Jamie Johnson jej2...@gmail.com wrote:
  I am currently experimenting with the Solr Cloud code on trunk and just
 had
  a quick question.  Lets say my setup had 3 nodes a, b and c.  Node a has
  1000 results which meet a particular query, b has 2000 and c has 3000.
  When
  executing this query and asking for row 900 what specifically happens?
  From
  reading the Distributed Search Wiki I would expect that node a responds
 with
  900, node b responds with 900 and c responds with 900 and the
 coordinating
  node is responsible for taking the top scored items and throwing away the
  rest, is this correct or is there some additional coordination that
 happens
  where nodes a, b and c return back an id and a score and the coordinating
  node makes an additional request to get back the documents for the ids
 which
  make up the top list?

 The latter is correct - the first phase only collects enough
 information to merge ids from the shards, and then a second phase
 requests the stored fields, highlighting, etc for the specific docs
 that will be returned.

 -Yonik
 http://www.lucidimagination.com

Re: Default query parser operator

2011-06-07 Thread Brian Lamb

Hi Jonathan,

Thank you for your reply. Your point about my example is a good one. So let
me try to restate using your example. Suppose I want to apply AND to any
search terms within field1.

Then

field1:foo field2:bar field1:baz field2:bom

would by written as

http://localhost:8983/solr/?q=field1:foo OR field2:bar OR field1:baz OR
field2:bom

But if they were written together like:

http://localhost:8983/solr/?q=field1:(foo baz) field2:(bar bom)

I would want it to be

http://localhost:8983/solr/?q=field1:(foo AND baz) OR field2:(bar OR bom)

But it sounds like you are saying that would not be possible.

Thanks,

Brian Lamb

On Tue, Jun 7, 2011 at 11:27 AM, Jonathan Rochkind rochk...@jhu.edu wrote:

 Nope, not possible.

 I'm not even sure what it would mean semantically. If you had default
 operator OR ordinarily, but default operator AND just for field2, then
 what would happen if you entered:

 field1:foo field2:bar field1:baz field2:bom

 Where the heck would the ANDs and ORs go?  The operators are BETWEEN the
 clauses that specify fields, they don't belong to a field. In general, the
 operators are part of the query as a whole, not any specific field.

 In fact, I'd be careful of your example query:
q=field1:foo bar field2:baz

 I don't think that means what you think it means, I don't think the
 field1 applies to the bar in that case. Although I could be wrong, but
 you definitely want to check it.  You need field1:foo field1:bar, or set
 the default field for the query to field1, or use parens (although that
 will change the execution strategy and ranking): q=field1:(foo bar)   

 At any rate, even if there's a way to specify this so it makes sense, no,
 Solr/lucene doesn't support any such thing.




 On 6/7/2011 10:56 AM, Brian Lamb wrote:

 I feel like this should be fairly easy to do but I just don't see anywhere
 in the documentation on how to do this. Perhaps I am using the wrong
 search
 parameters.

 On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb
 brian.l...@journalexperts.comwrote:

  Hi all,

 Is it possible to change the query parser operator for a specific field
 without having to explicitly type it in the search field?

 For example, I'd like to use:

 http://localhost:8983/solr/search/?q=field1:word token field2:parser
 syntax

 instead of

 http://localhost:8983/solr/search/?q=field1:word AND token field2:parser
 syntax

 But, I only want it to be applied to field1, not field2 and I want the
 operator to always be AND unless the user explicitly types in OR.

 Thanks,

 Brian Lamb

Re: Question about tokenizing, searching and retrieving results.

2011-06-07 Thread Tomás Fernández Löbbe

My first guess would be that you are using AND as default operator? you can
see the generated query by using the parameter debugQuery=true

On Tue, Jun 7, 2011 at 1:34 PM, Luis Cappa Banda luisca...@gmail.comwrote:

 Hello!

 My problem is as follows: I've got a field (indexed and stored setted as
 true) tokenized by whitespaces and other patterns, with a gap with value
 100. For example, if index the following expression for the field that I
 mentioned:


 *Expression*: A B C D E-  *Index*: tokenAtokenBtokenC
 tokenD   tokenE


 This behaviour is replicated in search context, so each content asociated
 to
 this field during a search will be tokenized as I explained. If I search
 the
 whole expresion the document indexed is returned correctly as expected, but
 if I search something like:

 *Expression*: A B C D E F G H I


 It doesn't retrieve the document. What's happening? The expression will
 match partially and I thought that the document will be returned too. I
 tried modifying the gap value but it doesn't work.


 Thank you very much.

Re: Solr Cloud Query Question

On Tue, Jun 7, 2011 at 1:01 PM, Jamie Johnson jej2...@gmail.com wrote:
 Thanks Yonik.  I have a follow on now, how does Solr ensure consistent
 results across pages?  So for example if we had my 3 theoretical solr
 instances again and a, b and c each returned 100 documents with the same
 score and the user only requested 100 documents, how are those 100 documents
 chosen from the set available from a, b and c if the documents have the same
 score?

Ties within a shard are broken by docid (just like lucene), and ties
across different shards are broken by comparing the shard ids... so
yes, it's consistent.

-Yonik
http://www.lucidimagination.com

Re: Default query parser operator


There's no feature in Solr to do what you ask, no. I don't think.

On 6/7/2011 1:30 PM, Brian Lamb wrote:

Hi Jonathan,

Thank you for your reply. Your point about my example is a good one. So let
me try to restate using your example. Suppose I want to apply AND to any
search terms within field1.

Then

field1:foo field2:bar field1:baz field2:bom

would by written as

http://localhost:8983/solr/?q=field1:foo OR field2:bar OR field1:baz OR
field2:bom

But if they were written together like:

http://localhost:8983/solr/?q=field1:(foo baz) field2:(bar bom)

I would want it to be

http://localhost:8983/solr/?q=field1:(foo AND baz) OR field2:(bar OR bom)

But it sounds like you are saying that would not be possible.

Thanks,

Brian Lamb

On Tue, Jun 7, 2011 at 11:27 AM, Jonathan Rochkindrochk...@jhu.edu  wrote:


Nope, not possible.

I'm not even sure what it would mean semantically. If you had default
operator OR ordinarily, but default operator AND just for field2, then
what would happen if you entered:

field1:foo field2:bar field1:baz field2:bom

Where the heck would the ANDs and ORs go?  The operators are BETWEEN the
clauses that specify fields, they don't belong to a field. In general, the
operators are part of the query as a whole, not any specific field.

In fact, I'd be careful of your example query:
q=field1:foo bar field2:baz

I don't think that means what you think it means, I don't think the
field1 applies to the bar in that case. Although I could be wrong, but
you definitely want to check it.  You need field1:foo field1:bar, or set
the default field for the query to field1, or use parens (although that
will change the execution strategy and ranking): q=field1:(foo bar)   

At any rate, even if there's a way to specify this so it makes sense, no,
Solr/lucene doesn't support any such thing.




On 6/7/2011 10:56 AM, Brian Lamb wrote:


I feel like this should be fairly easy to do but I just don't see anywhere
in the documentation on how to do this. Perhaps I am using the wrong
search
parameters.

On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb
brian.l...@journalexperts.comwrote:

  Hi all,

Is it possible to change the query parser operator for a specific field
without having to explicitly type it in the search field?

For example, I'd like to use:

http://localhost:8983/solr/search/?q=field1:word token field2:parser
syntax

instead of

http://localhost:8983/solr/search/?q=field1:word AND token field2:parser
syntax

But, I only want it to be applied to field1, not field2 and I want the
operator to always be AND unless the user explicitly types in OR.

Thanks,

Brian Lamb

Re: Question about tokenizing, searching and retrieving results.

On Tue, Jun 7, 2011 at 12:34 PM, Luis Cappa Banda luisca...@gmail.com wrote:
 *Expression*: A B C D E F G H I

As written, this is equivalent to

*Expression*: A default_field:B default_field:C default_field:D
default_field:E default_field:F default_field:G default_field:H
default_field:I

Try *Expression*:( A B C D E F G H I)
or *Expression*:A B C D E F G H I for a phrase query.

Oh, and I highly recommend sticking to java identifiers for field
names - it will make your life much easier in the future.

-Yonik
http://www.lucidimagination.com

Solr Cloud and Range Facets

2011-06-07 Thread Jamie Johnson

I have a solr cloud setup wtih 2 servers, when executing a query against
them of the form:

http://localhost:8983/solr/select/?distrib=trueq=*:*facet=truefacet.mincount=1facet.range=dateTimef.dateTime.facet.range.gap=%2B1MONTHf.dateTime.facet.range.start=2011-06-01T00%3A00%3A00Z-1YEARf.dateTime.facet.range.end=2011-07-01T00%3A00%3A00Zf.dateTime.facet.mincount=1start=0rows=0

I am seeing that sometimes the date facet has a count, and other times it
does not.  Specifically I am seeing sometimes:

lst name=facet_ranges
  lst name=dateTime
lst name=counts/
str name=gap+1MONTH/str
date name=start2010-06-01T00:00:00Z/date
date name=end2011-07-01T00:00:00Z/date
  /lst
/lst

and others
lst name=facet_ranges
  lst name=dateTime
lst name=counts
  int name=2011-06-01T00:00:00Z250/int
/lst
str name=gap+1MONTH/str
date name=start2010-06-01T00:00:00Z/date
date name=end2011-07-01T00:00:00Z/date
  /lst
/lst

What could be causing this inconsistency?

Compound word search not what I expected

2011-06-07 Thread kenf_nc

I have a field defined as:
field name=content type=text indexed=true stored=false
termVectors=true multiValued=true /
where text is unmodified from the schema.xml example that came with Solr
1.4.1.

I have documents with some compound words indexed, words like Sandstone. And
in several cases words that are camel case like MaxSize. If I query using
all lower case, sandstone or maxsize, I get the documents I expect. If I
query with proper case, ie. Sandstone or Maxsize I get the documents I
expect. However, if the query is camel case, MaxSize or SandStone, it
doesn't find the documents. In the case of MaxSize it is particularly
frustrating because that is the actual case of the word that was indexed. Is
this expected behavior?  The query analyzer definition the the text field
type is:
analyzer type=query 
  tokenizer class=solr.WhitespaceTokenizerFactory/ 
  filter class=solr.SynonymFilterFactory ignoreCase=true expand=true
synonyms=synonyms.txt/ 
  filter class=solr.StopFilterFactory enablePositionIncrements=true
words=stopwords.txt ignoreCase=true/ 
  filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1
catenateAll=0 catenateNumbers=0 catenateWords=0
generateNumberParts=1 generateWordParts=1/ 
  filter class=solr.LowerCaseFilterFactory/ 
  filter language=English class=solr.SnowballPorterFilterFactory
protected=protwords.txt/ 
/analyzer

Is the order by the filters important? If LowerCaseFilterFactory came before
WordDelimiterFilterFactory, would that fix this? Would it break something
else?

Thanks,
Ken

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Compound-word-search-not-what-I-expected-tp3036089p3036089.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Compound word search not what I expected

2011-06-07 Thread Markus Jelsma

catenateWords should be set to true. Same goes for the index analyzer. 
preserveOriginal would also work.

 I have a field defined as:
 field name=content type=text indexed=true stored=false
 termVectors=true multiValued=true /
 where text is unmodified from the schema.xml example that came with Solr
 1.4.1.
 
 I have documents with some compound words indexed, words like Sandstone.
 And in several cases words that are camel case like MaxSize. If I query
 using all lower case, sandstone or maxsize, I get the documents I expect.
 If I query with proper case, ie. Sandstone or Maxsize I get the documents
 I expect. However, if the query is camel case, MaxSize or SandStone, it
 doesn't find the documents. In the case of MaxSize it is particularly
 frustrating because that is the actual case of the word that was indexed.
 Is this expected behavior?  The query analyzer definition the the text
 field type is:
 analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory/
   filter class=solr.SynonymFilterFactory ignoreCase=true expand=true
 synonyms=synonyms.txt/
   filter class=solr.StopFilterFactory enablePositionIncrements=true
 words=stopwords.txt ignoreCase=true/
   filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1
 catenateAll=0 catenateNumbers=0 catenateWords=0
 generateNumberParts=1 generateWordParts=1/
   filter class=solr.LowerCaseFilterFactory/
   filter language=English class=solr.SnowballPorterFilterFactory
 protected=protwords.txt/
 /analyzer
 
 Is the order by the filters important? If LowerCaseFilterFactory came
 before WordDelimiterFilterFactory, would that fix this? Would it break
 something else?
 
 Thanks,
 Ken
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Compound-word-search-not-what-I-expecte
 d-tp3036089p3036089.html Sent from the Solr - User mailing list archive at
 Nabble.com.

How to deal with many files using solr external file field

2011-06-07 Thread Bohnsack, Sven

Hi all,

we're using solr 1.4 and external file field ([1]) for sorting our 
searchresults. We have about 40.000 Terms, for which we use this sorting option.
Currently we're running into massive OutOfMemory-Problems and were not pretty 
sure, what's the matter. It seems that the garbage collector stops working or 
some processes are going wild. However, solr starts to allocate more and more 
RAM until we experience this OutOfMemory-Exception.


We noticed the following:

For some terms one could see in the solr log that there appear some 
java.io.FileNotFoundExceptions, when solr tries to load an external file for a 
term for which there is not such a file, e.g. solr tries to load the external 
score file for trousers but there ist none in the /solr/data-Folder.

Question: is it possible, that those exceptions are responsible for the 
OutOfMemory-Problem or could it be due to the large(?) number of 40k terms for 
which we want to sort the result via external file field?

I'm looking forward for your answers, suggestions and ideas :)


Regards
Sven


[1]: 
http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

Available Solr Indexing strategies

2011-06-07 Thread zarni aung

Hi,

I am very new to Solr and my client is trying to implement full text
searching capabilities to their product by using Solr.  They will also have
master storage that would be the Authoritative data store which will also
provide meta data searches.  Can you please point me in the right direction
for some indexing strategies that people are using for further research.

Thank you,

Zarni

Re: Data not always returned

Well, this is odd. Several questions

1 what do your logs show? I'm wondering if somehow some data is getting
 rejected. I have no idea why that would be, but if you're seeing indexing
 exceptions that would explain it.
2 on the admin/stats page, are maxDocs and numDocs the same in the success
 /failure case? And are they equal to 40,000?
3 what does debugQuery=on show in the two cases? I'd expect it to be
identical, but...
4 admin/schema browser. Look at your three fields and see if things
like unique-terms are
 identical.
5 are the rows being returned before indexing in the same order? I'm
wondering if somehow
 you're getting documents overwritten by having the same id (uniqueKey).
6 Have you poked around with Luke to see what, if anything, is dissimilar?

These are shots in the dark, but my supposition is that somehow you're
not indexing what
you expect, the questions above might give us a clue where to look next.

Best
Erick

On Tue, Jun 7, 2011 at 12:02 PM, Jerome Renard jerome.ren...@gmail.com wrote:
 Hi all,

 I have a problem with my index. Even though I always index the same
 data over and over again, whenever I try
 a couple of searches (they are always the same as they are issued by a
 unit test suite) I do not get the same
 results, sometimes I get 3 successes and 2 failures and sometimes it
 is the other way around it is unpredictable.

 Here is what I am trying to do:

 I created a new Solr core with its specific solrconfig.xml and schema.xml
 This core stores a list of towns which I plan to use with an
 auto-suggestion system, using ngrams (no Suggester)

 The indexing process is always the same :
 1. the import script deletes all documents in the core :
 deletequery*:*/query/delete and commit/
 2. the import script fetches date from postgres, 100 rows at a time
 2. the import script adds these 100 documents and sends a commit/
 3. once all the rows (around 40 000) have been imported the script
 send an optimize/ query

 Here is what happens:
 I run the indexer once and search for 'foo' I get results I expect but
 if I search for 'bar' I get nothing
 I reindex once again and search for 'foo' I get nothing, but if I
 search for 'bar' I get results
 The search is made on the name field which is a pretty common
 TextField with ngrams.

 I tried to physically remove the index (rm -rf path/to/index) and
 reindex everything as well and
 not all searches work, sometimes the 'foo' search work, sometimes the 'bar' 
 one.

 I tried a lot of differents things but now I am running out of ideas.
 This is why I am asking for help.

 Some useful informations :
 Solr version : 3.1.0
 Solr Implementation Version: 3.1.0 1085815 - grantingersoll -
 2011-03-26 18:00:07
 Lucene Implementation Version: 3.1.0 1085809 - 2011-03-26 18:06:58
 Java 1.5.0_24 on Mac Os X
 solrconfig.xml and schema.xml are attached

 Thanks in advance for your help.

Re: Compound word search not what I expected

WordDelimiterFilterFactory is doing this to you. It's not clear to me that you
want this in place at all.

Look at admin/analysis for that field to see how that filter breaks things up,
it's often surprising to people.

Best
Erick

On Tue, Jun 7, 2011 at 3:13 PM, kenf_nc ken.fos...@realestate.com wrote:
I have a field defined as:
field name=content type=text indexed=true stored=false
termVectors=true multiValued=true /
where text is unmodified from the schema.xml example that came with Solr
1.4.1.

I have documents with some compound words indexed, words like Sandstone. And
in several cases words that are camel case like MaxSize. If I query using
all lower case, sandstone or maxsize, I get the documents I expect. If I
query with proper case, ie. Sandstone or Maxsize I get the documents I
expect. However, if the query is camel case, MaxSize or SandStone, it
doesn't find the documents. In the case of MaxSize it is particularly
frustrating because that is the actual case of the word that was indexed. Is
this expected behavior? The query analyzer definition the the text field
type is:
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory ignoreCase=true expand=true
synonyms=synonyms.txt/
filter class=solr.StopFilterFactory enablePositionIncrements=true
words=stopwords.txt ignoreCase=true/
filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1
catenateAll=0 catenateNumbers=0 catenateWords=0
generateNumberParts=1 generateWordParts=1/
filter class=solr.LowerCaseFilterFactory/
filter language=English class=solr.SnowballPorterFilterFactory
protected=protwords.txt/
/analyzer

Is the order by the filters important? If LowerCaseFilterFactory came before
WordDelimiterFilterFactory, would that fix this? Would it break something
else?

Thanks,
Ken

--
View this message in context:
http://lucene.472066.n3.nabble.com/Compound-word-search-not-what-I-expected-tp3036089p3036089.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Compound word search not what I expected

see
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

from the wiki

Example of generateWordParts=1 and catenateWords=1:
PowerShot - 0:Power, 1:Shot 1:PowerShot
(where 0,1,1 are token positions)
A's+B'sC's - 0:A, 1:B, 2:C, 2:ABC
Super-Duper-XL500-42-AutoCoder! - 0:Super, 1:Duper, 2:XL,
2:SuperDuperXL, 3:500 4:42, 5:Auto, 6:Coder, 6:AutoCoder

One use for WordDelimiterFilter is to help match words with different
delimiters. One way of doing so is to specify generateWordParts=1
catenateWords=1 in the analyzer used for indexing, and
generateWordParts=1 in the analyzer used for querying. Given that
the current StandardTokenizer immediately removes many intra-word
delimiters, it is recommended that this filter be used after a
tokenizer that leaves them in place (such as WhitespaceTokenizer).

Re: Compound word search not what I expected

2011-06-07 Thread kenf_nc

I tried setting catenateWords=1 on the Query analyzer and that didn't do
anything. I think what I need is to set my Index Analyzer to have
preserveOriginal=1 and then re-index everything. That will be a pain, so
I'll do a small test to make sure first. I'm really surprised
preserveOriginal=1 isn't the default. It's like saying slice and dice
this word so I can search on all kinds of partial matches...but do NOT let
me search on the actual word itself.  I know it's not quite that, but it's
close. Anyway, I'm going to try the preserveOriginal parameter on
WordDelimiterFilterFactory, on both the Index and Query side and see what
happens.

Thanks for all the suggestions,
Ken

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Compound-word-search-not-what-I-expected-tp3036089p3037068.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Default query parser operator

Hi Brian could your front end app do this field query logic?

(assuming you have an app in front of solr)



On 7 June 2011 18:53, Jonathan Rochkind rochk...@jhu.edu wrote:
 There's no feature in Solr to do what you ask, no. I don't think.

 On 6/7/2011 1:30 PM, Brian Lamb wrote:

 Hi Jonathan,

 Thank you for your reply. Your point about my example is a good one. So
 let
 me try to restate using your example. Suppose I want to apply AND to any
 search terms within field1.

 Then

 field1:foo field2:bar field1:baz field2:bom

 would by written as

 http://localhost:8983/solr/?q=field1:foo OR field2:bar OR field1:baz OR
 field2:bom

 But if they were written together like:

 http://localhost:8983/solr/?q=field1:(foo baz) field2:(bar bom)

 I would want it to be

 http://localhost:8983/solr/?q=field1:(foo AND baz) OR field2:(bar OR bom)

 But it sounds like you are saying that would not be possible.

 Thanks,

 Brian Lamb

 On Tue, Jun 7, 2011 at 11:27 AM, Jonathan Rochkindrochk...@jhu.edu
  wrote:

 Nope, not possible.

 I'm not even sure what it would mean semantically. If you had default
 operator OR ordinarily, but default operator AND just for field2,
 then
 what would happen if you entered:

 field1:foo field2:bar field1:baz field2:bom

 Where the heck would the ANDs and ORs go?  The operators are BETWEEN the
 clauses that specify fields, they don't belong to a field. In general,
 the
 operators are part of the query as a whole, not any specific field.

 In fact, I'd be careful of your example query:
    q=field1:foo bar field2:baz

 I don't think that means what you think it means, I don't think the
 field1 applies to the bar in that case. Although I could be wrong,
 but
 you definitely want to check it.  You need field1:foo field1:bar, or
 set
 the default field for the query to field1, or use parens (although that
 will change the execution strategy and ranking): q=field1:(foo bar)
 

 At any rate, even if there's a way to specify this so it makes sense, no,
 Solr/lucene doesn't support any such thing.




 On 6/7/2011 10:56 AM, Brian Lamb wrote:

 I feel like this should be fairly easy to do but I just don't see
 anywhere
 in the documentation on how to do this. Perhaps I am using the wrong
 search
 parameters.

 On Mon, Jun 6, 2011 at 12:19 PM, Brian Lamb
 brian.l...@journalexperts.comwrote:

  Hi all,

 Is it possible to change the query parser operator for a specific field
 without having to explicitly type it in the search field?

 For example, I'd like to use:

 http://localhost:8983/solr/search/?q=field1:word token field2:parser
 syntax

 instead of

 http://localhost:8983/solr/search/?q=field1:word AND token
 field2:parser
 syntax

 But, I only want it to be applied to field1, not field2 and I want the
 operator to always be AND unless the user explicitly types in OR.

 Thanks,

 Brian Lamb

Solr Coldfusion Search Issue

2011-06-07 Thread Alejandro Delgadillo

Hi,

I¹m having some troubles using Solr throught Coldfusion,  the problem right
now is that when I search for a term in a Custom field, the results
sometimes have the value that I sent to the custom field and not to the
field that contains the text, this is the cfsearch sintax that I¹m using:

cfsearch collection=agenda,bitacoras
criteria='contents:#form.search#ANDcustom1:#form.tema#ANDcustom2:#
form.dia#ANDcustom4:#form.anio#ANDcustom3:#form.mon#'
name=result status=meta startrow=#url.start# maxrows=#max#
contextpassages=5 contexthighlightbegin=B
contexthighlightend=BE suggestions=always

Every custom fields gets the value by a combo box or drop box with a list of
option, the thing is that when the users sends a search for CUSTOM1,
sometimes the results include the same searched value un CONTENTS...

Do anyone have an idea on how to fix this?

I¹ll appreciate all the help I can get.

Regards.
Alex

Re: Solr Coldfusion Search Issue

Can you see the query actually presented to solr in the logs ?

maybe capture that and then run it with a debug true in the admin pages.

sorry i cant help directly with your syntax


On 7 June 2011 23:06, Alejandro Delgadillo adelgadi...@febg.org wrote:
 Hi,

 I¹m having some troubles using Solr throught Coldfusion,  the problem right
 now is that when I search for a term in a Custom field, the results
 sometimes have the value that I sent to the custom field and not to the
 field that contains the text, this is the cfsearch sintax that I¹m using:

 cfsearch collection=agenda,bitacoras
 criteria='contents:#form.search#ANDcustom1:#form.tema#ANDcustom2:#
 form.dia#ANDcustom4:#form.anio#ANDcustom3:#form.mon#'
 name=result status=meta startrow=#url.start# maxrows=#max#
 contextpassages=5 contexthighlightbegin=B
 contexthighlightend=BE suggestions=always

 Every custom fields gets the value by a combo box or drop box with a list of
 option, the thing is that when the users sends a search for CUSTOM1,
 sometimes the results include the same searched value un CONTENTS...

 Do anyone have an idea on how to fix this?

 I¹ll appreciate all the help I can get.

 Regards.
 Alex

Re: Compound word search not what I expected

2011-06-07 Thread Markus Jelsma

You must catenateWord on index-time as well.

 I tried setting catenateWords=1 on the Query analyzer and that didn't do
 anything. I think what I need is to set my Index Analyzer to have
 preserveOriginal=1 and then re-index everything. That will be a pain, so
 I'll do a small test to make sure first. I'm really surprised
 preserveOriginal=1 isn't the default. It's like saying slice and dice
 this word so I can search on all kinds of partial matches...but do NOT let
 me search on the actual word itself.  I know it's not quite that, but it's
 close. Anyway, I'm going to try the preserveOriginal parameter on
 WordDelimiterFilterFactory, on both the Index and Query side and see what
 happens.

wildcard search

2011-06-07 Thread Thomas Fischer

Hello,

I am testing solr 3.2 and have problems with wildcards.
I am indexing values like IA 300; IC 330; IA 317; IA 318 in a field GOK, 
and can't find a way to search with wildcards.
I want to use a wild card search to match something like IA 31? but cannot 
find a way to do so.
GOK:IA\ 38* doesn't work with the contents of GOK indexed as text.
Is there a way to index and search that would meet my requirements?

Thomas

Re: Solr Coldfusion Search Issue

2011-06-07 Thread Alejandro Delgadillo

Thanks Lee for the quick response,

Let me explain it a little bit better

In the CFSEARCH tag, you use the CRITERIA attribute, what it does... By
default is that it sents to the SOLR via post the search query of the user
to the field where the text is stored in this case since I'm indexing PDF
files the variable CONTENTS in solr...

The problem is that also sends the custom field criteria to the contents
variables and that's why I have for example:

If I search the value 03 in CUSTOM1, it also search the same value in
CONTENTS, it works since the results are filtered by the value, but the
contents display the same value, in this case 03

Maybe... I'm not sure... There is another way to search for custom fields
using the CFSEARCH tag, I've tried changing the order, but I still get the
same result...


On 6/7/11 4:14 PM, lee carroll lee.a.carr...@googlemail.com wrote:

 Can you see the query actually presented to solr in the logs ?
 
 maybe capture that and then run it with a debug true in the admin pages.
 
 sorry i cant help directly with your syntax
 
 
 On 7 June 2011 23:06, Alejandro Delgadillo adelgadi...@febg.org wrote:
 Hi,
 
 I¹m having some troubles using Solr throught Coldfusion,  the problem right
 now is that when I search for a term in a Custom field, the results
 sometimes have the value that I sent to the custom field and not to the
 field that contains the text, this is the cfsearch sintax that I¹m using:
 
 cfsearch collection=agenda,bitacoras
 criteria='contents:#form.search#ANDcustom1:#form.tema#ANDcustom2:#
 form.dia#ANDcustom4:#form.anio#ANDcustom3:#form.mon#'
 name=result status=meta startrow=#url.start# maxrows=#max#
 contextpassages=5 contexthighlightbegin=B
 contexthighlightend=BE suggestions=always
 
 Every custom fields gets the value by a combo box or drop box with a list of
 option, the thing is that when the users sends a search for CUSTOM1,
 sometimes the results include the same searched value un CONTENTS...
 
 Do anyone have an idea on how to fix this?
 
 I¹ll appreciate all the help I can get.
 
 Regards.
 Alex

Re: wildcard search

Yes there is, but you haven't provided enough information to
make a suggestion. What isthe fieldType definition? What is
the field definition?

Two resources that'll help you greatly are:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters

and the admin/analysis page...

Best
Erick

On Tue, Jun 7, 2011 at 6:23 PM, Thomas Fischer fischer...@aon.at wrote:
 Hello,

 I am testing solr 3.2 and have problems with wildcards.
 I am indexing values like IA 300; IC 330; IA 317; IA 318 in a field GOK, 
 and can't find a way to search with wildcards.
 I want to use a wild card search to match something like IA 31? but cannot 
 find a way to do so.
 GOK:IA\ 38* doesn't work with the contents of GOK indexed as text.
 Is there a way to index and search that would meet my requirements?

 Thomas

400 MB Fields

2011-06-07 Thread Otis Gospodnetic

Hello,

What are the biggest document fields that you've ever indexed in Solr or that 
you've heard of?  Ah, it must be Tom's Hathi trust. :)

I'm asking because I just heard of a case of an index where some documents 
having a field that can be around 400 MB in size!  I'm curious if anyone has 
any 
experience with such monster fields?
Crazy?  Yes, sure.
Doable?

Otis

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/

Re: 400 MB Fields