RE: Memcache for Solr

2010-09-01 Thread Hitendra Molleti
Apologies, did not realize it.

Thanks

-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Tuesday, August 31, 2010 11:11 PM
To: solr-user@lucene.apache.org
Subject: Re: Memcache for Solr


: References: 4c7d1071.8000...@elyograg.org
: In-Reply-To: 4c7d1071.8000...@elyograg.org
: Subject: Memcache for Solr

http://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to 
an existing message, instead start a fresh email.  Even if you change the 
subject line of your email, other mail headers still track which thread 
you replied to and your question is hidden in that thread and gets less 
attention.   It makes following discussions in the mailing list archives 
particularly difficult.
See Also:  http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking



-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!





Implementing Memcache for Solr

2010-09-01 Thread Hitendra Molleti
Hi,

 

We were looking at implementing Memcache for Solr.

 

Can someone who has already implemented this let us know if it is a good
option to go for i.e. how effective is using memcache compared to Solr's
internal cache. 

 

Also, are there any down sides to it and difficult to implement.

 

Thanks

 

Hitendra



Re: Implementing Memcache for Solr

2010-09-01 Thread Grijesh.singh

As per my experience with memcache was not so good .
Finaly I have configured solr's built in cache for best perfoemance.
By memcache we were caching query,bu it solr provides already.
you can take a call after Load testing with memcache and without memcache 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Implementing-Memcache-for-Solr-tp1398625p1398823.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Deploying Solr 1.4.1 in JbossAs 6

2010-09-01 Thread Grijesh.singh

1-extract the solr.war
2-edit the web.xml for setting solr/home param
3-create the solr.war
4-setup solr home directory
5-copy the solr.war to JBossAs 6 deploy directory
7-start the jboss server
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Deploying-Solr-1-4-1-in-JbossAs-6-tp1392539p1398859.html
Sent from the Solr - User mailing list archive at Nabble.com.


Proximity search + Highlighting

2010-09-01 Thread Xavier Schepler

Hi,

can the highlighting component highlight terms only if the distance 
between them matches the query ?

I use those parameters :

hl=onhl.fl=qFR,iFR,mFR,vlFRhl.usePhraseHighlighter=falsehl.highlightMultiTerm=truehl.simple.pre=bhl.simple.post=%2Fbhl.mergeContiguous=false


Re: Proximity search + Highlighting

2010-09-01 Thread Markus Jelsma
I think you need to enable usePhraseHighlighter in order to use the 
highlightMultiTerm parameter.

 On Wednesday 01 September 2010 12:12:11 Xavier Schepler wrote:
 Hi,
 
 can the highlighting component highlight terms only if the distance
 between them matches the query ?
 I use those parameters :
 
 hl=onhl.fl=qFR,iFR,mFR,vlFRhl.usePhraseHighlighter=falsehl.highlightMult
 iTerm=truehl.simple.pre=bhl.simple.post=%2Fbhl.mergeContiguous=false
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



NullpointerException when combining spellcheck component and synonyms

2010-09-01 Thread Stefan Moises

 Hi there,

I am using Solr from SVN, 
https://svn.apache.org/repos/asf/lucene/dev/trunk (my last update/build 
on my dev server was in July I think)...


I've encountered a strange problem when using the Spellcheck component 
in combination with the SynonymFilter...

My text field is pretty standard, using the default synonyms.txt file:
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt 
ignoreCase=true expand=false/

--
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 
catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
!-- filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/ --

filter class=solr.SnowballPorterFilterFactory protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 
catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
!-- filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/ --

filter class=solr.SnowballPorterFilterFactory protected=protwords.txt/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

I have only added some terms at the end of synonyms.txt:
...
# Synonym mappings can be used for spelling correction too
pixima = pixma

tekanne = teekanne
teekane = teekanne
flashen = flaschen
flasen = flaschen

Here is my query and the exception... if I turn off spellcheck, 
everything works as expected and the synonyms are found...


INFO: [db] webapp=/solr path=/select 
params={mlt.minwl=3spellcheck=truefacet=truemlt.fl=oxmanu_oxid,oxvendor_oxid,oxtags,oxsearchkeysspellcheck.q=flasenmlt.mintf=1facet.limit=-1mlt=truejson.nl=maphl.fl=oxtitlehl.fl=oxshortdeschl.fl=oxlongdeschl.fl=oxtagshl.fl=seodeschl.fl=seokeywordswt=jsonhl=truerows=10version=1.2mlt.mindf=1debugQuery=truefacet.sort=lexstart=0q=flasenfacet.field=oxcat_oxidfacet.field=oxcat_oxidtitlefacet.field=oxpricefacet.field=oxmanu_oxidfacet.field=oxmanu_oxidtitlefacet.field=oxvendor_oxidfacet.field=oxvendor_oxidtitlefacet.field=attrgroup_oxidfacet.field=attrgroup_oxidtitlefacet.field=attrgroup_oxidvaluefacet.field=attrvalue_oxidfacet.field=attrvalue_oxidtitlefacet.field=attr2attrgroup_oxidtitleqt=dismaxspellcheck.build=false} 
hits=2 status=500 QTime=14

01.09.2010 12:54:47 org.apache.solr.common.SolrException log
SCHWERWIEGEND: java.lang.NullPointerException
at 
org.apache.lucene.util.AttributeSource.cloneAttributes(AttributeSource.java:470)
at 
org.apache.lucene.analysis.synonym.SynonymFilter.incrementToken(SynonymFilter.java:128)
at 
org.apache.lucene.analysis.core.StopFilter.incrementToken(StopFilter.java:260)
at 
org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter.incrementToken(WordDelimiterFilter.java:336)
at 
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:62)
at 
org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:380)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:127)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:465)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:555)
at 

Re: Proximity search + Highlighting

2010-09-01 Thread Xavier Schepler

On 01/09/2010 12:38, Markus Jelsma wrote:

I think you need to enable usePhraseHighlighter in order to use the
highlightMultiTerm parameter.

  On Wednesday 01 September 2010 12:12:11 Xavier Schepler wrote:
   

Hi,

can the highlighting component highlight terms only if the distance
between them matches the query ?
I use those parameters :

hl=onhl.fl=qFR,iFR,mFR,vlFRhl.usePhraseHighlighter=falsehl.highlightMult
iTerm=truehl.simple.pre=bhl.simple.post=%2Fbhl.mergeContiguous=false

 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


   

yes, you're right.


Re: Proximity search + Highlighting

2010-09-01 Thread Xavier Schepler

On 01/09/2010 13:54, Xavier Schepler wrote:

On 01/09/2010 12:38, Markus Jelsma wrote:

I think you need to enable usePhraseHighlighter in order to use the
highlightMultiTerm parameter.

  On Wednesday 01 September 2010 12:12:11 Xavier Schepler wrote:

Hi,

can the highlighting component highlight terms only if the distance
between them matches the query ?
I use those parameters :

hl=onhl.fl=qFR,iFR,mFR,vlFRhl.usePhraseHighlighter=falsehl.highlightMult 

iTerm=truehl.simple.pre=bhl.simple.post=%2Fbhl.mergeContiguous=false 




Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



yes, you're right.


but it doesn't help for the other problem


shingles work in analyzer but not real data

2010-09-01 Thread Jeff Rose
Hi,
  We are using SOLR to match query strings with a keyword database, where
some of the keywords are actually more than one word.  For example a keyword
might be apple pie and we only want it to match for a query containing
that word pair, but not one only containing apple.  Here is the relevant
piece of the schema.xml, defining the index and query pipelines:

  fieldType name=text class=solr.TextField positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.PatternTokenizerFactory pattern=;/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory /
 /analyzer
 analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory /
filter class=solr.ShingleFilterFactory /
  /analyzer
   /fieldType

In the analysis tool this schema looks like it works correctly.  Our
multi-word keywords are indexed as a single entry, and then when a search
phrase contains one of these multi-word keywords it is shingled and matched.
 Unfortunately, when we do the same queries on top of the actual index it
responds with zero matches.  I can see in the index histogram that the terms
are correctly indexed from our mysql datasource containing the keywords, but
somehow the shingling doesn't appear to work on this live data.  Does anyone
have experience with shingling that might have some tips for us, or
otherwise advice for debugging the issue?

Thanks,
Jeff


Re: Problems indexing spatial field - undefined subField

2010-09-01 Thread Thomas Joiner
While you have already solved your problem, my guess as to why it didn't
work originally is that you probably didn't have a
dynamicField name=*_latLon indexed=true stored=true /

What subFieldType does is it registers a dynamicField for you.
 subFieldSuffix requires that you have already defined that dynamicField.

On Tue, Aug 31, 2010 at 8:07 PM, Simon Wistow si...@thegestalt.org wrote:

 On Wed, Sep 01, 2010 at 01:05:47AM +0100, me said:
  I'm trying to index a latLon field.
 
  fieldType name=latLon class=solr.LatLonType
 subFieldSuffix=_latLon/
  field name=location type=latLon  indexed=true  stored=true/

 Turns out changing it to

 fieldType name=latLon class=solr.LatLonType subFieldType=double/

 fixed it.





Re: shingles work in analyzer but not real data

2010-09-01 Thread Robert Muir
On Wed, Sep 1, 2010 at 8:21 AM, Jeff Rose j...@globalorange.nl wrote:

 Hi,
  We are using SOLR to match query strings with a keyword database, where
 some of the keywords are actually more than one word.  For example a
 keyword
 might be apple pie and we only want it to match for a query containing
 that word pair, but not one only containing apple.  Here is the relevant
 piece of the schema.xml, defining the index and query pipelines:

  fieldType name=text class=solr.TextField positionIncrementGap=100
 analyzer type=index
   tokenizer class=solr.PatternTokenizerFactory pattern=;/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory /
 /analyzer
 analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
filter class=solr.TrimFilterFactory /
 filter class=solr.ShingleFilterFactory /
  /analyzer
   /fieldType

 In the analysis tool this schema looks like it works correctly.  Our
 multi-word keywords are indexed as a single entry, and then when a search
 phrase contains one of these multi-word keywords it is shingled and
 matched.
  Unfortunately, when we do the same queries on top of the actual index it
 responds with zero matches.  I can see in the index histogram that the
 terms
 are correctly indexed from our mysql datasource containing the keywords,
 but
 somehow the shingling doesn't appear to work on this live data.  Does
 anyone
 have experience with shingling that might have some tips for us, or
 otherwise advice for debugging the issue?


query-time shingling probably isnt working with the queryparser you are
using, the default lucene one first splits on whitespace before sending it
to the analyzer: e.g. a query of foo bar is processed as TokenStream(foo) +
TokenStream(bar)

so query-time shingling like this doesn't work as you expect for this
reason.


-- 
Robert Muir
rcm...@gmail.com


Re: shingles work in analyzer but not real data

2010-09-01 Thread Markus Jelsma
If your use-case is limited to this, why don't you encapsulate all queries in 
double quotes? 

On Wednesday 01 September 2010 14:21:47 Jeff Rose wrote:
 Hi,
   We are using SOLR to match query strings with a keyword database, where
 some of the keywords are actually more than one word.  For example a
  keyword might be apple pie and we only want it to match for a query
  containing that word pair, but not one only containing apple.  Here is
  the relevant piece of the schema.xml, defining the index and query
  pipelines:
 
   fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.PatternTokenizerFactory pattern=;/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.TrimFilterFactory /
  /analyzer
  analyzer type=query
 tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.TrimFilterFactory /
 filter class=solr.ShingleFilterFactory /
   /analyzer
/fieldType
 
 In the analysis tool this schema looks like it works correctly.  Our
 multi-word keywords are indexed as a single entry, and then when a search
 phrase contains one of these multi-word keywords it is shingled and
  matched. Unfortunately, when we do the same queries on top of the actual
  index it responds with zero matches.  I can see in the index histogram
  that the terms are correctly indexed from our mysql datasource containing
  the keywords, but somehow the shingling doesn't appear to work on this
  live data.  Does anyone have experience with shingling that might have
  some tips for us, or otherwise advice for debugging the issue?
 
 Thanks,
 Jeff
 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350



Auto Suggest

2010-09-01 Thread Jazz Globe

Hallo

How would one implement a multiple term auto-suggest feature in Solr that is 
filter sensitive?
For example, a user enters :
mp3
  and solr might suggest:
  -   mp3 player
  -   mp3 nano
  -   mp3 sony
and then the user starts the second word :
mp3 n
and that narrows it down to:
  - mp3 nano

I had a quick look at the Terms Component.
I suppose it just returns term totals for the entire index and cannot be used 
with a filter or query?

Thanks
Johan

  

Re: Solr Admin Schema Browser and field named keywords

2010-09-01 Thread Shawn Heisey

 On 8/26/2010 5:04 PM, Chris Hostetter wrote:

doubtful.

I suspect it has more to do with the amount of data in your keywords
field and the underlying request to hte LukeRequestHandler timing out.

   have you tried using it with a test index where the keywords
field has only a few words in it?


It just occurred to me that there probably isn't enough data in the 
keywords field to cause this.  It is one of four fields copied into the 
catchall field, and is nowhere near as large as the ft_text field that 
is also copied to catchall.  The schema browser has always worked on the 
catchall field.


Actually, on a test index that I just built (with my leading/trailing 
punctuation filter included), I CAN access the keywords field.  
Bizarre.  Ideas?


Thanks,
Shawn



MoreLikethis and fq not giving exact results ?

2010-09-01 Thread Sumit Arora
Hi All,

 I have provided identifications ,While submitting document to Solr e.g; jp_
for job posting , cp_ for career profile , and it stores id in a form of :
jp_1, or jp_2 etc or cp_1 or cp_2 etc.

 So when I perform standard query with fq=cp_ , then its provide me the
results belong to cp_ only or jp only.

 But when I enable mlt inside the query it returns the results for jp_ as
well, because job_title also exist in job posting ( though jp_ or cp_
already differentiating to both of this ?)

e.g;

http://192.168.1.4:8983/solr/select/?mlt=truemlt.fl=job_title%2Ccareer_summary%2Cindustry%2Ccompany%2Cexactly_lookingversion=1.2q=id%3A
*cp_4*start=0rows=100*fq=cp_*
*
*
*
*
*How I can effectively use FilterQuery and MoreLikeThis ?*
*
*
*/Sumit*
*
*
*
*


Re: NullpointerException when combining spellcheck component and synonyms

2010-09-01 Thread Stefan Moises
 doh, looks like I only forgot to add the spellcheck component to my 
edismax request handler... now it works with:


...
arr name=last-components
strspellcheck/str
strelevator/str
/arr

What's strange is that spellchecking seemed to work *without* that 
entry, too


Cheers,
Stefan

Am 01.09.2010 13:33, schrieb Stefan Moises:

 Hi there,

I am using Solr from SVN, 
https://svn.apache.org/repos/asf/lucene/dev/trunk (my last 
update/build on my dev server was in July I think)...


I've encountered a strange problem when using the Spellcheck component 
in combination with the SynonymFilter...

My text field is pretty standard, using the default synonyms.txt file:
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
!-- in this example, we will only use synonyms at query time
filter class=solr.SynonymFilterFactory 
synonyms=index_synonyms.txt ignoreCase=true expand=false/

--
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=1 catenateNumbers=1 
catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
!-- filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/ --
filter class=solr.SnowballPorterFilterFactory 
protected=protwords.txt/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt 
ignoreCase=true expand=true/
filter class=solr.StopFilterFactory ignoreCase=true 
words=stopwords.txt/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1 
generateNumberParts=1 catenateWords=0 catenateNumbers=0 
catenateAll=0 splitOnCaseChange=1/

filter class=solr.LowerCaseFilterFactory/
!-- filter class=solr.EnglishPorterFilterFactory 
protected=protwords.txt/ --
filter class=solr.SnowballPorterFilterFactory 
protected=protwords.txt/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer
/fieldType

I have only added some terms at the end of synonyms.txt:
...
# Synonym mappings can be used for spelling correction too
pixima = pixma

tekanne = teekanne
teekane = teekanne
flashen = flaschen
flasen = flaschen

Here is my query and the exception... if I turn off spellcheck, 
everything works as expected and the synonyms are found...


INFO: [db] webapp=/solr path=/select 
params={mlt.minwl=3spellcheck=truefacet=truemlt.fl=oxmanu_oxid,oxvendor_oxid,oxtags,oxsearchkeysspellcheck.q=flasenmlt.mintf=1facet.limit=-1mlt=truejson.nl=maphl.fl=oxtitlehl.fl=oxshortdeschl.fl=oxlongdeschl.fl=oxtagshl.fl=seodeschl.fl=seokeywordswt=jsonhl=truerows=10version=1.2mlt.mindf=1debugQuery=truefacet.sort=lexstart=0q=flasenfacet.field=oxcat_oxidfacet.field=oxcat_oxidtitlefacet.field=oxpricefacet.field=oxmanu_oxidfacet.field=oxmanu_oxidtitlefacet.field=oxvendor_oxidfacet.field=oxvendor_oxidtitlefacet.field=attrgroup_oxidfacet.field=attrgroup_oxidtitlefacet.field=attrgroup_oxidvaluefacet.field=attrvalue_oxidfacet.field=attrvalue_oxidtitlefacet.field=attr2attrgroup_oxidtitleqt=dismaxspellcheck.build=false} 
hits=2 status=500 QTime=14

01.09.2010 12:54:47 org.apache.solr.common.SolrException log
SCHWERWIEGEND: java.lang.NullPointerException
at 
org.apache.lucene.util.AttributeSource.cloneAttributes(AttributeSource.java:470)
at 
org.apache.lucene.analysis.synonym.SynonymFilter.incrementToken(SynonymFilter.java:128)
at 
org.apache.lucene.analysis.core.StopFilter.incrementToken(StopFilter.java:260)
at 
org.apache.lucene.analysis.miscellaneous.WordDelimiterFilter.incrementToken(WordDelimiterFilter.java:336)
at 
org.apache.lucene.analysis.core.LowerCaseFilter.incrementToken(LowerCaseFilter.java:62)
at 
org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:380)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:127)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1323)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 

Download document from solr

2010-09-01 Thread Matteo Moci

 Hello to All,
I am a newbie with Solr, and I am trying to understand if I can use it 
form my purpose,
and I was wondering how Solr lists the result documents: do they appear 
as downloadable files,
just like http://solr.machine.com/path/file.doc, or do I need develop 
another layer to take care of downloading?

Even a link to the docs might work...

Thank you,
Matteo



Re: place of log4j.properties file

2010-09-01 Thread joyce chan
Hi

Sorry to reopen this thread.  Do you guys know how to use log4jdbc in solr?

Thanks
JC

2010/3/19 Király Péter pkir...@tesuji.eu

 Thanks David!

 It works. Even with relative path, like
 -Dlog4j.configuration=file:etc/log4j.properties.

 Péter

 - Original Message - From: Smiley, David W. dsmi...@mitre.org
 To: solr-user@lucene.apache.org
 Cc: Eric Pugh ep...@opensourceconnections.com
 Sent: Friday, March 19, 2010 5:43 PM
 Subject: Re: place of log4j.properties file



 I believe that should have been
 -Dlog4j.configuration=file:/c:/foo/log4j.properties
 I've done this sort of thing many times before.

 I've also found it helpful to add -Dlog4j.debug  (no value needed) to debug
 logging.

 http://logging.apache.org/log4j/1.2/manual.html

 ~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/

 On Mar 19, 2010, at 12:27 PM, Király Péter wrote:

  Hi,

 on page 205 of the Solr 1.4 Enterprise Search Server book there is an
 example,
 of how to reference log4j.properties file from Jetty. I tried that and
 several other
 methods (like -Dlog4j.properties=path to file), but the only working way
 was to put create a WEB-INF/classes directory inside the solr.war, and put
 the file there (tip I found in the list's archive).

 Is it possible, that there is no other way?

 Thanks,
 Péter









java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-01 Thread Antonio Calo'

 Hi guys

I'm facing an error in our production environment with our search 
application based on maven with spring + solrj.


When I try to change a class, or try to redeploy/restart an application, 
I catch a java.lang.OutOfMemoryError: PermGen


I've tryed to understand the cause of this and also I've succeded in 
reproducing this issue on my local develop environment by just 
restarting the jetty several time (I'm using eclipse + maven plugin).


The logs obtained are those:

   [...]
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/: org.apache.solr.handler.admin.AdminHandlers
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/ping: PingRequestHandler
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /debug/dump: solr.DumpRequestHandler
   32656 [Finalizer] INFO org.apache.solr.core.SolrCore - []  CLOSING
   SolrCore org.apache.solr.core.solrc...@1409c28
   17:43:19 ERROR InvertedIndexEngine:124 open -
   java.lang.OutOfMemoryError: PermGen space
   java.lang.RuntimeException: java.lang.OutOfMemoryError: PermGen space
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.core.SolrCore.init(SolrCore.java:579)
at
   
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at
   
com.intellisemantic.intellifacet.resource.invertedIndex.InvertedIndexEngine.open(InvertedIndexEngine.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
   
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1536)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1477)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1409)
   [...]

The exception is always thrown while solr init is performed after a 
restart (this is the reason why I'm asking your support ;) )


It seems that while solr is trying to be set up (by [Timer-1]), another 
thread ([Finalizer]) is trying to close it. I can see from the Solr code 
that this exception is thrown always in the same place: SolrCore.java:1068.

Here there is a comment that say:

   // need to close the searcher here??? we shouldn't have to.
  throw new RuntimeException(th);
} finally {
  if (newestSearcher != null) {
newestSearcher.decref();
  }
}

I'm using slorj lib in a Spring container, so I'm supposing that Spring 
will manage the relase of all the singleton classes. Should I do 
something other like force closing solr?


Thanks in advance for your support.

Best regards

Antonio


java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-01 Thread Antonio Calo'

 Hi guys

I'm facing an error in our production environment with our search 
application based on maven with spring + solrj.


When I try to change a class, or try to redeploy/restart an application, 
I catch a java.lang.OutOfMemoryError: PermGen


I've tryed to understand the cause of this and also I've succeded in 
reproducing this issue on my local develop environment by just 
restarting the jetty several time (I'm using eclipse + maven plugin).


The logs obtained are those:

   [...]
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/: org.apache.solr.handler.admin.AdminHandlers
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/ping: PingRequestHandler
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /debug/dump: solr.DumpRequestHandler
   32656 [Finalizer] INFO org.apache.solr.core.SolrCore - []  CLOSING
   SolrCore org.apache.solr.core.solrc...@1409c28
   17:43:19 ERROR InvertedIndexEngine:124 open -
   java.lang.OutOfMemoryError: PermGen space
   java.lang.RuntimeException: java.lang.OutOfMemoryError: PermGen space
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.core.SolrCore.init(SolrCore.java:579)
at
   
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137)
at
   
com.intellisemantic.intellifacet.resource.invertedIndex.InvertedIndexEngine.open(InvertedIndexEngine.java:113)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
   
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1536)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1477)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1409)
   [...]

The exception is always thrown while solr init is performed after a 
restart (this is the reason why I'm asking your support ;) )


It seems that while solr is trying to be set up (by [Timer-1]), another 
thread ([Finalizer]) is trying to close it. I can see from the Solr code 
that this exception is thrown always in the same place: SolrCore.java:1068.

Here there is a comment that say:

   // need to close the searcher here??? we shouldn't have to.
  throw new RuntimeException(th);
} finally {
  if (newestSearcher != null) {
newestSearcher.decref();
  }
}

I'm using slorj lib in a Spring container, so I'm supposing that Spring 
will manage the relase of all the singleton classes. Should I do 
something other like force closing solr?


Thanks in advance for your support.

Best regards

Antonio


Re: missing part folder - how to debug?

2010-09-01 Thread Alex Baranau
Hi,

Adding Solr user list.

We used similar approach to the one in this patch but with Hadoop Streaming.
Did you determine that indices are really missing? I mean did you find
missing documents in the output indices?

Alex Baranau

Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - HBase

2010/8/31 Koji Sekiguchi k...@r.email.ne.jp

  Hello,

 We are using Hadoop to make Solr index. We are using SOLR-1301
 that was first contributed by Andrzej:

 https://issues.apache.org/jira/browse/SOLR-1301

 It works great on testing environment, 4 servers.
 Today, we run it on production environment, 320 servers.
 We run 5120 reducers (16 per server). This results 5120 indexes
 i.e. part-X folders should be created. But about 20 part
 folders were missing, and Hadoop didn't produce any error logs.
 How can we investigate/debug this problem?

 Any pointers, experiences would be highly appreciated!

 Thanks,

 Koji

 --
 http://www.rondhuit.com/en/




Need help with field collapsing and out of memory error

2010-09-01 Thread Moazzam Khan
Hi guys,

I have about 20k documents in the Solr index (and there's a lot of
text in each of them). I have field collapsing enabled on a specific
field (AdvisorID).

The thing is if I have field collapsing enabled in the search request
I don't get correct count for the total number of records that
matched. It always says that the number of rows I asked to get back
is the number of total records it found.

And, when I run a query with search criteria *:* (to get the number of
total advisors in the index) solr runs of out memory and gives me an
error saying

SEVERE: java.lang.OutOfMemoryError: Java heap space
at java.nio.CharBuffer.wrap(CharBuffer.java:350)
at java.nio.CharBuffer.wrap(CharBuffer.java:373)
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
at java.lang.StringCoding.decode(StringCoding.java:173)


This is going to be a huge problem later on when we index 50k
documents later on.

These are the options I am running Solr with :

java  -Xms2048M -Xmx2048M -XX:+UseConcMarkSweepGC -XX:PermSize=1024m
MaxPermSize=1024m-jar  start.jar


Is there any way I can get the counts and not run out of memory?

Thanks in advance,
Moazzam


RE: how to deal with virtual collection in solr?

2010-09-01 Thread Ma, Xiaohui (NIH/NLM/LHC) [C]
Thank you, Jan. Unfortunately I got following exception when I use 
http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/
 . 

*
Aug 31, 2010 4:54:42 PM org.apache.solr.common.SolrException log
SEVERE: java.lang.NullPointerException
at java.io.StringReader.init(StringReader.java:33)
at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:197)
at 
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78)
at org.apache.solr.search.QParser.getQuery(QParser.java:131)
at 
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
*

-Original Message-
From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] 
Sent: Tuesday, August 31, 2010 2:15 PM
To: solr-user@lucene.apache.org
Subject: Re: how to deal with virtual collection in solr?

Hi,

If you have multiple cores defined in your solr.xml you need to issue your 
queries to one of the cores. Below it seems as if you are lacking core name. 
Try instead:


http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/

And as Lance pointed out, make sure your XML files conform to the Solr XML 
format (http://wiki.apache.org/solr/UpdateXmlMessages).

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 27. aug. 2010, at 15.04, Ma, Xiaohui (NIH/NLM/LHC) [C] wrote:

 Thank you, Jan Høydahl. 
 
 I used 
 http://localhost:8983/solr/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/.
  I got a error Missing solr core name in path. I have aapublic and 
 aaprivate cores. I also got a error if I used 
 http://localhost:8983/solr/aapublic/select?shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/.
  I got a null exception java.lang.NullPointerException. 
 
 My collections are xml files. Please let me if I can use the following way 
 you suggested.
 curl 
 http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true;
  -F fi...@myfile.xml
 
 Thanks so much as always!
 Xiaohui 
 
 
 -Original Message-
 From: Jan Høydahl / Cominvent [mailto:jan@cominvent.com] 
 Sent: Friday, August 27, 2010 7:42 AM
 To: solr-user@lucene.apache.org
 Subject: Re: how to deal with virtual collection in solr?
 
 Hi,
 
 Version 1.4.1 does not support the SolrCloud style sharding. In 1.4.1, please 
 use this style:
 shards=localhost:8983/solr/aaprivate,localhost:8983/solr/aapublic/
 
 
 However, since schema is the same, I'd opt for one index with a collections 
 field as the filter.
 
 You can add that field to your schema, and then inject it as metadata on the 
 ExtractingRequestHandler call:
 
 curl 
 http://localhost:8983/solr/update/extract?literal.collection=aaprivateliteral.id=doc1commit=true;
  -F fi...@myfile.pdf
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 

Do commits block updates in SOLR 1.4?

2010-09-01 Thread Robert Petersen
I can't seem to find a definitive answer.  I have ten threads doing my
indexing and I block all the threads when one is ready to do a commit so
no adds are done until the commit finishes.  Is this still required in
SOLR 1.4 or could I take it out?  I tried testing this on a separate
small index where I set autocommit in solrconfig and seem to have no
issues just continuously adding documents from multiple threads to it
despite its commit activity.  I'd like to do the same in my big main
index, is it safe?

 

Also, is there any difference in behavior between autocommits and
explicit commits in this regard?

 

 



RE: Auto Suggest

2010-09-01 Thread Robert Petersen
I do this by replacing the spaces with a '%' in a separate search field
which is not parsed nor tokenized and then you can wildcard across the
whole phrase like you want and the spaces don't mess you up.  Just store
the original phrase with spaces in a separate field for returning to the
front end for display.

-Original Message-
From: Jazz Globe [mailto:jazzgl...@hotmail.com] 
Sent: Wednesday, September 01, 2010 7:33 AM
To: solr-user@lucene.apache.org
Subject: Auto Suggest


Hallo

How would one implement a multiple term auto-suggest feature in Solr
that is filter sensitive?
For example, a user enters :
mp3
  and solr might suggest:
  -   mp3 player
  -   mp3 nano
  -   mp3 sony
and then the user starts the second word :
mp3 n
and that narrows it down to:
  - mp3 nano

I had a quick look at the Terms Component.
I suppose it just returns term totals for the entire index and cannot be
used with a filter or query?

Thanks
Johan

  


Re: Auto Suggest

2010-09-01 Thread Eric Grobler
Hi Robert,

Interesting approach, how many documents do you have in Solr?
I have about 2 million and I just wonder if it might be a bit slow.

Regards
Johan

On Wed, Sep 1, 2010 at 7:38 PM, Robert Petersen rober...@buy.com wrote:

 I do this by replacing the spaces with a '%' in a separate search field
 which is not parsed nor tokenized and then you can wildcard across the
 whole phrase like you want and the spaces don't mess you up.  Just store
 the original phrase with spaces in a separate field for returning to the
 front end for display.

 -Original Message-
 From: Jazz Globe [mailto:jazzgl...@hotmail.com]
 Sent: Wednesday, September 01, 2010 7:33 AM
 To: solr-user@lucene.apache.org
 Subject: Auto Suggest


 Hallo

 How would one implement a multiple term auto-suggest feature in Solr
 that is filter sensitive?
 For example, a user enters :
 mp3
  and solr might suggest:
  -   mp3 player
  -   mp3 nano
  -   mp3 sony
 and then the user starts the second word :
 mp3 n
 and that narrows it down to:
  - mp3 nano

 I had a quick look at the Terms Component.
 I suppose it just returns term totals for the entire index and cannot be
 used with a filter or query?

 Thanks
 Johan





how do I create custom function that uses multiple ValuesSources?

2010-09-01 Thread Gerald

using the NvlValueSourceParser example, I was able to create a custom
function that has two parameters; a valuesource (a solr field) and a string
literal.  i.e.: myfunc(mysolrfield, test)

it works well but is a pretty simple function.

what is the the best way to implement a (more complex) custom function that
contains two (or more) ValuesSources as parameters??  i.e.:  

myfunc2(myValuesSource1, myValuesSources2, test) or
myfunc2(div(myValuesSource1,3), sum(myValuesSources2, 2), test) or
myfunc2(myValuesSource1, myValuesSource2, myValuesSource3)

I dont have a concrete example right now but will likely get some
application ideas once I figure this out

any thoughts/examples on something like this?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-create-custom-function-that-uses-multiple-ValuesSources-tp1402645p1402645.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Auto Suggest

2010-09-01 Thread Robert Petersen
We don't have that many, just a hundred thousand, and solr response
times (since the index's docs are small and not complex) are logged as
typically 1 ms if not 0 ms.  It's funny but sometimes it is so fast no
milliseconds have elapsed.  Incredible if you ask me...  :)

Once you get SOLR to consider the whole phrase as just one big term, the
wildcard is very fast.

-Original Message-
From: Eric Grobler [mailto:impalah...@googlemail.com] 
Sent: Wednesday, September 01, 2010 12:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Auto Suggest

Hi Robert,

Interesting approach, how many documents do you have in Solr?
I have about 2 million and I just wonder if it might be a bit slow.

Regards
Johan

On Wed, Sep 1, 2010 at 7:38 PM, Robert Petersen rober...@buy.com
wrote:

 I do this by replacing the spaces with a '%' in a separate search
field
 which is not parsed nor tokenized and then you can wildcard across the
 whole phrase like you want and the spaces don't mess you up.  Just
store
 the original phrase with spaces in a separate field for returning to
the
 front end for display.

 -Original Message-
 From: Jazz Globe [mailto:jazzgl...@hotmail.com]
 Sent: Wednesday, September 01, 2010 7:33 AM
 To: solr-user@lucene.apache.org
 Subject: Auto Suggest


 Hallo

 How would one implement a multiple term auto-suggest feature in Solr
 that is filter sensitive?
 For example, a user enters :
 mp3
  and solr might suggest:
  -   mp3 player
  -   mp3 nano
  -   mp3 sony
 and then the user starts the second word :
 mp3 n
 and that narrows it down to:
  - mp3 nano

 I had a quick look at the Terms Component.
 I suppose it just returns term totals for the entire index and cannot
be
 used with a filter or query?

 Thanks
 Johan





Re: Alphanumeric wildcard search problem

2010-09-01 Thread Hasnain

Thankyou for your suggestions

when before removing the wordDelimiterFilterFactory, the results for q=R-*
returned perfect results but not for q=R-1*, also after removing
wordDelimiterFilterFactory, it didnt bring me results for q=R-*

the results before removing wordDelimiterFilterFactory using debugQuery=on
were

response
−
lst name=responseHeader
int name=status0/int
int name=QTime78/int
−
lst name=params
str name=debugQueryon/str
str name=flmat_nr/str
str name=qR-1*/str
str name=qtstandard2/str
/lst
/lst
result name=response numFound=0 start=0/
−
lst name=debug
str name=rawquerystringR-1*/str
str name=querystringR-1*/str
−
str name=parsedquery
+DisjunctionMaxQuery((ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 |
description:r-1*^0.4 | prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
manufact_mat:r-1*^0.4)~0.6) ()
/str
−
str name=parsedquery_toString
+(ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 | description:r-1*^0.4 |
prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
manufact_mat:r-1*^0.4)~0.6 ()
/str
lst name=explain/
str name=QParserDisMaxQParser/str
null name=altquerystring/
null name=boostfuncs/
−
lst name=timing
double name=time31.0/double
−
lst name=prepare
double name=time15.0/double
−
lst name=org.apache.solr.handler.component.QueryComponent
double name=time15.0/double
/lst
−
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
−
lst name=process
double name=time16.0/double
−
lst name=org.apache.solr.handler.component.QueryComponent
double name=time16.0/double
/lst
−
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
/lst
/lst
/response

and after removing wordDelimiterFilterFactory

response
−
lst name=responseHeader
int name=status0/int
int name=QTime78/int
−
lst name=params
str name=debugQueryon/str
str name=flmat_nr/str
str name=qR-1*/str
str name=qtstandard2/str
/lst
/lst
result name=response numFound=0 start=0/
−
lst name=debug
str name=rawquerystringR-1*/str
str name=querystringR-1*/str
−
str name=parsedquery
+DisjunctionMaxQuery((ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 |
description:r-1*^0.4 | prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
manufact_mat:r-1*^0.4)~0.6) ()
/str
−
str name=parsedquery_toString
+(ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 | description:r-1*^0.4 |
prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
manufact_mat:r-1*^0.4)~0.6 ()
/str
lst name=explain/
str name=QParserDisMaxQParser/str
null name=altquerystring/
null name=boostfuncs/
−
lst name=timing
double name=time31.0/double
−
lst name=prepare
double name=time15.0/double
−
lst name=org.apache.solr.handler.component.QueryComponent
double name=time15.0/double
/lst
−
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
−
lst name=process
double name=time16.0/double
−
lst name=org.apache.solr.handler.component.QueryComponent
double name=time16.0/double
/lst
−
lst name=org.apache.solr.handler.component.FacetComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.MoreLikeThisComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.HighlightComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.StatsComponent
double name=time0.0/double
/lst
−
lst name=org.apache.solr.handler.component.DebugComponent
double name=time0.0/double
/lst
/lst
/lst
/lst
/response

also at first the wordDelimiterFilterFactory used was this
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1/
filter class=solr.LowerCaseFilterFactory/

before removing wordDelimiterFilterFactory, solr admin showed

Index Analyzer
org.apache.solr.analysis.WhitespaceTokenizerFactory {}
term position   

Re: Auto Suggest

2010-09-01 Thread Eric Grobler
Thanks for your feedback Robert,

I will try that and see how Solr performs on my data - I think I will create
a field that contains only important key/product terms from the text.

Regards
Johan

On Wed, Sep 1, 2010 at 9:12 PM, Robert Petersen rober...@buy.com wrote:

 We don't have that many, just a hundred thousand, and solr response
 times (since the index's docs are small and not complex) are logged as
 typically 1 ms if not 0 ms.  It's funny but sometimes it is so fast no
 milliseconds have elapsed.  Incredible if you ask me...  :)

 Once you get SOLR to consider the whole phrase as just one big term, the
 wildcard is very fast.

 -Original Message-
 From: Eric Grobler [mailto:impalah...@googlemail.com]
 Sent: Wednesday, September 01, 2010 12:35 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Auto Suggest

 Hi Robert,

 Interesting approach, how many documents do you have in Solr?
 I have about 2 million and I just wonder if it might be a bit slow.

 Regards
 Johan

 On Wed, Sep 1, 2010 at 7:38 PM, Robert Petersen rober...@buy.com
 wrote:

  I do this by replacing the spaces with a '%' in a separate search
 field
  which is not parsed nor tokenized and then you can wildcard across the
  whole phrase like you want and the spaces don't mess you up.  Just
 store
  the original phrase with spaces in a separate field for returning to
 the
  front end for display.
 
  -Original Message-
  From: Jazz Globe [mailto:jazzgl...@hotmail.com]
  Sent: Wednesday, September 01, 2010 7:33 AM
  To: solr-user@lucene.apache.org
  Subject: Auto Suggest
 
 
  Hallo
 
  How would one implement a multiple term auto-suggest feature in Solr
  that is filter sensitive?
  For example, a user enters :
  mp3
   and solr might suggest:
   -   mp3 player
   -   mp3 nano
   -   mp3 sony
  and then the user starts the second word :
  mp3 n
  and that narrows it down to:
   - mp3 nano
 
  I had a quick look at the Terms Component.
  I suppose it just returns term totals for the entire index and cannot
 be
  used with a filter or query?
 
  Thanks
  Johan
 
 
 



Localsolr with Dismax

2010-09-01 Thread Luke Tebbs
Does anyone have any experience with getting dismax to work with a 
geospatial (localsolr) search?


I have the following configuration -


 requestHandler name=standard class=solr.SearchHandler default=true
   lst name=defaults
 str name=defTypedismax/str
 str name=qftitle description^0.5/str
 str name=pftitle description^0.5/str
 str name=mm0%/str
 str name=tie0.1/str
   /lst
 /requestHandler

 requestHandler name=geo class=solr.SearchHandler
   lst name=defaults
 str name=defTypedismax/str
 str name=qftitle description^0.5/str
 str name=pftitle description^0.5/str
 str name=mm0%/str
 str name=tie0.1/str
   /lst
   arr name=components
 strlocalsolr/str 
 strfacet/str

 strmlt/str
 strhighlight/str
 strdebug/str
   /arr
 /requestHandler


All of the location searching works fine, as does the normal search, but 
when using the geo handler the textual search seems to be using the 
standard search handler and only the title field is searched.


I'm a bit stumped on this one, any help would be greatly appreciated.

Luke


Re: Need help with field collapsing and out of memory error

2010-09-01 Thread Jean-Sebastien Vachon
can you tell us what are your current settings regarding the fieldCollapseCache?

I had similar issues with field collapsing and I found out that this cache was 
responsible for 
most of the OOM exceptions.

Reduce or even remove this cache from your configuration and it should help.


On 2010-09-01, at 1:10 PM, Moazzam Khan wrote:

 Hi guys,
 
 I have about 20k documents in the Solr index (and there's a lot of
 text in each of them). I have field collapsing enabled on a specific
 field (AdvisorID).
 
 The thing is if I have field collapsing enabled in the search request
 I don't get correct count for the total number of records that
 matched. It always says that the number of rows I asked to get back
 is the number of total records it found.
 
 And, when I run a query with search criteria *:* (to get the number of
 total advisors in the index) solr runs of out memory and gives me an
 error saying
 
 SEVERE: java.lang.OutOfMemoryError: Java heap space
at java.nio.CharBuffer.wrap(CharBuffer.java:350)
at java.nio.CharBuffer.wrap(CharBuffer.java:373)
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
at java.lang.StringCoding.decode(StringCoding.java:173)
 
 
 This is going to be a huge problem later on when we index 50k
 documents later on.
 
 These are the options I am running Solr with :
 
 java  -Xms2048M -Xmx2048M -XX:+UseConcMarkSweepGC -XX:PermSize=1024m
 MaxPermSize=1024m-jar  start.jar
 
 
 Is there any way I can get the counts and not run out of memory?
 
 Thanks in advance,
 Moazzam



Re: java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-01 Thread Luke Tebbs


Have you tried to up the MaxHeapSize?

I tend to run solr and the development instance in a separate jetty (on 
a separate port) and actually restart the web server for the dev 
application every now and again.
It doesn't take too long if you only have one webapp on jetty - I tend 
to use mvn jetty:run on the CLI rather than launch jetty in eclipse. I 
also use JRebel to reduce the number of restarts needed during dev.


As for a production instance, should you need to redeploy that often?

Luke

Antonio Calo' wrote:

 Hi guys

I'm facing an error in our production environment with our search 
application based on maven with spring + solrj.


When I try to change a class, or try to redeploy/restart an 
application, I catch a java.lang.OutOfMemoryError: PermGen


I've tryed to understand the cause of this and also I've succeded in 
reproducing this issue on my local develop environment by just 
restarting the jetty several time (I'm using eclipse + maven plugin).


The logs obtained are those:

   [...]
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/: org.apache.solr.handler.admin.AdminHandlers
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /admin/ping: PingRequestHandler
   1078 [Timer-1] INFO org.apache.solr.core.RequestHandlers - created
   /debug/dump: solr.DumpRequestHandler
   32656 [Finalizer] INFO org.apache.solr.core.SolrCore - []  CLOSING
   SolrCore org.apache.solr.core.solrc...@1409c28
   17:43:19 ERROR InvertedIndexEngine:124 open -
   java.lang.OutOfMemoryError: PermGen space
   java.lang.RuntimeException: java.lang.OutOfMemoryError: PermGen space
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at org.apache.solr.core.SolrCore.init(SolrCore.java:579)
at
   
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:137) 


at
   
com.intellisemantic.intellifacet.resource.invertedIndex.InvertedIndexEngine.open(InvertedIndexEngine.java:113) 


at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
   
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 


at
   
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 


at java.lang.reflect.Method.invoke(Method.java:597)
at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeCustomInitMethod(AbstractAutowireCapableBeanFactory.java:1536) 


at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.invokeInitMethods(AbstractAutowireCapableBeanFactory.java:1477) 


at
   
org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.initializeBean(AbstractAutowireCapableBeanFactory.java:1409) 


   [...]

The exception is always thrown while solr init is performed after a 
restart (this is the reason why I'm asking your support ;) )


It seems that while solr is trying to be set up (by [Timer-1]), 
another thread ([Finalizer]) is trying to close it. I can see from the 
Solr code that this exception is thrown always in the same place: 
SolrCore.java:1068.

Here there is a comment that say:

   // need to close the searcher here??? we shouldn't have to.
  throw new RuntimeException(th);
} finally {
  if (newestSearcher != null) {
newestSearcher.decref();
  }
}

I'm using slorj lib in a Spring container, so I'm supposing that 
Spring will manage the relase of all the singleton classes. Should I 
do something other like force closing solr?


Thanks in advance for your support.

Best regards

Antonio





Re: how do I create custom function that uses multiple ValuesSources?

2010-09-01 Thread Gerald

Figured this out about ten minutes after I posted the message, and much
simpler than I thought it would be.

I used the SumFloatFunction (which extends MultiFloatFunction) as a starting
point and was able to achieve what I was going for for my test; a simple
string length function that returns the length of the specified fields after
concatenation

very nice being able to create custom functions.

now on to creating custom functions that handle multiValued data types
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-do-I-create-custom-function-that-uses-multiple-ValuesSources-tp1402645p1403070.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: High - Low field value?

2010-09-01 Thread Geert-Jan Brits
StatsComponent is exactly what you're looking for.

http://wiki.apache.org/solr/StatsComponent

http://wiki.apache.org/solr/StatsComponentCheers,
Geert-Jan

2010/9/1 kenf_nc ken.fos...@realestate.com


 I want to do range facets on a couple fields, a Price field in particular.
 But Price is relative to the product type. Books, Automobiles and Houses
 are
 vastly different price ranges, and withing Houses there may be a regional
 difference (price range in San Francisco is different than Columbus, OH for
 example).

 If I do Filter Query on type, so I'm not mixing books with houses, is there
 a quick way in a query to get the High and Low value for a given field? I
 would need those to build my range boundaries more efficiently.

 Ideally it would be a function of the query, so regionality could be taken
 into account. It's not a search score, or a facet, it's more a function. I
 know query functions exist, but haven't had to use them yet and the 'max'
 function doesn't look like what I need.  Any suggestions?
 Thanks.
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/High-Low-field-value-tp1402568p1402568.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: High - Low field value?

2010-09-01 Thread kenf_nc

That's exactly what I want.  I was just searching the wiki using the wrong
terms.
Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/High-Low-field-value-tp1402568p1403164.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with field collapsing and out of memory error

2010-09-01 Thread Moazzam Khan
Hi,


If this is how you configure the field collapsing cache, then I don't
have it setup:


 fieldCollapsing

fieldCollapseCache
  class=solr.FastLRUCache
  size=512
  initialSize=512
  autowarmCount=128/

  /fieldCollapsing


I didnt add that part to solrconfig.xml.

The way I setup field collapsing is I added this tag:

searchComponent name=collapse
class=org.apache.solr.handler.component.CollapseComponent /

Then I modified the default request handler (for standard queries) with this:

 requestHandler name=standard class=solr.SearchHandler default=true
!-- default values for query parameters --
 lst name=defaults
   str name=echoParamsexplicit/str

 /lst
 arr name=components
strcollapse/str
strfacet/str
strhighlight/str
strdebug/str
 /arr
  /requestHandler




On Wed, Sep 1, 2010 at 4:11 PM, Jean-Sebastien Vachon
js.vac...@videotron.ca wrote:
 can you tell us what are your current settings regarding the 
 fieldCollapseCache?

 I had similar issues with field collapsing and I found out that this cache 
 was responsible for
 most of the OOM exceptions.

 Reduce or even remove this cache from your configuration and it should help.


 On 2010-09-01, at 1:10 PM, Moazzam Khan wrote:

 Hi guys,

 I have about 20k documents in the Solr index (and there's a lot of
 text in each of them). I have field collapsing enabled on a specific
 field (AdvisorID).

 The thing is if I have field collapsing enabled in the search request
 I don't get correct count for the total number of records that
 matched. It always says that the number of rows I asked to get back
 is the number of total records it found.

 And, when I run a query with search criteria *:* (to get the number of
 total advisors in the index) solr runs of out memory and gives me an
 error saying

 SEVERE: java.lang.OutOfMemoryError: Java heap space
        at java.nio.CharBuffer.wrap(CharBuffer.java:350)
        at java.nio.CharBuffer.wrap(CharBuffer.java:373)
        at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
        at java.lang.StringCoding.decode(StringCoding.java:173)


 This is going to be a huge problem later on when we index 50k
 documents later on.

 These are the options I am running Solr with :

 java  -Xms2048M -Xmx2048M -XX:+UseConcMarkSweepGC -XX:PermSize=1024m
 MaxPermSize=1024m    -jar  start.jar


 Is there any way I can get the counts and not run out of memory?

 Thanks in advance,
 Moazzam




Re: Download document from solr

2010-09-01 Thread Erick Erickson
SOLR returns an XML packet (well, you can also specify other response
formats, e.b. JSON). Within that XML, there'll be some overall response
characteristics (e.g. number of matches) and a list of documents.

If you do the example setup (http://lucene.apache.org/solr/tutorial.html)
and submit a query you'll see the XML returned (default) right in your
browser. If you're using FireFox or Chrome, you might have to install
an XML plugin to see it nicely formatted.

HTH
Erick

On Wed, Sep 1, 2010 at 11:24 AM, Matteo Moci mox...@libero.it wrote:

  Hello to All,
 I am a newbie with Solr, and I am trying to understand if I can use it form
 my purpose,
 and I was wondering how Solr lists the result documents: do they appear as
 downloadable files,
 just like http://solr.machine.com/path/file.doc, or do I need develop
 another layer to take care of downloading?
 Even a link to the docs might work...

 Thank you,
 Matteo




Re: Alphanumeric wildcard search problem

2010-09-01 Thread Erick Erickson
Oh dear. Wildcard queries aren't analyzed, so I suspect it's a casing issue.

Try two things:
1 search for r-1*
2 look in your index and be sure the actual terms are there as you expect.

HTH
Erick

On Wed, Sep 1, 2010 at 4:35 PM, Hasnain hasn...@hotmail.com wrote:


 Thankyou for your suggestions

 when before removing the wordDelimiterFilterFactory, the results for q=R-*
 returned perfect results but not for q=R-1*, also after removing
 wordDelimiterFilterFactory, it didnt bring me results for q=R-*

 the results before removing wordDelimiterFilterFactory using debugQuery=on
 were

 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime78/int
 −
 lst name=params
 str name=debugQueryon/str
 str name=flmat_nr/str
 str name=qR-1*/str
 str name=qtstandard2/str
 /lst
 /lst
 result name=response numFound=0 start=0/
 −
 lst name=debug
 str name=rawquerystringR-1*/str
 str name=querystringR-1*/str
 −
 str name=parsedquery
 +DisjunctionMaxQuery((ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 |
 description:r-1*^0.4 | prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
 manufact_mat:r-1*^0.4)~0.6) ()
 /str
 −
 str name=parsedquery_toString
 +(ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 | description:r-1*^0.4 |
 prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
 manufact_mat:r-1*^0.4)~0.6 ()
 /str
 lst name=explain/
 str name=QParserDisMaxQParser/str
 null name=altquerystring/
 null name=boostfuncs/
 −
 lst name=timing
 double name=time31.0/double
 −
 lst name=prepare
 double name=time15.0/double
 −
 lst name=org.apache.solr.handler.component.QueryComponent
 double name=time15.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.StatsComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.DebugComponent
 double name=time0.0/double
 /lst
 /lst
 −
 lst name=process
 double name=time16.0/double
 −
 lst name=org.apache.solr.handler.component.QueryComponent
 double name=time16.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.StatsComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.DebugComponent
 double name=time0.0/double
 /lst
 /lst
 /lst
 /lst
 /response

 and after removing wordDelimiterFilterFactory

 response
 −
 lst name=responseHeader
 int name=status0/int
 int name=QTime78/int
 −
 lst name=params
 str name=debugQueryon/str
 str name=flmat_nr/str
 str name=qR-1*/str
 str name=qtstandard2/str
 /lst
 /lst
 result name=response numFound=0 start=0/
 −
 lst name=debug
 str name=rawquerystringR-1*/str
 str name=querystringR-1*/str
 −
 str name=parsedquery
 +DisjunctionMaxQuery((ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 |
 description:r-1*^0.4 | prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
 manufact_mat:r-1*^0.4)~0.6) ()
 /str
 −
 str name=parsedquery_toString
 +(ext_quote_id:r-1*^0.4 | vendor_mat:r-1*^0.4 | description:r-1*^0.4 |
 prod_id:r-1*^0.4 | name:r-1*^2.3 | mat_nr:R-1*^0.4 |
 manufact_mat:r-1*^0.4)~0.6 ()
 /str
 lst name=explain/
 str name=QParserDisMaxQParser/str
 null name=altquerystring/
 null name=boostfuncs/
 −
 lst name=timing
 double name=time31.0/double
 −
 lst name=prepare
 double name=time15.0/double
 −
 lst name=org.apache.solr.handler.component.QueryComponent
 double name=time15.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.StatsComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.DebugComponent
 double name=time0.0/double
 /lst
 /lst
 −
 lst name=process
 double name=time16.0/double
 −
 lst name=org.apache.solr.handler.component.QueryComponent
 double name=time16.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.FacetComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.MoreLikeThisComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.HighlightComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.StatsComponent
 double name=time0.0/double
 /lst
 −
 lst name=org.apache.solr.handler.component.DebugComponent
 double 

In Need of Direction; Phrase-Context Tracking / Injection (Child Indexes) / Dismissal

2010-09-01 Thread Scott Gonyea
Hi,

I'm looking to get some direction on where I should focus my attention, with
regards to the Solr codebase and documentation.  Rather than write a ton of
stuff no one wants to read, I'll just start with a use-case.  For context,
the data originates from Nutch crawls and is indexed into Solr.

Imagine a web page has the following content (4 occurences of Johnson are
bolded):

--content_--
Lorem ipsum dolor *Johnson* sit amet, consectetur adipiscing elit. Aenean id
urna et justo fringilla dictum *johnson* in at tortor. Nulla eu nulla magna,
nec sodales est. Sed *johnSon* sed elit non lorem sagittis fermentum. Mauris
a arcu et sem sagittis rhoncus vel malesuada *Johnsons* mi. Morbi eget
ligula nisi. Ut fringilla ullamcorper sem.
--_content--

*First*; I would like to have the entire content block be indexed within
Solr.  This is done and definitely not an issue.

*Second* (+); during the injection of crawl data into Solr, I would like to
grab every occurence of a specific word, or phrase, with Johnson being my
example for the above.  I want to take every such phrase (without
collision), as well as its unique-context, and inject that into its own,
separate Solr index.  For example, the above content example, having been
indexed in its entirety, would also be the source of 4 additional indexes.
In each index, Johnson would only appear once.  All of the text before and
after Johnson would be BOUND BY any other occurrence of Johnson.  eg:

--index1_--
Lorem ipsum dolor *Johnson* sit amet, consectetur adipiscing elit. Aenean id
urna et justo fringilla dictum
--_index1-- --index2_--
sit amet, consectetur adipiscing elit. Aenean id urna et justo fringilla
dictum *johnson* in at tortor. Nulla eu nulla magna, nec sodales est. Sed
--_index2-- --index3_--
in at tortor. Nulla eu nulla magna, nec sodales est. Sed *johnSon* sed elit
non lorem sagittis fermentum. Mauris a arcu et sem sagittis rhoncus vel
malesuada
--_index3-- --index4_--
sed elit non lorem sagittis fermentum. Mauris a arcu et sem sagittis rhoncus
vel malesuada *Johnsons* mi. Morbi eget ligula nisi. Ut fringilla
ullamcorper sem.
--_index4--

Q:
How much of this is feasible in present-day Solr and how much of it do I
need to produce in a patch of my own?  Can anyone give me some direction on
where I should look, in approaching this problem (ie, libs / classes /
confs)?  I sincerely appreciate it.

*Third*; I would later like to go through the above, child indexes and
dismiss any that appear within a given context.  For example, I may deem
ipsum dolor *Johnson* sit amet as not being useful and I'd want to delete
any indexes matching that particular phrase-context.  The deletion is
trivial and, with the 2nd item resolved--this becomes a fairly non-issue.

Q:
The question, more or less, comes from the fact that my source data is from
a web crawler.  When recrawled, I need to repeat the process of dismissing
phrase-contexts that are not relevant to me.  Where is the best place to
perform this work?  I could easily perform queries, after indexing my crawl,
but that seems needlessly intensive.  I think the answer to that will be
wherever I implement #2, but assumptions can be painfully expensive.


Thank you for reading my bloated e-mail.  Again, I'm mostly just looking to
be pointed to various pieces of the Lucene / Solr code-base, and am trolling
for any insight that people might share.

Scott Gonyea


Re: Hardware Specs Question

2010-09-01 Thread Lance Norskog
I was just reading about configuring mass computation grids: hardware
writes on 2 striped disks take 10% than writes on a single disk,
because you have to wait for the slower disk to finish. So, single
disks without RAID are faster.

I don't know how much SSD disks cost, but they will certainly cure the
disk i/o problem.

On Tue, Aug 31, 2010 at 1:35 AM, scott chu (朱炎詹) scott@udngroup.com wrote:
 In our current lab project, we already built a Chinese newspaper index with
 18 millions documents. The index size is around 51GB. So I am very concerned
 about the memory issue you guys mentioned.

 I also look up the Hathitrust report on SolrPerformanceData page:
 http://wiki.apache.org/solr/SolrPerformanceData. They said their main
 bottleneck is Disk-I/O even they have 10 shards spread over 4 servers.

 Can you guys give me some helpful suggestion about hardward spec  memory
 configuration on our project?

 Thanks in advance.

 Scott

 - Original Message - From: Lance Norskog goks...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 31, 2010 1:01 PM
 Subject: Re: Hardware Specs Question


 There are synchronization points, which become chokepoints at some
 number of cores. I don't know where they cause Lucene to top out.
 Lucene apps are generally disk-bound, not CPU-bound, but yours will
 be. There are so many variables that it's really not possible to give
 any numbers.

 Lance

 On Mon, Aug 30, 2010 at 8:34 PM, Amit Nithian anith...@gmail.com wrote:

 Lance,

 makes sense and I have heard about the long GC times on large heaps but I
 personally haven't experienced a slowdown but that doesn't mean anything
 either :-). Agreed that tuning the SOLR caching is the way to go.

 I haven't followed all the solr/lucene changes but from what I remember
 there are synchronization points that could be a bottleneck where adding
 more cores won't help this problem? Or am I completely missing something.

 Thanks again
 Amit

 On Mon, Aug 30, 2010 at 8:28 PM, scott chu (朱炎詹)
 scott@udngroup.comwrote:

 I am also curious as Amit does. Can you make an example about the garbage
 collection problem you mentioned?

 - Original Message - From: Lance Norskog goks...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Tuesday, August 31, 2010 9:14 AM
 Subject: Re: Hardware Specs Question



 It generally works best to tune the Solr caches and allocate enough

 RAM to run comfortably. Linux  Windows et. al. have their own cache
 of disk blocks. They use very good algorithms for managing this cache.
 Also, they do not make long garbage collection passes.

 On Mon, Aug 30, 2010 at 5:48 PM, Amit Nithian anith...@gmail.com
 wrote:

 Lance,

 Thanks for your help. What do you mean by that the OS can keep the
 index
 in
 memory better than Solr? Do you mean that you should use another means
 to
 keep the index in memory (i.e. ramdisk)? Is there a generally accepted
 heap
 size/index size that you follow?

 Thanks
 Amit

 On Mon, Aug 30, 2010 at 5:00 PM, Lance Norskog goks...@gmail.com
 wrote:

 The price-performance knee for small servers is 32G ram, 2-6 SATA

 disks on a raid, 8/16 cores. You can buy these servers and half-fill
 them, leaving room for expansion.

 I have not done benchmarks about the max # of processors that can be
 kept busy during indexing or querying, and the total numbers: QPS,
 response time averages  variability, etc.

 If your index file size is 8G, and your Java heap is 8G, you will do
 long garbage collection cycles. The operating system is very good at
 keeping your index in memory- better than Solr can.

 Lance

 On Mon, Aug 30, 2010 at 4:52 PM, Amit Nithian anith...@gmail.com
 wrote:
  Hi all,
 
  I am curious to know get some opinions on at what point having more
    
 CPU
  cores shows diminishing returns in terms of QPS. Our index size is 
 about
 8GB
  and we have 16GB of RAM on a quad core 4 x 2.4 GHz AMD Opteron 2216.
  Currently I have the heap to 8GB.
 
  We are looking to get more servers to increase capacity and because
    
 the
  warranty is set to expire on our old servers and so I was curious 
 before
  asking for a certain spec what others run and at what point does 
 having
 more
  cores cease to matter? Mainly looking at somewhere between 4-12 
  cores
  per
  server.
 
  Thanks!
  Amit
 



 --
 Lance Norskog
 goks...@gmail.com





 --
 Lance Norskog
 goks...@gmail.com





 



 ___b___J_T_f_r_C
 Checked by AVG - www.avg.com
 Version: 9.0.851 / Virus Database: 271.1.1/3102 - Release Date: 08/30/10
 14:35:00






 --
 Lance Norskog
 goks...@gmail.com



 



 ___b___J_T_f_r_C
 Checked by AVG - www.avg.com
 Version: 9.0.851 / Virus Database: 271.1.1/3103 - Release Date: 08/31/10
 02:34:00





-- 
Lance Norskog
goks...@gmail.com


Re: Custom scoring

2010-09-01 Thread Lance Norskog
Check out the function query feature, and the bf= parameter. It may be
that the existing functions meet your needs, or that you can add a few
new functions.

It can take a while to understand what you really want to do, so
writing a large piece of code now can be wasteful.

On Mon, Aug 30, 2010 at 2:04 PM, Brad Kellett b...@chomp.com wrote:
 Hi all,

 I'm looking for examples or pointers to some info on implementing custom 
 scoring in solr/lucene. Basically, what we're looking at doing is to augment 
 the score from a dismax query with some custom signals based on data in 
 fields from the row initially matched. There will be several of these 
 features dynamically scored at query-time (due to the nature of the data, 
 pre-computed stuff isn't really what we're looking for).

 I do apologize for the vagueness of this, but a lot of this data is stuff we 
 want to keep under wraps. Essentially, I'm just looking for a place to use 
 some custom java code to be able to manipulate the score for a row matched in 
 a dismax query.

 I've been Googling like a mad man, but haven't really hit on something that 
 seems ideal yet. Custom similarity appears to just allow changing the 
 components of the TF-IDF score, for example. Can someone point me to an 
 example of doing something like this?

 ~Brad



-- 
Lance Norskog
goks...@gmail.com


Re: Distance sorting with spatial filtering

2010-09-01 Thread Lance Norskog
Post your schema.

On Mon, Aug 30, 2010 at 2:04 PM, Scott K s...@skister.com wrote:
 The new spatial filtering (SOLR-1586) works great and is much faster
 than fq={!frange. However, I am having problems sorting by distance.
 If I try
 GET 
 'http://localhost:8983/solr/select/?q=*:*sort=dist(2,latitude,longitude,0,0)+asc'
 I get an error:
 Error 400 can not sort on unindexed field: dist(2,latitude,longitude,0,0)

 I was able to work around this with
 GET 'http://localhost:8983/solr/select/?q=*:* AND _val_:recip(dist(2,
 latitude, longitude, 0,0),1,1,1)fl=*,score'

 But why isn't sorting by functions working? I get this error with any
 function I try to sort on.This is a nightly trunk build from Aug 25th.
 I see SOLR-1297 was reopened, but that seems to be for edge cases.

 Second question: I am using the LatLonType from the Spatial Filtering
 wiki, http://wiki.apache.org/solr/SpatialSearch
 Are there any distance sorting functions that use this field, or do I
 need to have three indexed fields, store_lat_lon, latitude, and
 longitude, if I want both filtering and sorting by distance.

 Thanks, Scott




-- 
Lance Norskog
goks...@gmail.com