Re: Filtering results based on a set of values for a field

2011-08-19 Thread Tomas Zerolo
On Thu, Aug 18, 2011 at 02:32:48PM -0400, Erick Erickson wrote:
 Hmmm, I'm still not getting it...
 
 You have one or more lists. These lists change once a month or so. Are
 you trying
 to include or exclude the documents in these lists?

In our specific case to include *only* the documents having a value of
an attribute (author) in this list (the user decides at query time
which of those lists to use). But we do expect the problem to become
more general over time...

 And do the authors you 
 want
 to include or exclude change on a per-query basis or would you be all set if 
 you
 just had a filter that applied to all the authors on a particular list?

No. ATM there are two fixed lists (in the sense that they are updated
like monthly. One problem: the document basis itself is huge (in the
abouts of 3.5 million). Re-indexing is a painful exercise taking days,
so we tend not to do it too often ;-)

 But I *think* what you want is a SearchComponent that implements your
 Filter. You can see various examples of how to add components to a seach
 handler in the solrconfig.xml file.

Thanks a lot for the pointer. Rushing to read on it. 

 WARNING: Haven't done this myself, so I'm partly guessing here.

Hey: I asked for pointers and you're giving me some, so I'm a happy
man now :-)

 Although here's a hint that someone else has used this approach:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg54240.html

Thanks again

 And you'll want to insure that the Filter is cached so you don't have to 
 compute
 it more than once.

Yes, I hope that will be the trick giving us the needed boost. Somehow
we'll have to figure out how to drop the cache when a new version of the
list arrives (without killing everyone in the building).

I'll sure report back.

Regards
-- tomás


RE: Full sentence spellcheck

2011-08-19 Thread Valentin
Actually, that's not my problem, I do specify q. 

Another idea ? It really makes me crazy...

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267394.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: get update record from database using DIH

2011-08-19 Thread Gora Mohanty
On Fri, Aug 19, 2011 at 5:32 AM, Alexandre Sompheng asomph...@gmail.com wrote:
 Hi guys, i try the delta import, i got logs saying that it found delta
 data to update. But it seems that the index is not updated. Amy guess
 why this happens ? Did i miss something? I'm on solr 3.3 with no
 patch.
[...]

Please show us the following:
* The exact URL you loaded for delta-import
* The Solr response which shows the delta documents that it found,
   and the status of the delta-import.
If your index is large, and if you are running an optimise after the
delta-import (the default is to optimise), it can take some time.
Check the status: It will say busy if the optimise is still running.

Regards,
Gora


can't use distributed spell check

2011-08-19 Thread Li Li
hi all,
I tested it following the instructions in
http://wiki.apache.org/solr/SpellCheckComponent. but it seems something
wrong.
 the sample url in the wiki is

http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr
It can't work and I read the codes, it seems qt=/spell shards.qt=/spell
After I modified the url, it search all the documents but without any
spell suggetions. I debuged it and found the method getSuggestions() in
AbstractLuceneSpellChecker are called.


solr distributed search don't work

2011-08-19 Thread Li Li
hi all,
 I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
but there is something wrong.
 the url given my the wiki is
http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr
 but it does not work. I trace the codes and find that
qt=spellshards.qt=spell should be qt=/spellshards.qt=/spell
 After modification of url, It return all documents but nothing
about spell check.
 I debug it and find the
AbstractLuceneSpellChecker.getSuggestions() is called.


Re: Full sentence spellcheck

2011-08-19 Thread Li Li
 this may need something like language models to suggest.
 I found an issue https://issues.apache.org/jira/browse/SOLR-2585
 what's going on with it?

On Thu, Aug 18, 2011 at 11:31 PM, Valentin igorlacro...@gmail.com wrote:
 I'm trying to configure a spellchecker to autocomplete full sentences from my
 query.

 I've already been able to get this results:

 /american israel :
 - american something
 - israel something/

 But i want :

 /american israel :
 - american israel something/

 This is my solrconfig.xml :

 /searchComponent name=suggest_full class=solr.SpellCheckComponent
  str name=queryAnalyzerFieldTypesuggestTextFull/str
  lst name=spellchecker
  str name=namesuggest_full/str
  str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
  str
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
  str name=fieldtext_suggest_full/str
  str name=fieldTypesuggestTextFull/str
  /lst
 /searchComponent

 requestHandler name=/suggest_full
 class=org.apache.solr.handler.component.SearchHandler
 lst name=defaults
  str name=echoParamsexplicit/str
  str name=spellchecktrue/str
  str name=spellcheck.dictionarysuggest_full/str
  str name=spellcheck.count10/str
  str name=spellcheck.onlyMorePopulartrue/str
 /lst
 arr name=last-components
  strsuggest_full/str
 /arr
 /requestHandler/

 And this is my schema.xml:

 /fieldType name=suggestTextFull class=solr.TextField
  analyzer type=index
    tokenizer class=solr.KeywordTokenizerFactory/
    filter class=solr.LowerCaseFilterFactory/
    filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
    tokenizer class=solr.KeywordTokenizerFactory/
    filter class=solr.LowerCaseFilterFactory/
  /analyzer
 /fieldType

 ...

 field name=text_suggest_full type=suggestTextFull indexed=true
 stored=false multiValued=true//

 I've read somewhere that I have to use spellcheck.q because q use the
 WhitespaceAnalyzer, but when I use spellcheck.q i get a
 java.lang.NullPointerException

 Any ideas ?

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3265257.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr distributed search don't work

2011-08-19 Thread olivier sallou
Hi,
I do not use spell but I use distributed search, using qt=spell is correct,
should not use qt=\spell.
For shards, I specify it in solrconfig directly, not in url, but should
work the same.
Maybe an issue in your spell request handler.


2011/8/19 Li Li fancye...@gmail.com

 hi all,
 I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
 but there is something wrong.
 the url given my the wiki is

 http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr
 but it does not work. I trace the codes and find that
 qt=spellshards.qt=spell should be qt=/spellshards.qt=/spell
 After modification of url, It return all documents but nothing
 about spell check.
 I debug it and find the
 AbstractLuceneSpellChecker.getSuggestions() is called.



Re: paging size in SOLR

2011-08-19 Thread jame vaalet
1 .what does this specify ?

queryResultCache class=*solr.LRUCache*
size=*${queryResultCacheSize:0}*initialSize
=*${queryResultCacheInitialSize:0}* autowarmCount=*
${queryResultCacheRows:0}* /

2.

when i say *queryResultCacheSize : 512 *, does it mean 512 queries can be
cached or 512 bytes are reserved for caching ?

can some please give me an answer ?



On 14 August 2011 21:41, Erick Erickson erickerick...@gmail.com wrote:

 Yep.

 ResultWindowSize in
  solrconfig.xml
 
  Best
  Erick
 
  On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet jamevaa...@gmail.com
 wrote:
   thanks erick ... that means it depends upon the memory allocated to
 the
  JVM
   .
  
   going back queryCacheResults factor i have got this doubt ..
   say, i have got 10 threads with 10 different queries ..and each of
 them
  in
   parallel are searching the same index with millions of docs in it
   (multisharded ) .
   now each of the queries have large number of results in it hence got
 to
  page
   them all..
   which all thread's (query ) result-set will be cached ? so that
  subsequent
   pages can be retrieved quickly ..?
  
   On 14 August 2011 17:40, Erick Erickson erickerick...@gmail.com
 wrote:
  
   There isn't an optimum page size that I know of, it'll vary with
 lots
  of
   stuff, not the least of which is whatever servlet container limits
 there
   are.
  
   But I suspect you can get quite a few (1000s) without
   too much problem, and you can always use the JSON response
   writer to pack in more pages with less overhead.
  
   You pretty much have to try it and see.
  
   Best
   Erick
  
   On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet jamevaa...@gmail.com
  wrote:
speaking about pagesizes, what is the optimum page size that should
 be
retrieved each time ??
i understand it depends upon the data you are fetching back
 fromeach
  hit
document ... but lets say when ever a document is hit am fetching
 back
   100
bytes worth data from each of those docs in indexes (along with
 solr
response statements ) .
this will make 100*x bytes worth data in each page if x is the page
  size
   ..
what is the optimum value of this x that solr can return each time
   without
going into exceptions 
   
On 13 August 2011 19:59, Erick Erickson erickerick...@gmail.com
  wrote:
   
Jame:
   
You control the number via settings in solrconfig.xml, so it's
up to you.
   
Jonathan:
Hmmm, that's seems right, after all the deep paging penalty is
  really
about keeping a large sorted array in memory but at least you
  only
pay it once per 10,000, rather than 100 times (assuming page size
 is
100)...
   
Best
Erick
   
On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet 
 jamevaa...@gmail.com
wrote:
 when you say queryResultCache, does it only cache n number of
  result
   for
the
 last one query or more than one queries?


 On 10 August 2011 20:14, simon mtnes...@gmail.com wrote:

 Worth remembering there are some performance penalties with
 deep
 paging, if you use the page-by-page approach. may not be too
 much
  of
   a
 problem if you really are only looking to retrieve 10K docs.

 -Simon

 On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
 erickerick...@gmail.com wrote:
  Well, if you really want to you can specify start=0 and
  rows=1
   and
  get them all back at once.
 
  You can do page-by-page by incrementing the start parameter
 as
   you
  indicated.
 
  You can keep from re-executing the search by setting your
 queryResultCache
  appropriately, but this affects all searches so might be an
  issue.
 
  Best
  Erick
 
  On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet 
  jamevaa...@gmail.com
   
 wrote:
  hi,
  i want to retrieve all the data from solr (say 10,000 ids )
 and
  my
page
 size
  is 1000 .
  how do i get back the data (pages) one after other ?do i
 have
  to
 increment
  the start value each time by the page size from 0 and do
 the
iteration
 ?
  In this case am i querying the index 10 time instead of one
 or
   after
 first
  query the result will be cached somewhere for the subsequent
  pages
   ?
 
 
  JAME VAALET
 
 




 --

 -JAME

   
   
   
   
--
   
-JAME
   
  
  
  
  
   --
  
   -JAME
  
 
 
 
 
  --
 
  -JAME
 




-- 

-JAME


Re: solr distributed search don't work

2011-08-19 Thread Li Li
could you please show me your configuration in solrconfig.xml?

On Fri, Aug 19, 2011 at 5:31 PM, olivier sallou
olivier.sal...@gmail.com wrote:
 Hi,
 I do not use spell but I use distributed search, using qt=spell is correct,
 should not use qt=\spell.
 For shards, I specify it in solrconfig directly, not in url, but should
 work the same.
 Maybe an issue in your spell request handler.


 2011/8/19 Li Li fancye...@gmail.com

 hi all,
     I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent
 but there is something wrong.
     the url given my the wiki is

 http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr
     but it does not work. I trace the codes and find that
 qt=spellshards.qt=spell should be qt=/spellshards.qt=/spell
     After modification of url, It return all documents but nothing
 about spell check.
     I debug it and find the
 AbstractLuceneSpellChecker.getSuggestions() is called.




Re: Boost documents based on the number of their fields

2011-08-19 Thread Marc Sturlese
You have different options here. You can give more boost at indexing time to
the documents that have set the fields you want. For this to take effect you
will have to reindex and set omitNorms=false to the fields you are going
to search. This same concept can be applied to boost single fields instead
of whole document boost.
Another option would be to use boosting queries at search time such as:
bq=video:[* TO *]^100 (this gives more boost to the documents that have
whatever value in video field).

The second one is much easy to play with as you don't have to reindex every
time you change a value. On the other said you pay the performance penalty
of running one extra query.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Boost-documents-based-on-the-number-of-their-fields-tp3266875p3267628.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Full sentence spellcheck

2011-08-19 Thread Valentin
I don't think it wil lhelp me, sorry. I just want my query to not be
tokenised, I want it to be considered as a full sentence to correct.

But thanks for your answers, I keep searching.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267629.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Copyfields

2011-08-19 Thread Nicholas Fellows
currently our full index takes around half an hour - its a big dataset ~
serveral million records of detailed product information  - this is actually
very quick compared to another one of my installations. I would be
interested to know which of these methods would reduce indexing time the
most.

N ...

On 18 August 2011 17:20, Jaeger, Jay - DOT jay.jae...@dot.wi.gov wrote:

 I would suggest #3, unless you have some very unusual performance
 requirements.  It has the advantage of isolating your index environment
  requirements from the database.


 -Original Message-
 From: Nicholas Fellows [mailto:n...@djdownload.com]
 Sent: Thursday, August 18, 2011 8:40 AM
 To: solr-user@lucene.apache.org
 Subject: Solr Copyfields

 Hi I have a question regarding
 CopyFields in Solr

 As far as i can tell there are several ways to do the same thing

 1) create an alias in the SQL Query and Delta Queries
 2) specify multiple fields in the db-data-config.xml having different
 names for the same column
 3) use the copyField directive in schema.xml

 is there any difference to these approaches? like indexing speed , query
 performance, memory consumption etc?

 Kind Regards

 Nick

 --
 Nick Fellows
 DJdownload.com
 ---
 10 Greenland Street
 London
 NW10ND
 United Kingdom
 ---
 n...@djdownload.com (E)

 ---
 www.djdownload.com




-- 
Nick Fellows
DJdownload.com
---
10 Greenland Street
London
NW10ND
United Kingdom
---
n...@djdownload.com (E)

---
www.djdownload.com


Re: Full sentence spellcheck

2011-08-19 Thread Li Li
I haven't used suggest yet. But in spell check if you don't
provide spellcheck.q, it will
analyze the q parameter by a converter which tokenize your query.
else it will use the analyzer of the field to process parameter q.
If you don't want to tokenize  query, you should pass spellcheck.q
and provide your own analyzer such as keyword analyzer.
you can achieve this by add   str name=fieldTypestring/str
lst name=spellchecker
  str name=classnamesolr.FileBasedSpellChecker/str
  str name=namefile/str
  str name=sourceLocationspellings.txt/str
  str name=characterEncodingUTF-8/str
  str name=spellcheckIndexDir./spellchecker2/str
  str name=fieldTypestring/str
/lst

The wiki says str name=queryAnalyzerFieldTypetextSpell/str
But I read the codes of solr 1.4.1 and latest lucene/solr 4 trunk
the both use the following codes. I think the wiki is out of date.
public static final String FIELD_TYPE = fieldType;

 
fieldTypeName = (String) config.get(FIELD_TYPE);
if (core.getSchema().getFieldTypes().containsKey(fieldTypeName))  {
  FieldType fieldType = core.getSchema().getFieldTypes().get(fieldTypeName);
  analyzer = fieldType.getQueryAnalyzer();
}

if you use file based spell check, it's ok. but for index based,
if you tokenize the field, but do not tokenize your query, you still
can get correct result.
On Fri, Aug 19, 2011 at 5:40 PM, Valentin igorlacro...@gmail.com wrote:
 I don't think it wil lhelp me, sorry. I just want my query to not be
 tokenised, I want it to be considered as a full sentence to correct.

 But thanks for your answers, I keep searching.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267629.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Content recommendation using solr?

2011-08-19 Thread Arcadius Ahouansou
Thanks Omri, that looks interesting.

What I'm looking for is for movies and close to jinni.com.
They seem to be using JEE, but not sure about Solr/Lucene though.


Thanks.

Arcadius.




On Thu, Aug 18, 2011 at 3:25 PM, Omri Cohen omri...@gmail.com wrote:

 check out OutBrain



Re: Full sentence spellcheck

2011-08-19 Thread Valentin

Li Li wrote:
 If you don't want to tokenize  query, you should pass spellcheck.q
 and provide your own analyzer such as keyword analyzer.

That's already what I do with my suggestTextFull fieldType, added to my
searchComponent, no ? I've copied my fieldType and my searchComponent on my
first post. The only big difference between your parameters and mine is:

  str name=characterEncodingUTF-8/str .

But I don't think it resolves the problem of the NullPointerException when i
use spellcheck.q :/


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267724.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Full sentence spellcheck

2011-08-19 Thread Li Li
NullPointerException? do you have the full exception print stack?

On Fri, Aug 19, 2011 at 6:49 PM, Valentin igorlacro...@gmail.com wrote:

 Li Li wrote:
 If you don't want to tokenize  query, you should pass spellcheck.q
 and provide your own analyzer such as keyword analyzer.

 That's already what I do with my suggestTextFull fieldType, added to my
 searchComponent, no ? I've copied my fieldType and my searchComponent on my
 first post. The only big difference between your parameters and mine is:

   str name=characterEncodingUTF-8/str .

 But I don't think it resolves the problem of the NullPointerException when i
 use spellcheck.q :/


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267724.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Full sentence spellcheck

2011-08-19 Thread Valentin
My beautiful NullPointer Exception : 


SEVERE: java.lang.NullPointerException
at
org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476)
at
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267771.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Full sentence spellcheck

2011-08-19 Thread Li Li
Line 476 of  SpellCheckComponent.getTokens of mine  is  assert analyzer != null;
it seems our codes' versions don't match. could you decompile your
SpellCheckComponent.class ?


On Fri, Aug 19, 2011 at 7:23 PM, Valentin igorlacro...@gmail.com wrote:
 My beautiful NullPointer Exception :


 SEVERE: java.lang.NullPointerException
        at
 org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476)
        at
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131)
        at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
        at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
        at
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267771.html
 Sent from the Solr - User mailing list archive at Nabble.com.



query cache result

2011-08-19 Thread jame vaalet
hi,
i understand that queryResultCache tag in solrconfig is the one which
determines the cache size of SOLR in jvm.

queryResultCache class=*solr.LRUCache*
size=*${queryResultCacheSize:0}*initialSize
=*${queryResultCacheInitialSize:0}* autowarmCount=*
${queryResultCacheRows:0}* /


out of the different attributes what is size? Is it the amount of memory
reserved in bytes ? or number of doc ids cached ? or is it the number of
queries it will cache?

similarly wat is initial size and autowarm depicted in?

can some please reply ...


Re: Full sentence spellcheck

2011-08-19 Thread Li Li
or your analyzer is null? any other exception or warning in your log file?

On Fri, Aug 19, 2011 at 7:37 PM, Li Li fancye...@gmail.com wrote:
 Line 476 of  SpellCheckComponent.getTokens of mine  is  assert analyzer != 
 null;
 it seems our codes' versions don't match. could you decompile your
 SpellCheckComponent.class ?


 On Fri, Aug 19, 2011 at 7:23 PM, Valentin igorlacro...@gmail.com wrote:
 My beautiful NullPointer Exception :


 SEVERE: java.lang.NullPointerException
        at
 org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476)
        at
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131)
        at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
        at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
        at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
        at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
        at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
        at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
        at
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
        at
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)


 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267771.html
 Sent from the Solr - User mailing list archive at Nabble.com.




Re: Full sentence spellcheck

2011-08-19 Thread Valentin
My analyser is not empty : 

/fieldType name=suggestTextFull class=solr.TextField
  analyzer type=index  
tokenizer class=solr.KeywordTokenizerFactory/  
filter class=solr.LowerCaseFilterFactory/  
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query  
tokenizer class=solr.KeywordTokenizerFactory/  
filter class=solr.LowerCaseFilterFactory/  
  /analyzer
/fieldType/

and i'm sure there is words in it


I don't know where to find this file
org.apache.solr.handler.component.SpellCheckComponent.getTokens

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267833.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Full sentence spellcheck

2011-08-19 Thread Will Oberman
This might be unrelated, but I had the exact same error yesterday  
trying to replace the query converter with a custom class I wrote.  
Ended up, I wasn't properly registering my jar.  I'm still testing  
with jetty, and lib in example is included too late in the startup  
process. I had to rebundle the war with my jar in the web-inf lib.


On Aug 19, 2011, at 8:01 AM, Valentin  igorlacro...@gmail.com wrote:


My analyser is not empty :

/fieldType name=suggestTextFull class=solr.TextField
 analyzer type=index
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
/fieldType/

and i'm sure there is words in it


I don't know where to find this file
org.apache.solr.handler.component.SpellCheckComponent.getTokens

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267833.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: paging size in SOLR

2011-08-19 Thread Erick Erickson
1 I don't know, where is it coming from? Looks like you've done stats call on
a freshly opened server.

2 512 entries (i.e. results for 512 queries). Each entry is
queryResultWindowSize
doc IDs.

Best
Erick

On Fri, Aug 19, 2011 at 5:33 AM, jame vaalet jamevaa...@gmail.com wrote:
 1 .what does this specify ?

 queryResultCache class=*solr.LRUCache*
 size=*${queryResultCacheSize:0}*initialSize
 =*${queryResultCacheInitialSize:0}* autowarmCount=*
 ${queryResultCacheRows:0}* /

 2.

 when i say *queryResultCacheSize : 512 *, does it mean 512 queries can be
 cached or 512 bytes are reserved for caching ?

 can some please give me an answer ?



 On 14 August 2011 21:41, Erick Erickson erickerick...@gmail.com wrote:

 Yep.

 ResultWindowSize in
  solrconfig.xml
 
  Best
  Erick
 
  On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet jamevaa...@gmail.com
 wrote:
   thanks erick ... that means it depends upon the memory allocated to
 the
  JVM
   .
  
   going back queryCacheResults factor i have got this doubt ..
   say, i have got 10 threads with 10 different queries ..and each of
 them
  in
   parallel are searching the same index with millions of docs in it
   (multisharded ) .
   now each of the queries have large number of results in it hence got
 to
  page
   them all..
   which all thread's (query ) result-set will be cached ? so that
  subsequent
   pages can be retrieved quickly ..?
  
   On 14 August 2011 17:40, Erick Erickson erickerick...@gmail.com
 wrote:
  
   There isn't an optimum page size that I know of, it'll vary with
 lots
  of
   stuff, not the least of which is whatever servlet container limits
 there
   are.
  
   But I suspect you can get quite a few (1000s) without
   too much problem, and you can always use the JSON response
   writer to pack in more pages with less overhead.
  
   You pretty much have to try it and see.
  
   Best
   Erick
  
   On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet jamevaa...@gmail.com
  wrote:
speaking about pagesizes, what is the optimum page size that should
 be
retrieved each time ??
i understand it depends upon the data you are fetching back
 fromeach
  hit
document ... but lets say when ever a document is hit am fetching
 back
   100
bytes worth data from each of those docs in indexes (along with
 solr
response statements ) .
this will make 100*x bytes worth data in each page if x is the page
  size
   ..
what is the optimum value of this x that solr can return each time
   without
going into exceptions 
   
On 13 August 2011 19:59, Erick Erickson erickerick...@gmail.com
  wrote:
   
Jame:
   
You control the number via settings in solrconfig.xml, so it's
up to you.
   
Jonathan:
Hmmm, that's seems right, after all the deep paging penalty is
  really
about keeping a large sorted array in memory but at least you
  only
pay it once per 10,000, rather than 100 times (assuming page size
 is
100)...
   
Best
Erick
   
On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet 
 jamevaa...@gmail.com
wrote:
 when you say queryResultCache, does it only cache n number of
  result
   for
the
 last one query or more than one queries?


 On 10 August 2011 20:14, simon mtnes...@gmail.com wrote:

 Worth remembering there are some performance penalties with
 deep
 paging, if you use the page-by-page approach. may not be too
 much
  of
   a
 problem if you really are only looking to retrieve 10K docs.

 -Simon

 On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson
 erickerick...@gmail.com wrote:
  Well, if you really want to you can specify start=0 and
  rows=1
   and
  get them all back at once.
 
  You can do page-by-page by incrementing the start parameter
 as
   you
  indicated.
 
  You can keep from re-executing the search by setting your
 queryResultCache
  appropriately, but this affects all searches so might be an
  issue.
 
  Best
  Erick
 
  On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet 
  jamevaa...@gmail.com
   
 wrote:
  hi,
  i want to retrieve all the data from solr (say 10,000 ids )
 and
  my
page
 size
  is 1000 .
  how do i get back the data (pages) one after other ?do i
 have
  to
 increment
  the start value each time by the page size from 0 and do
 the
iteration
 ?
  In this case am i querying the index 10 time instead of one
 or
   after
 first
  query the result will be cached somewhere for the subsequent
  pages
   ?
 
 
  JAME VAALET
 
 




 --

 -JAME

   
   
   
   
--
   
-JAME
   
  
  
  
  
   --
  
   -JAME
  
 
 
 
 
  --
 
  -JAME
 




 --

 -JAME



Re: query cache result

2011-08-19 Thread Tomás Fernández Löbbe
Hi Jame, the size for the queryResultCache is the number of queries that
will fit into this cache. AutowarmCount is the number of queries that are
going to be copyed from the old cache to the new cache when a commit occurrs
(actually, the queries are going to be executed again agains the new
IndexSearcher, as the results for them may have changed on the new Index).
initial size is the initial size of the array, it will start to grow from
that size up to size. You may want to see this page of the wiki:
http://wiki.apache.org/solr/SolrCaching

Regards,

Tomás
On Fri, Aug 19, 2011 at 8:39 AM, jame vaalet jamevaa...@gmail.com wrote:

 hi,
 i understand that queryResultCache tag in solrconfig is the one which
 determines the cache size of SOLR in jvm.

 queryResultCache class=*solr.LRUCache*
 size=*${queryResultCacheSize:0}*initialSize
 =*${queryResultCacheInitialSize:0}* autowarmCount=*
 ${queryResultCacheRows:0}* /


 out of the different attributes what is size? Is it the amount of memory
 reserved in bytes ? or number of doc ids cached ? or is it the number of
 queries it will cache?

 similarly wat is initial size and autowarm depicted in?

 can some please reply ...



Re: When are you planning to release SolrCloud feature with ZooKeeper?

2011-08-19 Thread Mark Miller
Whenever 4.0 comes out :) Hard to put a date on that - I believe another 
SolrCloud push is about to start to cover the indexing side.

On Aug 18, 2011, at 11:46 AM, Way Cool wrote:

 Hi, guys,
 
 When are you planning to release the SolrCloud feature with ZooKeeper 
 currently in trunk? The new admin interface looks great. Great job.
 
 Thanks,
 
 YH

- Mark Miller
lucidimagination.com










Re: Full sentence spellcheck

2011-08-19 Thread William Oberman
I was on my phone before, and didn't see the whole thread.  I wanted the same 
thing, to have spellchecker not tokenize.  See the Suggester Issues  thread 
for my junky replacement class that doesn't tokenize (as far as I can tell from 
a few minutes of testing).

will

On Aug 19, 2011, at 8:35 AM, Will Oberman wrote:

 This might be unrelated, but I had the exact same error yesterday trying to 
 replace the query converter with a custom class I wrote. Ended up, I wasn't 
 properly registering my jar.  I'm still testing with jetty, and lib in 
 example is included too late in the startup process. I had to rebundle the 
 war with my jar in the web-inf lib.
 
 On Aug 19, 2011, at 8:01 AM, Valentin  igorlacro...@gmail.com wrote:
 
 My analyser is not empty :
 
 /fieldType name=suggestTextFull class=solr.TextField
 analyzer type=index
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
   filter class=solr.RemoveDuplicatesTokenFilterFactory/
 /analyzer
 analyzer type=query
   tokenizer class=solr.KeywordTokenizerFactory/
   filter class=solr.LowerCaseFilterFactory/
 /analyzer
 /fieldType/
 
 and i'm sure there is words in it
 
 
 I don't know where to find this file
 org.apache.solr.handler.component.SpellCheckComponent.getTokens
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267833.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Filtering results based on a set of values for a field

2011-08-19 Thread Erick Erickson
Good luck, and let us know what the results are. About dropping the cache.. That
shouldn't  be a problem, it should just be computed when your
component is called
the first time, so starting the server (or opening a new searcher)
should re-compute
it. Your filters shouldn't be very big, just maxDocs/8 bytes long...

By the way, there exist user caches (see solrconfig.xml) that you can use
for whatever you want, so you could consider stashing your filters in there. The
neat thing is that they get notified whenever the searcher is opened (?) and you
can regenerate the data held there. MIght be more convenient than mucking with
filter query caching...

Best
Erick

On Fri, Aug 19, 2011 at 2:58 AM, Tomas Zerolo
tomas.zer...@axelspringer.de wrote:
 On Thu, Aug 18, 2011 at 02:32:48PM -0400, Erick Erickson wrote:
 Hmmm, I'm still not getting it...

 You have one or more lists. These lists change once a month or so. Are
 you trying
 to include or exclude the documents in these lists?

 In our specific case to include *only* the documents having a value of
 an attribute (author) in this list (the user decides at query time
 which of those lists to use). But we do expect the problem to become
 more general over time...

                                                     And do the authors you 
 want
 to include or exclude change on a per-query basis or would you be all set if 
 you
 just had a filter that applied to all the authors on a particular list?

 No. ATM there are two fixed lists (in the sense that they are updated
 like monthly. One problem: the document basis itself is huge (in the
 abouts of 3.5 million). Re-indexing is a painful exercise taking days,
 so we tend not to do it too often ;-)

 But I *think* what you want is a SearchComponent that implements your
 Filter. You can see various examples of how to add components to a seach
 handler in the solrconfig.xml file.

 Thanks a lot for the pointer. Rushing to read on it.

 WARNING: Haven't done this myself, so I'm partly guessing here.

 Hey: I asked for pointers and you're giving me some, so I'm a happy
 man now :-)

 Although here's a hint that someone else has used this approach:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg54240.html

 Thanks again

 And you'll want to insure that the Filter is cached so you don't have to 
 compute
 it more than once.

 Yes, I hope that will be the trick giving us the needed boost. Somehow
 we'll have to figure out how to drop the cache when a new version of the
 list arrives (without killing everyone in the building).

 I'll sure report back.

 Regards
 -- tomás



Re: query cache result

2011-08-19 Thread jame vaalet
wiki says *size

The maximum number of entries in the cache.
andqueryResultCache

This cache stores ordered sets of document IDs — the top N results of a
query ordered by some criteria.
*

doesn't it mean number of document ids rather than number of queries ?





2011/8/19 Tomás Fernández Löbbe tomasflo...@gmail.com

 Hi Jame, the size for the queryResultCache is the number of queries that
 will fit into this cache. AutowarmCount is the number of queries that are
 going to be copyed from the old cache to the new cache when a commit
 occurrs
 (actually, the queries are going to be executed again agains the new
 IndexSearcher, as the results for them may have changed on the new Index).
 initial size is the initial size of the array, it will start to grow from
 that size up to size. You may want to see this page of the wiki:
 http://wiki.apache.org/solr/SolrCaching

 Regards,

 Tomás
 On Fri, Aug 19, 2011 at 8:39 AM, jame vaalet jamevaa...@gmail.com wrote:

  hi,
  i understand that queryResultCache tag in solrconfig is the one which
  determines the cache size of SOLR in jvm.
 
  queryResultCache class=*solr.LRUCache*
  size=*${queryResultCacheSize:0}*initialSize
  =*${queryResultCacheInitialSize:0}* autowarmCount=*
  ${queryResultCacheRows:0}* /
 
 
  out of the different attributes what is size? Is it the amount of memory
  reserved in bytes ? or number of doc ids cached ? or is it the number of
  queries it will cache?
 
  similarly wat is initial size and autowarm depicted in?
 
  can some please reply ...
 




-- 

-JAME


Re: Solr 3.3 crashes after ~18 hours?

2011-08-19 Thread alexander sulz

Am 10.08.2011 17:11, schrieb Yonik Seeley:

On Wed, Aug 10, 2011 at 11:00 AM, alexander sulza.s...@digiconcept.net  wrote:

Okay, with this command it hangs.

It doesn't look like a hang from this thread dump.  It doesn't look
like any solr requests are executing at the time the dump was taken.

Did you do this from the command line?
curl http://localhost:8983/solr/update?commit=true;

Are you saying that the curl command just hung and never returned?

-Yonik
http://www.lucidimagination.com


Also: I managed to get a Thread Dump (attached).

regards

Am 05.08.2011 15:08, schrieb Yonik Seeley:

On Fri, Aug 5, 2011 at 7:33 AM, alexander sulza.s...@digiconcept.net
  wrote:

Usually you get a XML-Response when doing commits or optimize, in this
case
I get nothing
in return, but the site ( http://[...]/solr/update?optimize=true )
DOESN'T
load forever or anything.
It doesn't hang! I just get a blank page / empty response.

Sounds like you are doing it from a browser?
Can you try it from the command line?  It should give back some sort
of response (or hang waiting for a response).

curl http://localhost:8983/solr/update?commit=true;

-Yonik
http://www.lucidimagination.com



I use the stuff in the example folder, the only changes i made was enable
logging and changing the port to 8985.
I'll try getting a thread dump if it happens again!
So far its looking good with having allocated more memory to it.

Am 04.08.2011 16:08, schrieb Yonik Seeley:

On Thu, Aug 4, 2011 at 8:09 AM, alexander sulza.s...@digiconcept.net
  wrote:

Thank you for the many replies!

Like I said, I couldn't find anything in logs created by solr.
I just had a look at the /var/logs/messages and there wasn't anything
either.

What I mean by crash is that the process is still there and http GET
pings
would return 200
but when i try visiting /solr/admin, I'd get a blank page! The server
ignores any incoming updates or commits,

ignores means what?  The request hangs?  If so, could you get a thread
dump?

Do queries work (like /solr/select?q=*:*) ?


thous throwing no errors, no 503's.. It's like the server has a
blackout
and
stares blankly into space.

Are you using a different servlet container than what is shipped with
solr?
If you did start with the solr example server, what jetty
configuration changes have you made?

-Yonik
http://www.lucidimagination.com


Sigh it happened again, but I have a clue: before the crash I was 
deleting some entries but haven't optimized afterwards, then, when I 
tried indexing something, solr crashed again (responsive but just 
blank/empty returns).


I've just tried it again (doing the curl command while solr is its 
zombie state)
and i get the following reply from curl: curl: (52) Empty reply from 
server


Also, I updated my Java so the HotSpot version is now 20.1-b3



Re: query cache result

2011-08-19 Thread Tomás Fernández Löbbe
From my understanding, seeing the cache as a set of key-value pairs, this
cache has the query as key and the list of IDs resulting from the query as
values. When the exact same query is issued, it will be found as key in this
cache, and Solr will already have the list of IDs that match it.
If you set the size of this cache to 50, that means that Solr will keep in
memory the last 50 queries with their list of resulting document IDs.

The number of IDs per query can be configured with the parameter
queryResultWindowSize
http://wiki.apache.org/solr/SolrCaching#queryResultWindowSize

On Fri, Aug 19, 2011 at 10:34 AM, jame vaalet jamevaa...@gmail.com wrote:

 wiki says *size

 The maximum number of entries in the cache.
 andqueryResultCache

 This cache stores ordered sets of document IDs — the top N results of a
 query ordered by some criteria.
 *

 doesn't it mean number of document ids rather than number of queries ?





 2011/8/19 Tomás Fernández Löbbe tomasflo...@gmail.com

  Hi Jame, the size for the queryResultCache is the number of queries that
  will fit into this cache. AutowarmCount is the number of queries that are
  going to be copyed from the old cache to the new cache when a commit
  occurrs
  (actually, the queries are going to be executed again agains the new
  IndexSearcher, as the results for them may have changed on the new
 Index).
  initial size is the initial size of the array, it will start to grow from
  that size up to size. You may want to see this page of the wiki:
  http://wiki.apache.org/solr/SolrCaching
 
  Regards,
 
  Tomás
  On Fri, Aug 19, 2011 at 8:39 AM, jame vaalet jamevaa...@gmail.com
 wrote:
 
   hi,
   i understand that queryResultCache tag in solrconfig is the one which
   determines the cache size of SOLR in jvm.
  
   queryResultCache class=*solr.LRUCache*
   size=*${queryResultCacheSize:0}*initialSize
   =*${queryResultCacheInitialSize:0}* autowarmCount=*
   ${queryResultCacheRows:0}* /
  
  
   out of the different attributes what is size? Is it the amount of
 memory
   reserved in bytes ? or number of doc ids cached ? or is it the number
 of
   queries it will cache?
  
   similarly wat is initial size and autowarm depicted in?
  
   can some please reply ...
  
 



 --

 -JAME



Re: Solr 3.3 crashes after ~18 hours?

2011-08-19 Thread alexander sulz

Am 19.08.2011 15:48, schrieb alexander sulz:

Am 10.08.2011 17:11, schrieb Yonik Seeley:
On Wed, Aug 10, 2011 at 11:00 AM, alexander 
sulza.s...@digiconcept.net  wrote:

Okay, with this command it hangs.

It doesn't look like a hang from this thread dump.  It doesn't look
like any solr requests are executing at the time the dump was taken.

Did you do this from the command line?
curl http://localhost:8983/solr/update?commit=true;

Are you saying that the curl command just hung and never returned?

-Yonik
http://www.lucidimagination.com


Also: I managed to get a Thread Dump (attached).

regards

Am 05.08.2011 15:08, schrieb Yonik Seeley:

On Fri, Aug 5, 2011 at 7:33 AM, alexander sulza.s...@digiconcept.net
  wrote:
Usually you get a XML-Response when doing commits or optimize, in 
this

case
I get nothing
in return, but the site ( http://[...]/solr/update?optimize=true )
DOESN'T
load forever or anything.
It doesn't hang! I just get a blank page / empty response.

Sounds like you are doing it from a browser?
Can you try it from the command line?  It should give back some sort
of response (or hang waiting for a response).

curl http://localhost:8983/solr/update?commit=true;

-Yonik
http://www.lucidimagination.com


I use the stuff in the example folder, the only changes i made was 
enable

logging and changing the port to 8985.
I'll try getting a thread dump if it happens again!
So far its looking good with having allocated more memory to it.

Am 04.08.2011 16:08, schrieb Yonik Seeley:
On Thu, Aug 4, 2011 at 8:09 AM, alexander 
sulza.s...@digiconcept.net

  wrote:

Thank you for the many replies!

Like I said, I couldn't find anything in logs created by solr.
I just had a look at the /var/logs/messages and there wasn't 
anything

either.

What I mean by crash is that the process is still there and http 
GET

pings
would return 200
but when i try visiting /solr/admin, I'd get a blank page! The 
server

ignores any incoming updates or commits,
ignores means what?  The request hangs?  If so, could you get a 
thread

dump?

Do queries work (like /solr/select?q=*:*) ?


thous throwing no errors, no 503's.. It's like the server has a
blackout
and
stares blankly into space.
Are you using a different servlet container than what is shipped 
with

solr?
If you did start with the solr example server, what jetty
configuration changes have you made?

-Yonik
http://www.lucidimagination.com


Sigh it happened again, but I have a clue: before the crash I was 
deleting some entries but haven't optimized afterwards, then, when I 
tried indexing something, solr crashed again (responsive but just 
blank/empty returns).


I've just tried it again (doing the curl command while solr is its 
zombie state)
and i get the following reply from curl: curl: (52) Empty reply from 
server


Also, I updated my Java so the HotSpot version is now 20.1-b3

using lsof I think I pinned down the problem: too many open files!
I already doubled from 512 to 1024 once but it seems there are many 
SOCKETS involved,

which are listed as can't identify protocol, instead of real files.
over time, the list grows and grows with these entries until.. it crashs.
So Ive read several times the fix for this problem is to set the limit 
to a ridiculous high number but
that seems a little bit of a crude fix. Why so many open sockets in the 
first place?




Re: Solr 3.3 crashes after ~18 hours?

2011-08-19 Thread Yonik Seeley
On Fri, Aug 19, 2011 at 10:36 AM, alexander sulz a.s...@digiconcept.net wrote:
 using lsof I think I pinned down the problem: too many open files!
 I already doubled from 512 to 1024 once but it seems there are many SOCKETS
 involved,
 which are listed as can't identify protocol, instead of real files.
 over time, the list grows and grows with these entries until.. it crashs.
 So Ive read several times the fix for this problem is to set the limit to a
 ridiculous high number but
 that seems a little bit of a crude fix. Why so many open sockets in the
 first place?

What are you using as a client to talk to solr?
You need to look at both the update side and the query side.
Using persistent connections is the best all-around, but if not, be
sure to close the connections in the client.

-Yonik
http://www.lucidimagination.com


Re: suggester issues

2011-08-19 Thread Kuba Krzemien
As far as I checked creating a custom query converter is the only way to 
make this work.
Unfortunately I have some problems with running it - after creating a JAR 
with my class (Im using your source code, obviously besides package and 
class names) and throwing it into the lib dir I've added queryConverter 
name=queryConverter class=mypackage.MySpellingQueryConverter/ to 
solrconfig.xml.


I get a SEVERE: org.apache.solr.common.SolrException: Error Instantiating 
QueryConverter, mypackage.MySpellingQueryConverter is not a 
org.apache.solr.spelling.QueryConverter.


What am I doing wrong?

--
From: William Oberman ober...@civicscience.com
Sent: Thursday, August 18, 2011 10:35 PM
To: solr-user@lucene.apache.org
Subject: Re: suggester issues


I tried this:
package com.civicscience;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;

import org.apache.lucene.analysis.Token;
import org.apache.solr.spelling.QueryConverter;

/**
* Converts the query string to a Collection of Lucene tokens.
**/
public class SpellingQueryConverter extends QueryConverter  {

 /**
  * Converts the original query string to a collection of Lucene Tokens.
  * @param original the original query string
  * @return a Collection of Lucene Tokens
  */
 @Override
 public CollectionToken convert(String original) {
   if (original == null) {
 return Collections.emptyList();
   }
   CollectionToken result = new ArrayListToken();
   Token token = new Token(original, 0, original.length(), word);
   result.add(token);
   return result;
 }

}

And added it to the classpath, and now it does what I expect.

will


On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote:

It can be done, I did that with shingles, but it's not the way it's meant 
to

be. The main problem with suggester is that we want compound words and we
never get them. I try to get internet explorer but when i enter in the
second word, internet e the suggester never finds explorer.

2011/8/18 oberman_cs ober...@civicscience.com


I was trying to deal with the exact same issue, with the exact same
results.
Is there really no way to feed a phrase into the suggester 
(spellchecker)

without it splitting the input phrase into words?

--
View this message in context:
http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
Sent from the Solr - User mailing list archive at Nabble.com.





--

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533




Re: suggester issues

2011-08-19 Thread William Oberman
Hard to say, so I'll list the exact steps I took:
-Downloaded apache-solr-3.3.0 (I like to stick with releases vs. svn)
-Untar and cd
-ant
-Wrote my class below (under a peer directory in apache-solr-3.3.0)
-javac -cp 
../dist/apache-solr-core-3.3.0.jar:../lucene/build/lucene-core-3.3-SNAPSHOT.jar 
com/civicscience/SpellingQueryConverter.java
-jar cf cs.jar com
-Unzipped solr.war (under example)
-Added my cs.jar to lib (under web-inf)
-Rezipped solr.war
-Added: queryConverter name=queryConverter 
class=com.civicscience.SpellingQueryConverter/ to solrconfig.xml
-Restarted jetty

And, that seemed to all work.

will

On Aug 19, 2011, at 10:44 AM, Kuba Krzemien wrote:

 As far as I checked creating a custom query converter is the only way to make 
 this work.
 Unfortunately I have some problems with running it - after creating a JAR 
 with my class (Im using your source code, obviously besides package and class 
 names) and throwing it into the lib dir I've added queryConverter 
 name=queryConverter class=mypackage.MySpellingQueryConverter/ to 
 solrconfig.xml.
 
 I get a SEVERE: org.apache.solr.common.SolrException: Error Instantiating 
 QueryConverter, mypackage.MySpellingQueryConverter is not a 
 org.apache.solr.spelling.QueryConverter.
 
 What am I doing wrong?
 
 --
 From: William Oberman ober...@civicscience.com
 Sent: Thursday, August 18, 2011 10:35 PM
 To: solr-user@lucene.apache.org
 Subject: Re: suggester issues
 
 I tried this:
 package com.civicscience;
 
 import java.util.ArrayList;
 import java.util.Collection;
 import java.util.Collections;
 
 import org.apache.lucene.analysis.Token;
 import org.apache.solr.spelling.QueryConverter;
 
 /**
 * Converts the query string to a Collection of Lucene tokens.
 **/
 public class SpellingQueryConverter extends QueryConverter  {
 
 /**
  * Converts the original query string to a collection of Lucene Tokens.
  * @param original the original query string
  * @return a Collection of Lucene Tokens
  */
 @Override
 public CollectionToken convert(String original) {
   if (original == null) {
 return Collections.emptyList();
   }
   CollectionToken result = new ArrayListToken();
   Token token = new Token(original, 0, original.length(), word);
   result.add(token);
   return result;
 }
 
 }
 
 And added it to the classpath, and now it does what I expect.
 
 will
 
 
 On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote:
 
 It can be done, I did that with shingles, but it's not the way it's meant to
 be. The main problem with suggester is that we want compound words and we
 never get them. I try to get internet explorer but when i enter in the
 second word, internet e the suggester never finds explorer.
 
 2011/8/18 oberman_cs ober...@civicscience.com
 
 I was trying to deal with the exact same issue, with the exact same
 results.
 Is there really no way to feed a phrase into the suggester (spellchecker)
 without it splitting the input phrase into words?
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 
 -- 
 
 *Alexei Martchenko* | *CEO* | Superdownloads
 ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
 5083.1018/5080.3535/5080.3533



Re: Solr 3.3 crashes after ~18 hours?

2011-08-19 Thread alexander sulz

Am 19.08.2011 16:43, schrieb Yonik Seeley:

On Fri, Aug 19, 2011 at 10:36 AM, alexander sulza.s...@digiconcept.net  wrote:

using lsof I think I pinned down the problem: too many open files!
I already doubled from 512 to 1024 once but it seems there are many SOCKETS
involved,
which are listed as can't identify protocol, instead of real files.
over time, the list grows and grows with these entries until.. it crashs.
So Ive read several times the fix for this problem is to set the limit to a
ridiculous high number but
that seems a little bit of a crude fix. Why so many open sockets in the
first place?

What are you using as a client to talk to solr?
You need to look at both the update side and the query side.
Using persistent connections is the best all-around, but if not, be
sure to close the connections in the client.

-Yonik
http://www.lucidimagination.com
I use PHP to talk to solr, this one to be exact 
http://code.google.com/p/solr-php-client/ version r22 i guess.

I'll try updating it and see what happens..


Re: hl.useFastVectorHighlighter, fragmentsBuilder and HighlightingParameters

2011-08-19 Thread Alexei Martchenko
Hi Koji, thanks, it's loading right now. Can't say it's really working
though, but I believe those are other issues with FastVectorHighlighter

2011/8/18 Koji Sekiguchi k...@r.email.ne.jp

 (11/08/19 4:14), Alexei Martchenko wrote:

 Hi Koji thanks for the reply.

 MyfragmentsBuilder  is defined directly inconfig. SOLR 3.3 warns me
 highlighting  is a deprecated form do you think it is in the wrong
 place?


 Hi Alexei,

 Yes, it is incorrect. What deprecate is that highlighting tag just under
 config directly.
 After 3.1, it needs to be under searchComponent for HighlightComponent.
 Please consult
 solrconfig.xml in example 3.3.


 koji
 --
 Check out Query Log Visualizer
 http://www.rondhuit-demo.com/**loganalyzer/loganalyzer.htmlhttp://www.rondhuit-demo.com/loganalyzer/loganalyzer.html
 http://www.rondhuit.com/en/




-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533


Re: File based index doesn't work in spellcheck component

2011-08-19 Thread anupamxyz
I am using Nutch to crawl and Solr for searching. The search has been
successfully implemented. Now I want a file based Suggestion or a Do you
mean Feature? implemented. It is more or less like a Spell checker. For the
same I am making the requisite changes to the SolrConfig.xml and the
Schema.xml for the Solr, but it fails when I am re-indexing it gain to get
the new implementation. Please let me know how that can be corrected and
also, how can I have the suggestion displayed using Jsp over my application.
I can share part of the codes changed later if you intend to help me on
this.

Thanks in advance,
Anupam

--
View this message in context: 
http://lucene.472066.n3.nabble.com/File-based-index-doesn-t-work-in-spellcheck-component-tp489070p3268423.html
Sent from the Solr - User mailing list archive at Nabble.com.


How to implement Spell Checker using Solr?

2011-08-19 Thread anupamxyz
I am using Nutch to crawl and Solr for searching. The search has been
successfully implemented. Now I want a file based Suggestion or a Do you
mean Feature? implemented. It is more or less like a Spell checker. For the
same I am making the requisite changes to the SolrConfig.xml and the
Schema.xml for the Solr, but it fails when I am re-indexing it gain to get
the new implementation. Please let me know how that can be corrected and
also, how can I have the suggestion displayed using Jsp over my application.
I can share part of the codes changed later if you intend to help me on
this.

Thanks in advance,
Anupam

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3268450.html
Sent from the Solr - User mailing list archive at Nabble.com.


Requiring multiple matches of a term

2011-08-19 Thread Michael Ryan
Is there a way to specify in a query that a term must match at least X times in 
a document, where X is some value greater than 1?

For example, I want to only get documents that contain the word dog three 
times.  I've thought that using a proximity query with an arbitrary large 
distance value might do it:
dog dog dog~10
And that does seem to return the results I expect.

But when I try for more than three, I start getting unexpected result counts as 
I change the proximity value:
dog dog dog dog~10 returns 6403 results
dog dog dog dog~20 returns 9291 results
dog dog dog dog~30 returns 6395 results

Anyone ever do something like this and know how I can accomplish this?

-Michael


Re: How to implement Spell Checker using Solr?

2011-08-19 Thread Gora Mohanty
On Fri, Aug 19, 2011 at 9:26 PM, anupamxyz cse.anu...@gmail.com wrote:
 I am using Nutch to crawl and Solr for searching. The search has been
 successfully implemented. Now I want a file based Suggestion or a Do you
 mean Feature? implemented. It is more or less like a Spell checker

Um, not quite. At least as per my understanding they are very
different things. Please take a look at:
http://wiki.apache.org/solr/MoreLikeThis
http://wiki.apache.org/solr/SpellCheckComponent

   
  For the
 same I am making the requisite changes to the SolrConfig.xml and the
 Schema.xml for the Solr, but it fails when I am re-indexing it gain to get
 the new implementation. Please let me know how that can be corrected and
 also, how can I have the suggestion displayed using Jsp over my application.
 I can share part of the codes changed later if you intend to help me on
 this.

Please share with us what changes you are making, and what does
it fails mean, i.e., show us the configuration files, error messages,
etc., maybe through pastebin.com. You might wish to take a look at
http://wiki.apache.org/solr/UsingMailingLists

Regards,
Gora


Please register me

2011-08-19 Thread Anupam
Please register me


Re: How to implement Spell Checker using Solr?

2011-08-19 Thread anupamxyz
Both Nutch and Solr can be used as per the need.
http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ . So the search
is implemented and I am able to search on the values. Now I need the
SpellChecker to be implemented. The changes are exactly as per the ones
listed in http://wiki.apache.org/solr/SpellCheckComponent . I will share the
log details with you by Monday.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3268695.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr support for multiple points (latitude-longitude) for a document

2011-08-19 Thread Jean Croteau
Hi all,

I was going through Solr 3.3.0 and it seems there's still no support for
performing GeoSpatial queries on documents that have more than one
latitude-longitude.  The multi field value is set to false everywhere.

We absolutely need this feature.  I had a look at
https://issues.apache.org/jira/browse/SOLR-2155 in which David Smiley was
trying to implement a workable solution but unless I'm mistaken it was never
committed.

Does anyone know of a way to get a working solution with that fix?

Thanks


Re: Solr support for multiple points (latitude-longitude) for a document

2011-08-19 Thread Smiley, David W.
Hi.
  
Either port it to Solr 3, or use Solr 4 (trunk). 

I know and have used a Metacarta solution but that is also based on Solr 4 and 
I don't think they've back-ported it. I have no clue what they charge for it or 
where to get it; I have it as part of their larger solution.

There's also a small little-known set of source files tar'ed up and attached to 
SOLR-773 as solrGeoQuery.tar that I've examined; it attempts to solve this 
problem. It is not particularly fast and I believe there are bugs in +/- 10 
degrees latitude but I haven't actually confirmed it.  That was actually the 
last thing I looked at before embarking on SOLR-2155 (geohashes).

~ David

On Aug 19, 2011, at 1:47 PM, Jean Croteau wrote:

 Hi all,
 
 I was going through Solr 3.3.0 and it seems there's still no support for
 performing GeoSpatial queries on documents that have more than one
 latitude-longitude.  The multi field value is set to false everywhere.
 
 We absolutely need this feature.  I had a look at
 https://issues.apache.org/jira/browse/SOLR-2155 in which David Smiley was
 trying to implement a workable solution but unless I'm mistaken it was never
 committed.
 
 Does anyone know of a way to get a working solution with that fix?
 
 Thanks



Solr performance

2011-08-19 Thread Michał Kopacz
Hi

I have one instance of solr running on JBoss with the following schema and
partial config:

Schema:

schema name=users_szukacz version=1.4
-
types
fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/
fieldType name=int class=solr.TrieIntField omitNorms=true
precisionStep=1 positionIncrementGap=0/
fieldType name=date class=solr.TrieDateField omitNorms=true
positionIncrementGap=0/
-
fieldType name=text_pl class=solr.TextField positionIncrementGap=100
-
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
-
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType
/types
-
fields
field name=user_id type=int indexed=true required=true/
field name=birth_date type=date indexed=true stored=false/
field name=city type=text_pl indexed=true stored=false/
field name=sex type=text_pl indexed=true stored=false/
field name=show_search type=int indexed=true stored=false/
field name=confirmed type=int indexed=true stored=false/
field name=search_text type=text_pl indexed=true/
/fields
uniqueKeyuser_id/uniqueKey
defaultSearchFieldsearch_text/defaultSearchField
solrQueryParser defaultOperator=AND/
/schema

Config:

directoryFactory name=DirectoryFactory
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

mergeFactor10/mergeFactor

ramBufferSizeMB1024/ramBufferSizeMB

maxBufferedDocs1000/maxBufferedDocs
maxFieldLength1/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout
filterCache class=solr.FastLRUCache size=100 initialSize=100
autowarmCount=0/
queryResultCache class=solr.LRUCache size=512 initialSize=512
autowarmCount=0/
documentCache class=solr.LRUCache size=1300 initialSize=1300
autowarmCount=0/

Index has 41 000 000 documents and 9 GB size. For query like:
1)* q=Jarecki+Jan*
fq=sex:Mfq=confirmed:1fq=show_search:3fl=user_idstart=0rows=10wt=jsonversion=2.2

server reaches avarage* 90 query/s* on 4 theards and is very small for me.

For query with filer on filed city:
2) ex. fl=user_idindent=onstart=0q=Tarkowski+Bartłomiejwt=json*
fq=city:Kwidzyn*
fq=sex:Mfq=confirmed:1fq=show_search:3version=2.2rows=10

server reaches 800 query/s.

Do you have any advice to speed the search for first query? Is this speed is
the norm?

Server has 32GB RAM and 4 processors Intel Xeon 2.5GHz.


Solr performance for query without filter

2011-08-19 Thread mikopacz
Hi

I have one instance of solr running on JBoss with the following schema and
partial config:

Schema:

schema name=users_szukacz version=1.4
−
types
fieldType name=string class=solr.StrField sortMissingLast=true
omitNorms=true/
fieldType name=int class=solr.TrieIntField omitNorms=true
precisionStep=1 positionIncrementGap=0/
fieldType name=date class=solr.TrieDateField omitNorms=true
positionIncrementGap=0/
−
fieldType name=text_pl class=solr.TextField positionIncrementGap=100
−
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
−
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldType
/types
−
fields
field name=user_id type=int indexed=true required=true/
field name=birth_date type=date indexed=true stored=false/
field name=city type=text_pl indexed=true stored=false/
field name=sex type=text_pl indexed=true stored=false/
field name=show_search type=int indexed=true stored=false/
field name=confirmed type=int indexed=true stored=false/
field name=search_text type=text_pl indexed=true/
/fields
uniqueKeyuser_id/uniqueKey
defaultSearchFieldsearch_text/defaultSearchField
solrQueryParser defaultOperator=AND/
/schema

Config:

directoryFactory name=DirectoryFactory
class=${solr.directoryFactory:solr.StandardDirectoryFactory}/

mergeFactor10/mergeFactor

ramBufferSizeMB1024/ramBufferSizeMB

maxBufferedDocs1000/maxBufferedDocs
maxFieldLength1/maxFieldLength
writeLockTimeout1000/writeLockTimeout
commitLockTimeout1/commitLockTimeout
filterCache class=solr.FastLRUCache size=100 initialSize=100
autowarmCount=0/
queryResultCache class=solr.LRUCache size=512 initialSize=512
autowarmCount=0/
documentCache class=solr.LRUCache size=1300 initialSize=1300
autowarmCount=0/

Index has 41 000 000 documents and 9 GB size. For query like:
1)
*q=Jarecki+Jan*fq=sex:Mfq=confirmed:1fq=show_search:3fl=user_idstart=0rows=10wt=jsonversion=2.2

server reaches avarage *90 query/s* on 4 theards and is very small for me.

For query with filer on filed city:
2) ex.
fl=user_idindent=onstart=0q=Tarkowski+Bartłomiejwt=json*fq=city:Kwidzyn*fq=sex:Mfq=confirmed:1fq=show_search:3version=2.2rows=10

server reaches 800 query/s.

Do you have any advice to speed the search for first query? Is this speed is
the norm?

Server has 32GB RAM and 4 processors Intel Xeon 2.5GHz.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-performance-for-query-without-filter-tp3267785p3267785.html
Sent from the Solr - User mailing list archive at Nabble.com.


Terms.regex performance issue

2011-08-19 Thread O. Klein
As I want to use it in an Autocomplete it has to be fast. Terms.prefix gets
results in around 100 milliseconds, while terms.regex is 10 to 20 times
slower.

Not storing the field made it a bit faster but not enough. The index is on a
seperate core and only about 5Mb big. Are there some tricks to make it work
a lot faster? Or do I have to switch to ngrams or something?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3268994.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Date Facet Question

2011-08-19 Thread Chris Hostetter

: when the response comes back the facet names are
: 
: 2010-08-14T01:50:58.813Z
...
: instead of something like
: 
: NOW-11MONTH
...
: where as facet queries if specifying a set of facet queries like
: 
: datetime:[NOW-1YEAR TO NOW]
...
: the labels come back just as specified.  Is there a way to make date
: range queries come back using the query specified and not the parsed
: date?

No.  If dates were the only factor here we could maybe add an option for 
that but the faceting code is all generalized now to support all numerics, 
so it wouldn't relaly make sense in general.

it's also not clear how an option like this would work if/when stuff like 
SOLR-2366 get implemented - returning the concrete value used as the 
lowerbound of the range is un-ambiguious.

the functional difference between facet.range and facet.query is pretty 
signifigant, so it's kind of an apples/oranges thing to compare their 
output -- with facet.query you can specify any arbitrary query 
expression your heart desires, and that literal unparsed query string 
is again used as the constraint key in the resulting NamedList because 
it's as unambiguious as we can be given the circumstances.




-Hoss


Re: A strange Exception in Solr 1.4

2011-08-19 Thread Chris Hostetter

Can you reproduce this error consistently?

Can you try using the CheckIndex tool on your index to verify that it 
hasn't been corrupted in some way?

:2011-08-15 10:31:24,968 ERROR [org.apache.solr.core.SolrCore] -
: java.lang.NullPointerException
: at sun.nio.ch.Util.free(Util.java:199)
: at sun.nio.ch.Util.offerFirstTemporaryDirectBuffer(Util.java:176)
: at sun.nio.ch.IOUtil.read(IOUtil.java:181)
: at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:612)
: at
: 
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
: at
: 
org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136)
: at
: 
org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:247)
: at
: org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157)
: at
: 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38)
: at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80)
: at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64)
: at
: org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:129)
: at
: org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:160)
: at
: org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232)
: at
: org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179)
: at
: org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:975)
: at
: org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:627)
: at
: org.apache.lucene.index.FilterIndexReader.docFreq(FilterIndexReader.java:194)
: at org.apache.lucene.index.MultiReader.docFreq(MultiReader.java:344)
: at
: org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:308)
: at
: org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:147)
: at
: org.apache.lucene.search.Similarity.idfExplain(Similarity.java:765)
: at
: org.apache.lucene.search.TermQuery$TermWeight.init(TermQuery.java:46)
: at
: org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:146)
: at
: 
org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:184)
: at
: org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:415)
: at org.apache.lucene.search.Query.weight(Query.java:99)
: at org.apache.lucene.search.Searcher.createWeight(Searcher.java:230)
: at org.apache.lucene.search.Searcher.search(Searcher.java:171)
: at
: 
org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988)
: at
: 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884)
: at
: org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
: at
: 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182)
: at
: 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
: at
: 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
: at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
: at
: 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139)
: at
: 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
: at
: org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
: at
: 
com.taobao.terminator.core.realtime.DefaultSearchService.query(DefaultSearchService.java:197)
: at sun.reflect.GeneratedMethodAccessor73.invoke(Unknown Source)
: at
: 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
: at java.lang.reflect.Method.invoke(Method.java:597)
: at
: 
com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest0(ProviderProcessor.java:222)
: at
: 
com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest(ProviderProcessor.java:174)
: at
: 
com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest(ProviderProcessor.java:41)
: at
: 
com.taobao.remoting.impl.DefaultMsgListener$1ProcessorExecuteTask.run(DefaultMsgListener.java:131)
: at
: 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
: at
: 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
: at java.lang.Thread.run(Thread.java:662)
: 
: 
: 
:   Thank u
: 
: 
: 
: 
: 
: allen.Fu
: 

-Hoss


Re: SolrJ and ContentStreams

2011-08-19 Thread Chris Hostetter

: I'm considering to use SolrJ to run queries in a MLT fashion against my Solr
: server. I saw that there is already an open bug filed in Jira
: (https://issues.apache.org/jira/browse/SOLR-1085).

note that that issue is really just about having convinience classes for 
executing MLT style requests and parsing hte responses.  Just because 
those convinience methods/classes don't exist yet doesn't mean you can't 
use SolrJ to send MLT requests.  You can instantiate a QueryRequest object 
with the SolrParams you want to specify for MLT, and then extract the data 
you want directly from the QueryResponse.

The only potentially tricky part is sending an arbitrary ContentStream.  
in your example you are using the stream.body for a short string -- this 
is easy to do as a param when building a QueryRequest object, but if you 
want to provide a much larger stream of data you can subclass 
QueryRequest to add your own ContentStream (from a File Source or 
whatever) to the Collection it will stream to hte server.

For that matter, even though it's name might fool you, i'm pretty sure you 
a ContentStreamUpdateRequest instance with the appropriate URL for your 
MLT handler will do exactly what you want.

https://lucene.apache.org/solr/api/org/apache/solr/client/solrj/request/ContentStreamUpdateRequest.html
https://lucene.apache.org/solr/api/org/apache/solr/client/solrj/request/AbstractUpdateRequest.html#setParam%28java.lang.String,%20java.lang.String%29

-Hoss


Re: Terms.regex performance issue

2011-08-19 Thread Markus Jelsma
TermsComponent uses java.util.regex which is not particulary fast. If the 
number of terms grows your CPU is going to overheat. I'd prefer an analyzer 
approach.

 As I want to use it in an Autocomplete it has to be fast. Terms.prefix gets
 results in around 100 milliseconds, while terms.regex is 10 to 20 times
 slower.
 
 Not storing the field made it a bit faster but not enough. The index is on
 a seperate core and only about 5Mb big. Are there some tricks to make it
 work a lot faster? Or do I have to switch to ngrams or something?
 
 
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994
 p3268994.html Sent from the Solr - User mailing list archive at Nabble.com.


Re: Requiring multiple matches of a term

2011-08-19 Thread Chris Hostetter

FWIW: i think this is a really cool and interesting question.

: Is there a way to specify in a query that a term must match at least X 
: times in a document, where X is some value greater than 1?

at the moment, i think your phrase query approach is really the only 
viable way (allthough it did get me thinking about how hard it would be 
to implement this at a lower level ... i'll see if i can work out a patch)

: But when I try for more than three, I start getting unexpected result 
: counts as I change the proximity value:

Hmmm... i would think the phrase query approach should work, but it's 
totally possible that there's something odd in the way phrase queries work 
that could cause a problem -- the best way to sanity test something like 
this is to try a really small self contained example that you can post for 
other people to try.

If you said 2 clauses work, but not 3 i would guess that maybe there is 
an terms out of order type issue involved, but 3 works not 4 smells 
fishy.

-Hoss


Re: Terms.regex performance issue

2011-08-19 Thread Chris Hostetter

: Subject: Terms.regex performance issue
: 
: As I want to use it in an Autocomplete it has to be fast. Terms.prefix gets
: results in around 100 milliseconds, while terms.regex is 10 to 20 times
: slower.

can you elaborate on how you are using terms.regex?  what does your regex 
look like? .. particularly if your usecase is autocomplete terms.prefix 
seems like an odd choice. 

Possible XY Problem?
https://people.apache.org/~hossman/#xyproblem

Have you looked at using the Suggester plugin?

https://wiki.apache.org/solr/Suggester


-Hoss


Re: Solr performance for query without filter

2011-08-19 Thread Chris Hostetter

: Index has 41 000 000 documents and 9 GB size. For query like:
: 1)
: 
*q=Jarecki+Jan*fq=sex:Mfq=confirmed:1fq=show_search:3fl=user_idstart=0rows=10wt=jsonversion=2.2
: 
: server reaches avarage *90 query/s* on 4 theards and is very small for me.
: 
: For query with filer on filed city:
: 2) ex.
: 
fl=user_idindent=onstart=0q=Tarkowski+Bartłomiejwt=json*fq=city:Kwidzyn*fq=sex:Mfq=confirmed:1fq=show_search:3version=2.2rows=10
: 
: server reaches 800 query/s.
: 
: Do you have any advice to speed the search for first query? Is this speed is
: the norm?

norm is hard to define, but one key element you left out is how many 
docs are (typically) matched by requests of type #1 vs type #2. and how 
good a job your city filters do in partitioning the total number of 
documents.

I suspect that your city filters are heavily reused (ie: good cache hit 
rates) and do a really good job of cutting down the number of matching 
docs -- (ie: num docs matching fq=sex:Mfq=confirmed:1fq=show_search:3 is 
probably significantly higher then num docs matching 
fq=sex:Mfq=confirmed:1fq=show_search:3fq=city:Kwidzyn ).  In which case 
it makes sense that type#1 queries would take a lot longer on average -- 
there are a lot more docs to consider when evaluating the q to find 
matches.



-Hoss

Re: Content recommendation using solr?

2011-08-19 Thread Chris Hostetter

: Initially, I was looking at http://wiki.apache.org/solr/MoreLikeThis
: 
: Then, it turned out that most implementations are based on a combination of
:  Mahout, Solr and Hadoop.

I think you'll find that most serious (for some definition) content 
recomendation engines use various ML algorithms (ie: mahout) to crucnh 
both the content and the (aggregate) user behavior data to generate 
people who like this thing also like... and people like you also tend 
to like... type recomendations.

But Solr, with and w/o MLT, can be very handy for things similar to this 
thing are... type depending on how you use it.  

(I don't think i'm allowed to name names, but i can think of a couple of 
major www sites of which i have first hand knowledge that use MLT and/or 
customized things like MLT to serach their Solr/Lucene indexes for things 
similar to this thing you are currently looking at).


-Hoss


Re: Terms.regex performance issue

2011-08-19 Thread O. Klein
Terms.prefix was just to compare performance.

The use case was terms.regex=.*query.* And as Markus pointed out, this will
prolly remain a bottleneck.

I looked at the Suggester. But like many others I have been struggling to
make it useful. It needs a custom queryConverter to give proper suggestions,
but I havent tried this yet.






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3269628.html
Sent from the Solr - User mailing list archive at Nabble.com.