Re: Filtering results based on a set of values for a field
On Thu, Aug 18, 2011 at 02:32:48PM -0400, Erick Erickson wrote: Hmmm, I'm still not getting it... You have one or more lists. These lists change once a month or so. Are you trying to include or exclude the documents in these lists? In our specific case to include *only* the documents having a value of an attribute (author) in this list (the user decides at query time which of those lists to use). But we do expect the problem to become more general over time... And do the authors you want to include or exclude change on a per-query basis or would you be all set if you just had a filter that applied to all the authors on a particular list? No. ATM there are two fixed lists (in the sense that they are updated like monthly. One problem: the document basis itself is huge (in the abouts of 3.5 million). Re-indexing is a painful exercise taking days, so we tend not to do it too often ;-) But I *think* what you want is a SearchComponent that implements your Filter. You can see various examples of how to add components to a seach handler in the solrconfig.xml file. Thanks a lot for the pointer. Rushing to read on it. WARNING: Haven't done this myself, so I'm partly guessing here. Hey: I asked for pointers and you're giving me some, so I'm a happy man now :-) Although here's a hint that someone else has used this approach: http://www.mail-archive.com/solr-user@lucene.apache.org/msg54240.html Thanks again And you'll want to insure that the Filter is cached so you don't have to compute it more than once. Yes, I hope that will be the trick giving us the needed boost. Somehow we'll have to figure out how to drop the cache when a new version of the list arrives (without killing everyone in the building). I'll sure report back. Regards -- tomás
RE: Full sentence spellcheck
Actually, that's not my problem, I do specify q. Another idea ? It really makes me crazy... -- View this message in context: http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267394.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: get update record from database using DIH
On Fri, Aug 19, 2011 at 5:32 AM, Alexandre Sompheng asomph...@gmail.com wrote: Hi guys, i try the delta import, i got logs saying that it found delta data to update. But it seems that the index is not updated. Amy guess why this happens ? Did i miss something? I'm on solr 3.3 with no patch. [...] Please show us the following: * The exact URL you loaded for delta-import * The Solr response which shows the delta documents that it found, and the status of the delta-import. If your index is large, and if you are running an optimise after the delta-import (the default is to optimise), it can take some time. Check the status: It will say busy if the optimise is still running. Regards, Gora
can't use distributed spell check
hi all, I tested it following the instructions in http://wiki.apache.org/solr/SpellCheckComponent. but it seems something wrong. the sample url in the wiki is http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr It can't work and I read the codes, it seems qt=/spell shards.qt=/spell After I modified the url, it search all the documents but without any spell suggetions. I debuged it and found the method getSuggestions() in AbstractLuceneSpellChecker are called.
solr distributed search don't work
hi all, I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent but there is something wrong. the url given my the wiki is http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr but it does not work. I trace the codes and find that qt=spellshards.qt=spell should be qt=/spellshards.qt=/spell After modification of url, It return all documents but nothing about spell check. I debug it and find the AbstractLuceneSpellChecker.getSuggestions() is called.
Re: Full sentence spellcheck
this may need something like language models to suggest. I found an issue https://issues.apache.org/jira/browse/SOLR-2585 what's going on with it? On Thu, Aug 18, 2011 at 11:31 PM, Valentin igorlacro...@gmail.com wrote: I'm trying to configure a spellchecker to autocomplete full sentences from my query. I've already been able to get this results: /american israel : - american something - israel something/ But i want : /american israel : - american israel something/ This is my solrconfig.xml : /searchComponent name=suggest_full class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypesuggestTextFull/str lst name=spellchecker str name=namesuggest_full/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldtext_suggest_full/str str name=fieldTypesuggestTextFull/str /lst /searchComponent requestHandler name=/suggest_full class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=echoParamsexplicit/str str name=spellchecktrue/str str name=spellcheck.dictionarysuggest_full/str str name=spellcheck.count10/str str name=spellcheck.onlyMorePopulartrue/str /lst arr name=last-components strsuggest_full/str /arr /requestHandler/ And this is my schema.xml: /fieldType name=suggestTextFull class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType ... field name=text_suggest_full type=suggestTextFull indexed=true stored=false multiValued=true// I've read somewhere that I have to use spellcheck.q because q use the WhitespaceAnalyzer, but when I use spellcheck.q i get a java.lang.NullPointerException Any ideas ? -- View this message in context: http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3265257.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr distributed search don't work
Hi, I do not use spell but I use distributed search, using qt=spell is correct, should not use qt=\spell. For shards, I specify it in solrconfig directly, not in url, but should work the same. Maybe an issue in your spell request handler. 2011/8/19 Li Li fancye...@gmail.com hi all, I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent but there is something wrong. the url given my the wiki is http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr but it does not work. I trace the codes and find that qt=spellshards.qt=spell should be qt=/spellshards.qt=/spell After modification of url, It return all documents but nothing about spell check. I debug it and find the AbstractLuceneSpellChecker.getSuggestions() is called.
Re: paging size in SOLR
1 .what does this specify ? queryResultCache class=*solr.LRUCache* size=*${queryResultCacheSize:0}*initialSize =*${queryResultCacheInitialSize:0}* autowarmCount=* ${queryResultCacheRows:0}* / 2. when i say *queryResultCacheSize : 512 *, does it mean 512 queries can be cached or 512 bytes are reserved for caching ? can some please give me an answer ? On 14 August 2011 21:41, Erick Erickson erickerick...@gmail.com wrote: Yep. ResultWindowSize in solrconfig.xml Best Erick On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet jamevaa...@gmail.com wrote: thanks erick ... that means it depends upon the memory allocated to the JVM . going back queryCacheResults factor i have got this doubt .. say, i have got 10 threads with 10 different queries ..and each of them in parallel are searching the same index with millions of docs in it (multisharded ) . now each of the queries have large number of results in it hence got to page them all.. which all thread's (query ) result-set will be cached ? so that subsequent pages can be retrieved quickly ..? On 14 August 2011 17:40, Erick Erickson erickerick...@gmail.com wrote: There isn't an optimum page size that I know of, it'll vary with lots of stuff, not the least of which is whatever servlet container limits there are. But I suspect you can get quite a few (1000s) without too much problem, and you can always use the JSON response writer to pack in more pages with less overhead. You pretty much have to try it and see. Best Erick On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet jamevaa...@gmail.com wrote: speaking about pagesizes, what is the optimum page size that should be retrieved each time ?? i understand it depends upon the data you are fetching back fromeach hit document ... but lets say when ever a document is hit am fetching back 100 bytes worth data from each of those docs in indexes (along with solr response statements ) . this will make 100*x bytes worth data in each page if x is the page size .. what is the optimum value of this x that solr can return each time without going into exceptions On 13 August 2011 19:59, Erick Erickson erickerick...@gmail.com wrote: Jame: You control the number via settings in solrconfig.xml, so it's up to you. Jonathan: Hmmm, that's seems right, after all the deep paging penalty is really about keeping a large sorted array in memory but at least you only pay it once per 10,000, rather than 100 times (assuming page size is 100)... Best Erick On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet jamevaa...@gmail.com wrote: when you say queryResultCache, does it only cache n number of result for the last one query or more than one queries? On 10 August 2011 20:14, simon mtnes...@gmail.com wrote: Worth remembering there are some performance penalties with deep paging, if you use the page-by-page approach. may not be too much of a problem if you really are only looking to retrieve 10K docs. -Simon On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson erickerick...@gmail.com wrote: Well, if you really want to you can specify start=0 and rows=1 and get them all back at once. You can do page-by-page by incrementing the start parameter as you indicated. You can keep from re-executing the search by setting your queryResultCache appropriately, but this affects all searches so might be an issue. Best Erick On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet jamevaa...@gmail.com wrote: hi, i want to retrieve all the data from solr (say 10,000 ids ) and my page size is 1000 . how do i get back the data (pages) one after other ?do i have to increment the start value each time by the page size from 0 and do the iteration ? In this case am i querying the index 10 time instead of one or after first query the result will be cached somewhere for the subsequent pages ? JAME VAALET -- -JAME -- -JAME -- -JAME -- -JAME -- -JAME
Re: solr distributed search don't work
could you please show me your configuration in solrconfig.xml? On Fri, Aug 19, 2011 at 5:31 PM, olivier sallou olivier.sal...@gmail.com wrote: Hi, I do not use spell but I use distributed search, using qt=spell is correct, should not use qt=\spell. For shards, I specify it in solrconfig directly, not in url, but should work the same. Maybe an issue in your spell request handler. 2011/8/19 Li Li fancye...@gmail.com hi all, I follow the wiki http://wiki.apache.org/solr/SpellCheckComponent but there is something wrong. the url given my the wiki is http://solr:8983/solr/select?q=*:*spellcheck=truespellcheck.build=truespellcheck.q=toyataqt=spellshards.qt=spellshards=solr-shard1:8983/solr,solr-shard2:8983/solr but it does not work. I trace the codes and find that qt=spellshards.qt=spell should be qt=/spellshards.qt=/spell After modification of url, It return all documents but nothing about spell check. I debug it and find the AbstractLuceneSpellChecker.getSuggestions() is called.
Re: Boost documents based on the number of their fields
You have different options here. You can give more boost at indexing time to the documents that have set the fields you want. For this to take effect you will have to reindex and set omitNorms=false to the fields you are going to search. This same concept can be applied to boost single fields instead of whole document boost. Another option would be to use boosting queries at search time such as: bq=video:[* TO *]^100 (this gives more boost to the documents that have whatever value in video field). The second one is much easy to play with as you don't have to reindex every time you change a value. On the other said you pay the performance penalty of running one extra query. -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-documents-based-on-the-number-of-their-fields-tp3266875p3267628.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Full sentence spellcheck
I don't think it wil lhelp me, sorry. I just want my query to not be tokenised, I want it to be considered as a full sentence to correct. But thanks for your answers, I keep searching. -- View this message in context: http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267629.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Copyfields
currently our full index takes around half an hour - its a big dataset ~ serveral million records of detailed product information - this is actually very quick compared to another one of my installations. I would be interested to know which of these methods would reduce indexing time the most. N ... On 18 August 2011 17:20, Jaeger, Jay - DOT jay.jae...@dot.wi.gov wrote: I would suggest #3, unless you have some very unusual performance requirements. It has the advantage of isolating your index environment requirements from the database. -Original Message- From: Nicholas Fellows [mailto:n...@djdownload.com] Sent: Thursday, August 18, 2011 8:40 AM To: solr-user@lucene.apache.org Subject: Solr Copyfields Hi I have a question regarding CopyFields in Solr As far as i can tell there are several ways to do the same thing 1) create an alias in the SQL Query and Delta Queries 2) specify multiple fields in the db-data-config.xml having different names for the same column 3) use the copyField directive in schema.xml is there any difference to these approaches? like indexing speed , query performance, memory consumption etc? Kind Regards Nick -- Nick Fellows DJdownload.com --- 10 Greenland Street London NW10ND United Kingdom --- n...@djdownload.com (E) --- www.djdownload.com -- Nick Fellows DJdownload.com --- 10 Greenland Street London NW10ND United Kingdom --- n...@djdownload.com (E) --- www.djdownload.com
Re: Full sentence spellcheck
I haven't used suggest yet. But in spell check if you don't provide spellcheck.q, it will analyze the q parameter by a converter which tokenize your query. else it will use the analyzer of the field to process parameter q. If you don't want to tokenize query, you should pass spellcheck.q and provide your own analyzer such as keyword analyzer. you can achieve this by add str name=fieldTypestring/str lst name=spellchecker str name=classnamesolr.FileBasedSpellChecker/str str name=namefile/str str name=sourceLocationspellings.txt/str str name=characterEncodingUTF-8/str str name=spellcheckIndexDir./spellchecker2/str str name=fieldTypestring/str /lst The wiki says str name=queryAnalyzerFieldTypetextSpell/str But I read the codes of solr 1.4.1 and latest lucene/solr 4 trunk the both use the following codes. I think the wiki is out of date. public static final String FIELD_TYPE = fieldType; fieldTypeName = (String) config.get(FIELD_TYPE); if (core.getSchema().getFieldTypes().containsKey(fieldTypeName)) { FieldType fieldType = core.getSchema().getFieldTypes().get(fieldTypeName); analyzer = fieldType.getQueryAnalyzer(); } if you use file based spell check, it's ok. but for index based, if you tokenize the field, but do not tokenize your query, you still can get correct result. On Fri, Aug 19, 2011 at 5:40 PM, Valentin igorlacro...@gmail.com wrote: I don't think it wil lhelp me, sorry. I just want my query to not be tokenised, I want it to be considered as a full sentence to correct. But thanks for your answers, I keep searching. -- View this message in context: http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267629.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Content recommendation using solr?
Thanks Omri, that looks interesting. What I'm looking for is for movies and close to jinni.com. They seem to be using JEE, but not sure about Solr/Lucene though. Thanks. Arcadius. On Thu, Aug 18, 2011 at 3:25 PM, Omri Cohen omri...@gmail.com wrote: check out OutBrain
Re: Full sentence spellcheck
Li Li wrote: If you don't want to tokenize query, you should pass spellcheck.q and provide your own analyzer such as keyword analyzer. That's already what I do with my suggestTextFull fieldType, added to my searchComponent, no ? I've copied my fieldType and my searchComponent on my first post. The only big difference between your parameters and mine is: str name=characterEncodingUTF-8/str . But I don't think it resolves the problem of the NullPointerException when i use spellcheck.q :/ -- View this message in context: http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267724.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Full sentence spellcheck
NullPointerException? do you have the full exception print stack? On Fri, Aug 19, 2011 at 6:49 PM, Valentin igorlacro...@gmail.com wrote: Li Li wrote: If you don't want to tokenize query, you should pass spellcheck.q and provide your own analyzer such as keyword analyzer. That's already what I do with my suggestTextFull fieldType, added to my searchComponent, no ? I've copied my fieldType and my searchComponent on my first post. The only big difference between your parameters and mine is: str name=characterEncodingUTF-8/str . But I don't think it resolves the problem of the NullPointerException when i use spellcheck.q :/ -- View this message in context: http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267724.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Full sentence spellcheck
My beautiful NullPointer Exception : SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) -- View this message in context: http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267771.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Full sentence spellcheck
Line 476 of SpellCheckComponent.getTokens of mine is assert analyzer != null; it seems our codes' versions don't match. could you decompile your SpellCheckComponent.class ? On Fri, Aug 19, 2011 at 7:23 PM, Valentin igorlacro...@gmail.com wrote: My beautiful NullPointer Exception : SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) -- View this message in context: http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267771.html Sent from the Solr - User mailing list archive at Nabble.com.
query cache result
hi, i understand that queryResultCache tag in solrconfig is the one which determines the cache size of SOLR in jvm. queryResultCache class=*solr.LRUCache* size=*${queryResultCacheSize:0}*initialSize =*${queryResultCacheInitialSize:0}* autowarmCount=* ${queryResultCacheRows:0}* / out of the different attributes what is size? Is it the amount of memory reserved in bytes ? or number of doc ids cached ? or is it the number of queries it will cache? similarly wat is initial size and autowarm depicted in? can some please reply ...
Re: Full sentence spellcheck
or your analyzer is null? any other exception or warning in your log file? On Fri, Aug 19, 2011 at 7:37 PM, Li Li fancye...@gmail.com wrote: Line 476 of SpellCheckComponent.getTokens of mine is assert analyzer != null; it seems our codes' versions don't match. could you decompile your SpellCheckComponent.class ? On Fri, Aug 19, 2011 at 7:23 PM, Valentin igorlacro...@gmail.com wrote: My beautiful NullPointer Exception : SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) -- View this message in context: http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267771.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Full sentence spellcheck
My analyser is not empty : /fieldType name=suggestTextFull class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType/ and i'm sure there is words in it I don't know where to find this file org.apache.solr.handler.component.SpellCheckComponent.getTokens -- View this message in context: http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267833.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Full sentence spellcheck
This might be unrelated, but I had the exact same error yesterday trying to replace the query converter with a custom class I wrote. Ended up, I wasn't properly registering my jar. I'm still testing with jetty, and lib in example is included too late in the startup process. I had to rebundle the war with my jar in the web-inf lib. On Aug 19, 2011, at 8:01 AM, Valentin igorlacro...@gmail.com wrote: My analyser is not empty : /fieldType name=suggestTextFull class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType/ and i'm sure there is words in it I don't know where to find this file org.apache.solr.handler.component.SpellCheckComponent.getTokens -- View this message in context: http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267833.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: paging size in SOLR
1 I don't know, where is it coming from? Looks like you've done stats call on a freshly opened server. 2 512 entries (i.e. results for 512 queries). Each entry is queryResultWindowSize doc IDs. Best Erick On Fri, Aug 19, 2011 at 5:33 AM, jame vaalet jamevaa...@gmail.com wrote: 1 .what does this specify ? queryResultCache class=*solr.LRUCache* size=*${queryResultCacheSize:0}*initialSize =*${queryResultCacheInitialSize:0}* autowarmCount=* ${queryResultCacheRows:0}* / 2. when i say *queryResultCacheSize : 512 *, does it mean 512 queries can be cached or 512 bytes are reserved for caching ? can some please give me an answer ? On 14 August 2011 21:41, Erick Erickson erickerick...@gmail.com wrote: Yep. ResultWindowSize in solrconfig.xml Best Erick On Sun, Aug 14, 2011 at 8:35 AM, jame vaalet jamevaa...@gmail.com wrote: thanks erick ... that means it depends upon the memory allocated to the JVM . going back queryCacheResults factor i have got this doubt .. say, i have got 10 threads with 10 different queries ..and each of them in parallel are searching the same index with millions of docs in it (multisharded ) . now each of the queries have large number of results in it hence got to page them all.. which all thread's (query ) result-set will be cached ? so that subsequent pages can be retrieved quickly ..? On 14 August 2011 17:40, Erick Erickson erickerick...@gmail.com wrote: There isn't an optimum page size that I know of, it'll vary with lots of stuff, not the least of which is whatever servlet container limits there are. But I suspect you can get quite a few (1000s) without too much problem, and you can always use the JSON response writer to pack in more pages with less overhead. You pretty much have to try it and see. Best Erick On Sun, Aug 14, 2011 at 5:42 AM, jame vaalet jamevaa...@gmail.com wrote: speaking about pagesizes, what is the optimum page size that should be retrieved each time ?? i understand it depends upon the data you are fetching back fromeach hit document ... but lets say when ever a document is hit am fetching back 100 bytes worth data from each of those docs in indexes (along with solr response statements ) . this will make 100*x bytes worth data in each page if x is the page size .. what is the optimum value of this x that solr can return each time without going into exceptions On 13 August 2011 19:59, Erick Erickson erickerick...@gmail.com wrote: Jame: You control the number via settings in solrconfig.xml, so it's up to you. Jonathan: Hmmm, that's seems right, after all the deep paging penalty is really about keeping a large sorted array in memory but at least you only pay it once per 10,000, rather than 100 times (assuming page size is 100)... Best Erick On Wed, Aug 10, 2011 at 10:58 AM, jame vaalet jamevaa...@gmail.com wrote: when you say queryResultCache, does it only cache n number of result for the last one query or more than one queries? On 10 August 2011 20:14, simon mtnes...@gmail.com wrote: Worth remembering there are some performance penalties with deep paging, if you use the page-by-page approach. may not be too much of a problem if you really are only looking to retrieve 10K docs. -Simon On Wed, Aug 10, 2011 at 10:32 AM, Erick Erickson erickerick...@gmail.com wrote: Well, if you really want to you can specify start=0 and rows=1 and get them all back at once. You can do page-by-page by incrementing the start parameter as you indicated. You can keep from re-executing the search by setting your queryResultCache appropriately, but this affects all searches so might be an issue. Best Erick On Wed, Aug 10, 2011 at 9:09 AM, jame vaalet jamevaa...@gmail.com wrote: hi, i want to retrieve all the data from solr (say 10,000 ids ) and my page size is 1000 . how do i get back the data (pages) one after other ?do i have to increment the start value each time by the page size from 0 and do the iteration ? In this case am i querying the index 10 time instead of one or after first query the result will be cached somewhere for the subsequent pages ? JAME VAALET -- -JAME -- -JAME -- -JAME -- -JAME -- -JAME
Re: query cache result
Hi Jame, the size for the queryResultCache is the number of queries that will fit into this cache. AutowarmCount is the number of queries that are going to be copyed from the old cache to the new cache when a commit occurrs (actually, the queries are going to be executed again agains the new IndexSearcher, as the results for them may have changed on the new Index). initial size is the initial size of the array, it will start to grow from that size up to size. You may want to see this page of the wiki: http://wiki.apache.org/solr/SolrCaching Regards, Tomás On Fri, Aug 19, 2011 at 8:39 AM, jame vaalet jamevaa...@gmail.com wrote: hi, i understand that queryResultCache tag in solrconfig is the one which determines the cache size of SOLR in jvm. queryResultCache class=*solr.LRUCache* size=*${queryResultCacheSize:0}*initialSize =*${queryResultCacheInitialSize:0}* autowarmCount=* ${queryResultCacheRows:0}* / out of the different attributes what is size? Is it the amount of memory reserved in bytes ? or number of doc ids cached ? or is it the number of queries it will cache? similarly wat is initial size and autowarm depicted in? can some please reply ...
Re: When are you planning to release SolrCloud feature with ZooKeeper?
Whenever 4.0 comes out :) Hard to put a date on that - I believe another SolrCloud push is about to start to cover the indexing side. On Aug 18, 2011, at 11:46 AM, Way Cool wrote: Hi, guys, When are you planning to release the SolrCloud feature with ZooKeeper currently in trunk? The new admin interface looks great. Great job. Thanks, YH - Mark Miller lucidimagination.com
Re: Full sentence spellcheck
I was on my phone before, and didn't see the whole thread. I wanted the same thing, to have spellchecker not tokenize. See the Suggester Issues thread for my junky replacement class that doesn't tokenize (as far as I can tell from a few minutes of testing). will On Aug 19, 2011, at 8:35 AM, Will Oberman wrote: This might be unrelated, but I had the exact same error yesterday trying to replace the query converter with a custom class I wrote. Ended up, I wasn't properly registering my jar. I'm still testing with jetty, and lib in example is included too late in the startup process. I had to rebundle the war with my jar in the web-inf lib. On Aug 19, 2011, at 8:01 AM, Valentin igorlacro...@gmail.com wrote: My analyser is not empty : /fieldType name=suggestTextFull class=solr.TextField analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType/ and i'm sure there is words in it I don't know where to find this file org.apache.solr.handler.component.SpellCheckComponent.getTokens -- View this message in context: http://lucene.472066.n3.nabble.com/Full-sentence-spellcheck-tp3265257p3267833.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filtering results based on a set of values for a field
Good luck, and let us know what the results are. About dropping the cache.. That shouldn't be a problem, it should just be computed when your component is called the first time, so starting the server (or opening a new searcher) should re-compute it. Your filters shouldn't be very big, just maxDocs/8 bytes long... By the way, there exist user caches (see solrconfig.xml) that you can use for whatever you want, so you could consider stashing your filters in there. The neat thing is that they get notified whenever the searcher is opened (?) and you can regenerate the data held there. MIght be more convenient than mucking with filter query caching... Best Erick On Fri, Aug 19, 2011 at 2:58 AM, Tomas Zerolo tomas.zer...@axelspringer.de wrote: On Thu, Aug 18, 2011 at 02:32:48PM -0400, Erick Erickson wrote: Hmmm, I'm still not getting it... You have one or more lists. These lists change once a month or so. Are you trying to include or exclude the documents in these lists? In our specific case to include *only* the documents having a value of an attribute (author) in this list (the user decides at query time which of those lists to use). But we do expect the problem to become more general over time... And do the authors you want to include or exclude change on a per-query basis or would you be all set if you just had a filter that applied to all the authors on a particular list? No. ATM there are two fixed lists (in the sense that they are updated like monthly. One problem: the document basis itself is huge (in the abouts of 3.5 million). Re-indexing is a painful exercise taking days, so we tend not to do it too often ;-) But I *think* what you want is a SearchComponent that implements your Filter. You can see various examples of how to add components to a seach handler in the solrconfig.xml file. Thanks a lot for the pointer. Rushing to read on it. WARNING: Haven't done this myself, so I'm partly guessing here. Hey: I asked for pointers and you're giving me some, so I'm a happy man now :-) Although here's a hint that someone else has used this approach: http://www.mail-archive.com/solr-user@lucene.apache.org/msg54240.html Thanks again And you'll want to insure that the Filter is cached so you don't have to compute it more than once. Yes, I hope that will be the trick giving us the needed boost. Somehow we'll have to figure out how to drop the cache when a new version of the list arrives (without killing everyone in the building). I'll sure report back. Regards -- tomás
Re: query cache result
wiki says *size The maximum number of entries in the cache. andqueryResultCache This cache stores ordered sets of document IDs — the top N results of a query ordered by some criteria. * doesn't it mean number of document ids rather than number of queries ? 2011/8/19 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Jame, the size for the queryResultCache is the number of queries that will fit into this cache. AutowarmCount is the number of queries that are going to be copyed from the old cache to the new cache when a commit occurrs (actually, the queries are going to be executed again agains the new IndexSearcher, as the results for them may have changed on the new Index). initial size is the initial size of the array, it will start to grow from that size up to size. You may want to see this page of the wiki: http://wiki.apache.org/solr/SolrCaching Regards, Tomás On Fri, Aug 19, 2011 at 8:39 AM, jame vaalet jamevaa...@gmail.com wrote: hi, i understand that queryResultCache tag in solrconfig is the one which determines the cache size of SOLR in jvm. queryResultCache class=*solr.LRUCache* size=*${queryResultCacheSize:0}*initialSize =*${queryResultCacheInitialSize:0}* autowarmCount=* ${queryResultCacheRows:0}* / out of the different attributes what is size? Is it the amount of memory reserved in bytes ? or number of doc ids cached ? or is it the number of queries it will cache? similarly wat is initial size and autowarm depicted in? can some please reply ... -- -JAME
Re: Solr 3.3 crashes after ~18 hours?
Am 10.08.2011 17:11, schrieb Yonik Seeley: On Wed, Aug 10, 2011 at 11:00 AM, alexander sulza.s...@digiconcept.net wrote: Okay, with this command it hangs. It doesn't look like a hang from this thread dump. It doesn't look like any solr requests are executing at the time the dump was taken. Did you do this from the command line? curl http://localhost:8983/solr/update?commit=true; Are you saying that the curl command just hung and never returned? -Yonik http://www.lucidimagination.com Also: I managed to get a Thread Dump (attached). regards Am 05.08.2011 15:08, schrieb Yonik Seeley: On Fri, Aug 5, 2011 at 7:33 AM, alexander sulza.s...@digiconcept.net wrote: Usually you get a XML-Response when doing commits or optimize, in this case I get nothing in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T load forever or anything. It doesn't hang! I just get a blank page / empty response. Sounds like you are doing it from a browser? Can you try it from the command line? It should give back some sort of response (or hang waiting for a response). curl http://localhost:8983/solr/update?commit=true; -Yonik http://www.lucidimagination.com I use the stuff in the example folder, the only changes i made was enable logging and changing the port to 8985. I'll try getting a thread dump if it happens again! So far its looking good with having allocated more memory to it. Am 04.08.2011 16:08, schrieb Yonik Seeley: On Thu, Aug 4, 2011 at 8:09 AM, alexander sulza.s...@digiconcept.net wrote: Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process is still there and http GET pings would return 200 but when i try visiting /solr/admin, I'd get a blank page! The server ignores any incoming updates or commits, ignores means what? The request hangs? If so, could you get a thread dump? Do queries work (like /solr/select?q=*:*) ? thous throwing no errors, no 503's.. It's like the server has a blackout and stares blankly into space. Are you using a different servlet container than what is shipped with solr? If you did start with the solr example server, what jetty configuration changes have you made? -Yonik http://www.lucidimagination.com Sigh it happened again, but I have a clue: before the crash I was deleting some entries but haven't optimized afterwards, then, when I tried indexing something, solr crashed again (responsive but just blank/empty returns). I've just tried it again (doing the curl command while solr is its zombie state) and i get the following reply from curl: curl: (52) Empty reply from server Also, I updated my Java so the HotSpot version is now 20.1-b3
Re: query cache result
From my understanding, seeing the cache as a set of key-value pairs, this cache has the query as key and the list of IDs resulting from the query as values. When the exact same query is issued, it will be found as key in this cache, and Solr will already have the list of IDs that match it. If you set the size of this cache to 50, that means that Solr will keep in memory the last 50 queries with their list of resulting document IDs. The number of IDs per query can be configured with the parameter queryResultWindowSize http://wiki.apache.org/solr/SolrCaching#queryResultWindowSize On Fri, Aug 19, 2011 at 10:34 AM, jame vaalet jamevaa...@gmail.com wrote: wiki says *size The maximum number of entries in the cache. andqueryResultCache This cache stores ordered sets of document IDs — the top N results of a query ordered by some criteria. * doesn't it mean number of document ids rather than number of queries ? 2011/8/19 Tomás Fernández Löbbe tomasflo...@gmail.com Hi Jame, the size for the queryResultCache is the number of queries that will fit into this cache. AutowarmCount is the number of queries that are going to be copyed from the old cache to the new cache when a commit occurrs (actually, the queries are going to be executed again agains the new IndexSearcher, as the results for them may have changed on the new Index). initial size is the initial size of the array, it will start to grow from that size up to size. You may want to see this page of the wiki: http://wiki.apache.org/solr/SolrCaching Regards, Tomás On Fri, Aug 19, 2011 at 8:39 AM, jame vaalet jamevaa...@gmail.com wrote: hi, i understand that queryResultCache tag in solrconfig is the one which determines the cache size of SOLR in jvm. queryResultCache class=*solr.LRUCache* size=*${queryResultCacheSize:0}*initialSize =*${queryResultCacheInitialSize:0}* autowarmCount=* ${queryResultCacheRows:0}* / out of the different attributes what is size? Is it the amount of memory reserved in bytes ? or number of doc ids cached ? or is it the number of queries it will cache? similarly wat is initial size and autowarm depicted in? can some please reply ... -- -JAME
Re: Solr 3.3 crashes after ~18 hours?
Am 19.08.2011 15:48, schrieb alexander sulz: Am 10.08.2011 17:11, schrieb Yonik Seeley: On Wed, Aug 10, 2011 at 11:00 AM, alexander sulza.s...@digiconcept.net wrote: Okay, with this command it hangs. It doesn't look like a hang from this thread dump. It doesn't look like any solr requests are executing at the time the dump was taken. Did you do this from the command line? curl http://localhost:8983/solr/update?commit=true; Are you saying that the curl command just hung and never returned? -Yonik http://www.lucidimagination.com Also: I managed to get a Thread Dump (attached). regards Am 05.08.2011 15:08, schrieb Yonik Seeley: On Fri, Aug 5, 2011 at 7:33 AM, alexander sulza.s...@digiconcept.net wrote: Usually you get a XML-Response when doing commits or optimize, in this case I get nothing in return, but the site ( http://[...]/solr/update?optimize=true ) DOESN'T load forever or anything. It doesn't hang! I just get a blank page / empty response. Sounds like you are doing it from a browser? Can you try it from the command line? It should give back some sort of response (or hang waiting for a response). curl http://localhost:8983/solr/update?commit=true; -Yonik http://www.lucidimagination.com I use the stuff in the example folder, the only changes i made was enable logging and changing the port to 8985. I'll try getting a thread dump if it happens again! So far its looking good with having allocated more memory to it. Am 04.08.2011 16:08, schrieb Yonik Seeley: On Thu, Aug 4, 2011 at 8:09 AM, alexander sulza.s...@digiconcept.net wrote: Thank you for the many replies! Like I said, I couldn't find anything in logs created by solr. I just had a look at the /var/logs/messages and there wasn't anything either. What I mean by crash is that the process is still there and http GET pings would return 200 but when i try visiting /solr/admin, I'd get a blank page! The server ignores any incoming updates or commits, ignores means what? The request hangs? If so, could you get a thread dump? Do queries work (like /solr/select?q=*:*) ? thous throwing no errors, no 503's.. It's like the server has a blackout and stares blankly into space. Are you using a different servlet container than what is shipped with solr? If you did start with the solr example server, what jetty configuration changes have you made? -Yonik http://www.lucidimagination.com Sigh it happened again, but I have a clue: before the crash I was deleting some entries but haven't optimized afterwards, then, when I tried indexing something, solr crashed again (responsive but just blank/empty returns). I've just tried it again (doing the curl command while solr is its zombie state) and i get the following reply from curl: curl: (52) Empty reply from server Also, I updated my Java so the HotSpot version is now 20.1-b3 using lsof I think I pinned down the problem: too many open files! I already doubled from 512 to 1024 once but it seems there are many SOCKETS involved, which are listed as can't identify protocol, instead of real files. over time, the list grows and grows with these entries until.. it crashs. So Ive read several times the fix for this problem is to set the limit to a ridiculous high number but that seems a little bit of a crude fix. Why so many open sockets in the first place?
Re: Solr 3.3 crashes after ~18 hours?
On Fri, Aug 19, 2011 at 10:36 AM, alexander sulz a.s...@digiconcept.net wrote: using lsof I think I pinned down the problem: too many open files! I already doubled from 512 to 1024 once but it seems there are many SOCKETS involved, which are listed as can't identify protocol, instead of real files. over time, the list grows and grows with these entries until.. it crashs. So Ive read several times the fix for this problem is to set the limit to a ridiculous high number but that seems a little bit of a crude fix. Why so many open sockets in the first place? What are you using as a client to talk to solr? You need to look at both the update side and the query side. Using persistent connections is the best all-around, but if not, be sure to close the connections in the client. -Yonik http://www.lucidimagination.com
Re: suggester issues
As far as I checked creating a custom query converter is the only way to make this work. Unfortunately I have some problems with running it - after creating a JAR with my class (Im using your source code, obviously besides package and class names) and throwing it into the lib dir I've added queryConverter name=queryConverter class=mypackage.MySpellingQueryConverter/ to solrconfig.xml. I get a SEVERE: org.apache.solr.common.SolrException: Error Instantiating QueryConverter, mypackage.MySpellingQueryConverter is not a org.apache.solr.spelling.QueryConverter. What am I doing wrong? -- From: William Oberman ober...@civicscience.com Sent: Thursday, August 18, 2011 10:35 PM To: solr-user@lucene.apache.org Subject: Re: suggester issues I tried this: package com.civicscience; import java.util.ArrayList; import java.util.Collection; import java.util.Collections; import org.apache.lucene.analysis.Token; import org.apache.solr.spelling.QueryConverter; /** * Converts the query string to a Collection of Lucene tokens. **/ public class SpellingQueryConverter extends QueryConverter { /** * Converts the original query string to a collection of Lucene Tokens. * @param original the original query string * @return a Collection of Lucene Tokens */ @Override public CollectionToken convert(String original) { if (original == null) { return Collections.emptyList(); } CollectionToken result = new ArrayListToken(); Token token = new Token(original, 0, original.length(), word); result.add(token); return result; } } And added it to the classpath, and now it does what I expect. will On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote: It can be done, I did that with shingles, but it's not the way it's meant to be. The main problem with suggester is that we want compound words and we never get them. I try to get internet explorer but when i enter in the second word, internet e the suggester never finds explorer. 2011/8/18 oberman_cs ober...@civicscience.com I was trying to deal with the exact same issue, with the exact same results. Is there really no way to feed a phrase into the suggester (spellchecker) without it splitting the input phrase into words? -- View this message in context: http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: suggester issues
Hard to say, so I'll list the exact steps I took: -Downloaded apache-solr-3.3.0 (I like to stick with releases vs. svn) -Untar and cd -ant -Wrote my class below (under a peer directory in apache-solr-3.3.0) -javac -cp ../dist/apache-solr-core-3.3.0.jar:../lucene/build/lucene-core-3.3-SNAPSHOT.jar com/civicscience/SpellingQueryConverter.java -jar cf cs.jar com -Unzipped solr.war (under example) -Added my cs.jar to lib (under web-inf) -Rezipped solr.war -Added: queryConverter name=queryConverter class=com.civicscience.SpellingQueryConverter/ to solrconfig.xml -Restarted jetty And, that seemed to all work. will On Aug 19, 2011, at 10:44 AM, Kuba Krzemien wrote: As far as I checked creating a custom query converter is the only way to make this work. Unfortunately I have some problems with running it - after creating a JAR with my class (Im using your source code, obviously besides package and class names) and throwing it into the lib dir I've added queryConverter name=queryConverter class=mypackage.MySpellingQueryConverter/ to solrconfig.xml. I get a SEVERE: org.apache.solr.common.SolrException: Error Instantiating QueryConverter, mypackage.MySpellingQueryConverter is not a org.apache.solr.spelling.QueryConverter. What am I doing wrong? -- From: William Oberman ober...@civicscience.com Sent: Thursday, August 18, 2011 10:35 PM To: solr-user@lucene.apache.org Subject: Re: suggester issues I tried this: package com.civicscience; import java.util.ArrayList; import java.util.Collection; import java.util.Collections; import org.apache.lucene.analysis.Token; import org.apache.solr.spelling.QueryConverter; /** * Converts the query string to a Collection of Lucene tokens. **/ public class SpellingQueryConverter extends QueryConverter { /** * Converts the original query string to a collection of Lucene Tokens. * @param original the original query string * @return a Collection of Lucene Tokens */ @Override public CollectionToken convert(String original) { if (original == null) { return Collections.emptyList(); } CollectionToken result = new ArrayListToken(); Token token = new Token(original, 0, original.length(), word); result.add(token); return result; } } And added it to the classpath, and now it does what I expect. will On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote: It can be done, I did that with shingles, but it's not the way it's meant to be. The main problem with suggester is that we want compound words and we never get them. I try to get internet explorer but when i enter in the second word, internet e the suggester never finds explorer. 2011/8/18 oberman_cs ober...@civicscience.com I was trying to deal with the exact same issue, with the exact same results. Is there really no way to feed a phrase into the suggester (spellchecker) without it splitting the input phrase into words? -- View this message in context: http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: Solr 3.3 crashes after ~18 hours?
Am 19.08.2011 16:43, schrieb Yonik Seeley: On Fri, Aug 19, 2011 at 10:36 AM, alexander sulza.s...@digiconcept.net wrote: using lsof I think I pinned down the problem: too many open files! I already doubled from 512 to 1024 once but it seems there are many SOCKETS involved, which are listed as can't identify protocol, instead of real files. over time, the list grows and grows with these entries until.. it crashs. So Ive read several times the fix for this problem is to set the limit to a ridiculous high number but that seems a little bit of a crude fix. Why so many open sockets in the first place? What are you using as a client to talk to solr? You need to look at both the update side and the query side. Using persistent connections is the best all-around, but if not, be sure to close the connections in the client. -Yonik http://www.lucidimagination.com I use PHP to talk to solr, this one to be exact http://code.google.com/p/solr-php-client/ version r22 i guess. I'll try updating it and see what happens..
Re: hl.useFastVectorHighlighter, fragmentsBuilder and HighlightingParameters
Hi Koji, thanks, it's loading right now. Can't say it's really working though, but I believe those are other issues with FastVectorHighlighter 2011/8/18 Koji Sekiguchi k...@r.email.ne.jp (11/08/19 4:14), Alexei Martchenko wrote: Hi Koji thanks for the reply. MyfragmentsBuilder is defined directly inconfig. SOLR 3.3 warns me highlighting is a deprecated form do you think it is in the wrong place? Hi Alexei, Yes, it is incorrect. What deprecate is that highlighting tag just under config directly. After 3.1, it needs to be under searchComponent for HighlightComponent. Please consult solrconfig.xml in example 3.3. koji -- Check out Query Log Visualizer http://www.rondhuit-demo.com/**loganalyzer/loganalyzer.htmlhttp://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/ -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: File based index doesn't work in spellcheck component
I am using Nutch to crawl and Solr for searching. The search has been successfully implemented. Now I want a file based Suggestion or a Do you mean Feature? implemented. It is more or less like a Spell checker. For the same I am making the requisite changes to the SolrConfig.xml and the Schema.xml for the Solr, but it fails when I am re-indexing it gain to get the new implementation. Please let me know how that can be corrected and also, how can I have the suggestion displayed using Jsp over my application. I can share part of the codes changed later if you intend to help me on this. Thanks in advance, Anupam -- View this message in context: http://lucene.472066.n3.nabble.com/File-based-index-doesn-t-work-in-spellcheck-component-tp489070p3268423.html Sent from the Solr - User mailing list archive at Nabble.com.
How to implement Spell Checker using Solr?
I am using Nutch to crawl and Solr for searching. The search has been successfully implemented. Now I want a file based Suggestion or a Do you mean Feature? implemented. It is more or less like a Spell checker. For the same I am making the requisite changes to the SolrConfig.xml and the Schema.xml for the Solr, but it fails when I am re-indexing it gain to get the new implementation. Please let me know how that can be corrected and also, how can I have the suggestion displayed using Jsp over my application. I can share part of the codes changed later if you intend to help me on this. Thanks in advance, Anupam -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3268450.html Sent from the Solr - User mailing list archive at Nabble.com.
Requiring multiple matches of a term
Is there a way to specify in a query that a term must match at least X times in a document, where X is some value greater than 1? For example, I want to only get documents that contain the word dog three times. I've thought that using a proximity query with an arbitrary large distance value might do it: dog dog dog~10 And that does seem to return the results I expect. But when I try for more than three, I start getting unexpected result counts as I change the proximity value: dog dog dog dog~10 returns 6403 results dog dog dog dog~20 returns 9291 results dog dog dog dog~30 returns 6395 results Anyone ever do something like this and know how I can accomplish this? -Michael
Re: How to implement Spell Checker using Solr?
On Fri, Aug 19, 2011 at 9:26 PM, anupamxyz cse.anu...@gmail.com wrote: I am using Nutch to crawl and Solr for searching. The search has been successfully implemented. Now I want a file based Suggestion or a Do you mean Feature? implemented. It is more or less like a Spell checker Um, not quite. At least as per my understanding they are very different things. Please take a look at: http://wiki.apache.org/solr/MoreLikeThis http://wiki.apache.org/solr/SpellCheckComponent For the same I am making the requisite changes to the SolrConfig.xml and the Schema.xml for the Solr, but it fails when I am re-indexing it gain to get the new implementation. Please let me know how that can be corrected and also, how can I have the suggestion displayed using Jsp over my application. I can share part of the codes changed later if you intend to help me on this. Please share with us what changes you are making, and what does it fails mean, i.e., show us the configuration files, error messages, etc., maybe through pastebin.com. You might wish to take a look at http://wiki.apache.org/solr/UsingMailingLists Regards, Gora
Please register me
Please register me
Re: How to implement Spell Checker using Solr?
Both Nutch and Solr can be used as per the need. http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ . So the search is implemented and I am able to search on the values. Now I need the SpellChecker to be implemented. The changes are exactly as per the ones listed in http://wiki.apache.org/solr/SpellCheckComponent . I will share the log details with you by Monday. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-implement-Spell-Checker-using-Solr-tp3268450p3268695.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr support for multiple points (latitude-longitude) for a document
Hi all, I was going through Solr 3.3.0 and it seems there's still no support for performing GeoSpatial queries on documents that have more than one latitude-longitude. The multi field value is set to false everywhere. We absolutely need this feature. I had a look at https://issues.apache.org/jira/browse/SOLR-2155 in which David Smiley was trying to implement a workable solution but unless I'm mistaken it was never committed. Does anyone know of a way to get a working solution with that fix? Thanks
Re: Solr support for multiple points (latitude-longitude) for a document
Hi. Either port it to Solr 3, or use Solr 4 (trunk). I know and have used a Metacarta solution but that is also based on Solr 4 and I don't think they've back-ported it. I have no clue what they charge for it or where to get it; I have it as part of their larger solution. There's also a small little-known set of source files tar'ed up and attached to SOLR-773 as solrGeoQuery.tar that I've examined; it attempts to solve this problem. It is not particularly fast and I believe there are bugs in +/- 10 degrees latitude but I haven't actually confirmed it. That was actually the last thing I looked at before embarking on SOLR-2155 (geohashes). ~ David On Aug 19, 2011, at 1:47 PM, Jean Croteau wrote: Hi all, I was going through Solr 3.3.0 and it seems there's still no support for performing GeoSpatial queries on documents that have more than one latitude-longitude. The multi field value is set to false everywhere. We absolutely need this feature. I had a look at https://issues.apache.org/jira/browse/SOLR-2155 in which David Smiley was trying to implement a workable solution but unless I'm mistaken it was never committed. Does anyone know of a way to get a working solution with that fix? Thanks
Solr performance
Hi I have one instance of solr running on JBoss with the following schema and partial config: Schema: schema name=users_szukacz version=1.4 - types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=int class=solr.TrieIntField omitNorms=true precisionStep=1 positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField omitNorms=true positionIncrementGap=0/ - fieldType name=text_pl class=solr.TextField positionIncrementGap=100 - analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer - analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType /types - fields field name=user_id type=int indexed=true required=true/ field name=birth_date type=date indexed=true stored=false/ field name=city type=text_pl indexed=true stored=false/ field name=sex type=text_pl indexed=true stored=false/ field name=show_search type=int indexed=true stored=false/ field name=confirmed type=int indexed=true stored=false/ field name=search_text type=text_pl indexed=true/ /fields uniqueKeyuser_id/uniqueKey defaultSearchFieldsearch_text/defaultSearchField solrQueryParser defaultOperator=AND/ /schema Config: directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ mergeFactor10/mergeFactor ramBufferSizeMB1024/ramBufferSizeMB maxBufferedDocs1000/maxBufferedDocs maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout filterCache class=solr.FastLRUCache size=100 initialSize=100 autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=1300 initialSize=1300 autowarmCount=0/ Index has 41 000 000 documents and 9 GB size. For query like: 1)* q=Jarecki+Jan* fq=sex:Mfq=confirmed:1fq=show_search:3fl=user_idstart=0rows=10wt=jsonversion=2.2 server reaches avarage* 90 query/s* on 4 theards and is very small for me. For query with filer on filed city: 2) ex. fl=user_idindent=onstart=0q=Tarkowski+Bartłomiejwt=json* fq=city:Kwidzyn* fq=sex:Mfq=confirmed:1fq=show_search:3version=2.2rows=10 server reaches 800 query/s. Do you have any advice to speed the search for first query? Is this speed is the norm? Server has 32GB RAM and 4 processors Intel Xeon 2.5GHz.
Solr performance for query without filter
Hi I have one instance of solr running on JBoss with the following schema and partial config: Schema: schema name=users_szukacz version=1.4 − types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=int class=solr.TrieIntField omitNorms=true precisionStep=1 positionIncrementGap=0/ fieldType name=date class=solr.TrieDateField omitNorms=true positionIncrementGap=0/ − fieldType name=text_pl class=solr.TextField positionIncrementGap=100 − analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer − analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType /types − fields field name=user_id type=int indexed=true required=true/ field name=birth_date type=date indexed=true stored=false/ field name=city type=text_pl indexed=true stored=false/ field name=sex type=text_pl indexed=true stored=false/ field name=show_search type=int indexed=true stored=false/ field name=confirmed type=int indexed=true stored=false/ field name=search_text type=text_pl indexed=true/ /fields uniqueKeyuser_id/uniqueKey defaultSearchFieldsearch_text/defaultSearchField solrQueryParser defaultOperator=AND/ /schema Config: directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory}/ mergeFactor10/mergeFactor ramBufferSizeMB1024/ramBufferSizeMB maxBufferedDocs1000/maxBufferedDocs maxFieldLength1/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout filterCache class=solr.FastLRUCache size=100 initialSize=100 autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=1300 initialSize=1300 autowarmCount=0/ Index has 41 000 000 documents and 9 GB size. For query like: 1) *q=Jarecki+Jan*fq=sex:Mfq=confirmed:1fq=show_search:3fl=user_idstart=0rows=10wt=jsonversion=2.2 server reaches avarage *90 query/s* on 4 theards and is very small for me. For query with filer on filed city: 2) ex. fl=user_idindent=onstart=0q=Tarkowski+Bartłomiejwt=json*fq=city:Kwidzyn*fq=sex:Mfq=confirmed:1fq=show_search:3version=2.2rows=10 server reaches 800 query/s. Do you have any advice to speed the search for first query? Is this speed is the norm? Server has 32GB RAM and 4 processors Intel Xeon 2.5GHz. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-performance-for-query-without-filter-tp3267785p3267785.html Sent from the Solr - User mailing list archive at Nabble.com.
Terms.regex performance issue
As I want to use it in an Autocomplete it has to be fast. Terms.prefix gets results in around 100 milliseconds, while terms.regex is 10 to 20 times slower. Not storing the field made it a bit faster but not enough. The index is on a seperate core and only about 5Mb big. Are there some tricks to make it work a lot faster? Or do I have to switch to ngrams or something? -- View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3268994.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Date Facet Question
: when the response comes back the facet names are : : 2010-08-14T01:50:58.813Z ... : instead of something like : : NOW-11MONTH ... : where as facet queries if specifying a set of facet queries like : : datetime:[NOW-1YEAR TO NOW] ... : the labels come back just as specified. Is there a way to make date : range queries come back using the query specified and not the parsed : date? No. If dates were the only factor here we could maybe add an option for that but the faceting code is all generalized now to support all numerics, so it wouldn't relaly make sense in general. it's also not clear how an option like this would work if/when stuff like SOLR-2366 get implemented - returning the concrete value used as the lowerbound of the range is un-ambiguious. the functional difference between facet.range and facet.query is pretty signifigant, so it's kind of an apples/oranges thing to compare their output -- with facet.query you can specify any arbitrary query expression your heart desires, and that literal unparsed query string is again used as the constraint key in the resulting NamedList because it's as unambiguious as we can be given the circumstances. -Hoss
Re: A strange Exception in Solr 1.4
Can you reproduce this error consistently? Can you try using the CheckIndex tool on your index to verify that it hasn't been corrupted in some way? :2011-08-15 10:31:24,968 ERROR [org.apache.solr.core.SolrCore] - : java.lang.NullPointerException : at sun.nio.ch.Util.free(Util.java:199) : at sun.nio.ch.Util.offerFirstTemporaryDirectBuffer(Util.java:176) : at sun.nio.ch.IOUtil.read(IOUtil.java:181) : at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:612) : at : org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161) : at : org.apache.lucene.store.BufferedIndexInput.readBytes(BufferedIndexInput.java:136) : at : org.apache.lucene.index.CompoundFileReader$CSIndexInput.readInternal(CompoundFileReader.java:247) : at : org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157) : at : org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) : at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80) : at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:64) : at : org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:129) : at : org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:160) : at : org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232) : at : org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179) : at : org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:975) : at : org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:627) : at : org.apache.lucene.index.FilterIndexReader.docFreq(FilterIndexReader.java:194) : at org.apache.lucene.index.MultiReader.docFreq(MultiReader.java:344) : at : org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:308) : at : org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:147) : at : org.apache.lucene.search.Similarity.idfExplain(Similarity.java:765) : at : org.apache.lucene.search.TermQuery$TermWeight.init(TermQuery.java:46) : at : org.apache.lucene.search.TermQuery.createWeight(TermQuery.java:146) : at : org.apache.lucene.search.BooleanQuery$BooleanWeight.init(BooleanQuery.java:184) : at : org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:415) : at org.apache.lucene.search.Query.weight(Query.java:99) : at org.apache.lucene.search.Searcher.createWeight(Searcher.java:230) : at org.apache.lucene.search.Searcher.search(Searcher.java:171) : at : org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) : at : org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:884) : at : org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) : at : org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) : at : org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) : at : org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) : at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) : at : org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139) : at : org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) : at : org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) : at : com.taobao.terminator.core.realtime.DefaultSearchService.query(DefaultSearchService.java:197) : at sun.reflect.GeneratedMethodAccessor73.invoke(Unknown Source) : at : sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) : at java.lang.reflect.Method.invoke(Method.java:597) : at : com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest0(ProviderProcessor.java:222) : at : com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest(ProviderProcessor.java:174) : at : com.taobao.hsf.rpc.tbremoting.provider.ProviderProcessor.handleRequest(ProviderProcessor.java:41) : at : com.taobao.remoting.impl.DefaultMsgListener$1ProcessorExecuteTask.run(DefaultMsgListener.java:131) : at : java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) : at : java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) : at java.lang.Thread.run(Thread.java:662) : : : : Thank u : : : : : : allen.Fu : -Hoss
Re: SolrJ and ContentStreams
: I'm considering to use SolrJ to run queries in a MLT fashion against my Solr : server. I saw that there is already an open bug filed in Jira : (https://issues.apache.org/jira/browse/SOLR-1085). note that that issue is really just about having convinience classes for executing MLT style requests and parsing hte responses. Just because those convinience methods/classes don't exist yet doesn't mean you can't use SolrJ to send MLT requests. You can instantiate a QueryRequest object with the SolrParams you want to specify for MLT, and then extract the data you want directly from the QueryResponse. The only potentially tricky part is sending an arbitrary ContentStream. in your example you are using the stream.body for a short string -- this is easy to do as a param when building a QueryRequest object, but if you want to provide a much larger stream of data you can subclass QueryRequest to add your own ContentStream (from a File Source or whatever) to the Collection it will stream to hte server. For that matter, even though it's name might fool you, i'm pretty sure you a ContentStreamUpdateRequest instance with the appropriate URL for your MLT handler will do exactly what you want. https://lucene.apache.org/solr/api/org/apache/solr/client/solrj/request/ContentStreamUpdateRequest.html https://lucene.apache.org/solr/api/org/apache/solr/client/solrj/request/AbstractUpdateRequest.html#setParam%28java.lang.String,%20java.lang.String%29 -Hoss
Re: Terms.regex performance issue
TermsComponent uses java.util.regex which is not particulary fast. If the number of terms grows your CPU is going to overheat. I'd prefer an analyzer approach. As I want to use it in an Autocomplete it has to be fast. Terms.prefix gets results in around 100 milliseconds, while terms.regex is 10 to 20 times slower. Not storing the field made it a bit faster but not enough. The index is on a seperate core and only about 5Mb big. Are there some tricks to make it work a lot faster? Or do I have to switch to ngrams or something? -- View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994 p3268994.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Requiring multiple matches of a term
FWIW: i think this is a really cool and interesting question. : Is there a way to specify in a query that a term must match at least X : times in a document, where X is some value greater than 1? at the moment, i think your phrase query approach is really the only viable way (allthough it did get me thinking about how hard it would be to implement this at a lower level ... i'll see if i can work out a patch) : But when I try for more than three, I start getting unexpected result : counts as I change the proximity value: Hmmm... i would think the phrase query approach should work, but it's totally possible that there's something odd in the way phrase queries work that could cause a problem -- the best way to sanity test something like this is to try a really small self contained example that you can post for other people to try. If you said 2 clauses work, but not 3 i would guess that maybe there is an terms out of order type issue involved, but 3 works not 4 smells fishy. -Hoss
Re: Terms.regex performance issue
: Subject: Terms.regex performance issue : : As I want to use it in an Autocomplete it has to be fast. Terms.prefix gets : results in around 100 milliseconds, while terms.regex is 10 to 20 times : slower. can you elaborate on how you are using terms.regex? what does your regex look like? .. particularly if your usecase is autocomplete terms.prefix seems like an odd choice. Possible XY Problem? https://people.apache.org/~hossman/#xyproblem Have you looked at using the Suggester plugin? https://wiki.apache.org/solr/Suggester -Hoss
Re: Solr performance for query without filter
: Index has 41 000 000 documents and 9 GB size. For query like: : 1) : *q=Jarecki+Jan*fq=sex:Mfq=confirmed:1fq=show_search:3fl=user_idstart=0rows=10wt=jsonversion=2.2 : : server reaches avarage *90 query/s* on 4 theards and is very small for me. : : For query with filer on filed city: : 2) ex. : fl=user_idindent=onstart=0q=Tarkowski+Bartłomiejwt=json*fq=city:Kwidzyn*fq=sex:Mfq=confirmed:1fq=show_search:3version=2.2rows=10 : : server reaches 800 query/s. : : Do you have any advice to speed the search for first query? Is this speed is : the norm? norm is hard to define, but one key element you left out is how many docs are (typically) matched by requests of type #1 vs type #2. and how good a job your city filters do in partitioning the total number of documents. I suspect that your city filters are heavily reused (ie: good cache hit rates) and do a really good job of cutting down the number of matching docs -- (ie: num docs matching fq=sex:Mfq=confirmed:1fq=show_search:3 is probably significantly higher then num docs matching fq=sex:Mfq=confirmed:1fq=show_search:3fq=city:Kwidzyn ). In which case it makes sense that type#1 queries would take a lot longer on average -- there are a lot more docs to consider when evaluating the q to find matches. -Hoss
Re: Content recommendation using solr?
: Initially, I was looking at http://wiki.apache.org/solr/MoreLikeThis : : Then, it turned out that most implementations are based on a combination of : Mahout, Solr and Hadoop. I think you'll find that most serious (for some definition) content recomendation engines use various ML algorithms (ie: mahout) to crucnh both the content and the (aggregate) user behavior data to generate people who like this thing also like... and people like you also tend to like... type recomendations. But Solr, with and w/o MLT, can be very handy for things similar to this thing are... type depending on how you use it. (I don't think i'm allowed to name names, but i can think of a couple of major www sites of which i have first hand knowledge that use MLT and/or customized things like MLT to serach their Solr/Lucene indexes for things similar to this thing you are currently looking at). -Hoss
Re: Terms.regex performance issue
Terms.prefix was just to compare performance. The use case was terms.regex=.*query.* And as Markus pointed out, this will prolly remain a bottleneck. I looked at the Suggester. But like many others I have been struggling to make it useful. It needs a custom queryConverter to give proper suggestions, but I havent tried this yet. -- View this message in context: http://lucene.472066.n3.nabble.com/Terms-regex-performance-issue-tp3268994p3269628.html Sent from the Solr - User mailing list archive at Nabble.com.