Solr 3.1 returning entire highlighted field
Hi, After upgrading from Solr 1.4.0 to 3.1, are highlighting has gone from highlighting short pieces of text to displaying what appears to be the entire contents of the highlighted field. The request using solrj is setting the following: params.setHighlight(true); params.setHighlightSnippets(3); params.set("hl.fl", "content_highlight"); From solrconfig dismax regex spellcheck 100 70 0.5 [-\w ,/\n\"']{20,200} From schema Any pointers anybody can provide would be greatly appreciated. Jake
RE: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?
[] ASF Mirrors (linked in our release announcements or via the Lucene website) [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) -Original Message- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, January 18, 2011 3:04 PM To: java-u...@lucene.apache.org; solr-user@lucene.apache.org Subject: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?
backup command
Hi, I'm running the official Solr 1.4 release and encountering an exception and told that a file does not exist when using the java replication command=backup. It looks very much like SOLR-1475 which was fixed for 1.4. I tried adding a deletionPolicy within solrconfig.xml to keep commit points for 30 minutes, but still receive the error. Our index is about 25G. On occasion I have seen the backup finish, but unfortunately it fails more often. Does anyone have any pointers? Thanks for your help, Jake
StreamingUpdateSolrServer seems to hang on indexing big batches
Hi, I swapped our indexing process over to the streaming update server, but now I'm seeing places where our indexing code adds several documents, but eventually hangs. It hangs just before the completion message, which comes directly after sending to solr. I found this issue in jira https://issues.apache.org/jira/browse/SOLR-1711 which may be what I'm seeing. If this is indeed what we're running up against is there any best practice to work around it? Thanks, Jake
RE: Corrupted Index
Yes, that would be helpful to include, sorry, the official 1.4. -Original Message- From: Ryan McKinley [mailto:ryan...@gmail.com] Sent: Thursday, January 07, 2010 2:15 PM To: solr-user@lucene.apache.org Subject: Re: Corrupted Index what version of solr are you running? On Jan 7, 2010, at 3:08 PM, Jake Brownell wrote: > Hi all, > > Our application uses solrj to communicate with our solr servers. We > started a fresh index yesterday after upping the maxFieldLength > setting in solrconfig. Our task indexes content in batches and all > appeared to be well until noonish today, when after 40k docs, I > started seeing errors. I've placed three stack traces below, the > first occurred once and was the initial error, the second occurred a > few times before the third started occurring on each request. I'd > really appreciate any insight into what could have caused this, a > missing file and then a corrupt index. If you know we'll have to > nuke the entire index and start over I'd like to know that too-oddly > enough searches against the index appear to be working. > > Thanks! > Jake > > #1 > > January 7, 2010 12:10:06 PM CST Caught error; TaskWrapper block 1 > January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/ > _fsk_1uj.del (No such file or directory) > > solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) > > request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No > such file or directory) > > solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) > > request: /core0/update > January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/ > _fsk_1uj.del (No such file or directory) > > solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) > > request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No > such file or directory) > > solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) > > request: /core0/update > org.benetech.exception.WrappedException > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424) > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243) > > org > .apache.solr.client.solrj.request.AbstractUpdateRequest#process(105) >org.apache.solr.client.solrj.SolrServer#commit(86) >org.apache.solr.client.solrj.SolrServer#commit(75) >org.bookshare.search.solr.SolrSearchServerWrapper#add(63) >org.bookshare.search.solr.SolrSearchEngine#index(232) > > org > .bookshare > .service.task.SearchEngineIndexingTask#initialInstanceLoad(95) >org.bookshare.service.task.SearchEngineIndexingTask#run(53) >org.bookshare.service.scheduler.TaskWrapper#run(233) >java.util.TimerThread#mainLoop(512) >java.util.TimerThread#run(462) > Caused by: > solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) > > solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) > > request: /core0/update > org.apache.solr.common.SolrException > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424) > > org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243) > > org > .apache.solr.client.solrj.request.AbstractUpdateRequest#process(105) >org.apache.solr.client.solrj.SolrServer#commit(86) >org.apache.solr.client.solrj.SolrServer#commit(75) >org.bookshare.search.solr.SolrSearchServerWrapper#add(63) >org.bookshare.search.solr.SolrSearchEngine#index(232) > > org > .bookshare > .service.task.SearchEngineIndexingTask#initialInstanceLoad(95) >org.bookshare.service.task.SearchEngineIndexingTask#run(53) >org.bookshare.service.scheduler.TaskWrapper#run(233) >java.util.TimerThread#mainLoop(512) >java.util.TimerThread#run(462) > > #2 > > January 7, 2010 12:10:10 PM CST Caught error; TaskWrapper block 1 > January 7, 2010 12:10:10 PM CST > org.apache.lucene.index.CorruptIndexException: doc counts differ for > segment _hug: fieldsReader shows 8 but segmentInfo shows 2 > > org.apache.lucene.index.CorruptIndexException: doc counts differ for > segment _hug: fieldsReader shows 8 but segmentInfo shows 2 > > request: /core0/update > org.apache.lucene.index.CorruptIndexException: doc counts differ for > segment _hug: fieldsReader shows 8 but segmentInfo shows 2 > > org.apache.lucene.index.CorruptIndexException: doc counts differ for > segment _hug: fieldsReader shows 8 but segmentInfo shows 2 > > request: /core0/upda
Corrupted Index
Hi all, Our application uses solrj to communicate with our solr servers. We started a fresh index yesterday after upping the maxFieldLength setting in solrconfig. Our task indexes content in batches and all appeared to be well until noonish today, when after 40k docs, I started seeing errors. I've placed three stack traces below, the first occurred once and was the initial error, the second occurred a few times before the third started occurring on each request. I'd really appreciate any insight into what could have caused this, a missing file and then a corrupt index. If you know we'll have to nuke the entire index and start over I'd like to know that too-oddly enough searches against the index appear to be working. Thanks! Jake #1 January 7, 2010 12:10:06 PM CST Caught error; TaskWrapper block 1 January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) request: /core0/update January 7, 2010 12:10:07 PM CST solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) request: /core0/update solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) request: /core0/update org.benetech.exception.WrappedException org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243) org.apache.solr.client.solrj.request.AbstractUpdateRequest#process(105) org.apache.solr.client.solrj.SolrServer#commit(86) org.apache.solr.client.solrj.SolrServer#commit(75) org.bookshare.search.solr.SolrSearchServerWrapper#add(63) org.bookshare.search.solr.SolrSearchEngine#index(232) org.bookshare.service.task.SearchEngineIndexingTask#initialInstanceLoad(95) org.bookshare.service.task.SearchEngineIndexingTask#run(53) org.bookshare.service.scheduler.TaskWrapper#run(233) java.util.TimerThread#mainLoop(512) java.util.TimerThread#run(462) Caused by: solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) solr-home/core0/data/index/_fsk_1uj.del (No such file or directory) request: /core0/update org.apache.solr.common.SolrException org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243) org.apache.solr.client.solrj.request.AbstractUpdateRequest#process(105) org.apache.solr.client.solrj.SolrServer#commit(86) org.apache.solr.client.solrj.SolrServer#commit(75) org.bookshare.search.solr.SolrSearchServerWrapper#add(63) org.bookshare.search.solr.SolrSearchEngine#index(232) org.bookshare.service.task.SearchEngineIndexingTask#initialInstanceLoad(95) org.bookshare.service.task.SearchEngineIndexingTask#run(53) org.bookshare.service.scheduler.TaskWrapper#run(233) java.util.TimerThread#mainLoop(512) java.util.TimerThread#run(462) #2 January 7, 2010 12:10:10 PM CST Caught error; TaskWrapper block 1 January 7, 2010 12:10:10 PM CST org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 request: /core0/update org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 request: /core0/update January 7, 2010 12:10:10 PM CST org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 request: /core0/update org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 org.apache.lucene.index.CorruptIndexException: doc counts differ for segment _hug: fieldsReader shows 8 but segmentInfo shows 2 request: /core0/update org.benetech.exception.WrappedException org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(424) org.apache.solr.client.solrj.impl.CommonsHttpSolrServer#request(243) org.apache.solr.client.solrj.request.AbstractUpdateRequest#process(105) org.apache.
RE: Is there a way to skip cache for a query
See https://issues.apache.org/jira/browse/SOLR-1363 -- it's currently scheduled for 1.5. Jake -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Sunday, November 15, 2009 11:17 PM To: solr-user@lucene.apache.org Subject: Re: Is there a way to skip cache for a query I don't think that is supported today. It might be useful, though (e.g. something I'd use with an external monitoring service, so that it doesn't always get fast results from the cache). Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message > From: Bertie Shen > To: solr-user@lucene.apache.org > Sent: Sat, November 14, 2009 9:43:25 PM > Subject: Is there a way to skip cache for a query > > Hey, > > I do not want to disable cache completely by changing the setting in > solrconfig.xml. I just want to sometimes skip cache for a query for testing > purpose. So is there a parameter like skipcache=true to specify in > select/?q=hot&version=2.2&start=0&rows=10&skipcache=true to skip cache for > the query [hot]. skipcache can by default be false. > > Thanks.
NPE when trying to view a specific document via Luke
Hi, I'm seeing this stack trace when I try to view a specific document, e.g. /admin/luke?id=1 but luke appears to be working correctly when I just view /admin/luke. Does this look familiar to anyone? Our sysadmin just upgraded us to the 1.4 release, I'm not sure if this occurred before that. Thanks, Jake 1. java.lang.NullPointerException 2. at org.apache.lucene.index.TermBuffer.set(TermBuffer.java:95) 3. at org.apache.lucene.index.SegmentTermEnum.scanTo(SegmentTermEnum.java:158) 4. at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:232) 5. at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:179) 6. at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:975) 7. at org.apache.lucene.index.DirectoryReader.docFreq(DirectoryReader.java:627) 8. at org.apache.solr.search.SolrIndexReader.docFreq(SolrIndexReader.java:308) 9. at org.apache.solr.handler.admin.LukeRequestHandler.getDocumentFieldsInfo(LukeRequestHandler.java:248) 10.at org.apache.solr.handler.admin.LukeRequestHandler.handleRequestBody(LukeRequestHandler.java:124) 11.at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) 12.at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) 13.at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) 14.at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) 15.at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76) 16.at com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158) 17.at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178) 18.at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241) 19.at com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435) 20.at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586) 21.at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:690) 22.at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:612) 23.at java.lang.Thread.run(Thread.java:619) 24. 25. Date: Fri, 13 Nov 2009 02:19:54 GMT 26. Server: Apache/2.2.3 (Red Hat) 27. Cache-Control: no-cache, no-store 28. Pragma: no-cache 29. Expires: Sat, 01 Jan 2000 01:00:00 GMT 30. Content-Type: text/html; charset=UTF-8 31. Vary: Accept-Encoding,User-Agent 32. Content-Encoding: gzip 33. Content-Length: 1066 34. Connection: close 35.
Field settings for best highlighting performance
Hi, I've seen the use case for highlighting on: http://wiki.apache.org/solr/FieldOptionsByUseCase I just wanted to confirm that for best performance Indexed=true Stored=true termVectors=true termPositions=true is the way to go for highlighting for Solr 1.4. Note that I'm not doing anything else with this field, it's just for highlighting. Congratulations on the release, I'm particularly excited because it was soon enough to be included in our launch of full text search integration. Thanks, Jake
RE: Highlighting performance between 1.3 and 1.4rc
Thanks Mark, that did bring the time back down. I'll have to investigate a little more, and weigh the pros of each to determine which best suits are needs. Jake -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Tuesday, November 03, 2009 11:23 PM To: solr-user@lucene.apache.org Cc: solr-user@lucene.apache.org Subject: Re: Highlighting performance between 1.3 and 1.4rc The 1.4 highlighter is Now slower if you have multi term queries or phrase queries. You can get the old behavior (which is faster) if you pass usePhraseHighlighter=false - but you will not get correct phrase highlighting and multi term queries won't highlight - eg prefix/ wildcard/range. - Mark http://www.lucidimagination.com (mobile) On Nov 3, 2009, at 8:18 PM, Jake Brownell wrote: > Hi, > > The fix MarkM provided yesterday for the problem I reported > encountering with the highlighter appears to be working--I installed > the Lucene 2.9.1 rc4 artifacts. > > Now I'm running into an oddity regarding performance. Our > integration test is running slower than it used to. I've placed some > average timings below. I'll try to describe what the test does in > the hopes that someone will have some insight. > > The indexing time represents the time it takes to load and index/ > commit ~43 books. The test then does two sets of searches. > > A basic search is a dismax search across several fields including > the text of the book. It searches either the exact title (in quotes) > or the ISBN. Highlighting is enabled on the field that holds the > text of the book. > > An advanced search uses a nested dismax (inside a normal Lucene), to > search for either the exact title (in quotes) or the ISBN. The main > difference is that the title is only matched against fields related > to titles, not authors, text of the book, etc. Highlighting is > enabled against the text of the book. > > The indexing time remained fairly constant. I ran with and without > highlighting enabled, to see how much it was contributing. I am most > interested in the jumps in time between 1.3 and 1.4 for the > highlighting time. > > with highlighting enabled > solr 1.3 > Indexing: 40161ms > Basic: 12407ms > Advanced: 1106ms > > > solr 1.4 rc > Indexing: 41734ms > Basic: 26346ms > Advanced: 17067ms > > > without any highlighting > solr 1.3 > Indexing: 41186ms > Basic: 1024ms > Advanced: 265ms > > solr 1.4 rc > Indexing: 40981ms > Basic: 883ms > Advanced: 356ms > > FWIW, the integration test uses an embedded solr server. > > I supposed I should also ask if there are any general tips to speed > up highlighting? > > Thanks, > Jake
Highlighting performance between 1.3 and 1.4rc
Hi, The fix MarkM provided yesterday for the problem I reported encountering with the highlighter appears to be working--I installed the Lucene 2.9.1 rc4 artifacts. Now I'm running into an oddity regarding performance. Our integration test is running slower than it used to. I've placed some average timings below. I'll try to describe what the test does in the hopes that someone will have some insight. The indexing time represents the time it takes to load and index/commit ~43 books. The test then does two sets of searches. A basic search is a dismax search across several fields including the text of the book. It searches either the exact title (in quotes) or the ISBN. Highlighting is enabled on the field that holds the text of the book. An advanced search uses a nested dismax (inside a normal Lucene), to search for either the exact title (in quotes) or the ISBN. The main difference is that the title is only matched against fields related to titles, not authors, text of the book, etc. Highlighting is enabled against the text of the book. The indexing time remained fairly constant. I ran with and without highlighting enabled, to see how much it was contributing. I am most interested in the jumps in time between 1.3 and 1.4 for the highlighting time. with highlighting enabled solr 1.3 Indexing: 40161ms Basic: 12407ms Advanced: 1106ms solr 1.4 rc Indexing: 41734ms Basic: 26346ms Advanced: 17067ms without any highlighting solr 1.3 Indexing: 41186ms Basic: 1024ms Advanced: 265ms solr 1.4 rc Indexing: 40981ms Basic: 883ms Advanced: 356ms FWIW, the integration test uses an embedded solr server. I supposed I should also ask if there are any general tips to speed up highlighting? Thanks, Jake
highlighting error using 1.4rc
Hi, I've tried installing the latest (3rd) RC for Solr 1.4 and Lucene 2.9.1. One of our integration tests, which runs against and embedded server appears to be failing on highlighting. I've included the stack trace and the configuration from solrconf. I'd appreciate any insights. Please let me know what additional information would be useful. Caused by: org.apache.solr.client.solrj.SolrServerException: org.apache.solr.client.solrj.SolrServerException: java.lang.ClassCastException: org.apache.lucene.search.spans.SpanOrQuery cannot be cast to org.apache.lucene.search.spans.SpanNearQuery at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:153) at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118) at org.bookshare.search.solr.SolrSearchServerWrapper.query(SolrSearchServerWrapper.java:96) ... 29 more Caused by: org.apache.solr.client.solrj.SolrServerException: java.lang.ClassCastException: org.apache.lucene.search.spans.SpanOrQuery cannot be cast to org.apache.lucene.search.spans.SpanNearQuery at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:141) ... 32 more Caused by: java.lang.ClassCastException: org.apache.lucene.search.spans.SpanOrQuery cannot be cast to org.apache.lucene.search.spans.SpanNearQuery at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.collectSpanQueryFields(WeightedSpanTermExtractor.java:489) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.collectSpanQueryFields(WeightedSpanTermExtractor.java:484) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:249) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:230) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:158) at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414) at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216) at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139) ... 32 more I see in our solrconf the following for highlighting. 100 70 0.5 [-\w ,/\n\"']{20,200} Thanks, Jake