IndexSchema object
How can we get instance of IndexSchema object in Tokenizer subclass?
Query performance
Hi all, Does the following query has any performance impact over the second query? +title:lucene +(title:lucene -name:sid) +(title:lucene -name:sid)
Does semi-colon still works as special character for sorting?
I read somewhere that it is deprecated
Optimize
Hi all, I am not sure how to call optimize on the existing index. I tried with following URL http://localhost:9090/solr/update?optimize=true With this request, the response took a long time, and the index folder size doubled. Then again I queried the same URL and index size reduced to actual size and I got the response immediately. Is this expected behavior? Is there any other way to call optimize? Thanks, Siddharth
RE: Autocommit blocking adds? AutoCommit Speedup?
Hi all, I am also facing the same issue where autocommit blocks all other requests. I having around 1,00,000 documents with average size of 100K each. It took more than 20 hours to index. I have currently set autocommit maxtime to 7 seconds, mergeFactor to 25. Do I need more configuration changes? Also I see that memory usage goes to peak level of heap specified(6 GB in my case). Looks like Solr spends most of the time in GC. According to my understanding, fix for Solr-1155 would be that commit will run in background and new documents will be queued in the memory. But I am afraid of the memory consumption by this queue if commit takes much longer to complete. Thanks, Siddharth -Original Message- From: jayson.minard [mailto:jayson.min...@gmail.com] Sent: Saturday, May 09, 2009 10:45 AM To: solr-user@lucene.apache.org Subject: Re: Autocommit blocking adds? AutoCommit Speedup? First cut of updated handler now in: https://issues.apache.org/jira/browse/SOLR-1155 Needs review from those that know Lucene better, and double check for errors in locking or other areas of the code. Thanks. --j jayson.minard wrote: Can we move this to patch files within the JIRA issue please. Will make it easier to review and help out a as a patch to current trunk. --j Jim Murphy wrote: Yonik Seeley-2 wrote: ...your code snippit elided and edited below ... Don't take this code as correct (or even compiling) but is this the essence? I moved shared access to the writer inside the read lock and kept the other non-commit bits to the write lock. I'd need to rethink the locking in a more fundamental way but is this close to idea? public void commit(CommitUpdateCommand cmd) throws IOException { if (cmd.optimize) { optimizeCommands.incrementAndGet(); } else { commitCommands.incrementAndGet(); } Future[] waitSearcher = null; if (cmd.waitSearcher) { waitSearcher = new Future[1]; } boolean error=true; iwCommit.lock(); try { log.info(start +cmd); if (cmd.optimize) { closeSearcher(); openWriter(); writer.optimize(cmd.maxOptimizeSegments); } finally { iwCommit.unlock(); } iwAccess.lock(); try { writer.commit(); } finally { iwAccess.unlock(); } iwCommit.lock(); try { callPostCommitCallbacks(); if (cmd.optimize) { callPostOptimizeCallbacks(); } // open a new searcher in the sync block to avoid opening it // after a deleteByQuery changed the index, or in between deletes // and adds of another commit being done. core.getSearcher(true,false,waitSearcher); // reset commit tracking tracker.didCommit(); log.info(end_commit_flush); error=false; } finally { iwCommit.unlock(); addCommands.set(0); deleteByIdCommands.set(0); deleteByQueryCommands.set(0); numErrors.set(error ? 1 : 0); } // if we are supposed to wait for the searcher to be registered, then we should do it // outside of the synchronized block so that other update operations can proceed. if (waitSearcher!=null waitSearcher[0] != null) { try { waitSearcher[0].get(); } catch (InterruptedException e) { SolrException.log(log,e); } catch (ExecutionException e) { SolrException.log(log,e); } } } -- View this message in context: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--tp2 3435224p23457422.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: OutofMemory on Highlightling
Is it possible to read only maxAnalyzedChars from the stored field instead of reading the complete field in the memory? For instance, in my case, is it possible to read only first 50K characters instead of complete 1 MB stored text? That will help minimizing the memory usage (Though, it will still take 50K * 500 * 2 = 50 MB for 500 results). I would really appreciate some feedback on this issue... Thanks, Siddharth -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Friday, April 24, 2009 10:46 AM To: solr-user@lucene.apache.org Subject: RE: OutofMemory on Highlightling I am not sure whether lazy loading should help solve this problem. I have set enableLazyFieldLoading to true but it is not helping. I went through the code and observed that DefaultSolrHighlighter.doHighlighting is reading all the documents and the fields for highlighting (In my case, 1 MB stored field is read for all documents). Also I am confused over the following code in SolrIndexSearcher.doc() method if(!enableLazyFieldLoading || fields == null) { d = searcher.getIndexReader().document(i); } else { d = searcher.getIndexReader().document(i, new SetNonLazyFieldSelector(fields)); } Are we setting the fields as NonLazy even if lazy loading is enabled? Thanks, Siddharth -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Wednesday, April 22, 2009 11:12 AM To: solr-user@lucene.apache.org Subject: RE: OutofMemory on Highlightling Here is the stack trace SEVERE: java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133) at java.lang.StringCoding.decode(StringCoding.java:173) at java.lang.String.init(String.java:444) at org.apache.lucene.store.IndexInput.readString(IndexInput.java:125) at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:390) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:230) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:892) at org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.j ava:277) at org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:176 ) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:457) at org.apache.solr.search.SolrIndexSearcher.readDocs(SolrIndexSearcher.java :482) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultS olrHighlighter.java:253) at org.apache.solr.handler.component.HighlightComponent.process(HighlightCo mponent.java:84) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 86) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 5) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Wednesday, April 22, 2009 9:29 AM To: solr-user@lucene.apache.org Subject: RE: OutofMemory on Highlightling I tried disabling the documentCache but still the same issue. documentCache class=solr.LRUCache size=0 initialSize=0 autowarmCount=0/ -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Monday, April 20, 2009 4:38 PM To: solr-user@lucene.apache.org Subject: Re: OutofMemory on Highlightling Gargate, Siddharth wrote: Anybody facing the same issue? Following is my configuration ... field name=content type=text indexed=true stored=false multiValued=true/ field name=teaser type=text indexed=false stored=true
RE: OutofMemory on Highlightling
I am not sure whether lazy loading should help solve this problem. I have set enableLazyFieldLoading to true but it is not helping. I went through the code and observed that DefaultSolrHighlighter.doHighlighting is reading all the documents and the fields for highlighting (In my case, 1 MB stored field is read for all documents). Also I am confused over the following code in SolrIndexSearcher.doc() method if(!enableLazyFieldLoading || fields == null) { d = searcher.getIndexReader().document(i); } else { d = searcher.getIndexReader().document(i, new SetNonLazyFieldSelector(fields)); } Are we setting the fields as NonLazy even if lazy loading is enabled? Thanks, Siddharth -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Wednesday, April 22, 2009 11:12 AM To: solr-user@lucene.apache.org Subject: RE: OutofMemory on Highlightling Here is the stack trace SEVERE: java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133) at java.lang.StringCoding.decode(StringCoding.java:173) at java.lang.String.init(String.java:444) at org.apache.lucene.store.IndexInput.readString(IndexInput.java:125) at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:390) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:230) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:892) at org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.j ava:277) at org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:176 ) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:457) at org.apache.solr.search.SolrIndexSearcher.readDocs(SolrIndexSearcher.java :482) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultS olrHighlighter.java:253) at org.apache.solr.handler.component.HighlightComponent.process(HighlightCo mponent.java:84) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 86) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 5) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Wednesday, April 22, 2009 9:29 AM To: solr-user@lucene.apache.org Subject: RE: OutofMemory on Highlightling I tried disabling the documentCache but still the same issue. documentCache class=solr.LRUCache size=0 initialSize=0 autowarmCount=0/ -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Monday, April 20, 2009 4:38 PM To: solr-user@lucene.apache.org Subject: Re: OutofMemory on Highlightling Gargate, Siddharth wrote: Anybody facing the same issue? Following is my configuration ... field name=content type=text indexed=true stored=false multiValued=true/ field name=teaser type=text indexed=false stored=true/ copyField source=content dest=teaser maxChars=100 / ... ... requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str int name=rows500/int str name=hltrue/str str name=flid,score/str str name=hl.flteaser/str str name=hl.alternateFieldteaser/str int name=hl.fragsize200/int int name=hl.maxAlternateFieldLength200/int int name=hl.maxAnalyzedChars500/int /lst /requestHandler ... Search works fine if I disable
RE: OutofMemory on Highlightling
I tried disabling the documentCache but still the same issue. documentCache class=solr.LRUCache size=0 initialSize=0 autowarmCount=0/ -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Monday, April 20, 2009 4:38 PM To: solr-user@lucene.apache.org Subject: Re: OutofMemory on Highlightling Gargate, Siddharth wrote: Anybody facing the same issue? Following is my configuration ... field name=content type=text indexed=true stored=false multiValued=true/ field name=teaser type=text indexed=false stored=true/ copyField source=content dest=teaser maxChars=100 / ... ... requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str int name=rows500/int str name=hltrue/str str name=flid,score/str str name=hl.flteaser/str str name=hl.alternateFieldteaser/str int name=hl.fragsize200/int int name=hl.maxAlternateFieldLength200/int int name=hl.maxAnalyzedChars500/int /lst /requestHandler ... Search works fine if I disable highlighting and it brings 500 results. But if I enable hightlighting and set the no. of rows to just 20 I get OOME. How about switching documentCache off? Koji
RE: OutofMemory on Highlightling
Here is the stack trace SEVERE: java.lang.OutOfMemoryError: Java heap space at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:133) at java.lang.StringCoding.decode(StringCoding.java:173) at java.lang.String.init(String.java:444) at org.apache.lucene.store.IndexInput.readString(IndexInput.java:125) at org.apache.lucene.index.FieldsReader.addField(FieldsReader.java:390) at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:230) at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:892) at org.apache.lucene.index.MultiSegmentReader.document(MultiSegmentReader.j ava:277) at org.apache.solr.search.SolrIndexReader.document(SolrIndexReader.java:176 ) at org.apache.solr.search.SolrIndexSearcher.doc(SolrIndexSearcher.java:457) at org.apache.solr.search.SolrIndexSearcher.readDocs(SolrIndexSearcher.java :482) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultS olrHighlighter.java:253) at org.apache.solr.handler.component.HighlightComponent.process(HighlightCo mponent.java:84) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(Search Handler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.ja va:303) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.j ava:232) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Applica tionFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilt erChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValv e.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValv e.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java :128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java :102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve. java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:2 86) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:84 5) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process( Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Wednesday, April 22, 2009 9:29 AM To: solr-user@lucene.apache.org Subject: RE: OutofMemory on Highlightling I tried disabling the documentCache but still the same issue. documentCache class=solr.LRUCache size=0 initialSize=0 autowarmCount=0/ -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Monday, April 20, 2009 4:38 PM To: solr-user@lucene.apache.org Subject: Re: OutofMemory on Highlightling Gargate, Siddharth wrote: Anybody facing the same issue? Following is my configuration ... field name=content type=text indexed=true stored=false multiValued=true/ field name=teaser type=text indexed=false stored=true/ copyField source=content dest=teaser maxChars=100 / ... ... requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str int name=rows500/int str name=hltrue/str str name=flid,score/str str name=hl.flteaser/str str name=hl.alternateFieldteaser/str int name=hl.fragsize200/int int name=hl.maxAlternateFieldLength200/int int name=hl.maxAnalyzedChars500/int /lst /requestHandler ... Search works fine if I disable highlighting and it brings 500 results. But if I enable hightlighting and set the no. of rows to just 20 I get OOME. How about switching documentCache off? Koji
RE: OutofMemory on Highlightling
Anybody facing the same issue? Following is my configuration ... field name=content type=text indexed=true stored=false multiValued=true/ field name=teaser type=text indexed=false stored=true/ copyField source=content dest=teaser maxChars=100 / ... ... requestHandler name=standard class=solr.SearchHandler default=true lst name=defaults str name=echoParamsexplicit/str int name=rows500/int str name=hltrue/str str name=flid,score/str str name=hl.flteaser/str str name=hl.alternateFieldteaser/str int name=hl.fragsize200/int int name=hl.maxAlternateFieldLength200/int int name=hl.maxAnalyzedChars500/int /lst /requestHandler ... Search works fine if I disable highlighting and it brings 500 results. But if I enable hightlighting and set the no. of rows to just 20 I get OOME. -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Friday, April 17, 2009 11:32 AM To: solr-user@lucene.apache.org Subject: RE: OutofMemory on Highlightling I tried hl.maxAnalyzedChars=500 but still the same issue. I get OOM for row size 20 only. -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, April 16, 2009 9:56 PM To: solr-user@lucene.apache.org Subject: Re: OutofMemory on Highlightling Hi, Have you tried: http://wiki.apache.org/solr/HighlightingParameters#head-2ca22f63cb8d1b2b a3ff0cfc05e85b94898c59cf Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Gargate, Siddharth sgarg...@ptc.com To: solr-user@lucene.apache.org Sent: Thursday, April 16, 2009 6:33:46 AM Subject: OutofMemory on Highlightling Hi, I am analyzing the memory usage for my Solr setup. I am testing with 500 text documents of 2 MB each. I have defined a field for displaying the teasers and storing 1 MB of text in it. I am testing with just 128 MB maxHeap(I know I should be increasing it but just testing the worst case scenario). If I search for all 500 documents with row size as 500 and highlighting disabled, it works fine. But if I enable highlighting I get OutofMemoryError. Looks like stored field for all the matched results are read into the memory. How to avoid this memory consumption? Thanks, Siddharth
RE: OutofMemory on Highlightling
I tried hl.maxAnalyzedChars=500 but still the same issue. I get OOM for row size 20 only. -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, April 16, 2009 9:56 PM To: solr-user@lucene.apache.org Subject: Re: OutofMemory on Highlightling Hi, Have you tried: http://wiki.apache.org/solr/HighlightingParameters#head-2ca22f63cb8d1b2b a3ff0cfc05e85b94898c59cf Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Gargate, Siddharth sgarg...@ptc.com To: solr-user@lucene.apache.org Sent: Thursday, April 16, 2009 6:33:46 AM Subject: OutofMemory on Highlightling Hi, I am analyzing the memory usage for my Solr setup. I am testing with 500 text documents of 2 MB each. I have defined a field for displaying the teasers and storing 1 MB of text in it. I am testing with just 128 MB maxHeap(I know I should be increasing it but just testing the worst case scenario). If I search for all 500 documents with row size as 500 and highlighting disabled, it works fine. But if I enable highlighting I get OutofMemoryError. Looks like stored field for all the matched results are read into the memory. How to avoid this memory consumption? Thanks, Siddharth
OutofMemory on Highlightling
Hi, I am analyzing the memory usage for my Solr setup. I am testing with 500 text documents of 2 MB each. I have defined a field for displaying the teasers and storing 1 MB of text in it. I am testing with just 128 MB maxHeap(I know I should be increasing it but just testing the worst case scenario). If I search for all 500 documents with row size as 500 and highlighting disabled, it works fine. But if I enable highlighting I get OutofMemoryError. Looks like stored field for all the matched results are read into the memory. How to avoid this memory consumption? Thanks, Siddharth
Memory usage
Hi all, I am testing indexing with 2000 text documents of size 2 MB each. These documents contain words created with random characters. I observed that the tomcat memory usage goes on increasing slowly. I tried by removing all the cache configuration, but still memory usage increases. Once the memory reaches to max heap specified, commit looks like blocked until the memory is freed. With larger documents, I see some OOMEs Below are few properties set in solrconfig.xml mainIndex useCompoundFilefalse/useCompoundFile ramBufferSizeMB128/ramBufferSizeMB mergeFactor25/mergeFactor maxMergeDocs2147483647/maxMergeDocs maxFieldLength2147483647/maxFieldLength writeLockTimeout1000/writeLockTimeout commitLockTimeout1/commitLockTimeout lockTypesingle/lockType unlockOnStartupfalse/unlockOnStartup /mainIndex autoCommit maxDocs1/maxDocs maxTime7000/maxTime /autoCommit useColdSearcherfalse/useColdSearcher maxWarmingSearchers10/maxWarmingSearchers Where does the memory get used? And how to avoid it? Thanks, Siddharth
maxBufferedDocs
I see two entries of maxBufferedDocs property in solrconfig.xml. One in indexDefaults tag and other in mainIndex tag commented as Deprecated. So is this property required and gets used? What if remove the indexDefaults tag altogether? Thanks, Siddharth
Index time boost
Hi all, Can we specify the index-time boost value for a particular field in schema.xml? Thanks, Siddharth
RE: Special character indexing
Hi Shalin, Thanks for the suggestion. I tried following code, (not sure about the exact usage) CommonsHttpSolrServer ess = new CommonsHttpSolrServer(http://localhost:8983/solr;); ess.setRequestWriter(new BinaryRequestWriter()); SolrInputDocument solrdoc = new SolrInputDocument(); solrdoc.addField(id, Kimi); solrdoc.addField(name, 03 Kimi Räikkönen ); ess.add(solrdoc); But got following exception on the server WARNING: The @Deprecated SolrUpdateServlet does not accept query parameters: wt=javabin If you are using solrj, make sure to register a request handler to /update rather then use this servlet. Add: requestHandler name=/update class=solr.XmlUpdateRequestHandler to your solrconfig.xml Mar 20, 2009 3:14:48 PM org.apache.solr.common.SolrException log SEVERE: Error processing legacy update command:com.ctc.wstx.exc.WstxUnexpectedCharException: Illegal character ((CTRL- CHAR, code 1)) at [row,col {unknown-source}]: [1,1] at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:675) at com.ctc.wstx.sr.StreamScanner.throwInvalidSpace(StreamScanner.java:660) at com.ctc.wstx.sr.BasicStreamReader.readSpacePrimary(BasicStreamReader.java:4916) at com.ctc.wstx.sr.BasicStreamReader.nextFromProlog(BasicStreamReader.java:2003) at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1069) at org.apache.solr.handler.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:148) at org.apache.solr.handler.XmlUpdateRequestHandler.doLegacyUpdate(XmlUpdateRequestHandler.java:393) at org.apache.solr.servlet.SolrUpdateServlet.doPost(SolrUpdateServlet.java:78) at javax.servlet.http.HttpServlet.service(HttpServlet.java:727) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:295) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:723) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) Thanks in advance for help. Siddharth -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, March 20, 2009 10:35 AM To: solr-user@lucene.apache.org Subject: Re: Special character indexing On Fri, Mar 20, 2009 at 10:17 AM, Gargate, Siddharth sgarg...@ptc.comwrote: I tried with Jetty but the same issue. Just a guess, but looks like the fix for SOLR-973 might have introduced this issue. I'm not sure how SOLR-973 can cause this issue. Can you try using the BinaryRequestWriter and see if it succeeds? http://wiki.apache.org/solr/Solrj#head-ddc28af4033350481a3cbb27bc1d25bffd801af0 -- Regards, Shalin Shekhar Mangar.
FW: Special character indexing
Thanks Shalin, Adding BinaryUpdateRequestHandler solved the issue. Thank you very much. Just one query, shouldn't XmlUpdateRequestHandler also work for these characters? I saw another user mentioning the same issue and it was working with DirectXmlRequest. -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, March 20, 2009 3:58 PM To: solr-user@lucene.apache.org Subject: Re: Special character indexing On Fri, Mar 20, 2009 at 3:19 PM, Gargate, Siddharth sgarg...@ptc.comwrote: Hi Shalin, Thanks for the suggestion. I tried following code, (not sure about the exact usage) CommonsHttpSolrServer ess = new CommonsHttpSolrServer( http://localhost:8983/solr;); ess.setRequestWriter(new BinaryRequestWriter()); SolrInputDocument solrdoc = new SolrInputDocument(); solrdoc.addField(id, Kimi); solrdoc.addField(name, 03 Kimi Räikkönen ); ess.add(solrdoc); But got following exception on the server WARNING: The @Deprecated SolrUpdateServlet does not accept query parameters: wt=javabin If you are using solrj, make sure to register a request handler to /update rather then use this servlet. Add: requestHandler name=/update class=solr.XmlUpdateRequestHandler to your solrconfig.xml Yes, you need to add the following to your solrconfig.xml requestHandler name=/update/javabin class=solr.BinaryUpdateRequestHandler / -- Regards, Shalin Shekhar Mangar.
RE: Special character indexing
I tried with Jetty but the same issue. Just a guess, but looks like the fix for SOLR-973 might have introduced this issue. Thanks, Siddharth -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Friday, March 20, 2009 6:22 AM To: solr-user@lucene.apache.org Subject: Re: Special character indexing Gargate, Siddharth wrote: Hi all, I am trying to index words containing special characters like 'Räikkönen'. Using EmbeddedSolrServer indexing is working fine, but if I use CommonHttpSolrServer then it is indexing garbage values. I am using Solr 1.4 and set URLEcoding as UTF-8 in tomcat. Is this a known issue or am I doing something wrong? Thanks, Siddharth Can you use Jetty and index 'Räikkönen' via CommonsHttpSolrServer? If problem gone, something missing in the config of Tomcat... Koji
Special character indexing
Hi all, I am trying to index words containing special characters like 'Räikkönen'. Using EmbeddedSolrServer indexing is working fine, but if I use CommonHttpSolrServer then it is indexing garbage values. I am using Solr 1.4 and set URLEcoding as UTF-8 in tomcat. Is this a known issue or am I doing something wrong? Thanks, Siddharth
Phrase slop / Proximity search
Can I set the phrase slop value to standard request handler? I want it to be configurable in solrconfig.xml file. Thanks, Siddharth
RE: Distributed search
Hi, I am trying distributed search and multicore but not able to fire a query. I tried http://localhost:8080/solr/select/?shards=localhost:8080/solr/core0,localhost:8080/solr/core1q=solr I am getting following error: Missing solr core name in path. Should I use particular core to fire distributed search? -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, February 17, 2009 9:08 AM To: solr-user@lucene.apache.org Subject: Re: Distributed search Hi, That should work, yes, though it may not be a wise thing to do performance-wise, if the number of CPU cores that solr server has is lower than the number of Solr cores. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: revathy arun revas...@gmail.com To: solr-user@lucene.apache.org Sent: Monday, February 16, 2009 8:18:36 PM Subject: Distributed search Hi, Can we use multicore to have several indexes per webapp and use distributed search to merge the indexes? for exampe if we have 3 cores -core0 ,core1 and core2 for 3 different languages and to search across all the 3 indexes use the shard parameter as shard=localhost:8080/solr/core0,localhost:8080/solr/core1,localhost:8080/solr/core2 Regards Sujatha
multicore file path
I am trying out multicore environment with single schema and solrconfig file. Below is the folder structure Solr/ conf/ schema.xml solrconfig.xml core0/ data/ core1/ data/ tomcat/ The solrhome property is set in tomcat as -Dsolr.solr.home=../.. And the solr.xml file is solr persistent='true' cores adminPath='/admin/cores' core name='core0' instanceDir='core0/' config='../../conf/solrconfig.xml' schema='../../conf/schema.xml'/ core name='core1' instanceDir='core1/' config='../../conf/solrconfig.xml' schema='../../conf/schema.xml'/ /cores /solr Everything is working fine, but when I try to access schema file from admin UI I am getting following error http://localhost:8080/solr/core0/admin/file/?file=../../conf/schema.xml HTTP Status 403 - Invalid path: ../../conf/schema.xml description Access to the specified resource (Invalid path: ../../conf/schema.xml) has been forbidden.
Ignoring Whitespace
After parsing HTML documents, tika adds whitespaces (newlines and tabs) and this content gets stored as is in SOLR. If I fetch the teasers, the teaser contains these additonal whitespaces. How do I remove these whitespaces? At tika, solr or explicitly remove with my code? Thanks, Siddharth
RE: Outofmemory error for large files
Otis, I haven't tried it yet but what I meant is : If we divide the content in multiple parts, then words will be splitted in two different SOLR documents. If the main document contains 'Hello World' then these two words might get indexed in two different documents. Searching for 'Hello world' won't give me the required search result unless I use OR in the query. Thanks, Siddharth -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Tuesday, February 17, 2009 9:58 AM To: solr-user@lucene.apache.org Subject: Re: Outofmemory error for large files Siddharth, At the end of your email you said: One option I see is to break the file in chunks, but with this, I won't be able to search with multiple words if they are distributed in different documents. Unless I'm missing something unusual about your application, I don't think the above is technically correct. Have you tried doing this and have you then tried your searches? Everything should still work, even if you index one document at a time. Otis-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Gargate, Siddharth sgarg...@ptc.com To: solr-user@lucene.apache.org Sent: Monday, February 16, 2009 2:00:58 PM Subject: Outofmemory error for large files I am trying to index around 150 MB text file with 1024 MB max heap. But I get Outofmemory error in the SolrJ code. Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2882) at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.jav a:100) at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:572) at java.lang.StringBuffer.append(StringBuffer.java:320) at java.io.StringWriter.write(StringWriter.java:60) at org.apache.solr.common.util.XML.escape(XML.java:206) at org.apache.solr.common.util.XML.escapeCharData(XML.java:79) at org.apache.solr.common.util.XML.writeXML(XML.java:149) at org.apache.solr.client.solrj.util.ClientUtils.writeXML(ClientUtils.java: 115) at org.apache.solr.client.solrj.request.UpdateRequest.writeXML(UpdateReques t.java:200) at org.apache.solr.client.solrj.request.UpdateRequest.getXML(UpdateRequest. java:178) at org.apache.solr.client.solrj.request.UpdateRequest.getContentStreams(Upd ateRequest.java:173) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde dSolrServer.java:136) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest .java:243) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63) I modified the UpdateRequest class to initialize the StringWriter object in UpdateRequest.getXML with initial size, and cleared the SolrInputDocument that is having the reference of the file text. Then I am getting OOM as below: Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:2786) at java.lang.StringCoding.safeTrim(StringCoding.java:64) at java.lang.StringCoding.access$300(StringCoding.java:34) at java.lang.StringCoding$StringEncoder.encode(StringCoding.java:251) at java.lang.StringCoding.encode(StringCoding.java:272) at java.lang.String.getBytes(String.java:947) at org.apache.solr.common.util.ContentStreamBase$StringStream.getStream(Con tentStreamBase.java:142) at org.apache.solr.common.util.ContentStreamBase$StringStream.getReader(Con tentStreamBase.java:154) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:61) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte ntStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedde dSolrServer.java:139) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest .java:249) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:63) After I increase the heap size upto 1250 MB, I get OOM as Exception in thread main java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.init(String.java:216) at java.lang.StringBuffer.toString(StringBuffer.java:585) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:403) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:276) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Conte ntStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerB ase.java:131
Store limited text
Hi All, Is it possible to store only limited text in the field, say, max 1 mb? The field maxfieldlength limits only the number of tokens to be indexed, but stores complete content. Thanks, Siddharth
Maximum size of document indexed
Hi, I am trying to index a 25 MB word document. I am not able to search all the keywords. Looks like only certain number of initial words are getting indexed. Is there any limit to the size of document getting indexed? Or is there any word count limit per field? Thanks, Siddharth
CommonsHttpSolrServer in multithreaded env
Hi all, Is it safe to use a single instance of CommonsHttpSolrServer object in multithreaded environment? I am having multiple threads that are accessing single CommonsHttpSolrServer static object but sometimes the application gets blocked. Following is the stacktrace printed for all threads indexthread1 Id=47 prio=5 RUNNABLE (in native) Blocked (cnt): 5853; Waited (cnt): 30 at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked java.io.bufferedinputstr...@147d387 at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.jav a:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpCon nectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBa se.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase .java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java :1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMe thodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMetho dDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:3 97) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:3 23) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH ttpSolrServer.java:335) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsH ttpSolrServer.java:183) at org.apache.solr.client.solrj.request.UpdateRequest.process(UpdateRequest .java:217) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:85) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:74) at wt.index.SolrIndexDelegate.index(SolrIndexDelegate.java:84)
Ensuring documents indexed by autocommit
Hi all, I am using CommonsHttpSolrServer to add documents to Solr. Instead of explicitly calling commit for every document I have configured autocommit in solrconfig.xml. But how do we ensure that the document added is successfully indexed/committed on Solr side. Is there any callback mechanism available where the callback method my application will get called? I looked at the postCommit listener in solrconfig.xml file but looks like it just supports execution of external executables. Thanks in advance, Siddharth
RE: Ensuring documents indexed by autocommit
Thanks Shalin for the reply. I am working with the remote Solr server. I am using autocommit instead of commit method call because I observed significant performance improvement with autocommit. Just wanted to make sure that callback functionality is currently not available in Solr. Thanks, Siddharth -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, January 09, 2009 3:16 PM To: solr-user@lucene.apache.org Subject: Re: Ensuring documents indexed by autocommit On Fri, Jan 9, 2009 at 3:03 PM, Gargate, Siddharth sgarg...@ptc.com wrote: Hi all, I am using CommonsHttpSolrServer to add documents to Solr. Instead of explicitly calling commit for every document I have configured autocommit in solrconfig.xml. But how do we ensure that the document added is successfully indexed/committed on Solr side. Is there any callback mechanism available where the callback method my application will get called? I looked at the postCommit listener in solrconfig.xml file but looks like it just supports execution of external executables. Are you using embedded Solr? or is it on a remote machine? A callback would only work on the same JVM anyway. You can always call commit through CommonsHttpSolrServer and then do a query to check if the document you expect got indexed. Though, if all the add and commit calls were successful (i.e. returned HTTP 200), it is very unlikely that the document won't be indexed. -- Regards, Shalin Shekhar Mangar.
RE: Ensuring documents indexed by autocommit
How do we set the maxDocs or maxTime for commit from the application? Thanks, Siddharth -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, January 09, 2009 4:34 PM To: solr-user@lucene.apache.org Subject: Re: Ensuring documents indexed by autocommit On Fri, Jan 9, 2009 at 4:20 PM, Gargate, Siddharth sgarg...@ptc.com wrote: Thanks Shalin for the reply. I am working with the remote Solr server. I am using autocommit instead of commit method call because I observed significant performance improvement with autocommit. Just wanted to make sure that callback functionality is currently not available in Solr. You provide your own implementation of SolrEventListener to do a call back to your application in any way you need. I don't think using autoCommit gives a performance advantage over normal commits. Calling commit after each document is not a good idea since commit is an expensive operation. The only reason you are seeing better performance after autoCommit is because it is set to commit after 'X' number of documents or minutes. This is something you can do from your application as well. -- Regards, Shalin Shekhar Mangar.
RE: Ensuring documents indexed by autocommit
Sorry, for the previous question. What I meant was whether we can set the configuration from the code. But what you were suggesting is that I should call commit only after some time or after few number of documents, right? -Original Message- From: Gargate, Siddharth [mailto:sgarg...@ptc.com] Sent: Friday, January 09, 2009 4:43 PM To: solr-user@lucene.apache.org Subject: RE: Ensuring documents indexed by autocommit How do we set the maxDocs or maxTime for commit from the application? Thanks, Siddharth -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, January 09, 2009 4:34 PM To: solr-user@lucene.apache.org Subject: Re: Ensuring documents indexed by autocommit On Fri, Jan 9, 2009 at 4:20 PM, Gargate, Siddharth sgarg...@ptc.com wrote: Thanks Shalin for the reply. I am working with the remote Solr server. I am using autocommit instead of commit method call because I observed significant performance improvement with autocommit. Just wanted to make sure that callback functionality is currently not available in Solr. You provide your own implementation of SolrEventListener to do a call back to your application in any way you need. I don't think using autoCommit gives a performance advantage over normal commits. Calling commit after each document is not a good idea since commit is an expensive operation. The only reason you are seeing better performance after autoCommit is because it is set to commit after 'X' number of documents or minutes. This is something you can do from your application as well. -- Regards, Shalin Shekhar Mangar.
RE: Ensuring documents indexed by autocommit
Thanks again for your inputs. But then I am still stuck on the question that how do we ensure that document is successfully indexed. One option I see is search for every document sent to solr. Or do we assume that autocommit always indexes all the documents successfully? Thanks, Siddharth -Original Message- From: Shalin Shekhar Mangar [mailto:shalinman...@gmail.com] Sent: Friday, January 09, 2009 5:08 PM To: solr-user@lucene.apache.org Subject: Re: Ensuring documents indexed by autocommit On Fri, Jan 9, 2009 at 5:00 PM, Alexander Ramos Jardim alexander.ramos.jar...@gmail.com wrote: Shalin, Just to remember that with he is indexing more documents that he has memory avaiable, it is a good thing to have autocommit set. Yes, sorry, I had assumed that he has enough memory on the solr server. If not, then autoCommit may improve performance. Thanks for pointing this out Alexander. -- Regards, Shalin Shekhar Mangar.