to reduce indexing time
Before indexing , this was the memory layout, System Memory : 63.2% ,2.21 gb JVM Memory : 8.3% , 81.60mb of 981.38mb I have indexed 700 documents of total size 12MB. Following are the results i get : Qtime: 8122, System time : 00:00:12.7318648 System Memory : 65.4% ,2.29 gb JVM Memory : 15.3% , 148.32mb of 981.38mb After indexing 7,000 documents, Qtime: 51817, System time : 00:01:12.6028320 System Memory : 69.4% 2.43Gb JVM Memoery : *26.5%* , 266.60mb After indexing 70,000 documents of 1200mb size, this are the results : Qtime: 511447, System time : 00:11:14.0398768 System memory : 82.7% , 2.89Gb JVM memory :* 11.8%* , 118.46mb Here the JVM usage decreases as compared to 7000 doc, why is it so?? This is* solrconfig.xml *; updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dir${solr.document.log.dir:}/str /updateLog autoSoftCommit maxTime1000/maxTime /autoSoftCommit autoCommit maxTime60/maxTime openSearchertrue/openSearcher /autoCommit /updateHandler I am indexing through solrnet, indexing each document, var res = solr.Add(doc). // Doc doc = new Doc(); How do i reduce the time for indexing, as the size of data indexed is quite less?? Will batch indexing reduce the indexing time?? But then, do i need to make changes in solrconfig.xml Also, i want the documents to be searched in 1sec of indexing. Is it true that, if softcommit is done, then faceting cannot be done on the data?? -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
Now i have batch indexed, with batch of 250 documents.These were the results. After 7,000 documents, Qtime: 46894, System time : 00:00:55.9384892 JVM memory : 249.02mb, 24.8% This shows quite a reduction in timing. After 70,000 documents, Qtime: 480435, System time : 00:09:29.5206727 System memory : 82.8%, 2.90gb JVM memory : 82% , 818.06mb //Here, the memory usage has increased, though the timing has reduced. After disabling softcommit and tlog, for 70,000 contracts. Qtime: 461331, System time : 00:09:09.7930326 JVM Memory : 62.4% , 623.42mb. //Memory usage is less. What causes this memory usage to change, if the data to be indexed is same? -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121441.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
I will surely read about JVM Garbage collection. Thanks a lot, all of you. But, is the time required for my indexing good enough? I dont know about the ideal timings. I think that my indexing is taking more time. -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html Sent from the Solr - User mailing list archive at Nabble.com.
Solrcloud: no registered leader found and new searcher error
I have configured solrcloud as follows, http://lucene.472066.n3.nabble.com/file/n4117724/Untitled.png Solr.xml: solr persistent=true sharedLib=lib cores adminPath=/admin/cores zkClientTimeout=${zkClientTimeout:15000} hostPort=${jetty.port:} hostContext=solr core loadOnStartup=true instanceDir=document\ transient=false name=document/ core loadOnStartup=true instanceDir=contract\ transient=false name=contract/ /cores /solr I have added all the required config for solrcloud, referred this : http://wiki.apache.org/solr/SolrCloud#Required_Config I am adding data to core:document. Now when i try to index using solrnet, (solr.Add(doc)) , i get this error : SEVERE: org.apache.solr.common.SolrException: *No registered leader was found, collection:document* slice:shard2 at org.apache.solr.common.cloud.ZkStateReader.getLeaderRetry(ZkStateReader.java:481) and this error also: SEVERE: null:java.lang.RuntimeException: *SolrCoreState already closed* at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:84) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:520) I guess, it is because the leader is from core:contract and i am trying to index in core:document? Is there a way to change the leader, and how ? How can i change the state of shards from gone to active? Also when i try to query : q=*:* , this is shown org.apache.solr.common.SolrException: *Error opening new searcher at* org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at I read that, if number of commits exceed then this searcher error comes, but i did not issue commit command,then how will the commit exceed. Also it requires some warming setting, so i added this to solrconfig.xml, but still i get the same error, query listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=qsolr/str str name=start0/str str name=rows10/str /lst lst str name=qrocks/str str name=start0/str str name=rows10/str /lst /arr /listener maxWarmingSearchers2/maxWarmingSearchers /query I have just started with solrcloud, please tell if I am doing anything wrong in solrcloud configurations. Also i did not good material for solrcloud in windows 7 with apache tomcat , please suggest for that too. Thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrcloud: no registered leader found and new searcher error
How do i get them running? -- View this message in context: http://lucene.472066.n3.nabble.com/Solrcloud-no-registered-leader-found-and-new-searcher-error-tp4117724p4117830.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using extract handler: data not extracted
ya right all 3 points are right. Let me solve the 1 first, there is some errror in tika level indexing, for that i need to debug at tika level right?? but how to do that?? Solr admin does not show package wise logging. -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110922.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using extract handler: data not extracted
through command line(java -jar tika-app-1.4.jar -v C:Cloud.docx) apache tika is able to parse .docx files, so can i use this tika-app-1.4.jar in solr?? how to do that?? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110938.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using extract handler: data not extracted
Sorry for the mistake. im using solr 4.2, it has tika-1.3. So now, java -jar tika-app-1.3.jar -v C:Coding.pdf , parses pdf document without error or msg. Also, java -jar tika-app-1.4.jar* -t *C:Cloud.docx, shows the entire document. Which means there is no problem in tika right?? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110951.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using extract handler: data not extracted
Sorry for the mistake. im using solr 4.2, it has tika-1.3. So now, java -jar tika-app-1.3.jar -v C:\Coding.pdf , parses pdf document without error or msg. Also, java -jar tika-app-1.3.jar -t C:\Coding.pdf, shows the entire document. Which means there is no problem in tika right?? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110954.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using extract handler: data not extracted
Sorry for the mistake. im using solr 4.2, it has tika-1.3. So now, java -jar tika-app-1.3.jar -v C:\Coding.pdf , parses pdf document without error or msg. Also, java -jar tika-app-1.3.jar -t C:\Coding.pdf, shows the entire document. Which means there is no problem in tika right?? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110957.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using extract handler: data not extracted
I am working on Windows 7 -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110993.html Sent from the Solr - User mailing list archive at Nabble.com.
using extract handler: data not extracted
I need to index rich text documents, this is* solrconfig.xml for extract handler*: requestHandler name=/update/extract class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=lowernamestrue/str str name=uprefixignored_/str str name=captureAttrtrue/str /lst /requestHandler My *schema.xml* is: field name=doc_id type=uuid indexed=true stored=true default=NEW multiValued=false/ field name=id type=long indexed=true stored=true required=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=author type=title_text indexed=true stored=true multiValued=true/ field name=title type=title_text indexed=true stored=true/ field name=date_modified type=date indexed=true stored=true multivalued=true/ field name=_version_ type=long indexed=true stored=true multiValued=false/ dynamicField name=ignored_* type=text indexed=true stored=true multiValued=true/ But after *indexing using this curl*: curl http://localhost:8080/solr/document/update/extract?literal.id=12commit=true; -Fmyfile=Coding.pdf when queried as q=id:12, the *output* is : arr name=ignored_stream_source_info strmyfile/str /arr arr name=ignored_stream_content_type strapplication/octet-stream/str /arr arr name=ignored_stream_size str3336935/str /arr arr name=ignored_stream_name strCoding.pdf/str /arr arr name=ignored_content_type strapplication/pdf/str /arr str name=contents/str *Contents not shown* long name=_version_1456831756526157824/long str name=doc_id8eb229e0-5f25-4d26-bba4-6cb67aab7f81/str /doc Why is it so?? Also date_modified field does not appear?? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using extract handler: data not extracted
Sorry, that my question was not clear. Initially when indexed pdf files it showed the data within this pdf in the contents field.as follows:(this is output for initially indexed documents) str name=contents Cloud ctured As tale in size as well as complexity. We need a cloud based system that will solve this problem. Provide interfaces to registeP CSS Client Measurements Benchmarkinse times by varying Number of documents fromnds to millions Nuervers from 1 to 5 Storage and search options as discussed abo /str But for newly indexed documents, the contents field is empty, Actually coding.pdf is of 3mb size, but as shown in the output the contents of this pdf are not extracted, indexing extracts the metadata,but not the contents of the file, the contents field is empty, str name=contents/str what is the reason for this? Is is because of some jar missing? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110873.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using extract handler: data not extracted
I set the level of extract handler to finest, now the logs are : INFO: [document] webapp=/solr path=/update/extract params={commit=trueliteral.id=12debug=true} {add=[12 (1456944038966984704)],commit=} 0 2631 Jan 11, 2014 7:51:57 PM org.apache.solr.servlet.SolrDispatchFilter handleAdminRequest INFO: [admin] webapp=null path=/admin/cores params={indexInfo=falsewt=json} status=0 QTime=0 Jan 11, 2014 7:51:57 PM org.apache.solr.core.SolrCore execute INFO: [contract] webapp=/solr path=/admin/system params={wt=json} status=0 QTime=1 Jan 11, 2014 7:51:58 PM org.apache.solr.core.SolrCore execute INFO: [document] webapp=/solr path=/admin/mbeans params={stats=truewt=json} status=0 QTime=3 This shows no error. Also in the curl query i have set debug=true. What is the reason? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110877.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using extract handler: data not extracted
how set finest for tika package?? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110888.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using extract handler: data not extracted
the logging screen does not show tika package, also i searched on net, it requires log4j and slf4j jars, is it true?? Do i need to do the configurations for package level log? -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110891.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: using extract handler: data not extracted
this is the output i get when indexed through* solrj*, i followed the link you suggested. i tried indexing .doc file. response lst name=responseHeader int name=status400/int int name=QTime17/int /lst lst name=error str name=msg org.apache.solr.search.SyntaxError: Cannot parse 'id:C:\solr\document\src\new_index_doc\document_1.doc': Encountered : : at line 1, column 4. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... LPARAMS ... NUMBER ... /str int name=code400/int /lst /response Also when indexed with *solrnet*, i get this error: Caused by: java.lang.LinkageError: loader constraint violation: loader (instance of org/apache/catalina/loader/WebappClassLoader) previously initiated loading for a different type with name org/apache/xmlbeans/XmlCursor why this linkage error?? Now *curl does not work, neither does solrj and solrnet.* -- View this message in context: http://lucene.472066.n3.nabble.com/using-extract-handler-data-not-extracted-tp4110850p4110915.html Sent from the Solr - User mailing list archive at Nabble.com.
to index byte array
I am converting .doc and .docx files to byte array in c#, now I need to index this byte array of doc files. Is it possible in solr to index byte array of files?? -- View this message in context: http://lucene.472066.n3.nabble.com/to-index-byte-array-tp4108999.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to index byte array
For indexing .docx files using tika, requires file system path, but i dont want to give the path. I read in DIH faq's that by using transformer the output can be converted from byte to string. -- View this message in context: http://lucene.472066.n3.nabble.com/to-index-byte-array-tp4108999p4109007.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to index byte array
For indexing .docx files using tika, requires file system path, but i dont want to give the path. I read in DIH faq's that by using transformer the output can be converted from byte to string. -- View this message in context: http://lucene.472066.n3.nabble.com/to-index-byte-array-tp4108999p4109008.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to index byte array
If you consider a client-server architecture, the documents will sent in binary format to server, now for solr this binary format will be the source to index, so i need to index byte array. Also if store this byte-array into db and then index in solr, then will the contents of document be searchable like normal documents(because the contents are in binary format so will the solr match the query)?? -- View this message in context: http://lucene.472066.n3.nabble.com/to-index-byte-array-tp4108999p4109023.html Sent from the Solr - User mailing list archive at Nabble.com.
indexing .docx using solrj
i am trying to index .docx file using solrj, i referred this link: http://wiki.apache.org/solr/ContentStreamUpdateRequestExample My code is : import java.io.File; import java.io.IOException; import org.apache.solr.client.solrj.SolrServer; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.request.AbstractUpdateRequest; import org.apache.solr.client.solrj.response.QueryResponse; import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.client.solrj.impl.*; import org.apache.solr.client.solrj.request.ContentStreamUpdateRequest; public class rich_index { public static void main(String[] args) { try { //Solr cell can also index MS file (2003 version and 2007 version) types. String fileName = C:\\solr\\document\\src\\test1\\contract.docx; //this will be unique Id used by Solr to index the file contents. String solrId = contract.docx; indexFilesSolrCell(fileName, solrId); } catch (Exception ex) { System.out.println(ex.toString()); } } public static void indexFilesSolrCell(String fileName, String solrId) throws IOException, SolrServerException { String urlString = http://localhost:8080/solr/document;; SolrServer solr = new HttpSolrServer(urlString); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(new File(fileName), text); up.setParam(literal.id, solrId); up.setParam(uprefix, ignored_); up.setParam(fmap.content, contents); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); solr.request(up); QueryResponse rsp = solr.query(new SolrQuery(*:*)); System.out.println(rsp); } } This is my logs: Dec 22, 2013 12:27:58 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: [document] webapp=/solr path=/update/extract params={fmap.content=contentswaitSearcher=truecommit=trueuprefix=ignored_literal.id=contract.docxwt=javabinversion=2softCommit=false} {} 0 0 Dec 22, 2013 12:27:58 AM org.apache.solr.common.SolrException log SEVERE: null:java.lang.RuntimeException: java.lang.NoClassDefFoundError: *org/apache/xml/serialize/BaseMarkupSerializer* at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) To resolve this i added xerces.jar in the build path,this has. org/apache/xml/serialize/BaseMarkupSerializer class,but the error is not resolved. What is the problem?? *Solrconfig:* requestHandler name=/update/extract class=solr.extraction.ExtractingRequestHandler lst name=defaults str name=map.Last-Modifiedlast_modified/str str name=fmap.contentcontents/str str name=lowernamestrue/str str name=uprefixignored_/str /lst /requestHandler *scehma:* fields field name=doc_id type=uuid indexed=true stored=true default=NEW multiValued=false/ field name=id type=integer indexed=true stored=true required=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=author type=title_text indexed=true stored=true multiValued=true/ field name=title type=title_text indexed=true stored=true/ field name=date_modified type=date
Re: indexing .docx using solrj
I have added that jar,in the build path. but the same error,i get. Why is eclipse not recognising that jar?? Logs also show this, Caused by: java.lang.NoClassDefFoundError: org/apache/xml/serialize/BaseMarkupSerializer at org.apache.solr.handler.extraction.ExtractingRequestHandler.newLoader(ExtractingRequestHandler.java:117) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:63) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) ... 16 more Caused by: java.lang.ClassNotFoundException: org.apache.xml.serialize.BaseMarkupSerializer at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1688) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1533) ... 22 more -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-docx-using-solrj-tp4107737p4107746.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing .docx using solrj
Jar is already there in the lib folder of solr home. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-docx-using-solrj-tp4107737p4107748.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: program termination in solrj
Before and after running client,stats remain same only, class:org.apache.solr.update.DirectUpdateHandler2 version:1.0 description:Update handler that efficiently directly updates the on-disk main lucene index src:$URL: https://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/solr/core/src/java/org/apache/solr/update/DirectUpdateHandler2.java $ stats: commits:0 autocommits:0 soft autocommits:0 optimizes:0 rollbacks:0 expungeDeletes:0 docsPending:0 adds:0 deletesById:0 deletesByQuery:0 errors:0 cumulative_adds:0 cumulative_deletesById:0 cumulative_deletesByQuery:0 cumulative_errors:0 -- View this message in context: http://lucene.472066.n3.nabble.com/program-termination-in-solrj-tp4107706p4107749.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing .docx using solrj
solr: 4.2 tomcat: 7.0 jdk1.7.0.45 i have created solr home in c:\solr as in java options: -Dsolr.solr.home=C:\solr c:solr/lib contains: tika jars, actually i pasted all the jars from the solr 4.2 dist,contrib folders in c:solr/lib tomcat/lib contains: all the jars when installed. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-docx-using-solrj-tp4107737p4107752.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: program termination in solrj
also my default search handler has no dismax. requestHandler name=/select class=solr.SearchHandler /requestHandler requestHandler name=standard class=solr.StandardRequestHandler default=true lst name=defaults str name=echoParamsexplicit/str int name=rows20/int str name=fl*/str str name=dfcontents/str str name=version2.1/str /lst /requestHandler -- View this message in context: http://lucene.472066.n3.nabble.com/program-termination-in-solrj-tp4107706p4107753.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: program termination in solrj
okay, i did a mistake, i did not refresh the stats,so the stats after running java program: commits:1 autocommits:0 soft autocommits:0 optimizes:0 rollbacks:0 expungeDeletes:0 docsPending:0 adds:0 deletesById:0 deletesByQuery:0 errors:0 cumulative_adds:1 cumulative_deletesById:0 cumulative_deletesByQuery:0 cumulative_errors:0 -- View this message in context: http://lucene.472066.n3.nabble.com/program-termination-in-solrj-tp4107706p4107754.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing .docx using solrj
It is working now,i just restarted computer. But i dont still get the reason for the error. Thank you though,for your efforts. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-docx-using-solrj-tp4107737p4107755.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: indexing .docx using solrj
yes,i copied all jars from contrib/extraction to solr/lib. It is not getting the poi jar now, as mentioned in above post of mine, new error it shows now. -- View this message in context: http://lucene.472066.n3.nabble.com/indexing-docx-using-solrj-tp4107737p4107758.html Sent from the Solr - User mailing list archive at Nabble.com.
Java heap space:out of memory
I just indexed 10 doc of total 15mb.For some queries it works fine but, for some queries i get this error: response lst name=error str name=msgjava.lang.OutOfMemoryError: Java heap space/str str name=trace java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.OutOfMemoryError: Java heap space /str int name=code500/int /lst /response I have direclty indexed them into solr. My schema.xml is: fields field name=doc_id type=uuid indexed=true stored=true default=NEW multiValued=false/ field name=id type=integer indexed=true stored=true required=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=author type=title_text indexed=true stored=true multiValued=true/ field name=title type=title_text indexed=true stored=true/ field name=_version_ type=long indexed=true stored=true multiValued=false/ copyfield source=id dest=text / dynamicField name=ignored_* type=text indexed=false stored=false multiValued=true/ field name=spelltext type=spell indexed=true stored=false multiValued=true / copyField source=contents dest=spelltext / /fields I dont understand for such small num of doc why do i get this error. I havent studied much about solr performance details. How to increase the heap size? because I need to index a lot more data still. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Java heap space:out of memory
4gb ram. I m running on Windows 7,with Tomcat as webserver. -- View this message in context: http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903p4105929.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Java heap space:out of memory
sorry but i dont know how to check that? -- View this message in context: http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903p4105947.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Java heap space:out of memory
okay thanks, here it is: max heap size : 63.56MB(it is howing 37.2% usage though) How to increase that size?? -- View this message in context: http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903p4105952.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Java heap space:out of memory
I have set : JAVA_OPTS as value: -Xms1024M-Xmx1024M But the dashboard still shows 64M,but now the usage is only 18% How could that be? yesterday it was 87%. -- View this message in context: http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903p4106069.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Java heap space:out of memory
yes,i did put the space,as in the image -- View this message in context: http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903p4106077.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Java heap space:out of memory
You were right the changes made in JAVA_OPTs didn't show increase in the heap size, I made changes in the UI of tomcat Initial pool memory : 512 MB Maximum pool memory : 1024 MB Now the heap size has increased. Thanks you all for your suggestions,it really saved my time. -- View this message in context: http://lucene.472066.n3.nabble.com/Java-heap-space-out-of-memory-tp4105903p4106082.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Null pointer exception in spell checker at addchecker method
yes, it worked. And i got the reason for the error. Thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Null-pointer-exception-in-spell-checker-at-addchecker-method-tp4105489p4105636.html Sent from the Solr - User mailing list archive at Nabble.com.
Null pointer exception in spell checker at addchecker method
Im trying to use spell check component. My *schema* is:(i have included only fields necessary for spell check not the entire schema) fields field name=doc_id type=uuid indexed=true stored=true default=NEW multiValued=false/ field name=id type=string indexed=true stored=true required=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=author type=string indexed=true stored=true multiValued=true/ field name=title type=text indexed=true stored=true/ field name=_version_ type=long indexed=true stored=true multiValued=false/ copyfield source=id dest=text / dynamicField name=ignored_* type=text indexed=false stored=false multiValued=true/ field name=spelltext type=spell indexed=true stored=false multiValued=true / copyField source=contents dest=spelltext / /fields types fieldType name=spell class=solr.TextField analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EnglishMinimalStemFilterFactory / filter class=solr.SnowballPorterFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 splitOnCaseChange=1/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.EnglishMinimalStemFilterFactory / filter class=solr.SnowballPorterFilterFactory / /analyzer /fieldType /types My *solrconfig* is: searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext/str lst name=spellchecker str name=namedirect/str str name=fieldcontents/str str name=classnamesolr.DirectSolrSpellChecker/str str name=distanceMeasureinternal/str float name=accuracy0.8/float int name=maxEdits1/int int name=minPrefix1/int int name=maxInspections5/int int name=minQueryLength3/int float name=maxQueryFrequency0.01/float /lst /searchComponent searchComponent name=spellcheck class=solr.SpellCheckComponent lst name=spellchecker str name=namewordbreak/str str name=classnamesolr.WordBreakSolrSpellChecker/str str name=fieldcontents/str str name=combineWordstrue/str str name=breakWordstrue/str int name=maxChanges10/int /lst /searchComponent requestHandler name=/spell class=solr.SearchHandler startup=lazy lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarydirect/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count5/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str /lst arr name=last-components strspellcheck/str /arr /requestHandler I get this *error*: java.lang.NullPointerException at org.apache.solr.spelling.*ConjunctionSolrSpellChecker.addChecker*(ConjunctionSolrSpellChecker.java:58) at org.apache.solr.handler.component.SpellCheckComponent.getSpellChecker(SpellCheckComponent.java:475) at org.apache.solr.handler.component.SpellCheckComponent.prepare(SpellCheckComponent.java:106) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:187) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:242) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at I know that the error might be in addchecker method,i read this method but the coding of this method is such that, for all the null values, default values are added. (eg: if (queryAnalyzer == null) queryAnalyzer = checker.getQueryAnalyzer(); ) Now so i feel that the Null checker value is sent when /checkers.add(checker);/ is executed. If i am right tell me how to resolve this,else what has gone wrong. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Null-pointer-exception-in-spell-checker-at-addchecker-method-tp4105489.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: no such field error:smaller big block size details while indexing doc files
I will try using solrj.Thanks. but I tried to index .docx file I am getting some different error: SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:59) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) ... 16 more I read this solution(http://stackoverflow.com/questions/14696371/how-to-extract-the-text-of-a-ppt-file-with-tika),which says removal of jars solves errors,but there are no such mentioned jars in my classpath. Is it that,Jars may cause the issue? Thank You. On Wednesday, October 9, 2013 12:54 PM, sweety shinde sweetyshind...@yahoo.com wrote: I will try using solrJ. Now I tried indexing .docx files and I get some different error,logs are: SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539
Re: no such field error:smaller big block size details while indexing doc files
I will try using solrJ. Now I tried indexing .docx files and I get some different error,logs are: SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:59) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) ... 16 more But does the jars cause these errors? Because I read one solution which said removal of few jars in classpath may solve the errors,but those jars are not present in my classpath.(the link to solution :http://stackoverflow.com/questions/14696371/how-to-extract-the-text-of-a-ppt-file-with-tika) Thank You. On Wednesday, October 9, 2013 6:05 AM, Erick Erickson [via Lucene] ml-node+s472066n4094231...@n3.nabble.com wrote: Hmmm, that is odd, the glob dynamicField should pick this up. Not quite sure what's going on. You an parse the file via Tika yourself and look at what's in there, it's a relatively simple SolrJ program, here's a sample: http://searchhub.org/2012/02/14/indexing-with-solrj/ Best, Erick On Tue, Oct 8, 2013 at 4:15 PM, sweety [hidden email] wrote: This my new schema.xml: schema name=documents fields field name=id type=string indexed=true stored=true required=true multiValued=false/ field name=author type=string indexed=true stored=true multiValued=true/ field name=comments type=text indexed=true stored=true multiValued=false/ field name=keywords type=text indexed=true stored=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=title type=text indexed=true stored=true multiValued=false/ field name=revision_number type=string indexed=true stored=true multiValued=false/ field name=_version_ type=long indexed=true stored=true multiValued=false/ dynamicField name=ignored_* type=string indexed=false stored=true multiValued=true/ dynamicField name=* type=ignored multiValued=true / copyfield source=id dest=text / copyfield source=author dest=text / /fields types fieldtype name=ignored stored=false indexed=false class=solr.StrField / fieldType name=integer class=solr.IntField / fieldType name=long class=solr.LongField / fieldType name=string class=solr.StrField / fieldType name=text
Re: no such field error:smaller big block size details while indexing doc files
This my new schema.xml: schema name=documents fields field name=id type=string indexed=true stored=true required=true multiValued=false/ field name=author type=string indexed=true stored=true multiValued=true/ field name=comments type=text indexed=true stored=true multiValued=false/ field name=keywords type=text indexed=true stored=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=title type=text indexed=true stored=true multiValued=false/ field name=revision_number type=string indexed=true stored=true multiValued=false/ field name=_version_ type=long indexed=true stored=true multiValued=false/ dynamicField name=ignored_* type=string indexed=false stored=true multiValued=true/ dynamicField name=* type=ignored multiValued=true / copyfield source=id dest=text / copyfield source=author dest=text / /fields types fieldtype name=ignored stored=false indexed=false class=solr.StrField / fieldType name=integer class=solr.IntField / fieldType name=long class=solr.LongField / fieldType name=string class=solr.StrField / fieldType name=text class=solr.TextField / /types uniqueKeyid/uniqueKey /schema I still get the same error. From: Erick Erickson [via Lucene] ml-node+s472066n4094013...@n3.nabble.com To: sweety sweetyshind...@yahoo.com Sent: Tuesday, October 8, 2013 7:16 AM Subject: Re: no such field error:smaller big block size details while indexing doc files Well, one of the attributes parsed out of, probably the meta-information associated with one of your structured docs is SMALLER_BIG_BLOCK_SIZE_DETAILS and Solr Cel is faithfully sending that to your index. If you want to throw all these in the bit bucket, try defining a true catch-all field that ignores things, like this. dynamicField name=* type=ignored multiValued=true / Best, Erick On Mon, Oct 7, 2013 at 8:03 AM, sweety [hidden email] wrote: Im trying to index .doc,.docx,pdf files, im using this url: curl http://localhost:8080/solr/document/update/extract?literal.id=12commit=true; -Fmyfile=@complex.doc This is the error I get: Oct 07, 2013 5:02:18 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:93) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:190) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:184) at org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:376) at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219
no such field error:smaller big block size details while indexing doc files
Im trying to index .doc,.docx,pdf files, im using this url: curl http://localhost:8080/solr/document/update/extract?literal.id=12commit=true; -Fmyfile=@complex.doc This is the error I get: Oct 07, 2013 5:02:18 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.RuntimeException: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.NoSuchFieldError: SMALLER_BIG_BLOCK_SIZE_DETAILS at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:93) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:190) at org.apache.poi.poifs.filesystem.NPOIFSFileSystem.init(NPOIFSFileSystem.java:184) at org.apache.tika.parser.microsoft.POIFSContainerDetector.getTopLevelNames(POIFSContainerDetector.java:376) at org.apache.tika.parser.microsoft.POIFSContainerDetector.detect(POIFSContainerDetector.java:165) at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:113) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) ... 16 more Also using same type of url,txt,mp3 and pdf files are indexed successfully. (curl http://localhost:8080/solr/document/update/extract?literal.id=12commit=true; -Fmyfile=@abc.txt) Schema.xml is: schema name=documents fields field name=id type=string indexed=true stored=true required=true multiValued=false/ field name=author type=string indexed=true stored=true multiValued=true/ field name=comments type=text indexed=true stored=true multiValued=false/ field name=keywords type=text indexed=true stored=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=title type=text indexed=true stored=true multiValued=false/ field name=revision_number type=string indexed=true stored=true multiValued=false/ field name=_version_ type=long indexed=true stored=true multiValued=false/ dynamicField name=ignored_* type=string indexed=false stored=true multiValued=true/ copyfield source=id dest=text / copyfield source=author dest=text / /fields types fieldType name=integer class=solr.IntField / fieldType name=long class=solr.LongField / fieldType name=string class=solr.StrField / fieldType name=text class=solr.TextField / fieldtype name=ignored stored=false indexed=false multiValued=true class=solr.StrField / /types uniqueKeyid/uniqueKey /schema Im not able to understand what kind of error this is,please help me. -- View this message in context: http://lucene.472066.n3.nabble.com/no-such-field-error-smaller-big-block-size-details-while-indexing-doc-files-tp4093883.html Sent from the Solr - User mailing list archive at Nabble.com.