Re: Solr4.0 causes NoClassDefFoundError while indexing class files and mp4 files.

2012-12-20 Thread Shigeki Kobayashi
Thanks Abe-san!

Your advice is very informative.

Thanks again.


Regards,

Shigeki


2012/12/21 Shinichiro Abe 

> You can place the missing JAR files in the contrib/extraction/lib.
>
> For class files: asm-x.x.jar
> For mp4 files: aspectjrt-x.x.jar
>
> FWIW, please see https://issues.apache.org/jira/browse/SOLR-4209
>
> Regards,
> Shinichiro Abe
>
> On 2012/12/21, at 15:08, Shigeki Kobayashi wrote:
>
> > Hi,
> >
> > I use ManifoldCF1.1dev to crawl files and index them into Solr4.0
> >
> > While indexing class files and mp4 files, Solr caused
> NoClassDefFoundError
> > as
> > following:
> >
> >>> Indexing a mp4 file
> >
> > 2012-12-19
> >
> 06:16:48,485%P[solr.servlet.SolrDispatchFilter]-[TP-Processor44]-:null:java.lang.RuntimeException:
> > java.lang.NoClassDefFoundError: org/aspectj/lang/Signature
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:469)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:297)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >at
> >
> filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> >at
> >
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> >at
> >
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> >at
> >
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
> >at
> >
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> >at
> >
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
> >at
> > org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
> >at
> > org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
> >at
> org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:774)
> >at
> >
> org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:703)
> >at
> >
> org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:896)
> >at
> >
> org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
> >at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.lang.NoClassDefFoundError: org/aspectj/lang/Signature
> >at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:117)
> >at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >at
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> >at
> >
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219)
> >at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
> >at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
> >... 18 more
> > Caused by: java.lang.ClassNotFoundException: org.aspectj.lang.Signature
> >at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >at java.security.AccessController.doPrivileged(Native Method)
> >at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >at
> java.net.FactoryURLClassLoader.loadClass(URLClass

IOFileUploadException(Too many open files) occurs while indexing using ExtractingRequestHandler

2012-11-29 Thread Shigeki Kobayashi
Hello everyone

I use ManifoldCF (File Crawler) to crawl and index file contents into
Solr3.6.
ManifoldCF uses ExtractingRequestHandler to extract contents from files.
Somehow IOFileUploadException occurs and tells there are too many open
files.

Does Solr open temporary files under /var/tmp/ a lot? Are there any cases
that those files remained open?

Also, after IOFileUploadException occurs, LockObtainFailedException tend to
happen a lot. Do you think this is related to IOFileUploadException?


2012/11/30 04:11:19
ERROR[solr.servlet.SolrDispatchFilter]-[TP-Processor1962]-:org.apache.commons.fileupload.FileUploadBase$IOFileUploadException:
Processing of multipart/form-data request failed.
/var/tmp/upload_4f3502de_13b4ac3d1f6__8000_24519177.tmp (Too many open
files)
at
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:367)
at
org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
at
org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:344)
at
org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:397)
at
org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:115)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at
org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190)
at
org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291)
at org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:774)
at
org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:703)
at
org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:896)
at
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException:
/var/tmp/upload_4f3502de_13b4ac3d1f6__8000_24519177.tmp (Too many open
files)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.(FileOutputStream.java:194)
at java.io.FileOutputStream.(FileOutputStream.java:145)
at
org.apache.commons.io.output.DeferredFileOutputStream.thresholdReached(DeferredFileOutputStream.java:181)
at
org.apache.commons.io.output.ThresholdingOutputStream.checkThreshold(ThresholdingOutputStream.java:226)
at
org.apache.commons.io.output.ThresholdingOutputStream.write(ThresholdingOutputStream.java:130)
at org.apache.commons.fileupload.util.Streams.copy(Streams.java:101)
at org.apache.commons.fileupload.util.Streams.copy(Streams.java:64)
at
org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)
... 23 more






2012/11/30 06:11:08
ERROR[solr.servlet.SolrDispatchFilter]-[TP-Processor1940]-:org.apache.lucene.store.LockObtainFailedException:
Lock obtain timed out: NativeFSLock@/usr/local/solr/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:1098)
at
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:84)
at
org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:101)
at
org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:171)
at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:219)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:141)
at
org.apache.solr.handler.extraction.Extrac

Re: ExtractingRequestHandler causes Out of Memory Error

2012-09-27 Thread Shigeki Kobayashi
Hi Jan.

Thank you very much for your advice.

So I understand Solr needs more memory to parse the files.
To parse a file of size x,  it needs double memory (2x). Then how much
memory allocation should be taken to heap size? 8x? 16x?

Regards,


Shigeki

2012/9/28 Jan Høydahl 

> Please try to increase -Xmx and see how much RAM you need for it to
> succeed.
>
> I believe it is simply a case where this particular file needs double
> memory (480Mb) to parse and you have only allocated 1Gb (which is not
> particularly much). Perhaps the code could be optimized to avoid the
> Arrays.copyOf() call..
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 27. sep. 2012 kl. 11:22 skrev Shigeki Kobayashi <
> shigeki.kobayas...@g.softbank.co.jp>:
>
> > Hi guys,
> >
> >
> > I use Manifold CF to crawl files in Windows file server and index them to
> > Solr using Extracting Request Handler.
> > Most of the documents are succesfully indexed but some are failed and Out
> > of Memory Error occurs in Solr, so I need some advice.
> >
> > Those failed files are not so big and they are a csv file of 240MB and a
> > text file of 170MB.
> >
> > Here is environment and machine spec:
> > Solr 3.6 (also Solr4.0Beta)
> > Tomcat 6.0
> > CentOS 5.6
> > java version 1.6.0_23
> > HDD 60GB
> > MEM 2GB
> > JVM Heap: -Xmx1024m -Xms1024m
> >
> > I feel there is enough memory that Solr should be able to extract and
> index
> > file content.
> >
> > Here is a Solr log below:
> > --
> >
> [solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError:
> > Java heap space
> >at java.util.Arrays.copyOf(Arrays.java:2882)
> >at
> >
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
> >at
> > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
> >at java.lang.StringBuilder.append(StringBuilder.java:189)
> >at
> >
> org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293)
> >at
> >
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> >at
> >
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
> >at
> >
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> >at
> >
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> >at
> >
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> >at
> >
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
> >at
> >
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
> >at
> >
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
> >at
> >
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
> >at
> >
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
> >at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134)
> >at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >at
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> >at
> >
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
> >at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
> >at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >at
> >
> filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> >
> > -
> >
> > Anyone has any ideas?
> >
> > Regards,
> >
> > Shigeki
>
>


ExtractingRequestHandler causes Out of Memory Error

2012-09-27 Thread Shigeki Kobayashi
Hi guys,


I use Manifold CF to crawl files in Windows file server and index them to
Solr using Extracting Request Handler.
Most of the documents are succesfully indexed but some are failed and Out
of Memory Error occurs in Solr, so I need some advice.

Those failed files are not so big and they are a csv file of 240MB and a
text file of 170MB.

Here is environment and machine spec:
Solr 3.6 (also Solr4.0Beta)
Tomcat 6.0
CentOS 5.6
java version 1.6.0_23
HDD 60GB
MEM 2GB
JVM Heap: -Xmx1024m -Xms1024m

I feel there is enough memory that Solr should be able to extract and index
file content.

Here is a Solr log below:
--
[solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError:
Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
at java.lang.StringBuilder.append(StringBuilder.java:189)
at
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

-

Anyone has any ideas?

Regards,

Shigeki