plx help for a weird problem
hi, I am new to lucene. I want to build a search engine for myself, and use jboss(bundled with tomcat) as server. I wrote following code to do the index: ---code snippet- IndexWriter writer = null; File root = new File(path); DictionaryMgr dm = DictionaryMgr.getInstance(); HashMap dictionary = dm.getDictionary(); try { String[] files = root.list(); if (files != null files.length 0) { writer = new IndexWriter(path, new ChineseAnalyzer(dictionary), false); } else { writer = new IndexWriter(path, new ChineseAnalyzer(dictionary), true); } writer.maxFieldLength = 100; writer.addDocument(doc); } --code snippet- when I use junit to test above code, there is no problem, but if this code work with JBOSS, it failed, very weird. I check the code, and found the error lies in IndexWriter.java. it is the following code makes the error. private org.apache.lucene.index.SegmentInfos segmentInfos = new org.apache.lucene.index.SegmentInfos(); but I don't think there is coding error, maybe error from JBoss, Tomcat or something else? I have no idea. so plx help me. error--- javax.servlet.ServletException: Servlet execution threw an exception 17:42:49,857 ERROR [STDERR] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269) 17:42:49,857 ERROR [STDERR] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:193) 17:42:49,857 ERROR [STDERR] at com.sungoal.brim.PermissionMonitorFilter.doFilter(PermissionMonitorFilter.java:106) 17:42:49,857 ERROR [STDERR] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:213) 17:42:49,857 ERROR [STDERR] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:193) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:256) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:643) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:643) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.valves.CertificatesValve.invoke(CertificatesValve.java:246) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:641) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2415) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:643) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.java:171) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:641) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:172) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:641) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:509) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:641) 17:42:49,887 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) 17:42:49,887 ERROR [STDERR] at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) 17:42:49,887 ERROR [STDERR] at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:174) 17:42:49,887 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:643) 17:42:49,887 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) 17:42:49,887 ERROR
plx help for a weird problem
hi, I am new to lucene. I want to build a search engine for myself, and use jboss(bundled with tomcat) as server. I wrote following code to do the index: ---code snippet- IndexWriter writer = null; File root = new File(path); DictionaryMgr dm = DictionaryMgr.getInstance(); HashMap dictionary = dm.getDictionary(); try { String[] files = root.list(); if (files != null files.length 0) { writer = new IndexWriter(path, new ChineseAnalyzer(dictionary), false); } else { writer = new IndexWriter(path, new ChineseAnalyzer(dictionary), true); } writer.maxFieldLength = 100; writer.addDocument(doc); } --code snippet- when I use junit to test above code, there is no problem, but if this code work with JBOSS, it failed, very weird. I check the code, and found the error lies in IndexWriter.java. it is the following code makes the error. private org.apache.lucene.index.SegmentInfos segmentInfos = new org.apache.lucene.index.SegmentInfos(); but I don't think there is coding error, maybe error from JBoss, Tomcat or something else? I have no idea. so plx help me. error--- javax.servlet.ServletException: Servlet execution threw an exception 17:42:49,857 ERROR [STDERR] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269) 17:42:49,857 ERROR [STDERR] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:193) 17:42:49,857 ERROR [STDERR] at com.sungoal.brim.PermissionMonitorFilter.doFilter(PermissionMonitorFilter.java:106) 17:42:49,857 ERROR [STDERR] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:213) 17:42:49,857 ERROR [STDERR] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:193) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:256) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:643) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:643) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.valves.CertificatesValve.invoke(CertificatesValve.java:246) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:641) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2415) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:643) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.java:171) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:641) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:172) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:641) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:509) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:641) 17:42:49,887 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) 17:42:49,887 ERROR [STDERR] at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) 17:42:49,887 ERROR [STDERR] at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:174) 17:42:49,887 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:643) 17:42:49,887 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) 17:42:49,887 ERROR
Re: plx help for a weird problem
Hello, I can't see any mention of Lucene in the exception trace. Have you seen this: http://ejindex.sourceforge.net/ ? Otis --- Slide Tao [EMAIL PROTECTED] wrote: hi, I am new to lucene. I want to build a search engine for myself, and use jboss(bundled with tomcat) as server. I wrote following code to do the index: ---code snippet- IndexWriter writer = null; File root = new File(path); DictionaryMgr dm = DictionaryMgr.getInstance(); HashMap dictionary = dm.getDictionary(); try { String[] files = root.list(); if (files != null files.length 0) { writer = new IndexWriter(path, new ChineseAnalyzer(dictionary), false); } else { writer = new IndexWriter(path, new ChineseAnalyzer(dictionary), true); } writer.maxFieldLength = 100; writer.addDocument(doc); } --code snippet- when I use junit to test above code, there is no problem, but if this code work with JBOSS, it failed, very weird. I check the code, and found the error lies in IndexWriter.java. it is the following code makes the error. private org.apache.lucene.index.SegmentInfos segmentInfos = new org.apache.lucene.index.SegmentInfos(); but I don't think there is coding error, maybe error from JBoss, Tomcat or something else? I have no idea. so plx help me. error--- javax.servlet.ServletException: Servlet execution threw an exception 17:42:49,857 ERROR [STDERR] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:269) 17:42:49,857 ERROR [STDERR] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:193) 17:42:49,857 ERROR [STDERR] at com.sungoal.brim.PermissionMonitorFilter.doFilter(PermissionMonitorFilter.java:106) 17:42:49,857 ERROR [STDERR] at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:213) 17:42:49,857 ERROR [STDERR] at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:193) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:256) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:643) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:643) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.valves.CertificatesValve.invoke(CertificatesValve.java:246) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:641) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) 17:42:49,867 ERROR [STDERR] at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2415) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:643) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve.java:171) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:641) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:172) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:641) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:509) 17:42:49,877 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invokeNext(StandardPipeline.java:641) 17:42:49,887 ERROR [STDERR] at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) 17:42:49,887 ERROR [STDERR] at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) 17:42:49,887 ERROR
Need advice: what pdf lib to use?
Hello all, I need a piece of advice/experience.. What pdf parser (written in java) u'd recommend? I played now with PDFBox-0.6.7a and would not say I was satisfied too much with it On certain pdf's (not well formated but anyway readable with acrobate) it run into dead loop (this I could fix in code), and on one file it produced out of memory error and killed jvm:( (this problem I could not identify yet) After all the performance was not too great as well: it took c. 19 h. to index 13000 files (c. 3.5Gb) Regards, J.
Re: Need advice: what pdf lib to use?
Please post any PDFBox issues you notice on the PDFBox sourceforge bug list, if possible attach/email any problem PDFs that you encounter. There are some efforts underway to improve the speed of PDFBox, you can monitor the progress at http://sourceforge.net/tracker/index.php?func=detailaid=1046300group_id=78314atid=552832 As for other suggestions, I know some people have utilized xpdf(open source but non Java) to extract the text. For other Java solutions PDFTextStream(commercial) - Fastest PDF-to-Text Solution for Java http://snowtide.com/home/PDFTextStream/ Etymon PJ (GPL) http://www.etymon.com/ Ben http://www.pdfbox.org On Fri, 22 Oct 2004 [EMAIL PROTECTED] wrote: Hello all, I need a piece of advice/experience.. What pdf parser (written in java) u'd recommend? I played now with PDFBox-0.6.7a and would not say I was satisfied too much with it On certain pdf's (not well formated but anyway readable with acrobate) it run into dead loop (this I could fix in code), and on one file it produced out of memory error and killed jvm:( (this problem I could not identify yet) After all the performance was not too great as well: it took c. 19 h. to index 13000 files (c. 3.5Gb) Regards, J. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: index files version and lucene 1.4
Thanks, Finally my problem seems to come from TOMCAT (5.0) and lucene 1.4 installation. To summerize: Throught TOMCAT with the same application (lucene 1.4) and index 1.4 I have no Hits while I have Hits with index 1.3. Without TOMCAT with the same application (lucene 1.4) I have Hits for both version of index files 1.3 and 1.4. Is someone have an idea, please? Arno. Aviran wrote: Lucene 1.4 changed the file format for indexes. You can access a old index using lucene 1.4 but you can't access index which was created using lucene 1.4 with older versions. I suggest you rebuild your index using lucene 1.4 Aviran http://aviran.mordos.com -Original Message- From: arnaud gaudinat [mailto:[EMAIL PROTECTED] Sent: Thursday, October 21, 2004 12:10 PM To: Lucene Users List Subject: index files version and lucene 1.4 Hi, Certainly a stupid question! I have just upgraded to 1.4, I have succeeded to access my 1.3 index files but not my new 1.4 index files. In fact I have no error, but no hits for 1.4 index files. More, I don't know if it's normal but now I have just 3 files for my index (.cfs, deletable and segments). However if I use Luke with the 1.4 index files, It works perfectly. An idea? Regards, Arno. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
problems deleting documents / design question
Hi, I'm creating an index from several database tables. Every item within every table has a unique id which is saved in some kind of id-field and the table name in an other one. So together they form a unique identifier within the index. When deleting / updating an item I need to retrieve it. My first idea was indexreader.delete(new Term(id, id-value)); but this could delete several entries as id-value may appear in several databases. My second idea was to combine database name and id to form a kind of unique identifier but this seems to be not the right way as the problem may occur again with some sub-ids within a certain table. So my question is: is it possible to determine the item to be deleted by more than one term? thx, Paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: problems deleting documents / design question
Paul, We are doing similar stuff. We actually do create a hash of database name, table name and id to form a unique id. So far I have not had any problems with it. Cheers, Aad Hi, I'm creating an index from several database tables. Every item within every table has a unique id which is saved in some kind of id-field and the table name in an other one. So together they form a unique identifier within the index. When deleting / updating an item I need to retrieve it. My first idea was indexreader.delete(new Term(id, id-value)); but this could delete several entries as id-value may appear in several databases. My second idea was to combine database name and id to form a kind of unique identifier but this seems to be not the right way as the problem may occur again with some sub-ids within a certain table. So my question is: is it possible to determine the item to be deleted by more than one term? thx, Paul - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: index files version and lucene 1.4
I had this problem when i initially upgraded to 1.4, but tomcat was still searching with the old 1.3 jar. Make sure you have fully updated its path variables, include directories, etc. - andy g On Fri, 22 Oct 2004 16:00:42 +0200, gaudinat [EMAIL PROTECTED] wrote: Thanks, Finally my problem seems to come from TOMCAT (5.0) and lucene 1.4 installation. To summerize: Throught TOMCAT with the same application (lucene 1.4) and index 1.4 I have no Hits while I have Hits with index 1.3. Without TOMCAT with the same application (lucene 1.4) I have Hits for both version of index files 1.3 and 1.4. Is someone have an idea, please? Arno. Aviran wrote: Lucene 1.4 changed the file format for indexes. You can access a old index using lucene 1.4 but you can't access index which was created using lucene 1.4 with older versions. I suggest you rebuild your index using lucene 1.4 Aviran http://aviran.mordos.com -Original Message- From: arnaud gaudinat [mailto:[EMAIL PROTECTED] Sent: Thursday, October 21, 2004 12:10 PM To: Lucene Users List Subject: index files version and lucene 1.4 Hi, Certainly a stupid question! I have just upgraded to 1.4, I have succeeded to access my 1.3 index files but not my new 1.4 index files. In fact I have no error, but no hits for 1.4 index files. More, I don't know if it's normal but now I have just 3 files for my index (.cfs, deletable and segments). However if I use Luke with the 1.4 index files, It works perfectly. An idea? Regards, Arno. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Corrupted indexes
Recently, I've been getting a lot of corrupted lucene indexes. They appear to return search results normally, but there is really no good way to test whether information is missing. The main problem is that when i try to optimize, i get the following Exception: java.io.IOException: read past EOF at org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:218) at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61) at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:356) at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:323) at org.apache.lucene.index.SegmentMerger.mergeNorms(SegmentMerger.java:422) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:94) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:487) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:366) this is preventing me from optimizing the indexes, and also scares me that information might be missing. Does anybody know what's going on here, and what might be wrong? Thanks for your time, - andy g - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]