Re: pdfboxhelp
Hi natarajan, I kept log4j.properties in the classpath my new classpath is .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webclien t.ja r;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2sdk1.4.1\ lib\ xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\j2sdk1.4.1\l ib\s ervlet.jar;E:\Program Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\Program Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2sdk1. 4.1\ lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl.jar;C:\ j2sd k1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2sdk1.4.1\lib\ jaxp .jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.zip;C:\struts.jar ;F:\ apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0.6.6.jar;C:\j2sdk1.4. 1\li b\lucene-20030909.jar;D:\setups\searchEngine\PDFBox-0.6.6\external\log4j.jar ;C:\ j2sdk1.4.1\lib\log4j.properties; but there is no difference in the output - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:56 AM Subject: RE: pdfboxhelp Hi Santhosh, The attached file must be in your class path. Natarajan. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:51 AM To: Lucene Users List Subject: Fw: pdfboxhelp hi karthik, did u find any solution? should I send the pdf to u? - Original Message - From: Santosh [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:23 AM Subject: Re: pdfboxhelp hi karthik, I kept log4j in the classpath , I am sending classpath variable CLASSPATH .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webc lien t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2s dk1. 4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\ j2sd k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\Program Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2s dk1. 4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl .jar ;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2s dk1. 4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.z ip;C :\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0. 6.6. jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox- 0.6. 6\external\log4j.jar please check the error - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:26 AM Subject: RE: pdfboxhelp Hi Santosh I think u'r Pdf is using Log4j package ,Try toe set the classpath for log4j.jar path. [ Is it a just a WARNING or an ERROR u are getting. Send me in u'r Configuration management Let me help u with it ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:11 AM To: Lucene Users List Cc: Ben Litchfield Subject: Re: pdfboxhelp hi karthik, I have downloaded pdfbox and kept pdfjar file in the classpath, but when I am typing following command in the command prompt I am getting the error: D:\setups\searchEngine\PDFBox-0.6.6\srcjava org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly why I am getting this error? plz help - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 9:21 AM Subject: RE: pdfboxhelp Hi To Begin with try to build Indexes offline [ out of Tomcat container] and on completing indxexes, feed u'r search with the realpath of the offline indexed folder,Start the Tomcat and then use the search on As u experiment it out u will be comfortable withrequirment of Indexing /Search.. ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:55 PM To: Lucene Users List Subject: Re: pdfboxhelp Yes I did the same. I copied all the classes into classes folder but now when I am building the index using IndexHTML the pdfs are not added to this index, only text and htmls are added to index. what changes should I do for IndexHTML.java to build index with pdf - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:54 PM Subject: RE: pdfboxhelp Hi If u are using the
Re: pdfboxhelp
I kept the file in the classpath .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webclien t.ja r;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;D:\JAVAPRO;E:\ Prog ram Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\j2sdk1.4.1\lib\classes12.z ip;C:\struts.jar;C:\j2sdk1.4.1\lib\PDFBox-0.6.6.jar;C:\j2sdk1.4.1\lib\lucene -200 30909.jar;D:\setups\searchEngine\PDFBox-0.6.6\external\log4j.jar;C:\j2sdk1.4 .1\l ib\log4j.properties;D:\setups\searchEngine\PDFBox-0.6.6\external\ant.jar;D:\ setu ps\searchEngine\PDFBox-0.6.6\external\checkstyle-all-2.4.jar;D:\setups\searc hEng ine\PDFBox-0.6.6\external\junit.jar;D:\setups\searchEngine\PDFBox-0.6.6\exte rnal \lucene-1.4-final.jar;D:\setups\searchEngine\PDFBox-0.6.6\external\lucene-de mos- 1.4-final.jar;D:\setups\searchEngine\PDFBox-0.6.6\external\xercesImpl.jar;D: \set ups\searchEngine\PDFBox-0.6.6\external\xml-apis.jar; but there is no change in the output, it is same as previous E:\java org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly. what might be the error? - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:56 AM Subject: RE: pdfboxhelp Hi Santhosh, The attached file must be in your class path. Natarajan. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:51 AM To: Lucene Users List Subject: Fw: pdfboxhelp hi karthik, did u find any solution? should I send the pdf to u? - Original Message - From: Santosh [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:23 AM Subject: Re: pdfboxhelp hi karthik, I kept log4j in the classpath , I am sending classpath variable CLASSPATH .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webc lien t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2s dk1. 4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\ j2sd k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\Program Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2s dk1. 4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl .jar ;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2s dk1. 4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.z ip;C :\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0. 6.6. jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox- 0.6. 6\external\log4j.jar please check the error - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:26 AM Subject: RE: pdfboxhelp Hi Santosh I think u'r Pdf is using Log4j package ,Try toe set the classpath for log4j.jar path. [ Is it a just a WARNING or an ERROR u are getting. Send me in u'r Configuration management Let me help u with it ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:11 AM To: Lucene Users List Cc: Ben Litchfield Subject: Re: pdfboxhelp hi karthik, I have downloaded pdfbox and kept pdfjar file in the classpath, but when I am typing following command in the command prompt I am getting the error: D:\setups\searchEngine\PDFBox-0.6.6\srcjava org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly why I am getting this error? plz help - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 9:21 AM Subject: RE: pdfboxhelp Hi To Begin with try to build Indexes offline [ out of Tomcat container] and on completing indxexes, feed u'r search with the realpath of the offline indexed folder,Start the Tomcat and then use the search on As u experiment it out u will be comfortable withrequirment of Indexing /Search.. ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:55 PM To: Lucene Users List Subject: Re: pdfboxhelp Yes I did the same. I copied all the classes into classes folder but now when I am building the index using IndexHTML the pdfs are not added to this index, only text and htmls are added to index. what changes should I do for IndexHTML.java to build index with pdf -
RE: memory leek in lucene?
Yes Terence, it's exactly what I do Terence Lai [EMAIL PROTECTED] 21.08.2004 01:50 Please respond to Lucene Users List To: Lucene Users List [EMAIL PROTECTED] cc: Subject:RE: memory leek in lucene? Category: Are you calling ParallelMultiSearcher.search(Query query, Sort sort) to do your search? If so, I am currently having a similar problem. Terence Doing query against lucene I run into memomry problem, i.e. it's look like it's not giving memory back after the query have been executed. I use ParallelMultiSearcher ant call close method after results are displayed. hits=null; // Hits class if (ms!=null) ms.close(); //ParallelMultiSearcher Doesn't help. The memory getting not free. On queries like No* I get incremental memory consume of c. 20-70mb. per query. Imagine what happens with my web server... I tried also from command line and got the similar result. Am I doing wrong or miss something? Please help, I use 1.4.1 on linux box. Joel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Search Applet
Thanks Jon that works by putting the jar file in the archive attribute. Now im getting the disablelock error cause of the unsigned applet. Do I just comment out the code anywhere where System.getProperty() appears in the files that you specified and then update the JAR Archive?? Is it possible you could show me one of the hacked files so that I know what I'm modifying? Does anyone else know if there is another way of doing this without having to hack the source code? Many thanks. Simon - Original Message - From: Jon Schuster [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 2:08 AM Subject: Re: Lucene Search Applet I have Lucene working in an applet and I've seen this problem only when the jar file really was not available (typo in the jar name), which is what you'd expect. It's possible that the classpath for your application is not the same as the classpath for the applet; perhaps they're using different VMs or JREs from different locations. Try referencing the Lucene jar file in the archive attribute of the applet tag. Also, to get Lucene to work from an unsigned applet, I had to modify a few classes that call System.getProperty(), because the properties that were being requested were disallowed for applets. I think the classes were IndexWriter, FSDirectory, and BooleanQuery. --Jon On Aug 20, 2004, at 6:57 AM, Simon mcIlwaine wrote: Im a new Lucene User and I'm not too familiar with Applets either but I've been doing a bit of testing on java applet security and if im correct in saying that applets can read anything below there codebase then my problem is not a security restriction one. The error is reading java.lang.NoClassDefFoundError and the classpath is set as I have it working in a Swing App. Does someone actually have Lucene working in an Applet? Can it be done?? Please help. Thanks Simon - Original Message - From: Terry Steichen [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, August 18, 2004 4:17 PM Subject: Re: Lucene Search Applet I suspect it has to do with the security restrictions of the applet, 'cause it doesn't appear to be finding your Lucene jar file. Also, regarding the lock files, I believe you can disable the locking stuff just for purposes like yours (read-only index). Regards, Terry - Original Message - From: Simon mcIlwaine To: Lucene Users List Sent: Wednesday, August 18, 2004 11:03 AM Subject: Lucene Search Applet Im developing a Lucene CD-ROM based search which will search html pages on CD-ROM, using an applet as the UI. I know that theres a problem with lock files and also security restrictions on applets so I am using the RAMDirectory. I have it working in a Swing application however when I put it into an applet its giving me problems. It compiles but when I go to run the applet I get the error below. Can anyone help? Thanks in advance. Simon Error: Java.lang.noClassDefFoundError: org/apache/lucene/store/Directory At: Java.lang.Class.getDeclaredConstructors0(Native Method) At: Java.lang.Class.privateGetDeclaredConstructors(Class.java:1610) At: Java.lang.Class.getConstructor0(Class.java:1922) At: Java.lang.Class.newInstance0(Class.java:278) At: Java.lang.Class.newInstance(Class.java:261) At: sun.applet.AppletPanel.createApplet(AppletPanel.java:617) At: sun.applet.AppletPanel.runloader(AppletPanel.java:546) At: sun.applet.AppletPanel.run(AppletPanel.java:298) At: java.lang.Thread.run(Thread.java:534) Code: import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.TermQuery; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.store.Directory; import org.apache.lucene.index.Term; import org.apache.lucene.search.Hits; import java.awt.*; import java.awt.event.*; import javax.swing.*; import java.io.*; public class MemorialApp2 extends JApplet implements ActionListener{ JLabel prompt; JTextField input; JButton search; JPanel panel; String indexDir = C:/Java/lucene/index-list; private static RAMDirectory idx; public void init(){ Container cp = getContentPane(); panel = new JPanel(); panel.setLayout(new FlowLayout(FlowLayout.CENTER, 4, 4)); prompt = new JLabel(Keyword search:); input = new JTextField(,20); search = new JButton(Search); search.addActionListener(this); panel.add(prompt); panel.add(input); panel.add(search); cp.add(panel); } public void actionPerformed(ActionEvent e){ if (e.getSource() == search){ String surname = (input.getText()); try { findSurname(indexDir, surname); }
RE: Lucene with English and Spanish Best Practice?
Thanks for the info Grant. As for indexes, do you anticipate adding more fields later in Spanish? Is the content just a translation of the English, or do you have separate conetent in Spanish? Are your users querying in only one language (cross-lingual) or are the Spanish speakers only querying against Spanish content? Our fields are pretty much going to be one-for-one between English and Spanish (a translation of current content from English to Spanish). Something like title_en and title_sp, body_en and body_sp, keywords_en and keywords_sp. Our users will be querying cross-lingual. So I see your point, it looks like it would be easier if we added the Spanish fields to our current indexes, then we wouldn't have to filter out same results between English and Spanish indexes. I am doing Arabic and English (and have done Spanish, French, and Japanese in the past), although our cross-lingual system supports any languages that you have resources for. Did you use Snowball for the Spanish? Or is there just a Lucene Spanish Analyzer available (I couldn't find one). Or do people just use something like a plain old StandardAnalyzer to index and query Spanish content? I'm a little confused on the Snowball project, is it a multi-language Stemmer Analyzer for Lucene? We just use plan old Standard and Whitespace Analyzers now for our English content. Can we just use those same Analyzers for Spanish content? Or would it be better to use the Snowball project? thanks, chad. -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 2:16 PM To: [EMAIL PROTECTED] Subject: Re: Lucene with English and Spanish Best Practice? I think the Snowball stuff works well, although I have only used the English Porter stemmer implementation. As for indexes, do you anticipate adding more fields later in Spanish? Is the content just a translation of the English, or do you have separate conetent in Spanish? Are your users querying in only one language (cross-lingual) or are the Spanish speakers only querying against Spanish content? I am doing Arabic and English (and have done Spanish, French, and Japanese in the past), although our cross-lingual system supports any languages that you have resources for. We lean towards separate indexes, but mostly b/c they are based on separate content. The key is you have to be able to match up the analysis of the query with the analysis of the index. Having a mixed index may make this more difficult. If you have a mixed index would you filter out Spanish results that had hits from an English query? For instance, what if the query was a term that was common to both languages (banana, mosquito, etc.) or are you requiring the user to specify which fields they are searching against. I guess we really need to know more about how your user is going to be interacting. -Grant [EMAIL PROTECTED] 8/20/2004 5:27:40 PM Hello, I'm interested in any feedback from anyone who has worked through implementing Internationalization (I18N) search with Lucene or has ideas for this requirement. Currently, we're using Lucene with straight English and are looking to add Spanish to the mix (with maybe more languages to follow). This is our current IndexWriter setup utilizing the PerFieldAnalyzerWrapper: PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new StandardAnalyzer()); analyzer.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer()); analyzer.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer()); IndexWriter writer = new IndexWriter(indexDir, analyzer, create); Would people suggest we switch this over to Snowball so there are English and Spanish Analyzers and IndexWriters? Something like this: PerFieldAnalyzerWrapper analyzerEnglish = new PerFieldAnalyzerWrapper(new SnowballAnalyzer(English)); analyzerEnglish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer()); analyzerEnglish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer()); IndexWriter writerEnglish = new IndexWriter(indexDir, analyzerEnglish, create); PerFieldAnalyzerWrapper analyzerSpanish = new PerFieldAnalyzerWrapper(new SnowballAnalyzer(Spanish)); analyzerSpanish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer()); analyzerSpanish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer()); IndexWriter writerSpanish = new IndexWriter(indexDir, analyzerSpanish, create); Are multiple indexes or mirrors of each index then usually created for every language? We currently have 4 indexes that are all English. Would we then create 4 more that are Spanish? Then at search time we would determine the language and which set of indexes to search against, English or Spanish. Or another approach could be to add a Spanish field to the existing 4 indexes since most of the indexes have only one field that will be translated from English to Spanish. thanks a bunch, chad.
Re: pdfboxhelp
Your classpath should point to a directory that contains log4j.properties, not the file directly, see below. sv On Mon, 23 Aug 2004, Santosh wrote: Hi natarajan, I kept log4j.properties in the classpath my new classpath is C:\j2sdk1.4.1\lib\log4j.properties; should be C:\j2sdk1.4.1\lib\ but there is no difference in the output - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:56 AM Subject: RE: pdfboxhelp Hi Santhosh, The attached file must be in your class path. Natarajan. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:51 AM To: Lucene Users List Subject: Fw: pdfboxhelp hi karthik, did u find any solution? should I send the pdf to u? - Original Message - From: Santosh [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:23 AM Subject: Re: pdfboxhelp hi karthik, I kept log4j in the classpath , I am sending classpath variable CLASSPATH .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webc lien t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2s dk1. 4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\ j2sd k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\Program Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2s dk1. 4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl .jar ;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2s dk1. 4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.z ip;C :\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0. 6.6. jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox- 0.6. 6\external\log4j.jar please check the error - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:26 AM Subject: RE: pdfboxhelp Hi Santosh I think u'r Pdf is using Log4j package ,Try toe set the classpath for log4j.jar path. [ Is it a just a WARNING or an ERROR u are getting. Send me in u'r Configuration management Let me help u with it ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:11 AM To: Lucene Users List Cc: Ben Litchfield Subject: Re: pdfboxhelp hi karthik, I have downloaded pdfbox and kept pdfjar file in the classpath, but when I am typing following command in the command prompt I am getting the error: D:\setups\searchEngine\PDFBox-0.6.6\srcjava org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly why I am getting this error? plz help - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 9:21 AM Subject: RE: pdfboxhelp Hi To Begin with try to build Indexes offline [ out of Tomcat container] and on completing indxexes, feed u'r search with the realpath of the offline indexed folder,Start the Tomcat and then use the search on As u experiment it out u will be comfortable withrequirment of Indexing /Search.. ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:55 PM To: Lucene Users List Subject: Re: pdfboxhelp Yes I did the same. I copied all the classes into classes folder but now when I am building the index using IndexHTML the pdfs are not added to this index, only text and htmls are added to index. what changes should I do for IndexHTML.java to build index with pdf - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:54 PM Subject: RE: pdfboxhelp Hi If u are using the jar file with Web Interface for jsp/servlet dev, Place the jar file in webapps/u'rapplication/Web-inf/lib and also correct the Classpath for the present modification. 2)create u'r own package and put all u'r java files copy the java files to /Web-inf/Classes/u'r package Then use the same..;{ Karthik -Original Message- From: Santosh
RE: memory leek in lucene?
Iouli Terence, Could you create a self-sufficient test case that demonstrates the memory leak? If you can do that, please open a new bug entry in Bugzilla (the link to it is on Lucene's home page), and then attach your test case to it. Thanks! Otis --- [EMAIL PROTECTED] wrote: Yes Terence, it's exactly what I do Terence Lai [EMAIL PROTECTED] 21.08.2004 01:50 Please respond to Lucene Users List To: Lucene Users List [EMAIL PROTECTED] cc: Subject:RE: memory leek in lucene? Category: Are you calling ParallelMultiSearcher.search(Query query, Sort sort) to do your search? If so, I am currently having a similar problem. Terence Doing query against lucene I run into memomry problem, i.e. it's look like it's not giving memory back after the query have been executed. I use ParallelMultiSearcher ant call close method after results are displayed. hits=null; // Hits class if (ms!=null) ms.close(); //ParallelMultiSearcher Doesn't help. The memory getting not free. On queries like No* I get incremental memory consume of c. 20-70mb. per query. Imagine what happens with my web server... I tried also from command line and got the similar result. Am I doing wrong or miss something? Please help, I use 1.4.1 on linux box. Joel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene for Indian Languages
Hi,Srinivasa, Use StandardAnaylzer for indexing and parsing query for Indian Lang. docs. It will work. Right now we r searching on Hindi,Marathi but without specific stemmers and filters. We r plannig to develop Marathi Morphological Analyzer. Thanks, Satish. On Sun, 22 Aug 2004, srinivasa raghavan wrote: Hi all, Is Lucene API implemented for Indian contexts? I know that Lucene stemmers and filters for German and Russian Languages. I would like to know, whether there are stemmers and filters available/being developed for Indian Languages. Thanks, Rahavan. ___ Do you Yahoo!? Express yourself with Y! Messenger! Free. Download now. http://messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Search Applet
Hi Simon, Does this work? From FSDirectory api: If the system property 'disableLuceneLocks' has the String value of true, lock creation will be disabled. Otherwise, I think there was a Read-Only Directory hack: http://www.mail-archive.com/[EMAIL PROTECTED]/msg05148.html HTH, sv On Mon, 23 Aug 2004, Simon mcIlwaine wrote: Thanks Jon that works by putting the jar file in the archive attribute. Now im getting the disablelock error cause of the unsigned applet. Do I just comment out the code anywhere where System.getProperty() appears in the files that you specified and then update the JAR Archive?? Is it possible you could show me one of the hacked files so that I know what I'm modifying? Does anyone else know if there is another way of doing this without having to hack the source code? Many thanks. Simon - Original Message - From: Jon Schuster [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 2:08 AM Subject: Re: Lucene Search Applet I have Lucene working in an applet and I've seen this problem only when the jar file really was not available (typo in the jar name), which is what you'd expect. It's possible that the classpath for your application is not the same as the classpath for the applet; perhaps they're using different VMs or JREs from different locations. Try referencing the Lucene jar file in the archive attribute of the applet tag. Also, to get Lucene to work from an unsigned applet, I had to modify a few classes that call System.getProperty(), because the properties that were being requested were disallowed for applets. I think the classes were IndexWriter, FSDirectory, and BooleanQuery. --Jon On Aug 20, 2004, at 6:57 AM, Simon mcIlwaine wrote: Im a new Lucene User and I'm not too familiar with Applets either but I've been doing a bit of testing on java applet security and if im correct in saying that applets can read anything below there codebase then my problem is not a security restriction one. The error is reading java.lang.NoClassDefFoundError and the classpath is set as I have it working in a Swing App. Does someone actually have Lucene working in an Applet? Can it be done?? Please help. Thanks Simon - Original Message - From: Terry Steichen [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, August 18, 2004 4:17 PM Subject: Re: Lucene Search Applet I suspect it has to do with the security restrictions of the applet, 'cause it doesn't appear to be finding your Lucene jar file. Also, regarding the lock files, I believe you can disable the locking stuff just for purposes like yours (read-only index). Regards, Terry - Original Message - From: Simon mcIlwaine To: Lucene Users List Sent: Wednesday, August 18, 2004 11:03 AM Subject: Lucene Search Applet Im developing a Lucene CD-ROM based search which will search html pages on CD-ROM, using an applet as the UI. I know that theres a problem with lock files and also security restrictions on applets so I am using the RAMDirectory. I have it working in a Swing application however when I put it into an applet its giving me problems. It compiles but when I go to run the applet I get the error below. Can anyone help? Thanks in advance. Simon Error: Java.lang.noClassDefFoundError: org/apache/lucene/store/Directory At: Java.lang.Class.getDeclaredConstructors0(Native Method) At: Java.lang.Class.privateGetDeclaredConstructors(Class.java:1610) At: Java.lang.Class.getConstructor0(Class.java:1922) At: Java.lang.Class.newInstance0(Class.java:278) At: Java.lang.Class.newInstance(Class.java:261) At: sun.applet.AppletPanel.createApplet(AppletPanel.java:617) At: sun.applet.AppletPanel.runloader(AppletPanel.java:546) At: sun.applet.AppletPanel.run(AppletPanel.java:298) At: java.lang.Thread.run(Thread.java:534) Code: import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.TermQuery; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.store.Directory; import org.apache.lucene.index.Term; import org.apache.lucene.search.Hits; import java.awt.*; import java.awt.event.*; import javax.swing.*; import java.io.*; public class MemorialApp2 extends JApplet implements ActionListener{ JLabel prompt; JTextField input; JButton search; JPanel panel; String indexDir = C:/Java/lucene/index-list; private static RAMDirectory idx; public void init(){ Container cp = getContentPane(); panel = new JPanel();
spanish stemmer
Hello I use the Snowball jar for implement my SpanishAnalyzer. I found that the words finished in 'bol' are not stripped. For example: In spanish for say basketball, you can say basquet or basquetbol. But for SpanishStemmer are different words. Idem with voley and voleybol. Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t exist in spanish. you think that I are correct? you can change this? Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
NegativeArraySizeException when creating a new IndexSearcher
Hi Doug! Thank you very much for your answer! It solved the problem. I found an 1.3-version next to the 1.4-version and after removing the old one it works for fine now, as you said. Thanks again! Sven Date: Fri, 20 Aug 2004 14:08:57 -0700 From: Doug Cutting [EMAIL PROTECTED] Subject: NegativeArraySizeException when creating a new IndexSearcher Content-Type: text/plain; charset=us-ascii; format=flowed Looks to me like you're using an older version of Lucene on your Linux box. The code is back-compatible, it will read old indexes, but Lucene 1.3 cannot read indexes created by Lucene 1.4, and will fail in the way you describe. Doug Sven wrote: Hi! I have a problem to port a Lucene based knowledgebase from Windows to Linux. On Windows it works fine whereas I get a NegativeArraySizeException on Linux when I try to initialise a new IndexSearcher to search the index. Deleting and rebuilding the index didn't help. I checked permissions, file path and lock_dir but as far as I can say they seem to be all right. As I couldn't find another one with the same problem I guess I've overlooked sth, but I've run out of ideas. I use lucene-1.4-rc2 and tomcat 5.0.18. Can someone help me please with this or has an idea? Kind regards, Sven java.lang.NegativeArraySizeException at org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java:106) at org.apache.lucene.index.TermInfosReader.init(TermInfosReader.java:82) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:141) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:120) at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118) at org.apache.lucene.store.Lock$With.run(Lock.java:148) at org.apache.lucene.index.IndexReader.open(IndexReader.java:111) at org.apache.lucene.index.IndexReader.open(IndexReader.java:99) at org.apache.lucene.search.IndexSearcher.init(IndexSearcher.java:75) at com.sykon.knowledgebase.action.ListQueryResultAction.act(ListQueryResultActi on.java:134) at org.apache.cocoon.components.treeprocessor.sitemap.ActTypeNode.invoke(ActTyp eNode.java:159) at org.apache.cocoon.components.treeprocessor.sitemap.ActionSetNode.call(Action SetNode.java:121) at org.apache.cocoon.components.treeprocessor.sitemap.ActSetNode.invoke(ActSetN ode.java:98) at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo keNodes(AbstractParentProcessingNode.java:84) at org.apache.cocoon.components.treeprocessor.sitemap.PreparableMatchNode.invok e(PreparableMatchNode.java:165) at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo keNodes(AbstractParentProcessingNode.java:107) at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(Pipel ineNode.java:162) at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo keNodes(AbstractParentProcessingNode.java:107) at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(Pipe linesNode.java:136) at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess or.java:371) at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess or.java:312) at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNod e.java:133) at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo keNodes(AbstractParentProcessingNode.java:84) at org.apache.cocoon.components.treeprocessor.sitemap.PreparableMatchNode.invok e(PreparableMatchNode.java:165) at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo keNodes(AbstractParentProcessingNode.java:107) at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(Pipel ineNode.java:162) at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo keNodes(AbstractParentProcessingNode.java:107) at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(Pipe linesNode.java:136) at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess or.java:371) at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess or.java:312) at org.apache.cocoon.Cocoon.process(Cocoon.java:656) at org.apache.cocoon.servlet.CocoonServlet.service(CocoonServlet.java:1112) at javax.servlet.http.HttpServlet.service(HttpServlet.java:856) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:284) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:204) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher. java:742) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDis patcher.java:506) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatch er.java:443) at
Re: Lucene Search Applet
Hi Stephane, A bit of a stupid question but how do you mean set the system property disableLuceneLocks=true? Can I do it from a call from FSDirectory API or do I have to actually hack the code? Also if I do use RODirectory how do I go about using it? Do I have to update the Lucene JAR archive file with RODirectory class included as I tried using it and its not recognising the class? Many Thanks Simon - Original Message - From: Stephane James Vaucher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 2:22 PM Subject: Re: Lucene Search Applet Hi Simon, Does this work? From FSDirectory api: If the system property 'disableLuceneLocks' has the String value of true, lock creation will be disabled. Otherwise, I think there was a Read-Only Directory hack: http://www.mail-archive.com/[EMAIL PROTECTED]/msg05148.html HTH, sv On Mon, 23 Aug 2004, Simon mcIlwaine wrote: Thanks Jon that works by putting the jar file in the archive attribute. Now im getting the disablelock error cause of the unsigned applet. Do I just comment out the code anywhere where System.getProperty() appears in the files that you specified and then update the JAR Archive?? Is it possible you could show me one of the hacked files so that I know what I'm modifying? Does anyone else know if there is another way of doing this without having to hack the source code? Many thanks. Simon - Original Message - From: Jon Schuster [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 2:08 AM Subject: Re: Lucene Search Applet I have Lucene working in an applet and I've seen this problem only when the jar file really was not available (typo in the jar name), which is what you'd expect. It's possible that the classpath for your application is not the same as the classpath for the applet; perhaps they're using different VMs or JREs from different locations. Try referencing the Lucene jar file in the archive attribute of the applet tag. Also, to get Lucene to work from an unsigned applet, I had to modify a few classes that call System.getProperty(), because the properties that were being requested were disallowed for applets. I think the classes were IndexWriter, FSDirectory, and BooleanQuery. --Jon On Aug 20, 2004, at 6:57 AM, Simon mcIlwaine wrote: Im a new Lucene User and I'm not too familiar with Applets either but I've been doing a bit of testing on java applet security and if im correct in saying that applets can read anything below there codebase then my problem is not a security restriction one. The error is reading java.lang.NoClassDefFoundError and the classpath is set as I have it working in a Swing App. Does someone actually have Lucene working in an Applet? Can it be done?? Please help. Thanks Simon - Original Message - From: Terry Steichen [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, August 18, 2004 4:17 PM Subject: Re: Lucene Search Applet I suspect it has to do with the security restrictions of the applet, 'cause it doesn't appear to be finding your Lucene jar file. Also, regarding the lock files, I believe you can disable the locking stuff just for purposes like yours (read-only index). Regards, Terry - Original Message - From: Simon mcIlwaine To: Lucene Users List Sent: Wednesday, August 18, 2004 11:03 AM Subject: Lucene Search Applet Im developing a Lucene CD-ROM based search which will search html pages on CD-ROM, using an applet as the UI. I know that theres a problem with lock files and also security restrictions on applets so I am using the RAMDirectory. I have it working in a Swing application however when I put it into an applet its giving me problems. It compiles but when I go to run the applet I get the error below. Can anyone help? Thanks in advance. Simon Error: Java.lang.noClassDefFoundError: org/apache/lucene/store/Directory At: Java.lang.Class.getDeclaredConstructors0(Native Method) At: Java.lang.Class.privateGetDeclaredConstructors(Class.java:1610) At: Java.lang.Class.getConstructor0(Class.java:1922) At: Java.lang.Class.newInstance0(Class.java:278) At: Java.lang.Class.newInstance(Class.java:261) At: sun.applet.AppletPanel.createApplet(AppletPanel.java:617) At: sun.applet.AppletPanel.runloader(AppletPanel.java:546) At: sun.applet.AppletPanel.run(AppletPanel.java:298) At: java.lang.Thread.run(Thread.java:534) Code: import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import
integration of lucene with pdfbox
I have downloaded pdfbox and lucene and kept jar files in the class path, I am able to work with both of them independently but how can I integrate both regards Santosh kumar ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
Re: Lucene Search Applet
Hi, Just used the RODirectory and I'm now getting the following error: java.security.AccessControlException: access denied (java.util.PropertyPermission user.dir read) I'm reckoning that this is what Jon was on about with System.getProperty() within certain files because im using an applet. Is this correct and if so can someone show me one of the hacked files so that I know what I need to modify. Many Thanks Simon . - Original Message - From: Simon mcIlwaine [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 3:12 PM Subject: Re: Lucene Search Applet Hi Stephane, A bit of a stupid question but how do you mean set the system property disableLuceneLocks=true? Can I do it from a call from FSDirectory API or do I have to actually hack the code? Also if I do use RODirectory how do I go about using it? Do I have to update the Lucene JAR archive file with RODirectory class included as I tried using it and its not recognising the class? Many Thanks Simon - Original Message - From: Stephane James Vaucher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 2:22 PM Subject: Re: Lucene Search Applet Hi Simon, Does this work? From FSDirectory api: If the system property 'disableLuceneLocks' has the String value of true, lock creation will be disabled. Otherwise, I think there was a Read-Only Directory hack: http://www.mail-archive.com/[EMAIL PROTECTED]/msg05148.html HTH, sv On Mon, 23 Aug 2004, Simon mcIlwaine wrote: Thanks Jon that works by putting the jar file in the archive attribute. Now im getting the disablelock error cause of the unsigned applet. Do I just comment out the code anywhere where System.getProperty() appears in the files that you specified and then update the JAR Archive?? Is it possible you could show me one of the hacked files so that I know what I'm modifying? Does anyone else know if there is another way of doing this without having to hack the source code? Many thanks. Simon - Original Message - From: Jon Schuster [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 2:08 AM Subject: Re: Lucene Search Applet I have Lucene working in an applet and I've seen this problem only when the jar file really was not available (typo in the jar name), which is what you'd expect. It's possible that the classpath for your application is not the same as the classpath for the applet; perhaps they're using different VMs or JREs from different locations. Try referencing the Lucene jar file in the archive attribute of the applet tag. Also, to get Lucene to work from an unsigned applet, I had to modify a few classes that call System.getProperty(), because the properties that were being requested were disallowed for applets. I think the classes were IndexWriter, FSDirectory, and BooleanQuery. --Jon On Aug 20, 2004, at 6:57 AM, Simon mcIlwaine wrote: Im a new Lucene User and I'm not too familiar with Applets either but I've been doing a bit of testing on java applet security and if im correct in saying that applets can read anything below there codebase then my problem is not a security restriction one. The error is reading java.lang.NoClassDefFoundError and the classpath is set as I have it working in a Swing App. Does someone actually have Lucene working in an Applet? Can it be done?? Please help. Thanks Simon - Original Message - From: Terry Steichen [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, August 18, 2004 4:17 PM Subject: Re: Lucene Search Applet I suspect it has to do with the security restrictions of the applet, 'cause it doesn't appear to be finding your Lucene jar file. Also, regarding the lock files, I believe you can disable the locking stuff just for purposes like yours (read-only index). Regards, Terry - Original Message - From: Simon mcIlwaine To: Lucene Users List Sent: Wednesday, August 18, 2004 11:03 AM Subject: Lucene Search Applet Im developing a Lucene CD-ROM based search which will search html pages on CD-ROM, using an applet as the UI. I know that theres a problem with lock files and also security restrictions on applets so I am using the RAMDirectory. I have it working in a Swing application however when I put it into an applet its giving me problems. It compiles but when I go to run the applet I get the error below. Can anyone help? Thanks in advance. Simon Error: Java.lang.noClassDefFoundError:
Re: integration of lucene with pdfbox
If you can use lucene on its own then you already know how to add a lucene Document to the index. So you need to be able to take a PDF and get a lucene Document. org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument() does that for you. Ben On Mon, 23 Aug 2004, Santosh wrote: I have downloaded pdfbox and lucene and kept jar files in the class path, I am able to work with both of them independently but how can I integrate both regards Santosh kumar ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Search Applet
I haven't used it, and I'm a little confused from the code: /** ... * pIf the system property 'disableLuceneLocks' has the String value of * true, lock creation will be disabled. */ public final class FSDirectory extends Directory { private static final boolean DISABLE_LOCKS = Boolean.getBoolean(disableLuceneLocks) || Constants.JAVA_1_1; ... I don't see a System.getProperty(String). You might have to patch this, if I'm correct. This should stop the Directory from trying to use locks. HTH, sv On Mon, 23 Aug 2004, Simon mcIlwaine wrote: Hi Stephane, A bit of a stupid question but how do you mean set the system property disableLuceneLocks=true? Can I do it from a call from FSDirectory API or do I have to actually hack the code? Also if I do use RODirectory how do I go about using it? Do I have to update the Lucene JAR archive file with RODirectory class included as I tried using it and its not recognising the class? Many Thanks Simon - Original Message - From: Stephane James Vaucher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 2:22 PM Subject: Re: Lucene Search Applet Hi Simon, Does this work? From FSDirectory api: If the system property 'disableLuceneLocks' has the String value of true, lock creation will be disabled. Otherwise, I think there was a Read-Only Directory hack: http://www.mail-archive.com/[EMAIL PROTECTED]/msg05148.html HTH, sv On Mon, 23 Aug 2004, Simon mcIlwaine wrote: Thanks Jon that works by putting the jar file in the archive attribute. Now im getting the disablelock error cause of the unsigned applet. Do I just comment out the code anywhere where System.getProperty() appears in the files that you specified and then update the JAR Archive?? Is it possible you could show me one of the hacked files so that I know what I'm modifying? Does anyone else know if there is another way of doing this without having to hack the source code? Many thanks. Simon - Original Message - From: Jon Schuster [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 2:08 AM Subject: Re: Lucene Search Applet I have Lucene working in an applet and I've seen this problem only when the jar file really was not available (typo in the jar name), which is what you'd expect. It's possible that the classpath for your application is not the same as the classpath for the applet; perhaps they're using different VMs or JREs from different locations. Try referencing the Lucene jar file in the archive attribute of the applet tag. Also, to get Lucene to work from an unsigned applet, I had to modify a few classes that call System.getProperty(), because the properties that were being requested were disallowed for applets. I think the classes were IndexWriter, FSDirectory, and BooleanQuery. --Jon On Aug 20, 2004, at 6:57 AM, Simon mcIlwaine wrote: Im a new Lucene User and I'm not too familiar with Applets either but I've been doing a bit of testing on java applet security and if im correct in saying that applets can read anything below there codebase then my problem is not a security restriction one. The error is reading java.lang.NoClassDefFoundError and the classpath is set as I have it working in a Swing App. Does someone actually have Lucene working in an Applet? Can it be done?? Please help. Thanks Simon - Original Message - From: Terry Steichen [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, August 18, 2004 4:17 PM Subject: Re: Lucene Search Applet I suspect it has to do with the security restrictions of the applet, 'cause it doesn't appear to be finding your Lucene jar file. Also, regarding the lock files, I believe you can disable the locking stuff just for purposes like yours (read-only index). Regards, Terry - Original Message - From: Simon mcIlwaine To: Lucene Users List Sent: Wednesday, August 18, 2004 11:03 AM Subject: Lucene Search Applet Im developing a Lucene CD-ROM based search which will search html pages on CD-ROM, using an applet as the UI. I know that theres a problem with lock files and also security restrictions on applets so I am using the RAMDirectory. I have it working in a Swing application however when I put it into an applet its giving me problems. It compiles but when I go to run the applet I get the error below. Can anyone help? Thanks in advance. Simon Error: Java.lang.noClassDefFoundError: org/apache/lucene/store/Directory At:
Re: Lucene Search Applet
On Aug 23, 2004, at 10:48 AM, Stephane James Vaucher wrote: I haven't used it, and I'm a little confused from the code: /** ... * pIf the system property 'disableLuceneLocks' has the String value of * true, lock creation will be disabled. */ public final class FSDirectory extends Directory { private static final boolean DISABLE_LOCKS = Boolean.getBoolean(disableLuceneLocks) || Constants.JAVA_1_1; ... I don't see a System.getProperty(String). :) check the javadocs for Boolean.getBoolean() It's by far one on of the dumbest and most confusing API's ever! (basically this does a System.getProperty(disableLuceneLocks) and converts it to a boolean. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Search Applet
Thanks Erik for correcting me, I feel a bit stupid: I actually looked at the api to make sure that I wasn't in left field, but I trusted common-sense and stopped at the constructor ;) Should this property be changed in the next major release of lucene to org.apache...disableLuceneLocks? sv On Mon, 23 Aug 2004, Erik Hatcher wrote: On Aug 23, 2004, at 10:48 AM, Stephane James Vaucher wrote: I haven't used it, and I'm a little confused from the code: /** ... * pIf the system property 'disableLuceneLocks' has the String value of * true, lock creation will be disabled. */ public final class FSDirectory extends Directory { private static final boolean DISABLE_LOCKS = Boolean.getBoolean(disableLuceneLocks) || Constants.JAVA_1_1; ... I don't see a System.getProperty(String). :) check the javadocs for Boolean.getBoolean() It's by far one on of the dumbest and most confusing API's ever! (basically this does a System.getProperty(disableLuceneLocks) and converts it to a boolean. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Search Applet
On Aug 23, 2004, at 11:36 AM, Stephane James Vaucher wrote: Should this property be changed in the next major release of lucene to org.apache...disableLuceneLocks? Yes, that makes sense to put an org.apache.lucene prefix. If that is the case, it should be changed to disableLocks - no point in duplicating lucene. And if there are other changes that are needed to get Lucene to work from an applet along these lines - let us know. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene Search Applet
Hi all, The changes I made to get past the System.getProperty issues are essentially the same in the three files org.apache.lucene.index.IndexWriter, org.apache.lucene.store.FSDirectory, and org.apache.lucene.search.BooleanQuery. Change the static initializations from a form like this: public static long WRITE_LOCK_TIMEOUT = Integer.parseInt(System.getProperty(org.apache.lucene.writeLockTimeout, 1000)); to a separate declaration and static initializer block like this: public static long WRITE_LOCK_TIMEOUT; static { try { WRITE_LOCK_TIMEOUT = Integer.parseInt(System.getProperty(org.apache.lucene.writeLockTimeout, 1000)); } catch ( Exception e ) { WRITE_LOCK_TIMEOUT = 1000; } }; As before, the variables are initialized when the class is loaded, but if the System.getProperty fails, the variable still gets initialized to its default value in the catch block. You can use a separate static block for each variable, or put them all into a single static block. You could also add a setter for each variable if you want the ability to set the value separately from the class init. In the FSDirectory class, the variables DISABLE_LOCKS and LOCK_DIR are marked final, which I had to remove to do the initialization as described. I've also attached the three modified files if you want to just copy and paste. --Jon -Original Message- From: Simon mcIlwaine [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 7:37 AM To: Lucene Users List Subject: Re: Lucene Search Applet Hi, Just used the RODirectory and I'm now getting the following error: java.security.AccessControlException: access denied (java.util.PropertyPermission user.dir read) I'm reckoning that this is what Jon was on about with System.getProperty() within certain files because im using an applet. Is this correct and if so can someone show me one of the hacked files so that I know what I need to modify. Many Thanks Simon . - Original Message - From: Simon mcIlwaine [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 3:12 PM Subject: Re: Lucene Search Applet Hi Stephane, A bit of a stupid question but how do you mean set the system property disableLuceneLocks=true? Can I do it from a call from FSDirectory API or do I have to actually hack the code? Also if I do use RODirectory how do I go about using it? Do I have to update the Lucene JAR archive file with RODirectory class included as I tried using it and its not recognising the class? Many Thanks Simon - Original Message - From: Stephane James Vaucher [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 2:22 PM Subject: Re: Lucene Search Applet Hi Simon, Does this work? From FSDirectory api: If the system property 'disableLuceneLocks' has the String value of true, lock creation will be disabled. Otherwise, I think there was a Read-Only Directory hack: http://www.mail-archive.com/[EMAIL PROTECTED]/msg05148.html HTH, sv On Mon, 23 Aug 2004, Simon mcIlwaine wrote: Thanks Jon that works by putting the jar file in the archive attribute. Now im getting the disablelock error cause of the unsigned applet. Do I just comment out the code anywhere where System.getProperty() appears in the files that you specified and then update the JAR Archive?? Is it possible you could show me one of the hacked files so that I know what I'm modifying? Does anyone else know if there is another way of doing this without having to hack the source code? Many thanks. Simon - Original Message - From: Jon Schuster [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 2:08 AM Subject: Re: Lucene Search Applet I have Lucene working in an applet and I've seen this problem only when the jar file really was not available (typo in the jar name), which is what you'd expect. It's possible that the classpath for your application is not the same as the classpath for the applet; perhaps they're using different VMs or JREs from different locations. Try referencing the Lucene jar file in the archive attribute of the applet tag. Also, to get Lucene to work from an unsigned applet, I had to modify a few classes that call System.getProperty(), because the properties that were being requested were disallowed for applets. I think the classes were IndexWriter, FSDirectory, and BooleanQuery. --Jon On Aug 20, 2004, at 6:57 AM, Simon mcIlwaine wrote: Im a new Lucene User and I'm not too familiar with Applets either but I've been doing a bit of testing on java applet security and if im correct in saying that applets can read anything below there codebase then my problem is not a security
Re: Lucene for Indian Languages
Infact CJK analyzer also works well with indian languages. Since CJKAnalyzer considers the multi byte characters as special case, it works with most asian multi byte characters. I introduced CJKAnalyzer for japanese text search and we also tested with hindi and telugu languages. All our search test cases passed. Give CJKAnalyzer a try. You will find it a better analyzer than the standard (for any asian language). Praveen - Original Message - From: Satish Kagathare [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 9:20 AM Subject: Re: Lucene for Indian Languages Hi,Srinivasa, Use StandardAnaylzer for indexing and parsing query for Indian Lang. docs. It will work. Right now we r searching on Hindi,Marathi but without specific stemmers and filters. We r plannig to develop Marathi Morphological Analyzer. Thanks, Satish. On Sun, 22 Aug 2004, srinivasa raghavan wrote: Hi all, Is Lucene API implemented for Indian contexts? I know that Lucene stemmers and filters for German and Russian Languages. I would like to know, whether there are stemmers and filters available/being developed for Indian Languages. Thanks, Rahavan. ___ Do you Yahoo!? Express yourself with Y! Messenger! Free. Download now. http://messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: spanish stemmer
Do you mind sharing how you implemented your SpanishAnalyzer using Snowball? Sorry I can't help with your question. I am trying to implement Snowball Spanish or a Spanish Analyzer in Lucene. thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 8:30 AM To: Lucene Users List Subject: spanish stemmer Hello I use the Snowball jar for implement my SpanishAnalyzer. I found that the words finished in 'bol' are not stripped. For example: In spanish for say basketball, you can say basquet or basquetbol. But for SpanishStemmer are different words. Idem with voley and voleybol. Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t exist in spanish. you think that I are correct? you can change this? Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: spanish stemmer
Yes, is too easy. You need do a wrapper for spanish Snowball initilization. analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS); above the complete code. Bye, Ernesto. -- public class SpanishAnalyzer extends Analyzer { private static SnowballAnalyzer analyzer; private String SPANISH_STOP_WORDS[] = { un, una, unas, unos, uno, sobre, todo, también, tras, otro, algún, alguno, alguna, algunos, algunas, ser, es, soy, eres, somos, sois, estoy, esta, estamos, estais, estan, en, para, atras, porque, por qué, estado, estaba, ante, antes, siendo, ambos, pero, por, poder, puede, puedo, podemos, podeis, pueden, fui, fue, fuimos, fueron, hacer, hago, hace, hacemos, haceis, hacen, cada, fin, incluso, primero, desde, conseguir, consigo, consigue, consigues, conseguimos, consiguen, ir, voy, va, vamos, vais, van, vaya, bueno, ha, tener, tengo, tiene, tenemos, teneis, tienen, el, la, lo, las, los, su, aqui, mio, tuyo, ellos, ellas, nos, nosotros, vosotros, vosotras, si, dentro, solo, solamente, saber, sabes, sabe, sabemos, sabeis, saben, ultimo, largo, bastante, haces, muchos, aquellos, aquellas, sus, entonces, tiempo, verdad, verdadero, verdadera, cierto, ciertos, cierta, ciertas, intentar, intento, intenta, intentas, intentamos, intentais, intentan, dos, bajo, arriba, encima, usar, uso, usas, usa, usamos, usais, usan, emplear, empleo, empleas, emplean, ampleamos, empleais, valor, muy, era, eras, eramos, eran, modo, bien, cual, cuando, donde, mientras, quien, con, entre, sin, trabajo, trabajar, trabajas, trabaja, trabajamos, trabajais, trabajan, podria, podrias, podriamos, podrian, podriais, yo, aquel, mi, de, a, e, i, o, u}; public SpanishAnalyzer() { analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS); } public SpanishAnalyzer(String stopWords[]) { analyzer = new SnowballAnalyzer(Spanish, stopWords); } public TokenStream tokenStream(String fieldName, Reader reader) { return analyzer.tokenStream(fieldName, reader); } } - Original Message - From: Chad Small [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 3:49 PM Subject: RE: spanish stemmer Do you mind sharing how you implemented your SpanishAnalyzer using Snowball? Sorry I can't help with your question. I am trying to implement Snowball Spanish or a Spanish Analyzer in Lucene. thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 8:30 AM To: Lucene Users List Subject: spanish stemmer Hello I use the Snowball jar for implement my SpanishAnalyzer. I found that the words finished in 'bol' are not stripped. For example: In spanish for say basketball, you can say basquet or basquetbol. But for SpanishStemmer are different words. Idem with voley and voleybol. Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t exist in spanish. you think that I are correct? you can change this? Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: spanish stemmer
Ernesto, http://snowball.tartarus.org/texts/introduction.html might help w/ your understanding. The link provides basic info on why stemmer's are valuable (not necessarily any insight on how the Spanish version works). Of course, they don't solve every problem and in some cases may make things worse. A stemmer is not required to return a whole word. Hope this helps. [EMAIL PROTECTED] 8/23/2004 9:29:30 AM Hello I use the Snowball jar for implement my SpanishAnalyzer. I found that the words finished in 'bol' are not stripped. For example: In spanish for say basketball, you can say basquet or basquetbol. But for SpanishStemmer are different words. Idem with voley and voleybol. Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t exist in spanish. you think that I are correct? you can change this? Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: spanish stemmer
Excellent Ernesto. Was there a reason you used your own stop word list and not just the default constructor SnowballAnalyzer(Spanish)? thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 2:03 PM To: Lucene Users List Subject: Re: spanish stemmer Yes, is too easy. You need do a wrapper for spanish Snowball initilization. analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS); above the complete code. Bye, Ernesto. -- public class SpanishAnalyzer extends Analyzer { private static SnowballAnalyzer analyzer; private String SPANISH_STOP_WORDS[] = { un, una, unas, unos, uno, sobre, todo, también, tras, otro, algún, alguno, alguna, algunos, algunas, ser, es, soy, eres, somos, sois, estoy, esta, estamos, estais, estan, en, para, atras, porque, por qué, estado, estaba, ante, antes, siendo, ambos, pero, por, poder, puede, puedo, podemos, podeis, pueden, fui, fue, fuimos, fueron, hacer, hago, hace, hacemos, haceis, hacen, cada, fin, incluso, primero, desde, conseguir, consigo, consigue, consigues, conseguimos, consiguen, ir, voy, va, vamos, vais, van, vaya, bueno, ha, tener, tengo, tiene, tenemos, teneis, tienen, el, la, lo, las, los, su, aqui, mio, tuyo, ellos, ellas, nos, nosotros, vosotros, vosotras, si, dentro, solo, solamente, saber, sabes, sabe, sabemos, sabeis, saben, ultimo, largo, bastante, haces, muchos, aquellos, aquellas, sus, entonces, tiempo, verdad, verdadero, verdadera, cierto, ciertos, cierta, ciertas, intentar, intento, intenta, intentas, intentamos, intentais, intentan, dos, bajo, arriba, encima, usar, uso, usas, usa, usamos, usais, usan, emplear, empleo, empleas, emplean, ampleamos, empleais, valor, muy, era, eras, eramos, eran, modo, bien, cual, cuando, donde, mientras, quien, con, entre, sin, trabajo, trabajar, trabajas, trabaja, trabajamos, trabajais, trabajan, podria, podrias, podriamos, podrian, podriais, yo, aquel, mi, de, a, e, i, o, u}; public SpanishAnalyzer() { analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS); } public SpanishAnalyzer(String stopWords[]) { analyzer = new SnowballAnalyzer(Spanish, stopWords); } public TokenStream tokenStream(String fieldName, Reader reader) { return analyzer.tokenStream(fieldName, reader); } } - Original Message - From: Chad Small [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 3:49 PM Subject: RE: spanish stemmer Do you mind sharing how you implemented your SpanishAnalyzer using Snowball? Sorry I can't help with your question. I am trying to implement Snowball Spanish or a Spanish Analyzer in Lucene. thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 8:30 AM To: Lucene Users List Subject: spanish stemmer Hello I use the Snowball jar for implement my SpanishAnalyzer. I found that the words finished in 'bol' are not stripped. For example: In spanish for say basketball, you can say basquet or basquetbol. But for SpanishStemmer are different words. Idem with voley and voleybol. Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t exist in spanish. you think that I are correct? you can change this? Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: spanish stemmer
Because the SnowballAnalyzer, and SpanishStemmer don´t have a default stopword set. SnowballAnalyzer constructor: /** Builds the named analyzer with no stop words. */ public SnowballAnalyzer(String name) { this.name = name; } Note the comment. Bye, Ernesto. - Original Message - From: Chad Small [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 4:57 PM Subject: RE: spanish stemmer Excellent Ernesto. Was there a reason you used your own stop word list and not just the default constructor SnowballAnalyzer(Spanish)? thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 2:03 PM To: Lucene Users List Subject: Re: spanish stemmer Yes, is too easy. You need do a wrapper for spanish Snowball initilization. analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS); above the complete code. Bye, Ernesto. -- public class SpanishAnalyzer extends Analyzer { private static SnowballAnalyzer analyzer; private String SPANISH_STOP_WORDS[] = { un, una, unas, unos, uno, sobre, todo, también, tras, otro, algún, alguno, alguna, algunos, algunas, ser, es, soy, eres, somos, sois, estoy, esta, estamos, estais, estan, en, para, atras, porque, por qué, estado, estaba, ante, antes, siendo, ambos, pero, por, poder, puede, puedo, podemos, podeis, pueden, fui, fue, fuimos, fueron, hacer, hago, hace, hacemos, haceis, hacen, cada, fin, incluso, primero, desde, conseguir, consigo, consigue, consigues, conseguimos, consiguen, ir, voy, va, vamos, vais, van, vaya, bueno, ha, tener, tengo, tiene, tenemos, teneis, tienen, el, la, lo, las, los, su, aqui, mio, tuyo, ellos, ellas, nos, nosotros, vosotros, vosotras, si, dentro, solo, solamente, saber, sabes, sabe, sabemos, sabeis, saben, ultimo, largo, bastante, haces, muchos, aquellos, aquellas, sus, entonces, tiempo, verdad, verdadero, verdadera, cierto, ciertos, cierta, ciertas, intentar, intento, intenta, intentas, intentamos, intentais, intentan, dos, bajo, arriba, encima, usar, uso, usas, usa, usamos, usais, usan, emplear, empleo, empleas, emplean, ampleamos, empleais, valor, muy, era, eras, eramos, eran, modo, bien, cual, cuando, donde, mientras, quien, con, entre, sin, trabajo, trabajar, trabajas, trabaja, trabajamos, trabajais, trabajan, podria, podrias, podriamos, podrian, podriais, yo, aquel, mi, de, a, e, i, o, u}; public SpanishAnalyzer() { analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS); } public SpanishAnalyzer(String stopWords[]) { analyzer = new SnowballAnalyzer(Spanish, stopWords); } public TokenStream tokenStream(String fieldName, Reader reader) { return analyzer.tokenStream(fieldName, reader); } } - Original Message - From: Chad Small [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 3:49 PM Subject: RE: spanish stemmer Do you mind sharing how you implemented your SpanishAnalyzer using Snowball? Sorry I can't help with your question. I am trying to implement Snowball Spanish or a Spanish Analyzer in Lucene. thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 8:30 AM To: Lucene Users List Subject: spanish stemmer Hello I use the Snowball jar for implement my SpanishAnalyzer. I found that the words finished in 'bol' are not stripped. For example: In spanish for say basketball, you can say basquet or basquetbol. But for SpanishStemmer are different words. Idem with voley and voleybol. Not idem with futbol (football), we not say fut for futbol. But 'fut' don´t exist in spanish. you think that I are correct? you can change this? Ernesto. --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: spanish stemmer
One more question to the group. From what I have gathered, my choices for indexing and querying Spanish content are: 1. StandardAnalyzer (I read that this analyzer could be used for European languages) 2. SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS); --custom stop words from Ernesto class below Can I assume that choice 2 would be the better for Spanish content? thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 3:31 PM To: Lucene Users List Subject: Re: spanish stemmer Because the SnowballAnalyzer, and SpanishStemmer don´t have a default stopword set. SnowballAnalyzer constructor: /** Builds the named analyzer with no stop words. */ public SnowballAnalyzer(String name) { this.name = name; } Note the comment. Bye, Ernesto. - Original Message - From: Chad Small [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 4:57 PM Subject: RE: spanish stemmer Excellent Ernesto. Was there a reason you used your own stop word list and not just the default constructor SnowballAnalyzer(Spanish)? thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 2:03 PM To: Lucene Users List Subject: Re: spanish stemmer Yes, is too easy. You need do a wrapper for spanish Snowball initilization. analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS); above the complete code. Bye, Ernesto. -- public class SpanishAnalyzer extends Analyzer { private static SnowballAnalyzer analyzer; private String SPANISH_STOP_WORDS[] = { un, una, unas, unos, uno, sobre, todo, también, tras, otro, algún, alguno, alguna, algunos, algunas, ser, es, soy, eres, somos, sois, estoy, esta, estamos, estais, estan, en, para, atras, porque, por qué, estado, estaba, ante, antes, siendo, ambos, pero, por, poder, puede, puedo, podemos, podeis, pueden, fui, fue, fuimos, fueron, hacer, hago, hace, hacemos, haceis, hacen, cada, fin, incluso, primero, desde, conseguir, consigo, consigue, consigues, conseguimos, consiguen, ir, voy, va, vamos, vais, van, vaya, bueno, ha, tener, tengo, tiene, tenemos, teneis, tienen, el, la, lo, las, los, su, aqui, mio, tuyo, ellos, ellas, nos, nosotros, vosotros, vosotras, si, dentro, solo, solamente, saber, sabes, sabe, sabemos, sabeis, saben, ultimo, largo, bastante, haces, muchos, aquellos, aquellas, sus, entonces, tiempo, verdad, verdadero, verdadera, cierto, ciertos, cierta, ciertas, intentar, intento, intenta, intentas, intentamos, intentais, intentan, dos, bajo, arriba, encima, usar, uso, usas, usa, usamos, usais, usan, emplear, empleo, empleas, emplean, ampleamos, empleais, valor, muy, era, eras, eramos, eran, modo, bien, cual, cuando, donde, mientras, quien, con, entre, sin, trabajo, trabajar, trabajas, trabaja, trabajamos, trabajais, trabajan, podria, podrias, podriamos, podrian, podriais, yo, aquel, mi, de, a, e, i, o, u}; public SpanishAnalyzer() { analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS); } public SpanishAnalyzer(String stopWords[]) { analyzer = new SnowballAnalyzer(Spanish, stopWords); } public TokenStream tokenStream(String fieldName, Reader reader) { return analyzer.tokenStream(fieldName, reader); } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: spanish stemmer
Hi Chad One more question to the group. From what I have gathered, my choices for indexing and querying Spanish content are: 1. StandardAnalyzer (I read that this analyzer could be used for European languages) The StandardAnalyzer not is for European languages, is like a generic analyzer. 2. SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS); --custom stop words from Ernesto class below Can I assume that choice 2 would be the better for Spanish content? Yes, is too better. For example: In StandardAnalyzer, caminar, caminantes, camino, etc, are differents words, only return hit if the match is exactly. In SpanishAnalyzer, are the same word. This three words are conjugations of caminar. If in your index, one document have the word caminante, you can get the hit with the differents conjugations of this verb. The operation of stemmers is strip the words according to the rules of the language (spanish for us). caminar, caminantes, camino are stored as camin. (Camin not exist in spanish). This improvement the quality of hits thanks, chad. Bye, Ernesto. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 3:31 PM To: Lucene Users List Subject: Re: spanish stemmer Because the SnowballAnalyzer, and SpanishStemmer don´t have a default stopword set. SnowballAnalyzer constructor: /** Builds the named analyzer with no stop words. */ public SnowballAnalyzer(String name) { this.name = name; } Note the comment. Bye, Ernesto. - Original Message - From: Chad Small [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 4:57 PM Subject: RE: spanish stemmer Excellent Ernesto. Was there a reason you used your own stop word list and not just the default constructor SnowballAnalyzer(Spanish)? thanks, chad. -Original Message- From: Ernesto De Santis [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 2:03 PM To: Lucene Users List Subject: Re: spanish stemmer Yes, is too easy. You need do a wrapper for spanish Snowball initilization. analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS); above the complete code. Bye, Ernesto. -- public class SpanishAnalyzer extends Analyzer { private static SnowballAnalyzer analyzer; private String SPANISH_STOP_WORDS[] = { un, una, unas, unos, uno, sobre, todo, también, tras, otro, algún, alguno, alguna, algunos, algunas, ser, es, soy, eres, somos, sois, estoy, esta, estamos, estais, estan, en, para, atras, porque, por qué, estado, estaba, ante, antes, siendo, ambos, pero, por, poder, puede, puedo, podemos, podeis, pueden, fui, fue, fuimos, fueron, hacer, hago, hace, hacemos, haceis, hacen, cada, fin, incluso, primero, desde, conseguir, consigo, consigue, consigues, conseguimos, consiguen, ir, voy, va, vamos, vais, van, vaya, bueno, ha, tener, tengo, tiene, tenemos, teneis, tienen, el, la, lo, las, los, su, aqui, mio, tuyo, ellos, ellas, nos, nosotros, vosotros, vosotras, si, dentro, solo, solamente, saber, sabes, sabe, sabemos, sabeis, saben, ultimo, largo, bastante, haces, muchos, aquellos, aquellas, sus, entonces, tiempo, verdad, verdadero, verdadera, cierto, ciertos, cierta, ciertas, intentar, intento, intenta, intentas, intentamos, intentais, intentan, dos, bajo, arriba, encima, usar, uso, usas, usa, usamos, usais, usan, emplear, empleo, empleas, emplean, ampleamos, empleais, valor, muy, era, eras, eramos, eran, modo, bien, cual, cuando, donde, mientras, quien, con, entre, sin, trabajo, trabajar, trabajas, trabaja, trabajamos, trabajais, trabajan, podria, podrias, podriamos, podrian, podriais, yo, aquel, mi, de, a, e, i, o, u}; public SpanishAnalyzer() { analyzer = new SnowballAnalyzer(Spanish, SPANISH_STOP_WORDS); } public SpanishAnalyzer(String stopWords[]) { analyzer = new SnowballAnalyzer(Spanish, stopWords); } public TokenStream tokenStream(String fieldName, Reader reader) { return analyzer.tokenStream(fieldName, reader); } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.737 / Virus Database: 491 - Release Date: 11/08/2004 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene for Indian Languages
Hi Satish, Thank you satish for the pointers. Actually, I am able to search Indian Language data by storing the content in the index in ISCII encoding. When I search, the search word(s) is also converted into ISCII encoded word(s) and hit the lucene index for search. It works pretty fine. But was just wondering if any of the stemmers and filters are available. How are you searching on Hindi and Marathi? In which encoding you are storing the data? Can you provide me some details about the same? Thanks, Raghavan. --- Satish Kagathare [EMAIL PROTECTED] wrote: Hi,Srinivasa, Use StandardAnaylzer for indexing and parsing query for Indian Lang. docs. It will work. Right now we r searching on Hindi,Marathi but without specific stemmers and filters. We r plannig to develop Marathi Morphological Analyzer. Thanks, Satish. On Sun, 22 Aug 2004, srinivasa raghavan wrote: Hi all, Is Lucene API implemented for Indian contexts? I know that Lucene stemmers and filters for German and Russian Languages. I would like to know, whether there are stemmers and filters available/being developed for Indian Languages. Thanks, Rahavan. ___ Do you Yahoo!? Express yourself with Y! Messenger! Free. Download now. http://messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___ Do you Yahoo!? Win 1 of 4,000 free domain names from Yahoo! Enter now. http://promotions.yahoo.com/goldrush - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]