RE: WebLucene 0.4 released: added full featured demo(dump data php scripts and demo data in Chinese)
Hi, I am using the downloaded weblucene. I have started my tomcat server and trying to search by clicking on the search button but it says the search page cannot be found. Also, I cannot find it in the package. Can anyone help? Am I missing anything? -Original Message- From: Che Dong [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 17, 2003 1:53 AM To: Lucene Users List Subject: Re: WebLucene 0.4 released: added full featured demo(dump data php scripts and demo data in Chinese) sorry, demo address is: http://www.blogchina.com/weblucene/ Che, Dong - Original Message - From: "Che Dong" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, December 17, 2003 1:33 AM Subject: WebLucene 0.4 released: added full featured demo(dump data php scripts and demo data in Chinese) > http://sourceforge.net/projects/weblucene/ > > WebLucene: > Lucene search engine XML interface, provided sax based indexing, indexing sequence based result sorting and xml output with highlight support. > > The key features: > 1 The bi-gram based CJK support: org/apache/lucene/analysis/cjk/CJKTokenizer, The CJKTokenizer support Chinese Japanese and Korean with Westen language simultaneously. > > 2 DocID based result sorting: org/apache/lucene/search/IndexOrderSearcher > > 3 xml output: com/chedong/weblucene/search/DOMSearcher > > 4 sax based indexing: com/chedong/weblucene/index/SAXIndexer > > 5 token based highlighter: > reverse StopTokenzier: > org/apache/lucene/anlysis/HighlightAnalyzer.java > HighlightFilter.java > with abstract: > com/chedong/weblucene/search/WebluceneHighlighter > > 6 A simplified query parser: > google like syntax with term limit > org/apache/lucene/queryParser/SimpleQueryParser > modified from early version of Lucene :) > > 7 Add full featured demo (including dump script and sample data) runs on: http://www.blogchina.com/weblucene/ > > Regards > > > Che Dong > http://www.chedong.com/tech/weblucene.html > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Web Lucene classes.
Hi, When I downloaded the web lucene source classes and all, I did not see the classes directory at all as instructed in install.txt. Anyone knows how to get the whole package of classes for web lucene as in a jar file? When I type the command: Java IndexRunner I get the following message: Exception in thread "main" java.lang.NoClassDefFoundError: IndexRunner I have set the classpath for Web Lucene as in the path to the class files but it still does not work. Please advise.
RE: Web Lucene Question.
I am using the beta version. When I typed the command: ant -version I have the following: Apache Ant version 1.6beta3 compiled on December 5 2003 I am downloading the previous to be if there is improvements? -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Sunday, December 14, 2003 12:42 AM To: Lucene Users List Subject: Re: Web Lucene Question. On Saturday, December 13, 2003, at 11:20 AM, Tun Lin wrote: > Hi, > > I have tried to type the following at Windows command line at > weblucene > directory: > > ant build > > Everything seems to work fine except the following error: Everything works fine but it fails miserably?! :) > > java.lang.InstantiationException: org.apache.tools.ant.Main This says Ant did not even launch. I'm not sure which "weblucene" you mean here - the built-in demo? My guess is your Ant installation has issues. What does "ant -version" tell you? How about "ant -diagnostics"? And finally "ant -projecthelp -verbose" Erik - aka Mr. Ant - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Web Lucene Question.
Hi, I have tried to type the following at Windows command line at weblucene directory: ant build Everything seems to work fine except the following error: java.lang.InstantiationException: org.apache.tools.ant.Main at java.lang.Class.newInstance0(Class.java:293) at java.lang.Class.newInstance(Class.java:261) at org.apache.tools.ant.launch.Launcher.run(Launcher.java:214) at org.apache.tools.ant.launch.Launcher.main(Launcher.java:90) I have set the necessary classpath but still the error mentioned above. Can anyone help?
Chinese input.
Hi, Has anyone tried Chinese input text on Lucene for searching? Hope to hear from someone soon. :-)
RE: XMLIndexingDemo.
Or supports all xml files in that particular directory? -Original Message- From: Tun Lin [mailto:[EMAIL PROTECTED] Sent: Thursday, December 04, 2003 6:27 PM To: Lucene user list Subject: XMLIndexingDemo. Hi, I have tried the XMLIndexingDemo. It only supports indexing one xml file at a time and delete the old one. Also, I customerInfo tag can have only 1 . Is there an open source that supports 1 customerInfo tag with many ? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
XMLIndexingDemo.
Hi, I have tried the XMLIndexingDemo. It only supports indexing one xml file at a time and delete the old one. Also, I customerInfo tag can have only 1 . Is there an open source that supports 1 customerInfo tag with many ?
RE: New Lucene-powered Website (TO Tun Lin)
Hi, It's ok. Take your time. :-) -Original Message- From: lhelper [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 03, 2003 9:29 AM To: Lucene Users List; [EMAIL PROTECTED] Subject: Re: New Lucene-powered Website (TO Tun Lin) > Anyone has the install instructions for windows to run luceneweb? I > cannot even see the first page when I start tomcat though I have the > weblucene in the webapps directory. > > Can anyone help? Please. > it's a bug with the tar ball of weblucene, we'll fix the bug asap, and some little index will be added into the tar ball. please be patient! Good Luck! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Translation.
Hi, Can anyone translate this text for me? I cannot understand the instructions. Please help! Thanks. === || | LUCY 1.1 | readme.txtUltimo aggiornamento: 18/03/2003 || STRUTTURA Lucy 1.1-> Lucene 1.2 -> HTMLParser 1.2 -> PdfBox 0.5.6 -> wvWare 0.7.2-3 -> xlhtml 0.4.9 -> antiword 0.33 -> Xpdf 2.01 -> Snowball 0.1 -> NGramJ 01.12.11 -> it.corila.lucy -> IndexAll.java -> SearchIndex.java -> HTMLDocument.java -> PDFDocument.java -> ExternalParser.java -> ItalianStemFilter.java -> EnglishStemFilter.java -> ApostropheFilter.java -> IndexAnalyzer.java -> SearchAnalyzer.java -> LanguageCategorizer -> NgramjCategorizer.java DESCRIZIONE Lucy e' in grado di indicizzare tutti i files con estensione txt, html, pdf, doc, ppt, xls contenuti in una cartella base e nelle sue sottocartelle. Consente ricerche da linea di comando DOS oppure mediante interfaccia web. Gestisce testi in Italiano e Inglese con procedure di elaborazione lessicale specifiche. SISTEMI OPERATIVI SUPPORTATI Windows 98 / Windows 2000 / Windows XP REQUISITI DI SISTEMA Nessuno tranne i permessi necessari alla scrittura di files su una cartella del sistema Per utilizzare il modulo di ricerca con interfaccia web e' necessario disporre di Apache Tomcat, versione 3 o 4. INSTALLAZIONE Lanciare la procedura automatica di installazione Lucy1.1.exe, oppure scompattare il file Lucy1.1.zip in una cartella (NB: il percorso non deve contenere spazi). L'applicazione utilizza di default una propria java virtual machine. E' possibile utilizzarne un'altra gia' installata nel sistema modificando il valore della variabile MYJAVAPATH nel file jvm.bat In questo caso la cartella jre puo' essere eliminata per ridurre l'occupazione di spazio su disco di circa 40 MBytes. CONFIGURAZIONE Modificare i valori delle variabili contenute nel file properties.txt, nella cartella base dell'applicazione: lucy.path: cartella in cui si e' installata l'applicazione log.files.dir: cartella in cui verranno creati i files di log del.temp.files: eliminazione dei files temporanei alla fine dell'indicizzazione (yes/no) doc.parser: parser da utilizzare per i files .doc (antiword/wvware) pdf.parser: parser da utilizzare per i files .pdf (xpdf/pdfbox) index.dir: cartella in cui verranno memorizzati gli indici index.name: nome dell'indice che deve essere creato indexing.folder: cartella che deve essere indicizzata IMPORTANTE: tutti i percorsi devono essere indicati utilizzando come separatori di directory due barre rovesciate (\\) anziche' una barra singola MODALITA' DI UTILIZZO I tre files batch nella cartella base dell'applicazione sono attivabili direttamente da Windows con doppio click. indicizza.bat crea un indice aggiorna.bat modifica un indice cerca.bat effettua ricerche su un indice Tutti i parametri necessari (nome e localizzazione dell'indice, percorso della cartella da indicizzare) vanno specificati a priori nel file properties.txt E' possibile in alternativa utilizzare le procedure da riga di comando dos, sempre con la modifica preventiva del file properties.txt In questo caso inoltre, mediante la sintassi: cerca percorso-indice si possono effettuare ricerche su altri indici creati in precedenza, senza modificare il file properties.txt NOTE SULL'UTILIZZO DEI PARSERS I valori di default impostati per i parsers sono quelli consigliati per la prima esecuzione dell'indicizzazione. In un secondo momento e' possibile modificarli ai valori alternativi e procedere a un aggiornamento dell'indice. In questo modo i documenti che non sono stati indicizzati per errori di parsing vengono processati anche dai due parsers alternativi. Qualora il processo di parsing portasse ad errori di sistema che costringessero a interrompere il processo di indicizzazione, l'utente potra' riprendere l'indicizzazione da dove si e' interrotta utilizzando la procedura di aggiornamento, avendo cura di rimuovere - prima di lanciarla - il file write.lock dalla cartella che contiene i file dell'indice. UTILIZZO COME WEB APPLICATION PER RICERCHE TESTUALI Se nel sistema e' installato il motore per servlet e jsp Apache Tomcat, e' possibile effettuare le ricerche sugli indici tramite un'inte
RE: SearchBlox J2EE Search Component Version 1.1 released
Anyone knows a search engine that supports xml formats? -Original Message- From: Robert Selvaraj [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 03, 2003 12:36 AM To: Lucene Users List Subject: Re: SearchBlox J2EE Search Component Version 1.1 released No. The formats supported by SearchBlox given here : http://www.searchblox.com/faqs/question.php?qstId=5 Tun Lin wrote: > Hi, > > Does it support xml? > > -Original Message- > From: Tate Avery [mailto:[EMAIL PROTECTED] > Sent: Tuesday, December 02, 2003 11:45 PM > To: Lucene Users List > Subject: RE: SearchBlox J2EE Search Component Version 1.1 released > > > If you buy it, apparently: > http://www.searchblox.com/buy.html > > > > -Original Message- > From: Tun Lin [mailto:[EMAIL PROTECTED] > Sent: Tuesday, December 02, 2003 10:43 AM > To: 'Lucene Users List'; [EMAIL PROTECTED] > Subject: RE: SearchBlox J2EE Search Component Version 1.1 released > > > Hi, > > Just a feedback. > > SearchBlox can only search for html files. Will Searchblox support > pdf, xml and word documents in future? It will be perfect if it can > support all document types mentioned above. > > -Original Message- > From: Robert Selvaraj [mailto:[EMAIL PROTECTED] > Sent: Tuesday, December 02, 2003 10:42 PM > To: Lucene Users List; [EMAIL PROTECTED] > Subject: SearchBlox J2EE Search Component Version 1.1 released > > SearchBlox is a J2EE search component that enables you to add search > functionality to your applications, intranets or portals in a matter of minutes. > SearchBlox uses Lucene Search API and features integrated HTTP and > File System crawlers, support for different document formats, support > for indexing and searching content in 15 languages and customizable > search results, all controlled from a browser-based Admin Console. > > > Main features in this update: > = > - Asian language support. SearchBlox now supports Japanese, Chinese > Simplified, Chinese Traditional and Korean language content. > - Performance enhancements to search > - Improved Hit Highlighting > > SearchBlox is available as a Web Archive (WAR) and is deployable on > any Servlet 2.3/JSP 1.2 compliant server. SearchBlox Getting-Started > Guides are available for the following servers: > > JBoss - http://www.searchblox.com/gettingstarted_jboss.html > Jetty - http://www.searchblox.com/gettingstarted_jetty.html > JRun - http://www.searchblox.com/gettingstarted_jrun.html > Pramati - http://www.searchblox.com/gettingstarted_pramati.html > Resin - http://www.searchblox.com/gettingstarted_resin.html > Tomcat - http://www.searchblox.com/gettingstarted_tomcat.html > Weblogic - http://www.searchblox.com/gettingstarted_weblogic.html > Websphere - http://www.searchblox.com/gettingstarted_websphere.html > > > The SearchBlox FREE Edition is available free of charge and can index > up to 1000 HTML documents. > > The software can be downloaded from http://www.searchblox.com > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SearchBlox J2EE Search Component Version 1.1 released
Hi, Does it support xml? -Original Message- From: Tate Avery [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2003 11:45 PM To: Lucene Users List Subject: RE: SearchBlox J2EE Search Component Version 1.1 released If you buy it, apparently: http://www.searchblox.com/buy.html -Original Message- From: Tun Lin [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2003 10:43 AM To: 'Lucene Users List'; [EMAIL PROTECTED] Subject: RE: SearchBlox J2EE Search Component Version 1.1 released Hi, Just a feedback. SearchBlox can only search for html files. Will Searchblox support pdf, xml and word documents in future? It will be perfect if it can support all document types mentioned above. -Original Message- From: Robert Selvaraj [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2003 10:42 PM To: Lucene Users List; [EMAIL PROTECTED] Subject: SearchBlox J2EE Search Component Version 1.1 released SearchBlox is a J2EE search component that enables you to add search functionality to your applications, intranets or portals in a matter of minutes. SearchBlox uses Lucene Search API and features integrated HTTP and File System crawlers, support for different document formats, support for indexing and searching content in 15 languages and customizable search results, all controlled from a browser-based Admin Console. Main features in this update: = - Asian language support. SearchBlox now supports Japanese, Chinese Simplified, Chinese Traditional and Korean language content. - Performance enhancements to search - Improved Hit Highlighting SearchBlox is available as a Web Archive (WAR) and is deployable on any Servlet 2.3/JSP 1.2 compliant server. SearchBlox Getting-Started Guides are available for the following servers: JBoss - http://www.searchblox.com/gettingstarted_jboss.html Jetty - http://www.searchblox.com/gettingstarted_jetty.html JRun - http://www.searchblox.com/gettingstarted_jrun.html Pramati - http://www.searchblox.com/gettingstarted_pramati.html Resin - http://www.searchblox.com/gettingstarted_resin.html Tomcat - http://www.searchblox.com/gettingstarted_tomcat.html Weblogic - http://www.searchblox.com/gettingstarted_weblogic.html Websphere - http://www.searchblox.com/gettingstarted_websphere.html The SearchBlox FREE Edition is available free of charge and can index up to 1000 HTML documents. The software can be downloaded from http://www.searchblox.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SearchBlox J2EE Search Component Version 1.1 released
Hi, Just a feedback. SearchBlox can only search for html files. Will Searchblox support pdf, xml and word documents in future? It will be perfect if it can support all document types mentioned above. -Original Message- From: Robert Selvaraj [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2003 10:42 PM To: Lucene Users List; [EMAIL PROTECTED] Subject: SearchBlox J2EE Search Component Version 1.1 released SearchBlox is a J2EE search component that enables you to add search functionality to your applications, intranets or portals in a matter of minutes. SearchBlox uses Lucene Search API and features integrated HTTP and File System crawlers, support for different document formats, support for indexing and searching content in 15 languages and customizable search results, all controlled from a browser-based Admin Console. Main features in this update: = - Asian language support. SearchBlox now supports Japanese, Chinese Simplified, Chinese Traditional and Korean language content. - Performance enhancements to search - Improved Hit Highlighting SearchBlox is available as a Web Archive (WAR) and is deployable on any Servlet 2.3/JSP 1.2 compliant server. SearchBlox Getting-Started Guides are available for the following servers: JBoss - http://www.searchblox.com/gettingstarted_jboss.html Jetty - http://www.searchblox.com/gettingstarted_jetty.html JRun - http://www.searchblox.com/gettingstarted_jrun.html Pramati - http://www.searchblox.com/gettingstarted_pramati.html Resin - http://www.searchblox.com/gettingstarted_resin.html Tomcat - http://www.searchblox.com/gettingstarted_tomcat.html Weblogic - http://www.searchblox.com/gettingstarted_weblogic.html Websphere - http://www.searchblox.com/gettingstarted_websphere.html The SearchBlox FREE Edition is available free of charge and can index up to 1000 HTML documents. The software can be downloaded from http://www.searchblox.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SearchBlox J2EE Search Component Version 1.1 released
Wow. Bravo. This is a fantasic search component. Thank you for providing this information. :-) Three cheers! -Original Message- From: Robert Selvaraj [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2003 10:42 PM To: Lucene Users List; [EMAIL PROTECTED] Subject: SearchBlox J2EE Search Component Version 1.1 released SearchBlox is a J2EE search component that enables you to add search functionality to your applications, intranets or portals in a matter of minutes. SearchBlox uses Lucene Search API and features integrated HTTP and File System crawlers, support for different document formats, support for indexing and searching content in 15 languages and customizable search results, all controlled from a browser-based Admin Console. Main features in this update: = - Asian language support. SearchBlox now supports Japanese, Chinese Simplified, Chinese Traditional and Korean language content. - Performance enhancements to search - Improved Hit Highlighting SearchBlox is available as a Web Archive (WAR) and is deployable on any Servlet 2.3/JSP 1.2 compliant server. SearchBlox Getting-Started Guides are available for the following servers: JBoss - http://www.searchblox.com/gettingstarted_jboss.html Jetty - http://www.searchblox.com/gettingstarted_jetty.html JRun - http://www.searchblox.com/gettingstarted_jrun.html Pramati - http://www.searchblox.com/gettingstarted_pramati.html Resin - http://www.searchblox.com/gettingstarted_resin.html Tomcat - http://www.searchblox.com/gettingstarted_tomcat.html Weblogic - http://www.searchblox.com/gettingstarted_weblogic.html Websphere - http://www.searchblox.com/gettingstarted_websphere.html The SearchBlox FREE Edition is available free of charge and can index up to 1000 HTML documents. The software can be downloaded from http://www.searchblox.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: New Lucene-powered Website
Hi, I am very keen on using the New luceneweb. Has anyone managed to run luceneweb successfully on Windows? The instructions in luceneweb seems to support unix more than windows. Anyone has the install instructions for windows to run luceneweb? I cannot even see the first page when I start tomcat though I have the weblucene in the webapps directory. Can anyone help? Please. -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 02, 2003 8:35 PM To: Lucene Users List Subject: Re: New Lucene-powered Website Could you add a Lucene logo somewhere on your search results, as noted here: http://jakarta.apache.org/lucene/docs/powered.html ? Thanks! Otis --- Ulrich Mayring <[EMAIL PROTECTED]> wrote: > Hello, > > we (DENIC) are the world's second largest domain registry (.de-zone > has almost 6.9 million domains) and are using Lucene to index and > search our website in a high-traffic scenario. Most of our web pages > are available in English in addition to our native language German. If > you want to try our Lucene-based search engine, please start here: > > http://www.denic.de/en/special/index.jsp > > Use the input field on the page to search our website. Don't use the > input field at the top right, that is only for searching domains in > our domain database, it has nothing to do with Lucene. > > The indexes for German and English are seperate, so you should find > only English pages from that page. > > A somewhat interesting feature is the summarizer, on the results page > > you'll get a short summary of the page. These are not hand-written > blurbs, rather they are generated automatically from the HTML pages at > indexing time. I'd be especially interested in improvement suggestions > in this area. > > Naturally, the automatically generated texts don't have the same > quality as hand-written ones. But they're better than nothing and in > my eyes more useful than Google-style excerpts. How many times has it > happened to you that the Google excerpt doesn't really tell you > anything, because it's totally out of context? Summaries tell you what > the whole page is about, irregardless of the context within which your > search terms may > > appear. After reading the summary you should (hopefully) be able to > decide whether the page contains the info you're looking for. > Comments > welcome! > > We're using the snowball stemmers/analyzers for German and English, > custom stopword lists and the HTML parser from the Sourceforge > htmlparser project. Apart from that it's vanilla Lucene. > > cheers, > > Ulrich > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > __ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: WebLucene 0.3 release:support CJK, use sax based indexing, docID based result sorting and xml format output with highlighting support.
Hi Che Dong, The install.txt that you have in the package, the part on preparing the environment, can you include the setup for windows because I think what you wrote in install.txt is for UNIX setup? I still cannot get my system working. Please help. Thanks. -Original Message- From: Che Dong [mailto:[EMAIL PROTECTED] Sent: Monday, December 01, 2003 4:21 PM To: Lucene Users List; [EMAIL PROTECTED] Subject: Re: WebLucene 0.3 release:support CJK, use sax based indexing, docID based result sorting and xml format output with highlighting support. build..properties.default # - # WebLucene BUILD PROPERTIES # - jsdk_jar=/usr/local/resin/lib/jsdk23.jar # Home directory of JavaCC javacc.home = /usr/java/javacc/bin # modify following on Windows # jsdk_jar=c:\\resin\\lib\\jsdk23.jar # javacc.home = c:\\java\\javacc\\bin javacc.zip.dir = ${javacc.home}/lib javacc.zip = ${javacc.zip.dir}/JavaCC.zip Che, Dong - Original Message - From: "Tun Lin" <[EMAIL PROTECTED]> To: "'Lucene Developers List'" <[EMAIL PROTECTED]>; "'Lucene Users List'" <[EMAIL PROTECTED]> Sent: Monday, December 01, 2003 11:34 AM Subject: RE: WebLucene 0.3 release:support CJK, use sax based indexing, docID based result sorting and xml format output with highlighting support. > Hi, > > Do you have the install.txt for windows XP setup of the WebLucene? It seems that > the install.txt is only for UNIX setup. > > Thanks. > > -Original Message- > From: Che Dong [mailto:[EMAIL PROTECTED] > Sent: Sunday, November 30, 2003 9:57 PM > To: Lucene Developers List; Lucene Users List > Subject: WebLucene 0.3 release:support CJK, use sax based indexing, docID based > result sorting and xml format output with highlighting support. > > http://sourceforge.net/projects/weblucene/ > > WebLucene: > Lucene search engine XML interface, provided sax based indexing, indexing > sequence based result sorting and xml output with highlight support.The > CJKTokenizer support Chinese Japanese and Korean with Westen language > simultaneously. > > The key features: > 1 The bi-gram based CJK support: org/apache/lucene/analysis/cjk/CJKTokenizer > > 2 docID based result sorting: org/apache/lucene/search/IndexOrderSearcher > > 3 xml output: com/chedong/weblucene/search/DOMSearcher > > 4 sax based indexing: com/chedong/weblucene/index/SAXIndexer > > 5 token based highlighter: > reverse StopTokenzier: > org/apache/lucene/anlysis/HighlightAnalyzer.java > HighlightFilter.java > with abstract: > com/chedong/weblucene/search/WebluceneHighlighter > > 6 A simplified query parser: > google like syntax with term limit > org/apache/lucene/queryParser/SimpleQueryParser > modified from early version of Lucene :) > > Regards > > Che, Dong > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: WebLucene 0.3 release:support CJK, use sax based indexing, docID based result sorting and xml format output with highlighting support.
Hi, Do you have the install.txt for windows XP setup of the WebLucene? It seems that the install.txt is only for UNIX setup. Thanks. -Original Message- From: Che Dong [mailto:[EMAIL PROTECTED] Sent: Sunday, November 30, 2003 9:57 PM To: Lucene Developers List; Lucene Users List Subject: WebLucene 0.3 release:support CJK, use sax based indexing, docID based result sorting and xml format output with highlighting support. http://sourceforge.net/projects/weblucene/ WebLucene: Lucene search engine XML interface, provided sax based indexing, indexing sequence based result sorting and xml output with highlight support.The CJKTokenizer support Chinese Japanese and Korean with Westen language simultaneously. The key features: 1 The bi-gram based CJK support: org/apache/lucene/analysis/cjk/CJKTokenizer 2 docID based result sorting: org/apache/lucene/search/IndexOrderSearcher 3 xml output: com/chedong/weblucene/search/DOMSearcher 4 sax based indexing: com/chedong/weblucene/index/SAXIndexer 5 token based highlighter: reverse StopTokenzier: org/apache/lucene/anlysis/HighlightAnalyzer.java HighlightFilter.java with abstract: com/chedong/weblucene/search/WebluceneHighlighter 6 A simplified query parser: google like syntax with term limit org/apache/lucene/queryParser/SimpleQueryParser modified from early version of Lucene :) Regards Che, Dong - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
FileDocument.java
Hi Lucene experts, Can you help on this? I have included the following code in FileDocument to print out the summary but I have funny output like: The result after searching, the summary is displayed as below: ÐÏࡱá>þÿ UWþÿÿÿTÿ ÿ FileInputStream is = new FileInputStream(f); try { Reader reader = new BufferedReader(new InputStreamReader(is)); char [] buf = new char[512]; reader.read(buf); String a = new String(buf, 0, 510); doc.add(Field.Text("contents", reader)); doc.add(Field.UnIndexed("summary", a ) );// return the document }catch (IOException e) { e.printStackTrace(); }
RE: Lucene refresh index function (incremental indexing).
I have deleted one of the text files I indexed and did the following command: java -Dlog4j.configuration=file:///c:/jarfiles/log4j.properties -Dlog4j.debug=true org.pdfbox.searchengine.lucene.IndexFiles -index c:\\index .. root=.. java.io.IOException: Lock obtain timed out at org.apache.lucene.store.Lock.obtain(Lock.java:97) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:173) at org.apache.lucene.index.IndexWriter.(IndexWriter.java:151) at org.pdfbox.searchengine.lucene.IndexFiles.index(IndexFiles.java:158) at org.pdfbox.searchengine.lucene.IndexFiles.main(IndexFiles.java:141) I used the IndexFiles.java in PDFBox and I got the error msg mentioned above but if I used the IndexFiles in Lucene, it is working fine. Anyone can help? -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 26, 2003 7:39 PM To: Tun Lin Subject: Re: Lucene refresh index function (incremental indexing). Am Mittwoch, 26. November 2003 04:38 schrieb Tun Lin: > When I integrate with PDFBox, I cannot update, delete or change the > filename anymore. If I did any of the above, I will get a message: > Lock obtain timed out. I think you have to make sure that IndexReader and IndexWriter are not open at the same time, as they will block each other. Regards Daniel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: log4j.properties
I have integrated Lucene and PDFBox and tried the following command to index files java -Dlog4j.configuration=log4j.xml org.pdfbox.searchengine.lucene.IndexFiles -create -index c:\\index .. But I have the following error message: log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly. Anyone can help? -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 26, 2003 5:19 PM To: Lucene Users List Subject: Re: log4j.properties What does this have to do with Lucene? On Wednesday, November 26, 2003, at 01:04 AM, Tun Lin wrote: > I have created the following "log4j.properties" and put it in your > classpath but it still has that error. Anyone can help? > > log4j.rootCategory=stdout > > log4j.appender.stdout=org.apache.log4j.ConsoleAppender > log4j.appender.stdout.layout=org.apache.log4j.PatternLayout > log4j.appender.stdout.layout.ConversionPattern=%d %c - %m%n > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
log4j.properties
I have created the following "log4j.properties" and put it in your classpath but it still has that error. Anyone can help? log4j.rootCategory=stdout log4j.appender.stdout=org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout=org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern=%d %c - %m%n
Chinese input.
Hi, May I know how do I analyse Chinese input from Chinese text in Lucene? Do I use Analyser function in Lucene? If yes, how to go about using it?
RE: Lucene refresh index function (incremental indexing).
When I integrate with PDFBox, I cannot update, delete or change the filename anymore. If I did any of the above, I will get a message: Lock obtain timed out. Anyone can help? -Original Message- From: Pleasant, Tracy [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 11:42 PM To: Lucene Users List; [EMAIL PROTECTED] Subject: RE: Lucene refresh index function (incremental indexing). I was able to get PDFBox to work with my JSP webpages. I think you will have to in a way write your own code to do the PDF files (while still calling the Lucene functions) doc = LucenePDFDocument.getDocument(file); -Original Message- From: Tun Lin [mailto:[EMAIL PROTECTED] Sent: Monday, November 24, 2003 11:07 PM To: 'Lucene Users List' Subject: RE: Lucene refresh index function (incremental indexing). Does it support indexing the contents of pdf files? I have found one project called PDFBox that can be integrated with Lucene to search inside of the pdf files. Currently, Lucene can only search for the pdf filename. I tried with PDFBox and I got the following message when I typed the command: java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly. Can anyone advise? -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 5:01 AM To: Lucene Users List Subject: Re: Lucene refresh index function (incremental indexing). Tun Lin wrote: > These are the steps I took: > > 1) I compile all the files in a particular directory using the command: > java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. > , putting all the indexed files in c:\\index. > 2) Everytime, I added an additional file in that directory. I need to > reindex/recompile that directory to generate the indexes again. As the > directory gets larger, the indexing takes a longer time. > > My question is how do I generate the indexes automatically everytime a > new document is added in that directory without me recompiling everytime manually? To update, try removing the '-create' from the command line. The demo code supports incremental updates. It will re-scan the directory and figure out which files have changed, what new files have appeared and which previously existing files have been removed. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene refresh index function (incremental indexing).
Does it support indexing the contents of pdf files? I have found one project called PDFBox that can be integrated with Lucene to search inside of the pdf files. Currently, Lucene can only search for the pdf filename. I tried with PDFBox and I got the following message when I typed the command: java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly. Can anyone advise? -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 5:01 AM To: Lucene Users List Subject: Re: Lucene refresh index function (incremental indexing). Tun Lin wrote: > These are the steps I took: > > 1) I compile all the files in a particular directory using the command: > java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. > , putting all the indexed files in c:\\index. > 2) Everytime, I added an additional file in that directory. I need to > reindex/recompile that directory to generate the indexes again. As the > directory gets larger, the indexing takes a longer time. > > My question is how do I generate the indexes automatically everytime a > new document is added in that directory without me recompiling everytime manually? To update, try removing the '-create' from the command line. The demo code supports incremental updates. It will re-scan the directory and figure out which files have changed, what new files have appeared and which previously existing files have been removed. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene refresh index function (incremental indexing).
Will the final version 1.3 include an application that does the incremental updates automatically? -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 5:01 AM To: Lucene Users List Subject: Re: Lucene refresh index function (incremental indexing). Tun Lin wrote: > These are the steps I took: > > 1) I compile all the files in a particular directory using the command: > java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. > , putting all the indexed files in c:\\index. > 2) Everytime, I added an additional file in that directory. I need to > reindex/recompile that directory to generate the indexes again. As the > directory gets larger, the indexing takes a longer time. > > My question is how do I generate the indexes automatically everytime a > new document is added in that directory without me recompiling everytime manually? To update, try removing the '-create' from the command line. The demo code supports incremental updates. It will re-scan the directory and figure out which files have changed, what new files have appeared and which previously existing files have been removed. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene version 1.3.
I am now using 1.3RC2. -Original Message- From: Scott Smith [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 25, 2003 4:04 AM To: 'Lucene Users List'; '[EMAIL PROTECTED]' Subject: RE: Lucene version 1.3. If you had to be production in January, would you be using 1.3RC2 or 1.2? -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Monday, November 24, 2003 4:03 AM To: Lucene Users List; [EMAIL PROTECTED] Subject: Re: Lucene version 1.3. Sorry, no firm date. However, 1.3 RC2 is pretty solid, so I suggest you just use that until 1.3 final is out. Otis --- Tun Lin <[EMAIL PROTECTED]> wrote: > Hi, > > Anyone knows when the full version of Lucene version 1.3 will be > released? > > Please advise. > > Thanks. > __ Do you Yahoo!? Free Pop-Up Blocker - Get it now http://companion.yahoo.com/ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene refresh index function (incremental indexing).
Can you elaborate on "you don't compile directory using Lucene"? These are the steps I took: 1) I compile all the files in a particular directory using the command: java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. , putting all the indexed files in c:\\index. 2) Everytime, I added an additional file in that directory. I need to reindex/recompile that directory to generate the indexes again. As the directory gets larger, the indexing takes a longer time. My question is how do I generate the indexes automatically everytime a new document is added in that directory without me recompiling everytime manually? How does Lucene detect new documents to be added to the indexes? I saw the codes but the indexes are only generated for that directory only after I use the command mentioned above. Is there a code or built in function that allows Lucene to detect and build the indexes on its own? -Original Message- From: Victor Hadianto [mailto:[EMAIL PROTECTED] Sent: Monday, November 24, 2003 1:07 PM To: Lucene Users List Subject: Re: Lucene refresh index function (incremental indexing). Ah .. ic, But you don't need to do that even if you can do it. Lucene does incremental indexing. So you would create a new program to add your document manually using IndexWriter, not blatting the index and doing it again. Seems like you just trying out Lucene, I suggest having a look in the source code of IndexHTML and you will see that there is no magic there, it just traverse the directory and index the HTML file one by one using IndexWriter. BTW you don't compile directory using Lucene .. :) /victor - Original Message - From: "Tun Lin" <[EMAIL PROTECTED]> To: "'Lucene Users List'" <[EMAIL PROTECTED]> Sent: Monday, November 24, 2003 3:45 PM Subject: RE: Lucene refresh index function (incremental indexing). > Hi, > > Thanks for your reply. > > What if I add a new document into the directory that I have compiled using the > following command: java org.apache.lucene.demo.IndexHTML -create -index > {index-dir} .. > > Will it automatically reindex like I did manually to reflect the new document > being added in that particular directory? > > Please advise. > > -Original Message- > From: Victor Hadianto [mailto:[EMAIL PROTECTED] > Sent: Monday, November 24, 2003 12:36 PM > To: Lucene Users List > Subject: Re: Lucene refresh index function (incremental indexing). > > > I delete the old ones and add them again manually. But how do I > > reindex > the > > documents automatically without doing it manually? > > You don't need to reindex the documents again. Lucene does incremental indexing. > Just add your document to the index and that's it. You need to create a new > IndexSearcher to reflect the new changes into the your search result. > > /victor > > > > > > -Original Message- > > From: Dror Matalon [mailto:[EMAIL PROTECTED] > > Sent: Sunday, November 23, 2003 4:44 AM > > To: Lucene Users List > > Subject: Re: Lucene refresh index function (incremental indexing). > > > > Hi, > > > > It's not clear what you mean when you say "refresh indexes" or > "re-compiling." > > If you're adding new documents just use the add() method. If you are > replacing > > documents, you need to first delete the old ones and then add them again. > Look > > at the mailing list archive for this, since it's been discussed > > several > times. > > > > > > On Sun, Nov 23, 2003 at 12:22:40AM +0800, Tun Lin wrote: > > > Hi, > > > > > > I am new here. > > > > > > May I know how to refresh indexes in Lucene immediately after new > > > documents have been added without re-compiling again to reindex the > > > documents in that particular directory? > > > > > > Thanks. > > > > -- > > Dror Matalon > > Zapatec Inc > > 1700 MLK Way > > Berkeley, CA 94709 > > http://www.fastbuzz.com > > http://www.zapatec.com > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > - > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene refresh index function (incremental indexing).
Hi, Thanks for your reply. What if I add a new document into the directory that I have compiled using the following command: java org.apache.lucene.demo.IndexHTML -create -index {index-dir} .. Will it automatically reindex like I did manually to reflect the new document being added in that particular directory? Please advise. -Original Message- From: Victor Hadianto [mailto:[EMAIL PROTECTED] Sent: Monday, November 24, 2003 12:36 PM To: Lucene Users List Subject: Re: Lucene refresh index function (incremental indexing). > I delete the old ones and add them again manually. But how do I > reindex the > documents automatically without doing it manually? You don't need to reindex the documents again. Lucene does incremental indexing. Just add your document to the index and that's it. You need to create a new IndexSearcher to reflect the new changes into the your search result. /victor > > -Original Message- > From: Dror Matalon [mailto:[EMAIL PROTECTED] > Sent: Sunday, November 23, 2003 4:44 AM > To: Lucene Users List > Subject: Re: Lucene refresh index function (incremental indexing). > > Hi, > > It's not clear what you mean when you say "refresh indexes" or "re-compiling." > If you're adding new documents just use the add() method. If you are replacing > documents, you need to first delete the old ones and then add them again. Look > at the mailing list archive for this, since it's been discussed > several times. > > > On Sun, Nov 23, 2003 at 12:22:40AM +0800, Tun Lin wrote: > > Hi, > > > > I am new here. > > > > May I know how to refresh indexes in Lucene immediately after new > > documents have been added without re-compiling again to reindex the > > documents in that particular directory? > > > > Thanks. > > -- > Dror Matalon > Zapatec Inc > 1700 MLK Way > Berkeley, CA 94709 > http://www.fastbuzz.com > http://www.zapatec.com > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene version 1.3.
Hi, Anyone knows when the full version of Lucene version 1.3 will be released? Please advise. Thanks.
RE: Lucene refresh index function (incremental indexing).
Hi, I delete the old ones and add them again manually. But how do I reindex the documents automatically without doing it manually? Please advise. -Original Message- From: Dror Matalon [mailto:[EMAIL PROTECTED] Sent: Sunday, November 23, 2003 4:44 AM To: Lucene Users List Subject: Re: Lucene refresh index function (incremental indexing). Hi, It's not clear what you mean when you say "refresh indexes" or "re-compiling." If you're adding new documents just use the add() method. If you are replacing documents, you need to first delete the old ones and then add them again. Look at the mailing list archive for this, since it's been discussed several times. On Sun, Nov 23, 2003 at 12:22:40AM +0800, Tun Lin wrote: > Hi, > > I am new here. > > May I know how to refresh indexes in Lucene immediately after new > documents have been added without re-compiling again to reindex the > documents in that particular directory? > > Thanks. -- Dror Matalon Zapatec Inc 1700 MLK Way Berkeley, CA 94709 http://www.fastbuzz.com http://www.zapatec.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene refresh index function (incremental indexing).
Hi, I am new here. May I know how to refresh indexes in Lucene immediately after new documents have been added without re-compiling again to reindex the documents in that particular directory? Thanks.