ANN: Luke v. 0.5 released
Hello fellow Luceners, I'm pleased to announce that new release of Luke is now available. You can download it from: http://www.getopt.org/luke/ This release uses Lucene 1.4-rc4. This release also represents a major step forward - many new exciting features have been added. The feature I consider the most important in this release is extensibility - there is a plugin framework, and a sample plugin is provided in the distribution - I encourage you to write more. Here's a short summary of changes in this release: * NEW: Added support for Term Vectors. * NEW: Added a plugin framework - plugins found on classpath are detected automatically and added to the new Plugins tab. Note however that for now plugins autoloading doesn't quite work when using Java WebStart - an alternative mechanism is also provided. Plugins have full access to the application context. Please read JavaDoc for LukePlugin.java for more information. * NEW: A sample plugin is provided, based on Mark Harwood's tool for analyzing analyzers. * NEW: all tables support resizable columns now. Some dialogs are also resizable. * NEW: Added Reconstruct functionality. Using this function users can reconstruct the content of all (also unstored) fields of a document. This function uses a brute-force approach, so it may be slow for larger indexes ( 500,000 docs). * NEW: Added pseudo-edit functionality. New document editor dialog allows to modify reconstructed documents, and add or replace the original ones. * FIX: problems with MRU list solved, and a framework for handling preferences introduced. * FIX: the list of available Analyzers is now dynamically populated from the classpath, using the same method as in the AnalyzerTool plugin. This also doesn't work in WebStart, so a fallback to a static list is provided. * FIX: restructured source repository and added Ant build script. Please note that as a result of the package name changes, the main class is now org.getopt.luke.Luke, and NOT as before luke.Luke. I felt that all these changes merited a slight change in name, from Lucene Index Browser to Lucene Index Toolbox, as this seems to better reflect the current functionality of the tool. Any feedback, patches for enhancements or bufixes are welcome! If you want to provide a patch, please use diff -bdruN - this will help me to integrate it. Thank you! -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Clustering question: searching two diferent indexes
(re-directing to lucene-user list) Albert, If I understand your question correctly... You could run a query like the one you gave on both indices, but if one of them contains documents that have only one of those fields (cluster), then there will never be any matches in the second index. However, why not leave your big index along, add documents to a new, smaller index, and then merge them periodically. I may be off with this; it sounds like this is what you want to do, but I'm not certain I understood you fully. Otis --- Albert Vila [EMAIL PROTECTED] wrote: Hi all, I was wondering If I can search using the MultiSearcher over two diferent indexes at the same time (with diferent fields). I've got one big index, with the code, title, content, language, etc fields (new documents are added incrementally). Now, I have to introduce a clustering field. The problem is that I have to update the whole index each time the clusters change, and I have no enought time to do it (I wanna check for new clusters every 10 minuts and I spent 25 minutes to reindex the whole index). A query example could be: language:0 and title:java and cluster:0 Can I leave the big index whitout any changes and create a new index with only the following fields, code and cluster, and perform the searches using this two indexes? I think I cannot do that without changing the code. It would need a postprocess, matching all returning codes from index 1 with index 2. Anyone have a solution for this problem? I would appreciate that. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Fix for advanced tokenizers and highlighter problem
[EMAIL PROTECTED] wrote: I think this version of the highlighter should provide a fix: http://www.inperspective.com/lucene/hilite2beta.zip Before I update the version of the highlighter in the sandbox I'd appreciate feedback from those troubled with the issues to do with overlapping tokens in token streams (Erik, Dave, Bruce?) 1st pass of testing - yes, this does indeed fix the problem. I've realized I may want to modify my Analyer now too. I was focusing on the Token position increment instead of the offset. For something like the case where I broken HashMap into 3 tokens: Hash, Map, HashMap, I was returning the same start/end offsets for all of them (thus a search on Map ends up with all of HashMap being highlighted). Probably more correct is to return offsets within the orig larger token so that you can see exactly where your term matched. I'll update my code and then put up a site that demonstrates this. thx, Dave I added my own test analyzer to the Junit test that introduces synonyms into the token stream at the same position as the trigger token and the new code works OK for me with that analyzer. The fix means I needed to change the Formatter interface - this now takes a TokenGroup object instead of a token because that can be used to represent a single token OR a sequence of overlapping tokens. I dont think most people have needed to create custom Formatter implementations so I dont think this redefined interface should break too much existing code (if any). Cheers Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Delete Indexed from Merged Document
Hi Otis The link u have specified displays on how to update an Indexed File [ Deleting the Old and then updating with new Ones'] But My Question to be more Specific is : - When we MERGED more then 2 Indexed files [using writer.addIndexes(luceneDirs)] , In such a case How to Delete one of the Indexed files from the MERGED Index in order to Insert an new updated one Please have some sample code snippet in this regard.. with regards Karthik -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 22, 2004 12:52 PM To: Lucene Users List Subject: Re: Delete Indexed from Merged Document Hello Karthik, Here is the answer: http://www.jguru.com/faq/view.jsp?EID=492423 Otis --- Karthik N S [EMAIL PROTECTED] wrote: Dev Guys Apologies Please How Do I DELETE an Indexed Document from a MERGED Index File Can Some body Write me some Code Snippets on this... please With Regards Karthik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ANN: Luke v. 0.5 released
Hi Andrzej! I congratulate on the successful version. RussianAnalyzer works with my indexes, but there are problems with some words. These problem words are found only WildCard a method. Besides AnalizerTool works with these words without problems. There is one more small discrepancy on webpage http://www.getopt.org/luke/ - Remember to put both JARs on your classpath, e.g.: java-classpath luke.jar; lucene.jar org.getopt.luke. Luke + Remember to put both JARs on your classpath, e.g.: java-classpath luke.jar:lucene.jar org.getopt.luke. Luke Regards, Vladimir. On Tue, 22 Jun 2004 14:10:50 +0200 Andrzej Bialecki [EMAIL PROTECTED] wrote: Hello fellow Luceners, I'm pleased to announce that new release of Luke is now available. You can download it from: http://www.getopt.org/luke/ This release uses Lucene 1.4-rc4. This release also represents a major step forward - many new exciting features have been added. The feature I consider the most important in this release is extensibility - there is a plugin framework, and a sample plugin is provided in the distribution - I encourage you to write more. Here's a short summary of changes in this release: * NEW: Added support for Term Vectors. * NEW: Added a plugin framework - plugins found on classpath are detected automatically and added to the new Plugins tab. Note however that for now plugins autoloading doesn't quite work when using Java WebStart - an alternative mechanism is also provided. Plugins have full access to the application context. Please read JavaDoc for LukePlugin.java for more information. * NEW: A sample plugin is provided, based on Mark Harwood's tool for analyzing analyzers. * NEW: all tables support resizable columns now. Some dialogs are also resizable. * NEW: Added Reconstruct functionality. Using this function users can reconstruct the content of all (also unstored) fields of a document. This function uses a brute-force approach, so it may be slow for larger indexes ( 500,000 docs). * NEW: Added pseudo-edit functionality. New document editor dialog allows to modify reconstructed documents, and add or replace the original ones. * FIX: problems with MRU list solved, and a framework for handling preferences introduced. * FIX: the list of available Analyzers is now dynamically populated from the classpath, using the same method as in the AnalyzerTool plugin. This also doesn't work in WebStart, so a fallback to a static list is provided. * FIX: restructured source repository and added Ant build script. Please note that as a result of the package name changes, the main class is now org.getopt.luke.Luke, and NOT as before luke.Luke. I felt that all these changes merited a slight change in name, from Lucene Index Browser to Lucene Index Toolbox, as this seems to better reflect the current functionality of the tool. Any feedback, patches for enhancements or bufixes are welcome! If you want to provide a patch, please use diff -bdruN - this will help me to integrate it. Thank you! -- Best regards, Andrzej Bialecki - Software Architect, System Integration Specialist CEN/ISSS EC Workshop, ECIMF project chair EU FP6 E-Commerce Expert/Evaluator - FreeBSD developer (http://www.freebsd.org) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]