ANN: Luke v. 0.5 released

2004-06-22 Thread Andrzej Bialecki
Hello fellow Luceners,
I'm pleased to announce that new release of Luke is now available. You 
can download it from:

http://www.getopt.org/luke/
This release uses Lucene 1.4-rc4.
This release also represents a major step forward - many new exciting 
features have been added. The feature I consider the most important in 
this release is extensibility - there is a plugin framework, and a 
sample plugin is provided in the distribution - I encourage you to write 
more.

Here's a short summary of changes in this release:
* NEW: Added support for Term Vectors.
* NEW: Added a plugin framework - plugins found on classpath are
detected automatically and added to the new Plugins tab.
Note however that for now plugins autoloading doesn't quite
work when using Java WebStart - an alternative mechanism is also
provided. Plugins have full access to the application context.
Please read JavaDoc for LukePlugin.java for more information.
* NEW: A sample plugin is provided, based on Mark Harwood's tool
for analyzing analyzers.
* NEW: all tables support resizable columns now. Some dialogs are
also resizable.
* NEW: Added Reconstruct functionality. Using this function users
can reconstruct the content of all (also unstored) fields of a
document. This function uses a brute-force approach, so it may
be slow for larger indexes ( 500,000 docs).
* NEW: Added pseudo-edit functionality. New document editor dialog
allows to modify reconstructed documents, and add or replace the
original ones.
* FIX: problems with MRU list solved, and a framework for handling
preferences introduced.
* FIX: the list of available Analyzers is now dynamically populated
from the classpath, using the same method as in the AnalyzerTool
plugin. This also doesn't work in WebStart, so a fallback to a
static list is provided.
* FIX: restructured source repository and added Ant build script.
Please note that as a result of the package name changes, the main class 
is now org.getopt.luke.Luke, and NOT as before luke.Luke.

I felt that all these changes merited a slight change in name, from 
Lucene Index Browser to Lucene Index Toolbox, as this seems to 
better reflect the current functionality of the tool.

Any feedback, patches for enhancements or bufixes are welcome! If you 
want to provide a patch, please use diff -bdruN - this will help me to 
integrate it. Thank you!

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Clustering question: searching two diferent indexes

2004-06-22 Thread Otis Gospodnetic
(re-directing to lucene-user list)

Albert,

If I understand your question correctly... You could run a query like
the one you gave on both indices, but if one of them contains documents
that have only one of those fields (cluster), then there will never be
any matches in the second index.

However, why not leave your big index along, add documents to a new,
smaller index, and then merge them periodically.  I may be off with
this; it sounds like this is what you want to do, but I'm not certain I
understood you fully.

Otis

--- Albert Vila [EMAIL PROTECTED] wrote:
 Hi all,
 
 I was wondering If I can search using the MultiSearcher over two 
 diferent indexes at the same time (with diferent fields).
 I've got one big index, with the code, title, content, language, etc 
 fields (new documents are added incrementally). Now, I have to
 introduce 
 a clustering field. The problem is that I have to update the whole
 index 
 each time the clusters change, and I have no enought time to do it (I
 
 wanna check for new clusters every 10 minuts and I spent 25 minutes
 to 
 reindex the whole index).
 A query example could be: language:0 and title:java and cluster:0
 
 Can I leave the big index whitout any changes and create a new index 
 with only the following fields, code and cluster, and perform the 
 searches using this two indexes? I think I cannot do that without 
 changing the code. It would need a postprocess, matching all
 returning 
 codes from index 1 with index 2.
 
 Anyone have a solution for this problem? I would appreciate that.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Fix for advanced tokenizers and highlighter problem

2004-06-22 Thread David Spencer
[EMAIL PROTECTED] wrote:
I think this version of the highlighter should provide a fix: http://www.inperspective.com/lucene/hilite2beta.zip
Before I update the version of the highlighter in the sandbox I'd appreciate feedback from those troubled 
with the issues to do with overlapping tokens in token streams (Erik, Dave, Bruce?)
1st pass of testing - yes, this does indeed fix the problem.
I've realized I may want to modify my Analyer now too.
I was focusing on the Token position increment instead of the offset.
For something like the case where I broken HashMap into 3 tokens: 
Hash, Map, HashMap, I was returning the same start/end offsets for 
 all of them (thus a search on Map ends up with all of HashMap 
being highlighted). Probably more correct is to return offsets within 
the orig larger token so that you can see exactly where your term 
matched. I'll update my code and then put up a site that demonstrates this.

thx,
 Dave

I added my own test analyzer to the Junit test that introduces synonyms into the 
token stream at the same
position as the trigger token and the new code works OK for me with that analyzer.
The fix means I needed to change the Formatter interface - this now takes a TokenGroup object instead 
of a token because that can be used to represent a single token OR a sequence of overlapping tokens.
I dont think most people have needed to create custom Formatter implementations so I dont think this
redefined interface should break too much existing code (if any).

Cheers
Mark
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: Delete Indexed from Merged Document

2004-06-22 Thread Karthik N S
Hi

   Otis

   The  link u have specified  displays on how to update an Indexed File [
Deleting the Old  and then updating with new Ones']

  But My Question to be more Specific is : -

  When we MERGED more then 2 Indexed files  [using
writer.addIndexes(luceneDirs)] , In such  a case How to
   Delete one of the Indexed files from the MERGED Index in
order to Insert  an new updated one

  Please have some sample code snippet in this regard..


with regards
Karthik

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Tuesday, June 22, 2004 12:52 PM
To: Lucene Users List
Subject: Re: Delete Indexed from Merged Document


Hello Karthik,

Here is the answer: http://www.jguru.com/faq/view.jsp?EID=492423

Otis

--- Karthik N S [EMAIL PROTECTED] wrote:


   Dev Guys

   Apologies Please

 How Do I DELETE  an  Indexed Document from a MERGED Index File

Can Some body Write me some Code Snippets on this... please

 With Regards
 Karthik

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ANN: Luke v. 0.5 released

2004-06-22 Thread Vladimir Yuryev
Hi Andrzej!
I congratulate on the successful version. RussianAnalyzer works with 
my indexes, but there are problems with some words. These problem 
words are found only WildCard a method. Besides AnalizerTool works 
with these words without problems.

There is one more small discrepancy on webpage 
http://www.getopt.org/luke/
- Remember to put both JARs on your classpath, e.g.: java-classpath 
luke.jar; lucene.jar org.getopt.luke. Luke
+ Remember to put both JARs on your classpath, e.g.: java-classpath 
luke.jar:lucene.jar org.getopt.luke. Luke

Regards,
Vladimir.
On Tue, 22 Jun 2004 14:10:50 +0200
 Andrzej Bialecki [EMAIL PROTECTED] wrote:
Hello fellow Luceners,
I'm pleased to announce that new release of Luke is now available. 
You can download it from:

http://www.getopt.org/luke/
This release uses Lucene 1.4-rc4.
This release also represents a major step forward - many new exciting 
features have been added. The feature I consider the most important 
in this release is extensibility - there is a plugin framework, and a 
sample plugin is provided in the distribution - I encourage you to 
write more.

Here's a short summary of changes in this release:
* NEW: Added support for Term Vectors.
* NEW: Added a plugin framework - plugins found on classpath are
	detected automatically and added to the new Plugins tab.
	Note however that for now plugins autoloading doesn't quite
	work when using Java WebStart - an alternative mechanism is also
	provided. Plugins have full access to the application context.
	Please read JavaDoc for LukePlugin.java for more information.
* NEW: A sample plugin is provided, based on Mark Harwood's 
tool
	for analyzing analyzers.
* NEW: all tables support resizable columns now. Some dialogs 
are
	also resizable.
* NEW: Added Reconstruct functionality. Using this function 
users
	can reconstruct the content of all (also unstored) fields of a
	document. This function uses a brute-force approach, so it may
	be slow for larger indexes ( 500,000 docs).
* NEW: Added pseudo-edit functionality. New document editor 
dialog
	allows to modify reconstructed documents, and add or replace the
	original ones.
* FIX: problems with MRU list solved, and a framework for 
handling
	preferences introduced.
* FIX: the list of available Analyzers is now dynamically 
populated
	from the classpath, using the same method as in the AnalyzerTool
	plugin. This also doesn't work in WebStart, so a fallback to a
	static list is provided.
* FIX: restructured source repository and added Ant build 
script.

Please note that as a result of the package name changes, the main 
class is now org.getopt.luke.Luke, and NOT as before luke.Luke.

I felt that all these changes merited a slight change in name, from 
Lucene Index Browser to Lucene Index Toolbox, as this seems to 
better reflect the current functionality of the tool.

Any feedback, patches for enhancements or bufixes are welcome! If you 
want to provide a patch, please use diff -bdruN - this will help me 
to integrate it. Thank you!

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]