Re: TestRangeQuery.java

2004-10-20 Thread Vladimir Yuryev
Hi,
If tests work without eclipse it is necessary to adjust correctly 
their performance in eclipse:-)

Good luke,
Vladimir.
On Wed, 20 Oct 2004 19:10:45 +0530
 Karthik N S [EMAIL PROTECTED] wrote:
Hi
Does anybody have Trouble in Compiling   TestRangeQuery.java   in 
Eclipse
3.0 IDE,

[
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene/src/test/org/apache/lucene/
search ]
Seem's there is an Error
doc.add(new Field(id, id + docCount, Field.Store.YES,
Field.Index.UN_TOKENIZED));
doc.add(new Field(content, content, Field.Store.NO,
Field.Index.TOKENIZED));

Compiler Error is with Lucene1.4.1, Win O/s
Field.Store.yes is not Found


Thx in Advance
 WITH WARM REGARDS
 HAVE A NICE DAY
 [ N.S.KARTHIK]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: MultiSearcher to Indexing.

2004-08-13 Thread Vladimir Yuryev
Hi Joel,
Parallel method requests a lot of memories, but MultiSearcher requires 
slightly less memory.
Tomcat at the large loading gives out a system mistake.
If you have other experience of work that please tell me.

Regards,
Vladimir.
On Fri, 13 Aug 2004 12:22:34 +0200
 [EMAIL PROTECTED] wrote:
Hi Vladimir,
Can You please explain me what's the benefit of this approach and why
_pickles_?
I f I understand correctly the ?-n was how to make query run 
paralelly on
multi-index. Is ParalelMultiSearcher not for this?

Regards,
Joel

  
   
 Vladimir Yuryev 
  
   
 [EMAIL PROTECTED]To:   Lucene 
Users List [EMAIL PROTECTED] 

 ru  cc: 
  

  Subject:  Re: 
MultiSearcher to Indexing. 
  
 13.08.2004 06:45 
  

 Please respond toCategory: 
 |-| 
   
 Lucene Users| ( ) 
Action needed   | 
 List| ( ) 
Decision needed | 
  | ( ) 
General Information | 
  |-| 
   
  
   
  
   


Natarajan,
MultiSeacher - it is well, but this a way have pickles.
Example, but it is not sample:
public Query combine(Query[] queries) throws IOException {
if (expandedQueries.length  2) {
return queries[0];
}
Query[] combined = new Query[2];
combined[0] = new BooleanQuery();
BooleanQuery.setMaxClauseCount(1);
for (int i = 0; i  queries.length; i++) {
combined[1] = queries[i];
if (queries[i] instanceof BooleanQuery ||
queries[i] instanceof MultiTermQuery ||
queries[i] instanceof PrefixQuery ||
queries[i] instanceof RangeQuery) {
combined[0] = Query.mergeBooleanQueries(combined);
} else if (queries[i] instanceof PhraseQuery) {
Term[] queryTerms = 
((PhraseQuery)queries[i]).getTerms();
for (int j = 0; j  queryTerms.length; j++) {
TermQuery q = new TermQuery(queryTerms[j]);
((BooleanQuery)combined[0]).add(q, true, false);
}
} else ((BooleanQuery)combined[0]).add(queries[i], true,
false);
}
return combined[0];
}

...
Searcher[] searchers = new IndexSearcher[indexName.length];
for(int i=0;iindexName.length;i++) {
searchers[i] = new IndexSearcher(indexName[i]);
}
MultiSearcher multiSearcher=new MultiSearcher(searchers);
QueryParser qp = new QueryParser(FIELD_CONTENTS, analyzer);
query = QueryParser.parse(queryString, FIELD_CONTENTS, 
analyzer);
hits = multiSearcher.search(query);
IndexReader reader[] = new IndexReader[indexName.length];
Query[] expandedQueries=new Query[indexName.length];
for(int i=0;iindexName.length;i++){
IndexReader.open(indexName[i]);
expandedQueries[i]=query.rewrite(reader[i]);
}
query=combine(expandedQueries);
...

Best regards,
Vladimir.


On Thu, 12 Aug 2004 20:51:13 +0530
 Natarajan.T [EMAIL PROTECTED] wrote:
Thanks for your response.
Ok I can understand the concept . if you have any sample code pls
sent it to me.
You have any idea about Parallel Searcher pls share to me.
-Original Message-
From: Terence Lai [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 12, 2004 8:40 PM
To: Lucene Users List
Subject: RE: MultiSearcher to Indexing.
This is how I do it:
IndexSearcher[] is = new IndexSearcher[2];
is[0] = new IndexSearcher(IndexDir1); // first index folder
is[1] = new IndexSearcher(IndexDir2); // second index folder
MultiSearcher searcher = new MultiSearcher(is);
searcher.search(query);
I think that the MulitSearcher is only doing sequential search.
Alternately, you can use ParallelMultiSearcher which allows you to do
the search in parallel.
Hope this helps,
Terence

FYI
I have an Indexing

Re: MultiSearcher to Indexing.

2004-08-13 Thread Vladimir Yuryev
Thanks.
Vladimir.
On Fri, 13 Aug 2004 14:03:50 +0200
 [EMAIL PROTECTED] wrote:
Well, actually we use a nice piece of hardware with a lot of memory 
and 2
cpu under linux.

As front-end we use coldfusion application.  Seems to be ok, but we 
have
not  tested on huge load yet. Let You know if smth. gettig wrong.

Regards,
J.

  
   
 Vladimir Yuryev 
  
   
 [EMAIL PROTECTED]To:   Lucene 
Users List [EMAIL PROTECTED] 

 ru  cc: 
  

  Subject:  Re: 
MultiSearcher to Indexing. 
  
 13.08.2004 13:06 
  

 Please respond toCategory: 
 |-| 
   
 Lucene Users| ( ) 
Action needed   | 
 List| ( ) 
Decision needed | 
  | ( ) 
General Information | 
  |-| 
   
  
   
  
   


Hi Joel,
Parallel method requests a lot of memories, but MultiSearcher 
requires
slightly less memory.
Tomcat at the large loading gives out a system mistake.
If you have other experience of work that please tell me.

Regards,
Vladimir.
On Fri, 13 Aug 2004 12:22:34 +0200
 [EMAIL PROTECTED] wrote:
Hi Vladimir,
Can You please explain me what's the benefit of this approach and why
_pickles_?
I f I understand correctly the ?-n was how to make query run
paralelly on
multi-index. Is ParalelMultiSearcher not for this?
Regards,
Joel


 Vladimir Yuryev
 [EMAIL PROTECTED]To:   Lucene
Users List [EMAIL PROTECTED]
 ru  cc:
  Subject:  Re:
MultiSearcher to Indexing.
 13.08.2004 06:45
 Please respond toCategory:
 |-|
 Lucene Users| ( )
Action needed   |
 List| ( )
Decision needed |
  | ( )
General Information |
|-|




Natarajan,
MultiSeacher - it is well, but this a way have pickles.
Example, but it is not sample:
public Query combine(Query[] queries) throws IOException {
if (expandedQueries.length  2) {
return queries[0];
}
Query[] combined = new Query[2];
combined[0] = new BooleanQuery();
BooleanQuery.setMaxClauseCount(1);
for (int i = 0; i  queries.length; i++) {
combined[1] = queries[i];
if (queries[i] instanceof BooleanQuery ||
queries[i] instanceof MultiTermQuery ||
queries[i] instanceof PrefixQuery ||
queries[i] instanceof RangeQuery) {
combined[0] = Query.mergeBooleanQueries(combined);
} else if (queries[i] instanceof PhraseQuery) {
Term[] queryTerms =
((PhraseQuery)queries[i]).getTerms();
for (int j = 0; j  queryTerms.length; j++) {
TermQuery q = new TermQuery(queryTerms[j]);
((BooleanQuery)combined[0]).add(q, true, false);
}
} else ((BooleanQuery)combined[0]).add(queries[i], true,
false);
}
return combined[0];
}
...
Searcher[] searchers = new IndexSearcher[indexName.length];
for(int i=0;iindexName.length;i++) {
searchers[i] = new IndexSearcher(indexName[i]);
}
MultiSearcher multiSearcher=new MultiSearcher(searchers);
QueryParser qp = new QueryParser(FIELD_CONTENTS, analyzer);
query = QueryParser.parse(queryString, FIELD_CONTENTS,
analyzer);
hits = multiSearcher.search(query);
IndexReader reader[] = new IndexReader[indexName.length];
Query[] expandedQueries=new Query[indexName.length];
for(int i=0;iindexName.length;i++){
expandedQueries[i

Re: MultiSearcher to Indexing.

2004-08-12 Thread Vladimir Yuryev
Natarajan,
MultiSeacher - it is well, but this a way have pickles.
Example, but it is not sample:
public Query combine(Query[] queries) throws IOException { 
if (expandedQueries.length  2) {
return queries[0];
}
Query[] combined = new Query[2];
combined[0] = new BooleanQuery();
BooleanQuery.setMaxClauseCount(1);
for (int i = 0; i  queries.length; i++) {
combined[1] = queries[i];
if (queries[i] instanceof BooleanQuery ||
	queries[i] instanceof MultiTermQuery ||
	queries[i] instanceof PrefixQuery ||
	queries[i] instanceof RangeQuery) {
	combined[0] = Query.mergeBooleanQueries(combined);
} else if (queries[i] instanceof PhraseQuery) {
Term[] queryTerms = ((PhraseQuery)queries[i]).getTerms();
for (int j = 0; j  queryTerms.length; j++) {
TermQuery q = new TermQuery(queryTerms[j]);
((BooleanQuery)combined[0]).add(q, true, false);
}
} else ((BooleanQuery)combined[0]).add(queries[i], true, 
false);
}
return combined[0];
} 

...
Searcher[] searchers = new IndexSearcher[indexName.length];
for(int i=0;iindexName.length;i++) {
searchers[i] = new IndexSearcher(indexName[i]);
}
MultiSearcher multiSearcher=new MultiSearcher(searchers);
QueryParser qp = new QueryParser(FIELD_CONTENTS, analyzer);
query = QueryParser.parse(queryString, FIELD_CONTENTS, analyzer);
hits = multiSearcher.search(query);
IndexReader reader[] = new IndexReader[indexName.length];
Query[] expandedQueries=new Query[indexName.length];
for(int i=0;iindexName.length;i++){
reader[i] = IndexReader.open(indexName[i]);
expandedQueries[i]=query.rewrite(reader[i]);
}
query=combine(expandedQueries);
...
Best regards,
Vladimir.


On Thu, 12 Aug 2004 20:51:13 +0530
 Natarajan.T [EMAIL PROTECTED] wrote:
Thanks for your response.
Ok I can understand the concept . if you have any sample code pls
sent it to me.
You have any idea about Parallel Searcher pls share to me.
-Original Message-
From: Terence Lai [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 12, 2004 8:40 PM
To: Lucene Users List
Subject: RE: MultiSearcher to Indexing.

This is how I do it:
IndexSearcher[] is = new IndexSearcher[2];
is[0] = new IndexSearcher(IndexDir1); // first index folder
is[1] = new IndexSearcher(IndexDir2); // second index folder
MultiSearcher searcher = new MultiSearcher(is);
searcher.search(query);
I think that the MulitSearcher is only doing sequential search.
Alternately, you can use ParallelMultiSearcher which allows you to do
the search in parallel.
Hope this helps,
Terence

FYI
 
I have an Indexing files in different folders, in this time how can 
I
doing  the Searching process using MultiSearcher.
 
Thanks,
Natarajan.
 
 
 
 
 



--
Get your free email account from http://www.trekspace.com
 Your Internet Virtual Desktop!
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: continous index update

2004-07-28 Thread Vladimir Yuryev
Hi!
I do automatic index update by cron daemon.
Regards,
Vladimir.
On Wed, 28 Jul 2004 15:05:46 +0530
 jitender ahuja [EMAIL PROTECTED] wrote:
Hi all,
 I am trying to make an automatic index update file based o 
a background thread, but it gives errors in deleting the existing 
index, if (only if) the server accesses the index at the same time or 
has once accessed it and even if a different request is posed, i.e. 
for a different index directory or a different job, it makes no 
difference.
Can anyone tell that in such a continous update scenario, how the old 
index can be updated as I feel deletion is a must of the earlier 
contents so as to get the new contents in place.

Regards,
Jitender

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: continous index update

2004-07-28 Thread Vladimir Yuryev
Jitender
Use task manager.
Regards,
Vladimir.
On Wed, 28 Jul 2004 16:13:51 +0530
 jitender ahuja [EMAIL PROTECTED] wrote:
Hi,
 I am working on Windows platform and I think it wouldn't work 
there.
If it can, do please tell me.

Regards,
- Original Message - 
From: Vladimir Yuryev [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, July 28, 2004 3:17 PM
Subject: Re: continous index update


Hi!
I do automatic index update by cron daemon.
Regards,
Vladimir.
On Wed, 28 Jul 2004 15:05:46 +0530
  jitender ahuja [EMAIL PROTECTED] wrote:
Hi all,
  I am trying to make an automatic index update file based 
o
a background thread, but it gives errors in deleting the existing
index, if (only if) the server accesses the index at the same time 
or
has once accessed it and even if a different request is posed, i.e.
for a different index directory or a different job, it makes no
difference.
Can anyone tell that in such a continous update scenario, how the 
old
index can be updated as I feel deletion is a must of the earlier
contents so as to get the new contents in place.

Regards,
Jitender

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: ANN: Luke v. 0.5 released

2004-06-25 Thread Vladimir Yuryev
On Thu, 24 Jun 2004 12:34:35 +0200
 Andrzej Bialecki [EMAIL PROTECTED] wrote:
Vladimir Yuryev wrote:
Hi Andrzej!
I am sorry for my English :-(
I with pleasure shall tell about the test and I shall try to state 
conditions of the test in detail.

   I don't quite understand what you are saying... Do you suspect 
there is a bug in Luke somewhere on the Search tab? If that's the 
case, please provide an example.

1. Search was made on an index with coding Cp1251.
2. Conditions of search:
 Analyzer to use for query parsing: 
org.apache.lucene.analysis.ru. 
RussianAnalyzer
 Default field is:contents

 2.1. Enter search expression here: (the coding 
windows-1251)
Result: No Results  2.2. Enter search expression 
here:* (the coding windows-1251)
Result: 1 doc (s), url: 
http://www.agnuz.info/result.php?year=2004mounth1=Marchday=26files=v02.txtprint=news 
Time to refresh my russian... :-) Ok, the problem seems to be in the 
RussianAnalyzer - it uses RussianLetterTokenizer, which filters out 
anything which is a non-letter - I'm afraid it filters out also the 
wildcard at the end. Not only that, it then passes the tokens through 
a RussianStemmer, which further mutilates the tokens.

Please try the Parsed query view on the Search tab to see what is 
the result of your query, or paste your query into the text area on 
the AnalyzerTool plugin (Plugins), and see what tokens you get 
using RussianAnalyzer.

I just did it, and the result for * was  - clearly 
not what you wanted.

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Hi Andrzej!
Well.
To the address: 
http://www.agnuz.info/result.php?year=2004mounth1=Marchday=26files=v02.txtprint=news; 
there is a full text in which I searched for a phrase ...Pontiff has 
expressed importance..., in russian. 

Please try the Parsed query view on the Search tab to see what is the result of your query
In a bookmark Search the phrase has not been found. The problem was 
(for some reason?!) in the second and third words? Search by separate 
words (simple terms) has found out a problem in these last two words. 
And so, for Analyzer to use for query parsing: : 
org.apache.lucene.analysis.ru.RussianAnalyzer,
Entry search expression here: [texts in coding Cp1251] -

1. Entry search expression here :   .
Parsed query view: contents:  .
- No Results
2. Entry search expression here:
Parsed query view: contents: 
- 2 doc (s)
URLs:
http: // www.agnuz.info/result.php? 
year=2004mounth1=Marchday=26files=v01.txtprint=news 
http: // www.agnuz.info/result.php? 
year=2004mounth1=Marchday=26files=v02.txtprint=news 

3. Entry search expression here:
Parsed query view: contents: 
- No Results

4. Entry search expression here:
Parsed query view: contents:
- No Results
5. Entry search expression here:   .
Parsed query view: contents: contents: 
contents:.
 - 2 doc (s)- the same documents as point 2.

.., or paste your query into the text area on the AnalyzerTool plugin (Plugins), and see what tokens you get using RussianAnalyzer.
In a tab Plugins in a field Text to be analyzed I have tested the 
same three words as a phrase -. As a 
result of the analysis in a field Tokens found three have been shown 
stemms - ,  and . Actions -  hilite-  
has given positive results by all three words. (Similar a problem not 
in filters?):-)

Best regards,
Vladimir.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: ANN: Luke v. 0.5 released

2004-06-22 Thread Vladimir Yuryev
Hi Andrzej!
I congratulate on the successful version. RussianAnalyzer works with 
my indexes, but there are problems with some words. These problem 
words are found only WildCard a method. Besides AnalizerTool works 
with these words without problems.

There is one more small discrepancy on webpage 
http://www.getopt.org/luke/
- Remember to put both JARs on your classpath, e.g.: java-classpath 
luke.jar; lucene.jar org.getopt.luke. Luke
+ Remember to put both JARs on your classpath, e.g.: java-classpath 
luke.jar:lucene.jar org.getopt.luke. Luke

Regards,
Vladimir.
On Tue, 22 Jun 2004 14:10:50 +0200
 Andrzej Bialecki [EMAIL PROTECTED] wrote:
Hello fellow Luceners,
I'm pleased to announce that new release of Luke is now available. 
You can download it from:

http://www.getopt.org/luke/
This release uses Lucene 1.4-rc4.
This release also represents a major step forward - many new exciting 
features have been added. The feature I consider the most important 
in this release is extensibility - there is a plugin framework, and a 
sample plugin is provided in the distribution - I encourage you to 
write more.

Here's a short summary of changes in this release:
* NEW: Added support for Term Vectors.
* NEW: Added a plugin framework - plugins found on classpath are
	detected automatically and added to the new Plugins tab.
	Note however that for now plugins autoloading doesn't quite
	work when using Java WebStart - an alternative mechanism is also
	provided. Plugins have full access to the application context.
	Please read JavaDoc for LukePlugin.java for more information.
* NEW: A sample plugin is provided, based on Mark Harwood's 
tool
	for analyzing analyzers.
* NEW: all tables support resizable columns now. Some dialogs 
are
	also resizable.
* NEW: Added Reconstruct functionality. Using this function 
users
	can reconstruct the content of all (also unstored) fields of a
	document. This function uses a brute-force approach, so it may
	be slow for larger indexes ( 500,000 docs).
* NEW: Added pseudo-edit functionality. New document editor 
dialog
	allows to modify reconstructed documents, and add or replace the
	original ones.
* FIX: problems with MRU list solved, and a framework for 
handling
	preferences introduced.
* FIX: the list of available Analyzers is now dynamically 
populated
	from the classpath, using the same method as in the AnalyzerTool
	plugin. This also doesn't work in WebStart, so a fallback to a
	static list is provided.
* FIX: restructured source repository and added Ant build 
script.

Please note that as a result of the package name changes, the main 
class is now org.getopt.luke.Luke, and NOT as before luke.Luke.

I felt that all these changes merited a slight change in name, from 
Lucene Index Browser to Lucene Index Toolbox, as this seems to 
better reflect the current functionality of the tool.

Any feedback, patches for enhancements or bufixes are welcome! If you 
want to provide a patch, please use diff -bdruN - this will help me 
to integrate it. Thank you!

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene search integration with Portal Servers

2004-06-18 Thread Vladimir Yuryev
Hi!
For example:
http://www.lutece.paris.fr/en/jsp/site/Portal.jsp
Regards,
Vladimir.
On Fri, 18 Jun 2004 14:32:18 -0700
 Hetan Shah [EMAIL PROTECTED] wrote:
Hi All,
Has anyone tried or have any sample of working integration solution 
for
LUCENE with any J2EE portal servers? Also I am curious to know what 
are
the best or mostly used practices to link the search results with the
documents/files on the system.

Thanks all,
-H

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Analyzers

2004-06-10 Thread Vladimir Yuryev
Hi!
Well. It would be even better if it is allowable existed any 
InterAnalyzer in which the national coding would enter and it could 
inherit properties core analyzers.

Regards,
Vladimir.
Don,
You should get Snowball Analyzers:
http://jakarta.apache.org/lucene/docs/lucene-sandbox/
Lucene core includes Russian and German Analyzers, but in the long 
run
they, too, will most likely get moved out of the core.

Otis
--- Don Vaillancourt [EMAIL PROTECTED] wrote:
I have recently downloaded the latest version of Lucene 1.3 and was 
wondering where some of the classes are.

For example all of the analyzers except for Standard are missing 
from
the 
binary.  Are these documented, but incomplete classes which will be 
available later, although some articles that I have read seem to 
have
tested these analyzers.

=
http://www.simpy.com/ - social bookmarking and personal search engine
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Writing a stemmer

2004-06-06 Thread Vladimir Yuryev
On Sat, 05 Jun 2004 21:15:23 +0200
 Andrzej Bialecki [EMAIL PROTECTED] wrote:
Vladimir Yuryev wrote:
Hi, Andjej!
How you tested the Polish texts with what stemer?
Thanks,
Vladimir.
No reason to be too modest, Leo.. I tested your stemmer on English, 
Swedish and Polish texts (including F-measure vs. training set size 
plots), and it works exceptionally well indeed. Highly recommended!
Well, I have several corpora of Polish language, which together 
amount to roughly 90,000 words (nouns and verbs) having at least 4 
inflected forms. This set is randomized (i.e. lines of words + forms 
are in random order). I've split this into two parts - one of a fixed 
size, as a test set, and one of variable size as a training set. Then 
I compile stemmer tables using variable number of training examples, 
and using differnt settings (trie, multi-trie, different 
optimizations, etc..). Then for each output table I test the 
precision/recall of correct base forms (lemmatization), and of 
ability to create unique stems (stemming). Finally, I select the 
best table, which gives reasonably good results vs. table size. To 
put it in plain terms, e.g. for tables roughly 300kB in size (created 
from training set of 3000 unique words + their forms) in best cases I 
get ~90% of correct stems, and ~70% of correct lemmas. Which is a 
_very_ good result!

--
Best regards,
Andrzej Bialecki
Thanks for the detailed description of the test of the Polish texts. 
It was very important for me.
Vladimir.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Writing a stemmer

2004-06-05 Thread Vladimir Yuryev
Hi, Andjej!
How you tested the Polish texts with what stemer?
Thanks,
Vladimir.
No reason to be too modest, Leo.. I tested your stemmer on English, 
Swedish and Polish texts (including F-measure vs. training set size 
plots), and it works exceptionally well indeed. Highly recommended!

--
Best regards,
Andrzej Bialecki
-
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-
FreeBSD developer (http://www.freebsd.org)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: ParallelMultiSearcher

2004-05-10 Thread Vladimir Yuryev
Hi, Erik!

Thanks for your reply.

Vladimir.

What requirements ParallelMultiSearch to JVM? What the adjustments of 
memory and for processes of system are required? If it somebody knows, 
let it can be on an example anyone of Unix System.
ParallelMultiSearcher simply spins a separate thread for each index and 
waits for the results from all threads before returning results.

Depending on your hardware, you may or may not receive performance 
benefits over using plain MultiSearcher.  You would likely need each 
index on a separate disk so that you would benefit from parallel I/O.

Beyond standard multi-threaded Java concerns, there is nothing special 
about ParallelMultiSearcher, and tuning would be dependent on your 
environment.

	Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


ParallelMultiSearcher

2004-05-07 Thread Vladimir Yuryev
Hello all,

What requirements ParallelMultiSearch to JVM? What the adjustments of 
memory and for processes of system are required? If it somebody knows, 
let it can be on an example anyone of Unix System.

Is there anyone know something about it?
Thanks,
Vladimir
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


DEFAULT_OPERATOR_AND

2004-04-29 Thread Vladimir Yuryev
Hi!

I have lucene1.4-rc3-dev.
TestQueryParser works with RussianAnalyzer(RussianCharsets.CP1251) and 
russian terms.
...
  public Query getQueryDOA(String query, Analyzer a)
throws Exception {
if (a == null)
	a = new RussianAnalyzer(RussianCharsets.CP1251);
//  a = new SimpleAnalyzer();
QueryParser qp = new QueryParser(field, a);
qp.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
return qp.parse(query);
  }
...

In a reality QueryParser work as QueryParser.DEFAULT_OPERATOR_OR after 
set QueryParser.DEFAULT_OPERATOR_AND.
For example:  
1. Query: (after set DEFAULT _ OPERATOR _ AND): term1 term2 term3
Result : term1 OR term2 OR term3
2. Query: +term1 +term2 +term3
Result : term1 AND term2 AND term3

Please, help to decide this problem?

Thanks,
Vladimir.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Bug Luke

2004-04-29 Thread Vladimir Yuryev
Hi!

The search works not correctly c RussianAnalyzer allocating stems.
It(he) searches only for words conterminous with stem. 
For example, WildCard the search gives another result. 

Thanks,
Vladimir.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Patchs for RussianAnalyzer

2004-03-30 Thread Vladimir Yuryev
Erik,
Look, please second my letter whithout attachment. It has the texts in 
body letter.
Vladimir.

On Mon, 29 Mar 2004 12:06:45 -0500
 Erik Hatcher [EMAIL PROTECTED] wrote:
Vladimir,

I have just taken a look at your submitted patches.  I have no 
objections to making Cp1251 the default charset used in the no-arg 
constructor to RussianAnalyzer, but all of your other changes are 
formatting along with the addition of some other constructors.

Could you please provide a functionality-only diff for your patches, 
preferably in a single file attached to a Bugzilla issue?

Thanks,
Erik
On Mar 17, 2004, at 8:25 AM, Vladimir Yuryev wrote:

Dear developers!

The user using RussianAnalyzer writes to you of Lucene. There is one 

problem at work only with it of Analyzer it is parameter of the  
Russian coding (you it know as the set of the code tables for one  
language always causes admiration). East Europe or the population 
the  
using applied programs in Russian use the coding windows-1251 as 
basic  
or widely widespread client a platform MS Windows. There is an 
opinion  
to update constructor without parameters establishing default  
Cp1251.

See attached file: RussianAnalyzerPatchs.tgz
RussianAnalyzer.java.path
RussianLetterTokenizer.java.patch
RussianLowerCaseFilter.java.patch
RussianStemFilter.java.patch
TestRussianAnalyzer.java.path
Such updating will remove mess (for the beginners in Lucene or  
beginners of Russian) and will facilitate use Analyzers at 
switchings  
multilanguage search.
Regards,
Vladimir Yuryev.
RussianAnalyzerPatchs.tgz 
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Patchs for RussianAnalyzer

2004-03-30 Thread Vladimir Yuryev
Erik,
I made BUG # 28050.
Vladimir
On Tue, 30 Mar 2004 06:19:04 -0500
 Erik Hatcher [EMAIL PROTECTED] wrote:
On Mar 30, 2004, at 3:38 AM, Vladimir Yuryev wrote:
Erik,
Look, please second my letter whithout attachment. It has the texts 
in 
body letter.
Vladimir.
I don't have that e-mail you refer to.  Please use the standard 
Jakarta Bugzilla issue tracking system, though.  You can place an 
attachment to an issue after you create it - e-mail ends up mangling 
in-line patches.

What I'm after is a clean patch that *only* changes the functionality 
you desire, not code formatting also.  We can clean up code 
formatting in another pass if needed - or I can just do that on my 
end after reviewing the functionality-only patch.

	Erik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: What happened with build.xml in CVS?

2004-03-29 Thread Vladimir Yuryev
Thanks Rob, works now.
Vladimir
On Mon, 29 Mar 2004 10:34:44 +0100
 Rob Oxspring [EMAIL PROTECTED] wrote:
Looks like Erik's commits 2 days back have up'd the depencancy from 
ant 1.5 to 1.6.  Previously only selected tasks were allowed outside 
of targets and tstamp doesn't look like one of them.

Rob

Vladimir Yuryev wrote:
Hi !

I have made latest update from lucene CVS, in which build.xml has 
problems:

Buildfile: /home/vyuryev/workspace/jakarta-lucene/build.xml
BUILD FAILED: 
file:/home/vyuryev/workspace/jakarta-lucene/build.xml:11: 
Unexpected element tstamp
Total time: 297 milliseconds

Best Regards,
Vladimir Yuryev
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: What happened with build.xml in CVS?

2004-03-29 Thread Vladimir Yuryev
Thanks, Erik.
Ant 1.6.1 works with build.xml v.1.58 without problems.
Vladimir.
On Mon, 29 Mar 2004 08:32:56 -0500
 Erik Hatcher [EMAIL PROTECTED] wrote:
Cool... my sinister plan of subversively getting the world to upgrade 
to Ant 1.6 is working!  :)

	Erik

On Mar 29, 2004, at 4:34 AM, Rob Oxspring wrote:

Looks like Erik's commits 2 days back have up'd the depencancy from 
ant 1.5 to 1.6.  Previously only selected tasks were allowed outside 
of targets and tstamp doesn't look like one of them.

Rob

Vladimir Yuryev wrote:
Hi !
I have made latest update from lucene CVS, in which build.xml has 
problems:
Buildfile: /home/vyuryev/workspace/jakarta-lucene/build.xml
BUILD FAILED: 
file:/home/vyuryev/workspace/jakarta-lucene/build.xml:11: Unexpected 
element tstamp
Total time: 297 milliseconds
Best Regards,
Vladimir Yuryev
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


What happened with build.xml in CVS?

2004-03-28 Thread Vladimir Yuryev
Hi !

I have made latest update from lucene CVS, in which build.xml has 
problems:

Buildfile: /home/vyuryev/workspace/jakarta-lucene/build.xml
BUILD FAILED: 
file:/home/vyuryev/workspace/jakarta-lucene/build.xml:11: Unexpected 
element tstamp
Total time: 297 milliseconds

Best Regards,
Vladimir Yuryev
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Patchs for RussianAnalyzer

2004-03-17 Thread Vladimir Yuryev
Dear developers!

The user using RussianAnalyzer writes to you of Lucene. There is one 
problem at work only with it 
of Analyzer it is parameter of the Russian coding (you it know as the 
set of the code tables for one 
language always causes admiration). East Europe or the population the 
using applied programs in 
Russian use the coding windows-1251 as basic or widely widespread 
client a platform MS Windows. 
There is an opinion to update constructor without parameters 
establishing default Cp1251.

See attached file: RussianAnalyzerPatchs.tgz
RussianAnalyzer.java.path
RussianLetterTokenizer.java.patch
RussianLowerCaseFilter.java.patch
RussianStemFilter.java.patch
TestRussianAnalyzer.java.path
Such updating will remove mess (for the beginners in Lucene or 
beginners of Russian) and will facilitate use Analyzers at switchings 
multilanguage search. 

Regards,
Vladimir Yuryev.


RussianAnalyzerPatchs.tgz
Description: GNU Zip compressed data
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Highlighting problem

2004-03-03 Thread Vladimir Yuryev
Hi!
For you Mark Harwood has made a file HighlightExtractorTest in which 
the principle of work Highlight is specified.
Besides by replacing tags B for example on  B style =  
color:black; background-color:#66  , receive yellow Highlight. 
If to apply conformity found word and color, it will turn out as at 
Google and etc.
Best regards,
Vladimir.

On Tue, 2 Mar 2004 18:19:28 + (GMT)
 Clandes Tino [EMAIL PROTECTED] wrote:
Hi all, 
I have incorporated highlighting package
(http://home.clara.net/markharwood/lucene/highlight.htm)
but I am worried about the following issue.

If I want to display body field content?s best
segments, containing term from query highlighted, I
have to define Field body as Stored.
So, complete process would be like this:
Index related work:
1. parse uploaded document into temp ASCII file
2. read ASCII file and append its content to String 
3. make Field as Text(String name, String value)

Search related work:
1. Retrieve field ?body? String value from the hit
(again - only way to do this - as I have understood ?
is to declare Field ?body? as Stored)
2. pass the String value to Highlighter methods.
Besides that in Lucene FAQ I have read that ?body?
fields are not good candidates to be declared as
Stored. Index size is one obvious reason, but I am
wondering, how it implies Lucene search performance in
general?
Has somebody an idea how to include highlight
functionality in Unstored Field?
Regards and thanx in advance
Milan 



	
	
		
___
Yahoo! Messenger - Communicate instantly...Ping 
your friends today! Download Messenger Now 
http://uk.messenger.yahoo.com/download/index.html

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]