need info for database based Lucene but not flat file

2004-04-26 Thread Yukun Song
As known, currently Lucene uses flat file to store information for
indexing. 

Any people has idea or resources for combining database (Like MySQL or
PostreSQL) and Lucene instead of current flat index file formats?

Regards,

Yukun Song



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: QueryParser Behavior and Token.setPositionIncrement

2004-04-26 Thread Norton, James
Thanks for the reply.  I had reached the same conclusion as you regarding the analyzer 
for
queries (no multiple tokens per position), but I would still reqard the behaviour of 
QueryParser as incorrect.

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Monday, April 26, 2004 3:45 PM
To: Lucene Users List
Subject: Re: QueryParser Behavior and Token.setPositionIncrement


QueryParser is a mixed blessing.  It has plenty of quirks.  As you've 
proven to yourself, it ignores position increments and constructs a 
PhraseQuery of all the tokens in order, regardless of position 
increment.

Another oddity is that PhraseQuery doesn't deal with position 
increments either - each term added to it is considered in a successive 
position.

My best suggestion so far is to use a different analyzer for indexing 
than you do for querying.  There is really no need to add multiple 
tokens per position at query time anyway, if all the relevant ones were 
added at indexing time.  So use something that emits a more 
straight-forward incrementing position set of tokens during querying.

Erik


On Apr 26, 2004, at 2:34 PM, Norton, James wrote:

>> I am attempting to use Token.setPositionIncrement() to provide 
>> alternate forms of tokens and I have encountered strange
>> behavior with QueryParser.  It seems to be constructing phrase 
>> queries with the alternate tokens.  I don't know why the
>> query would be parsed as a phrase.
>>
>> For example, consider an Analyzer that adds lowercase tokens to the 
>> token stream as alternate forms (position increment = 0).
>> Parsing the query "Bush" (quotes added for emphasis and not part of 
>> query) results in a query of text:"Bush bush" ("text" is
>> the default field).  Whereas parsing the query "bush" results in the 
>> query text:bush.  Notice the lack of quotes in the second
>> case, which has no alternate form appended because the token is 
>> already lowercase.  Is this a bug or is there some
>> explanation of which I am not aware?
>>
>> The following two classes provide test code verifying this behaviour.
>>
>>
>>
>> /**
>>  * A test analyzer employing a TestLowerCaseFilter to demonstrate 
>> problems with
>>  * QueryParser when dealing with multiple tokens at the same position.
>>  */
>> public class TestAnalyzer extends Analyzer {
>>  /**
>>   * Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL 
>> PROTECTED]
>>   * StandardFilter} and a [EMAIL PROTECTED] TestLowerCaseFilter}.
>>   */
>>  public final TokenStream tokenStream(String fieldName, Reader 
>> reader) {
>>  TokenStream result = new StandardTokenizer(reader);
>>  result = new StandardFilter(result);
>>  result = new TestLowerCaseFilter(result, Locale.getDefault());
>>  return result;
>>  }
>>  
>>  public static void main(String[] args) {
>>  TestAnalyzer analyzer = new TestAnalyzer();
>>  try {
>>  Query lowerCaseQuery = QueryParser.parse("bush", "text", 
>> analyzer);
>>  Query upperCaseQuery = QueryParser.parse("Bush", "text", 
>> analyzer);
>>  
>>  System.out.println("lower case: " + lowerCaseQuery.toString());
>>  System.out.println("upper case: " + upperCaseQuery.toString());
>>  } catch (ParseException e) {
>>  // TODO Auto-generated catch block
>>  e.printStackTrace();
>>  }
>>  
>>  }
>> }
>>
>> /**
>>  *
>>  * A [EMAIL PROTECTED] Filter} that adds alternate forms (lower case) for upper 
>> case
>>  * tokens to a [EMAIL PROTECTED] TokenStream}.
>>  */
>> public class TestLowerCaseFilter extends TokenFilter {
>>  private Locale locale;
>>  private Token alternateToken;
>>
>>  public TestLowerCaseFilter(TokenStream stream, Locale locale) {
>>  super(stream);
>>  this.locale = locale;
>>  this.alternateToken = null;
>>  }
>>
>>  /* (non-Javadoc)
>>   * @see org.apache.lucene.analysis.TokenStream#next()
>>   */
>>  public Token next() throws IOException {
>>  
>>  Token rval = null;
>>  if (alternateToken != null) {
>>  rval = alternateToken;
>>  alternateToken = null;
>>  } else {
>>  Token nextToken = input.next();
>>  if (nextToken == null) {
>>  return null;
>>  }
>>  String text = nextToken.termText();
>>  String lc = text.toLowerCase(locale);
>>  rval = nextToken;
>>  if (!lc.equals(text)) {
>>
>>  alternateToken =
>>  new Token(
>>  lc,
>> 

Re: Adding duplicate Fields to Documents

2004-04-26 Thread Incze Lajos
On Mon, Apr 26, 2004 at 04:36:22PM -0400, Gerard Sychay wrote:
> As two people have already stated, fields that are not tokenised are
> stored seprately.  Then a single document with two fields of the same
> name can be retrieved by searching for either of the fields.
> 
> However, retrieving all of the field values seems to be impossible. 
> That is, given ("field_name", "keyword1") and ("field_name",
> "keyword2"), using doc.get("field_name") always returns "keyword2", the
> last value added.  Of course, I can't really think of a scenario where
> this would be a problem..
> 
> Thanks for the help!

Use this method on yout document:

Field[]   getFields(String name)
  Returns an array of Fields with the given name.

incze

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



about paralmultisearch java.lang.NoSuchMethodError:

2004-04-26 Thread xuemei li
Hello all,

I am doing some work using Lucene 1.4 paralmultisearch.  The part of
code is like this:
Searcher searcher1 = new IndexSearcher("/home/abc/lucetestbed1/index");
Searcher searcher2 = new IndexSearcher ("/home/abc/lucetestbed/index");
ParallelMultiSearcher
searcher =new ParallelMultiSearcher(new Searcher[]
{searcher1,searcher2});

If I use
MultiSearcher searcher = new MultiSearcher(new Searcher[]
{searcher1,searcher2});
to replace the
parallelMultisearcher searcher=new ParallelMultiSearcher(new Searcher[]
{searcher1,searcher2});
It works well.
But if I use ParallelMultisearcher the code can compile correctly and
cann't execute correctly.The executing error is as following:
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.lucene.search.ParallelMultiSearcher.getStarts()[I
at
org.apache.lucene.search.ParallelMultiSearcher.(ParallelMultiSe
archer.java:38)
at SearchFiles.main(SearchFiles.java:92)


Is there anyone know something about how to handle this error? The error
is the same if I use a RemoteSearchable to search.

Thanks,

Li





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Adding duplicate Fields to Documents

2004-04-26 Thread Gerard Sychay
As two people have already stated, fields that are not tokenised are
stored seprately.  Then a single document with two fields of the same
name can be retrieved by searching for either of the fields.

However, retrieving all of the field values seems to be impossible. 
That is, given ("field_name", "keyword1") and ("field_name",
"keyword2"), using doc.get("field_name") always returns "keyword2", the
last value added.  Of course, I can't really think of a scenario where
this would be a problem..

Thanks for the help!

>>> Gerard Sychay 04/26/04 01:57PM >>>
Luke is a good idea.  I'll also just write a simple test program and
play around with it (something I probably should have done before
posting) and then post my findings here.

>>> Stephane James Vaucher <[EMAIL PROTECTED]> 04/24/04 02:02PM
>>>
>From my experience (that is little experience;)), fields that are not
tokenised, are stored separately. Someone more qualified can surely
give
you more details.

You can look at your index with Luke, it might be insightful.
sv

On Thu, 22 Apr 2004, Gerard Sychay wrote:

> Hello,
>
> I am wondering what happens when you add two Fields with same names
to
> a Document.  The API states that "if the fields are indexed, their
text
> is treated as though appended."  This much makes sense.  But what
about
> the following two cases:
>
> - Adding two fields with same name that are indexed, not tokenized
> (keywords)?  E.g. given ("field_name", "keyword1") and
("field_name",
> "keyword2"), would the final keyword field be ("field_name",
> "keyword1keyword2")?  Seems weird..
>
> - Adding two fields with same name that are stored, but not indexed
and
> not tokenized (e.g. database keys)?  Are they appended (which would
mess
> up the database key when retrieved from the Hit)?
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ArrayIndexOutOfBoundsException

2004-04-26 Thread Paul
This looks very much like the problem I had (see subject/thread
'Optimize Crash').

I anyone replies to this, could you please read my post as well - this
is still unresolved and I'd like to be able to take some action on it.

Cheers,
Paul.


James Dunn wrote:
> Hello all,
>
> I have a web site whose search is driven by Lucene
> 1.3.  I've been doing some load testing using JMeter
> and occassionally I will see the exception below when
> the search page is under heavy load.
>
> Has anyone seen similar errors during load testing?
>
> I've seen some posts with similar exceptions and the
> general consensus is that this error means that the
> index is corrupt.  I'm not sure my index is corrupt
> however.  I can run all the queries I use for load
> testing under normal load and I don't appear to get
> this error.
>
> Is there any way to verify that a Lucene index is
> corrupt or not?
>
> Thanks,
>
> Jim
>
> java.lang.ArrayIndexOutOfBoundsException: 53 >= 52
> at java.util.Vector.elementAt(Vector.java:431)
> at
> org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:135)
> at
> org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:103)
> at
> org.apache.lucene.index.SegmentReader.document(SegmentReader.java:275)
> at
> org.apache.lucene.index.SegmentsReader.document(SegmentsReader.java:112)
> at
> org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:107)
> at
> org.apache.lucene.search.MultiSearcher.doc(MultiSearcher.java:100)
> at
> org.apache.lucene.search.MultiSearcher.doc(MultiSearcher.java:100)
> at
> org.apache.lucene.search.Hits.doc(Hits.java:130)
>
>
>
>
>
> __
> Do you Yahoo!?
> Yahoo! Photos: High-quality 4x6 digital prints for 25¢
> http://photos.yahoo.com/ph/print_splash
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

-- 
The trouble with the rat-race is that even if you win, you're still a rat.
-- Lily Tomlin

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: QueryParser Behavior and Token.setPositionIncrement

2004-04-26 Thread Erik Hatcher
QueryParser is a mixed blessing.  It has plenty of quirks.  As you've 
proven to yourself, it ignores position increments and constructs a 
PhraseQuery of all the tokens in order, regardless of position 
increment.

Another oddity is that PhraseQuery doesn't deal with position 
increments either - each term added to it is considered in a successive 
position.

My best suggestion so far is to use a different analyzer for indexing 
than you do for querying.  There is really no need to add multiple 
tokens per position at query time anyway, if all the relevant ones were 
added at indexing time.  So use something that emits a more 
straight-forward incrementing position set of tokens during querying.

	Erik

On Apr 26, 2004, at 2:34 PM, Norton, James wrote:

I am attempting to use Token.setPositionIncrement() to provide 
alternate forms of tokens and I have encountered strange
behavior with QueryParser.  It seems to be constructing phrase 
queries with the alternate tokens.  I don't know why the
query would be parsed as a phrase.

For example, consider an Analyzer that adds lowercase tokens to the 
token stream as alternate forms (position increment = 0).
Parsing the query "Bush" (quotes added for emphasis and not part of 
query) results in a query of text:"Bush bush" ("text" is
the default field).  Whereas parsing the query "bush" results in the 
query text:bush.  Notice the lack of quotes in the second
case, which has no alternate form appended because the token is 
already lowercase.  Is this a bug or is there some
explanation of which I am not aware?

The following two classes provide test code verifying this behaviour.



/**
 * A test analyzer employing a TestLowerCaseFilter to demonstrate 
problems with
 * QueryParser when dealing with multiple tokens at the same position.
 */
public class TestAnalyzer extends Analyzer {
	/**
	 * Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL PROTECTED]
	 * StandardFilter} and a [EMAIL PROTECTED] TestLowerCaseFilter}.
	 */
	public final TokenStream tokenStream(String fieldName, Reader 
reader) {
		TokenStream result = new StandardTokenizer(reader);
		result = new StandardFilter(result);
		result = new TestLowerCaseFilter(result, Locale.getDefault());
		return result;
	}
	
	public static void main(String[] args) {
		TestAnalyzer analyzer = new TestAnalyzer();
		try {
			Query lowerCaseQuery = QueryParser.parse("bush", "text", analyzer);
			Query upperCaseQuery = QueryParser.parse("Bush", "text", analyzer);
			
			System.out.println("lower case: " + lowerCaseQuery.toString());
			System.out.println("upper case: " + upperCaseQuery.toString());
		} catch (ParseException e) {
			// TODO Auto-generated catch block
			e.printStackTrace();
		}
		
	}
}

/**
 *
 * A [EMAIL PROTECTED] Filter} that adds alternate forms (lower case) for upper 
case
 * tokens to a [EMAIL PROTECTED] TokenStream}.
 */
public class TestLowerCaseFilter extends TokenFilter {
	private Locale locale;
	private Token alternateToken;

public TestLowerCaseFilter(TokenStream stream, Locale locale) {
super(stream);
this.locale = locale;
this.alternateToken = null;
}
/* (non-Javadoc)
 * @see org.apache.lucene.analysis.TokenStream#next()
 */
public Token next() throws IOException {

Token rval = null;
if (alternateToken != null) {
rval = alternateToken;
alternateToken = null;
} else {
Token nextToken = input.next();
if (nextToken == null) {
return null;
}
String text = nextToken.termText();
String lc = text.toLowerCase(locale);
rval = nextToken;
if (!lc.equals(text)) {
alternateToken =
new Token(
lc,
nextToken.startOffset(),
nextToken.endOffset());
alternateToken.setPositionIncrement(0);
}
}
return rval;
}
}


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


ArrayIndexOutOfBoundsException

2004-04-26 Thread James Dunn
Hello all,

I have a web site whose search is driven by Lucene
1.3.  I've been doing some load testing using JMeter
and occassionally I will see the exception below when
the search page is under heavy load.

Has anyone seen similar errors during load testing?

I've seen some posts with similar exceptions and the
general consensus is that this error means that the
index is corrupt.  I'm not sure my index is corrupt
however.  I can run all the queries I use for load
testing under normal load and I don't appear to get
this error.

Is there any way to verify that a Lucene index is
corrupt or not? 

Thanks,

Jim

java.lang.ArrayIndexOutOfBoundsException: 53 >= 52
at java.util.Vector.elementAt(Vector.java:431)
at
org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:135)
at
org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:103)
at
org.apache.lucene.index.SegmentReader.document(SegmentReader.java:275)
at
org.apache.lucene.index.SegmentsReader.document(SegmentsReader.java:112)
at
org.apache.lucene.search.IndexSearcher.doc(IndexSearcher.java:107)
at
org.apache.lucene.search.MultiSearcher.doc(MultiSearcher.java:100)
at
org.apache.lucene.search.MultiSearcher.doc(MultiSearcher.java:100)
at
org.apache.lucene.search.Hits.doc(Hits.java:130)





__
Do you Yahoo!?
Yahoo! Photos: High-quality 4x6 digital prints for 25¢
http://photos.yahoo.com/ph/print_splash

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: SQLDirectory implementation

2004-04-26 Thread Doug Cutting
Anthony Vito wrote:
  I noticed some talk on SQLDirectory a month or so ago. ( I just joined
the list :) ) I have a JDBC implementation that stores the "files" in a
couple of tables and stores the data for the files as blocks (BLOBs) of
a certain size ( 16k by default ). It also has an LRU cache for the
blocks which makes the performance quite acceptable. 

I actually prefixed all the file names with MySQL. Even though it's pure
JDBC and should work with any driver or database. I'll go clean that up
this weekend and put up a site with the code and the API docs. I'd be
interested to see what kind people have to say, and if the results of
any better tests people have cooked up.
Did you ever post this code?  It would be a great contribution to Lucene.

Thanks,

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Not deleting temp files after updating/optimising.

2004-04-26 Thread Doug Cutting
Win32 seems to sometimes not permit one to delete a file immediately 
after it has been closed.  Because of this, Lucene keeps a list of files 
that need to be deleted in the 'deleteable' file.  Are your files listed 
in this file?  If so, Lucene will again try to delete these files the 
next time the index is modified.  Perhaps we should add a public method 
which tries to delete such files...

Doug

Paul Williams wrote:
Hi,

I am currently using Java 1.4.2_03 with Lucene 1.3 Final. I am using the
option setUseCompoundFile(true) as I have a lot of fields in the database
schema, and it can cause the dreaded 'too many open file error' on Windows
based systems. (Optimising after 1000 documents with a merge factor of a
100)
But I am having a problem with updating a reasonable sized index. It seems
to leave temporary files which can treble the size of the index on the hard
disk. (You can delete these file afterwards with no ill effects)
I have checked to make sure no other processes are locking these files
(virus checker/backup software etc.) 

See the directory listing below for the index files.

Has anyone else had a similar problem? And hopefully come up with a
solution!!
Regards,

Paul

15/04/2004  11:05 .
15/04/2004  11:05 ..
15/04/2004  10:554 deletable
15/04/2004  11:050 dir.txt
15/04/2004  10:55   76 segments
11/04/2004  09:22  516,150,272 _11au.fdt
11/04/2004  09:22   12,288 _11au.fdx
11/04/2004  09:191,169 _11au.fnm
11/04/2004  15:003,969,914,880 _12nq.fdt
11/04/2004  15:00  283,648 _12nq.fdx
11/04/2004  14:391,169 _12nq.fnm
14/04/2004  17:145,689,259,942 _15ar.cfs
15/04/2004  10:175,462 _15ar.del
14/04/2004  17:198,619,569 _15e0.cfs
15/04/2004  10:12   21 _15e0.del
15/04/2004  07:58  563,730 _15es.cfs
15/04/2004  08:25  394,474 _15fg.cfs
15/04/2004  10:18  663,616 _15g5.cfs
15/04/2004  10:559,385 _15g9.cfs
10/04/2004  16:06  939,758,592 _vas.fdt
10/04/2004  16:06   25,600 _vas.fdx
10/04/2004  16:011,169 _vas.fnm
10/04/2004  16:23  656,837,632 _vuu.fdt
10/04/2004  16:23   23,552 _vuu.fdx
10/04/2004  16:201,169 _vuu.fnm
10/04/2004  17:39   98,271,232 _y6h.fdt
10/04/2004  17:382,048 _y6h.fdx
10/04/2004  17:381,169 _y6h.fnm
10/04/2004  17:42   76,907,520 _y9k.cfs
10/04/2004  17:421,060 _y9k.f1
10/04/2004  17:421,060 _y9k.f10
10/04/2004  17:421,060 _y9k.f100
10/04/2004  17:421,060 _y9k.f101
10/04/2004  17:421,060 _y9k.f102
10/04/2004  17:421,060 _y9k.f103
10/04/2004  17:421,060 _y9k.f104
10/04/2004  17:421,060 _y9k.f105
10/04/2004  17:421,060 _y9k.f106
10/04/2004  17:421,060 _y9k.f107
10/04/2004  17:421,060 _y9k.f108
10/04/2004  17:421,060 _y9k.f109
10/04/2004  17:421,060 _y9k.f11
10/04/2004  17:421,060 _y9k.f110
10/04/2004  17:421,060 _y9k.f111
10/04/2004  17:421,060 _y9k.f112
10/04/2004  17:421,060 _y9k.f113
10/04/2004  17:421,060 _y9k.f114
10/04/2004  17:421,060 _y9k.f115
10/04/2004  17:421,060 _y9k.f116
10/04/2004  17:421,060 _y9k.f117
10/04/2004  17:421,060 _y9k.f118
10/04/2004  17:421,060 _y9k.f119
10/04/2004  17:421,060 _y9k.f12
10/04/2004  17:421,060 _y9k.f120
10/04/2004  17:421,060 _y9k.f13
10/04/2004  17:421,060 _y9k.f14
10/04/2004  17:421,060 _y9k.f15
10/04/2004  17:421,060 _y9k.f16
10/04/2004  17:421,060 _y9k.f17
10/04/2004  17:421,060 _y9k.f18
10/04/2004  17:421,060 _y9k.f19
10/04/2004  17:421,060 _y9k.f2
10/04/2004  17:421,060 _y9k.f20
10/04/2004  17:421,060 _y9k.f21
10/04/2004  17:421,060 _y9k.f22
10/04/2004  17:421,060 _y9k.f23
10/04/2004  17:421,060 _y9k.f24
10/04/2004  17:421,060 _y9k.f25
10/04/2004  17:421,060 _y9k.f26
10/04/2004  17:421,060 _y9k.f27
10/04/2004  17:421,060 _y9k.f28
10/04/2004  17:421,060 _y9k.f29
10/04/2004  17:421,060 _y9k.f3
10/04/2004  17:421,060 _y9k.f30
10/04/2004  17:421,060 _y9k.f31
10/04/2004  17:421,060 _y9k.f32
10/04/2004  17:421,060 _y9k.f33
10/04/2004  17:421,060 _y9k.f34
10/04/2004  17:421,060 _

QueryParser Behavior and Token.setPositionIncrement

2004-04-26 Thread Norton, James
> I am attempting to use Token.setPositionIncrement() to provide alternate forms of 
> tokens and I have encountered strange
> behavior with QueryParser.  It seems to be constructing phrase queries with the 
> alternate tokens.  I don't know why the
> query would be parsed as a phrase.
> 
> For example, consider an Analyzer that adds lowercase tokens to the token stream as 
> alternate forms (position increment = 0).
> Parsing the query "Bush" (quotes added for emphasis and not part of query) results 
> in a query of text:"Bush bush" ("text" is
> the default field).  Whereas parsing the query "bush" results in the query 
> text:bush.  Notice the lack of quotes in the second
> case, which has no alternate form appended because the token is already lowercase.  
> Is this a bug or is there some 
> explanation of which I am not aware?
> 
> The following two classes provide test code verifying this behaviour.
> 
> 
> 
> /**
>  * A test analyzer employing a TestLowerCaseFilter to demonstrate problems with
>  * QueryParser when dealing with multiple tokens at the same position.
>  */
> public class TestAnalyzer extends Analyzer {
>   /**
>* Constructs a [EMAIL PROTECTED] StandardTokenizer} filtered by a [EMAIL 
> PROTECTED]
>* StandardFilter} and a [EMAIL PROTECTED] TestLowerCaseFilter}.
>*/
>   public final TokenStream tokenStream(String fieldName, Reader reader) {
>   TokenStream result = new StandardTokenizer(reader);
>   result = new StandardFilter(result);
>   result = new TestLowerCaseFilter(result, Locale.getDefault());
>   return result;
>   }
>   
>   public static void main(String[] args) {
>   TestAnalyzer analyzer = new TestAnalyzer();
>   try {
>   Query lowerCaseQuery = QueryParser.parse("bush", "text", 
> analyzer);
>   Query upperCaseQuery = QueryParser.parse("Bush", "text", 
> analyzer);
>   
>   System.out.println("lower case: " + lowerCaseQuery.toString());
>   System.out.println("upper case: " + upperCaseQuery.toString());
>   } catch (ParseException e) {
>   // TODO Auto-generated catch block
>   e.printStackTrace();
>   }
>   
>   }
> }
> 
> /**
>  *
>  * A [EMAIL PROTECTED] Filter} that adds alternate forms (lower case) for upper case 
>  * tokens to a [EMAIL PROTECTED] TokenStream}.
>  */
> public class TestLowerCaseFilter extends TokenFilter {
>   private Locale locale;
>   private Token alternateToken;
> 
>   public TestLowerCaseFilter(TokenStream stream, Locale locale) {
>   super(stream);
>   this.locale = locale;
>   this.alternateToken = null;
>   }
> 
>   /* (non-Javadoc)
>* @see org.apache.lucene.analysis.TokenStream#next()
>*/
>   public Token next() throws IOException {
>   
>   Token rval = null;
>   if (alternateToken != null) {
>   rval = alternateToken;
>   alternateToken = null;
>   } else {
>   Token nextToken = input.next();
>   if (nextToken == null) {
>   return null;
>   }
>   String text = nextToken.termText();
>   String lc = text.toLowerCase(locale);
>   rval = nextToken;
>   if (!lc.equals(text)) {
> 
>   alternateToken =
>   new Token(
>   lc,
>   nextToken.startOffset(),
>   nextToken.endOffset());
>   alternateToken.setPositionIncrement(0);
>   }
>   }
>   return rval;
>   }
> 
> }  


Re: Adding duplicate Fields to Documents

2004-04-26 Thread Gerard Sychay
Luke is a good idea.  I'll also just write a simple test program and
play around with it (something I probably should have done before
posting) and then post my findings here.

>>> Stephane James Vaucher <[EMAIL PROTECTED]> 04/24/04 02:02PM
>>>
>From my experience (that is little experience;)), fields that are not
tokenised, are stored separately. Someone more qualified can surely
give
you more details.

You can look at your index with Luke, it might be insightful.
sv

On Thu, 22 Apr 2004, Gerard Sychay wrote:

> Hello,
>
> I am wondering what happens when you add two Fields with same names
to
> a Document.  The API states that "if the fields are indexed, their
text
> is treated as though appended."  This much makes sense.  But what
about
> the following two cases:
>
> - Adding two fields with same name that are indexed, not tokenized
> (keywords)?  E.g. given ("field_name", "keyword1") and
("field_name",
> "keyword2"), would the final keyword field be ("field_name",
> "keyword1keyword2")?  Seems weird..
>
> - Adding two fields with same name that are stored, but not indexed
and
> not tokenized (e.g. database keys)?  Are they appended (which would
mess
> up the database key when retrieved from the Hit)?
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [Jakarta Lucene Wiki] Updated: NewFrontPage

2004-04-26 Thread Erik Hatcher
Make the switch ... I don't see any reason to save the old page.  We  
can always change it, it's a wiki after all!

	Erik

On Apr 26, 2004, at 10:56 AM, Stephane James Vaucher wrote:

I don't know what you think of the NewFrontPage, but if you like it, I
could do a switch, renaming old FrontPage to OldFrontPage and the new  
one
to FrontPage.

Also, if anyone knows how to do this, it would be appreciated. I  
haven't
figured out yet how to rename/destroy pages (are there permissions in
MoinMoin?). Amongst other things, the doc says there should be an
action=DeletePage.

cheers,
sv
On Mon, 26 Apr 2004 [EMAIL PROTECTED] wrote:

   Date: 2004-04-26T07:38:29
   Editor: StephaneVaucher <[EMAIL PROTECTED]>
   Wiki: Jakarta Lucene Wiki
   Page: NewFrontPage
   URL: http://wiki.apache.org/jakarta-lucene/NewFrontPage
   Added link to PoweredBy page

Change Log:

-- 

@@ -17,6 +17,7 @@
  || IntroductionToLucene || Articles and Tutorials introducing  
Lucene ||
  || OnTheRoad || Information on presentations and courses ||
  || InformationRetrieval || Articles and Tutorials on information  
retrieval ||
+ || PoweredBy || Link to projects using Lucene ||
  || ["LuceneFAQ"]|| The Lucene FAQ ||
  || HowTo|| Lucene HOWTO's : small tutorials and  
code snippets ||
  || ["Resources"]|| Contains useful links ||

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: [Jakarta Lucene Wiki] Updated: NewFrontPage

2004-04-26 Thread Stephane James Vaucher
I don't know what you think of the NewFrontPage, but if you like it, I 
could do a switch, renaming old FrontPage to OldFrontPage and the new one 
to FrontPage.

Also, if anyone knows how to do this, it would be appreciated. I haven't 
figured out yet how to rename/destroy pages (are there permissions in 
MoinMoin?). Amongst other things, the doc says there should be an 
action=DeletePage.

cheers,
sv

On Mon, 26 Apr 2004 [EMAIL PROTECTED] wrote:

>Date: 2004-04-26T07:38:29
>Editor: StephaneVaucher <[EMAIL PROTECTED]>
>Wiki: Jakarta Lucene Wiki
>Page: NewFrontPage
>URL: http://wiki.apache.org/jakarta-lucene/NewFrontPage
> 
>Added link to PoweredBy page
> 
> Change Log:
> 
> --
> @@ -17,6 +17,7 @@
>   || IntroductionToLucene || Articles and Tutorials introducing Lucene ||
>   || OnTheRoad || Information on presentations and courses ||
>   || InformationRetrieval || Articles and Tutorials on information retrieval ||
> + || PoweredBy || Link to projects using Lucene ||
>   || ["LuceneFAQ"]|| The Lucene FAQ ||
>   || HowTo|| Lucene HOWTO's : small tutorials and code snippets ||
>   || ["Resources"]|| Contains useful links ||
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Segments file get deleted?!

2004-04-26 Thread Stephane James Vaucher
I would have to agree with Surya's diagnosis, can you give us details on 
your update process?

Please include OS, and if there are some non-java processes (e.g. doing 
copies).

cheers,
sv

On Mon, 26 Apr 2004, Nader S. Henein wrote:

> Can you give us a bit of background, we've been using Lucene since the first
> stable release 2 years ago, and I 've never had segments disappear on me,
> first of all can you provide some background on your setup and secondly when
> you say "a certain period of time", how much time are we talking about here
> and does that interval coincide with your indexing schedule, because you may
> have the create flag on the Indexer set to true so it simply recreates the
> index at every update and deleted whatever was there, of course if there are
> no files to index at any point it will just give you a blank index. 
> 
> 
> Nader Henein
> 
> -Original Message-
> From: Surya Kiran [mailto:[EMAIL PROTECTED] 
> Sent: Monday, April 26, 2004 7:48 AM
> To: [EMAIL PROTECTED]
> Subject: Segments file get deleted?!
> 
> 
> Hi all, we have implemented our portal search using Lucene. It  works fine.
> But after a certain period of time "Lucene segments" file get deleted.
> Eventually all searches fails. Anyone can guess where the error could be.
> 
> Thanks a lot.
> 
> Regards
> Surya.
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ValueListHandler pattern with Lucene

2004-04-26 Thread lucene
On Monday 12 April 2004 20:54, [EMAIL PROTECTED] wrote:
> On Sunday 11 April 2004 17:46, Erik Hatcher wrote:
> > In other words, you need to invent your own "pattern" here?!  :)
>
> I just experimented a bit and came up with the ValueListSupplier which
> replaces the ValueList in the VLH. Seems to work so far... :-) Comments are
> greatly appreciated!

FYI http://www.nitwit.de/vlh2/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]