date:20060301

[jira] Created: (LUCENE-503) Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene

2006-03-01 Thread Samphan Raruenrom (JIRA)

Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene
---

 Key: LUCENE-503
 URL: http://issues.apache.org/jira/browse/LUCENE-503
 Project: Lucene - Java
Type: New Feature
  Components: Analysis  
Versions: 1.4
Reporter: Samphan Raruenrom


Thai text don't have space between words. Usually, a dictionary-based algorithm 
is used to break string into words. For Lucene to be usable for Thai, an 
Analyzer that know how to break Thai words is needed.

I've implemented such Analyzer, ThaiAnalyzer, using ICU4j 
DictionaryBasedBreakIterator for word breaking. I'll upload the code later.

I'm normally a C++ programmer and very new to Java. Please review the code for 
any problem. One possible problem is that it requires ICU4j. I don't know 
whether this is OK.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

FuzzyQuery / PriorityQueue BUG

2006-03-01 Thread Jörg Henß

Hi,
FuzzyQuery produces a java.lang.NegativeArraySizeException in
PriorityQueue.initialize if I use Integer.MAX_VALUE as
BooleanQuery.MaxClauseCount. This is because it adds 1 to MaxClauseCount and
tries to allocate an Array of this Size (I think it overflows to MIN_VALUE).
Usually nobody needs so much clauses, but I think this should be catched
somehow. Perhaps an Error your MaxClauseCount is too large could do it, so
the user knows where to find the problem.
Greets
Joerg


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: FuzzyQuery / PriorityQueue BUG

2006-03-01 Thread Bernhard Messer


Jörg,

could you please add this to JIRA so that things don't get lost. If you 
have a patch and/or a testcase showing the problem, it would be great if 
you append it to JIRA also.


thanks,
Bernhard

Jörg Henß wrote:


Hi,
FuzzyQuery produces a java.lang.NegativeArraySizeException in
PriorityQueue.initialize if I use Integer.MAX_VALUE as
BooleanQuery.MaxClauseCount. This is because it adds 1 to MaxClauseCount and
tries to allocate an Array of this Size (I think it overflows to MIN_VALUE).
Usually nobody needs so much clauses, but I think this should be catched
somehow. Perhaps an Error your MaxClauseCount is too large could do it, so
the user knows where to find the problem.
Greets
Joerg


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-504) FuzzyQuery produces a java.lang.NegativeArraySizeException in PriorityQueue.initialize if I use Integer.MAX_VALUE as BooleanQuery.MaxClauseCount

2006-03-01 Thread Joerg Henss (JIRA)

FuzzyQuery produces a java.lang.NegativeArraySizeException in 
PriorityQueue.initialize if I use Integer.MAX_VALUE as 
BooleanQuery.MaxClauseCount
--

 Key: LUCENE-504
 URL: http://issues.apache.org/jira/browse/LUCENE-504
 Project: Lucene - Java
Type: Bug
  Components: Search  
Versions: 1.9
Reporter: Joerg Henss
Priority: Minor


PriorityQueue creates an java.lang.NegativeArraySizeException when 
initialized with Integer.MAX_VALUE, because Integer overflows. I think this 
could be a general problem with PriorityQueue. The Error occured when I set 
BooleanQuery.MaxClauseCount to Integer.MAX_VALUE and user a FuzzyQuery for 
searching.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-504) FuzzyQuery produces a java.lang.NegativeArraySizeException in PriorityQueue.initialize if I use Integer.MAX_VALUE as BooleanQuery.MaxClauseCount

2006-03-01 Thread Joerg Henss (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-504?page=all ]

Joerg Henss updated LUCENE-504:
---

Attachment: TestFuzzyQueryError.java

Simple test showing the error

 FuzzyQuery produces a java.lang.NegativeArraySizeException in 
 PriorityQueue.initialize if I use Integer.MAX_VALUE as 
 BooleanQuery.MaxClauseCount
 --

  Key: LUCENE-504
  URL: http://issues.apache.org/jira/browse/LUCENE-504
  Project: Lucene - Java
 Type: Bug
   Components: Search
 Versions: 1.9
 Reporter: Joerg Henss
 Priority: Minor
  Attachments: TestFuzzyQueryError.java

 PriorityQueue creates an java.lang.NegativeArraySizeException when 
 initialized with Integer.MAX_VALUE, because Integer overflows. I think this 
 could be a general problem with PriorityQueue. The Error occured when I set 
 BooleanQuery.MaxClauseCount to Integer.MAX_VALUE and user a FuzzyQuery for 
 searching.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: version issue ?

2006-03-01 Thread Doug Cutting


Andi Vajda wrote:
It would seem to me that the source code snapshot that is made to 
release 'official' source and binary tarballs on the Lucene website 
should correspond to a precise svn version


It does correspond to a precise SVN version, but what we prefer is a 
tag.  The tag for 1.9-final is:


http://svn.apache.org/repos/asf/lucene/java/tags/lucene_1_9_final/

Tags should never be revised.  If you're paranoid, then you could also 
note the revision of this tag, which is 381437.


and that that version's 
common-build.xml should reflect the release number, ie 1.9 or 1.9-final 
in its version property.


Here's where things are murky.  I don't like it if what folks build 
themselves is easily confused with a released build.  So the version in 
the common-build.xml should not be 1.9 or 1.9-final but something 
else.  But what?  1.9-dev?  1.9.1-dev?  1.9.1-rc1-dev?  Classically 
we've used the -dev suffix to distinguish non-release builds.  Can you 
suggest something bette


If the 1.9 branch were to be modified for making, say, a 1.9.1 release, 
the first change should be in common-build.xml for the version to say 
something like 1.9.1-rc1-dev.


Right.


What is the use case here ?


The use case is folks downloading the source, tweaking things, and 
building things themselves.  If they're not going to tweak things, 
there's no need to build things themselves.  This is not like C code, 
where folks frequently need to build releases on different platforms. 
The predominant reason folks build Lucene is because they want to tweak 
it.  And once it is tweaked, it is no longer a released version.


It is no problem for me to patch common-build.xml for the 1.9-final 
release of PyLucene from Java Lucene, made from the lucene_1_9 branch, 
and then revert back to the trunk, without the patch, for the future 
ongoing 2.0 dev builds and PyLucene releases.


There's no need to patch: you can simply run 'ant -Dversion=XXX'.

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Index compatibility question

2006-03-01 Thread Daniel Naber

On Mittwoch 01 März 2006 16:21, DM Smith wrote:

 I find that 1.9 reads my 1.4.3 built indexes just fine. But not the
 other way around.

That's exactly how it is supposed to be.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object

2006-03-01 Thread Steven Tamm (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-505?page=all ]

Steven Tamm updated LUCENE-505:
---

Attachment: NormFactors.patch

Sorry, I didn't remove whitespace in the previous patch.  This one's easier to 
read.

svn diff --diff-cmd diff -x -b -u works better than svn diff --diff-cmd diff 
-x -b -x -u

 MultiReader.norm() takes up too much memory: norms byte[] should be made into 
 an Object
 ---

  Key: LUCENE-505
  URL: http://issues.apache.org/jira/browse/LUCENE-505
  Project: Lucene - Java
 Type: Improvement
   Components: Index
 Versions: 1.9
  Environment: Patch is against Lucene 1.9 trunk (as of Mar 1 06)
 Reporter: Steven Tamm
  Attachments: NormFactors.patch, NormFactors.patch

 MultiReader.norms() is very inefficient: it has to construct a byte array 
 that's as long as all the documents in every segment.  This doubles the 
 memory requirement for scoring MultiReaders vs. Segment Readers.  Although 
 this is cached, it's still a baseline of memory that is unnecessary.
 The problem is that the Normalization Factors are passed around as a byte[].  
 If it were instead replaced with an Object, you could perform a whole host of 
 optimizations
 a.  When reading, you wouldn't have to construct a fakeNorms array of all 
 1.0fs.  You could instead return a singleton object that would just return 
 1.0f.
 b.  MultiReader could use an object that could delegate to NormFactors of the 
 subreaders
 c.  You could write an implementation that could use mmap to access the norm 
 factors.  Or if the index isn't long lived, you could use an implementation 
 that reads directly from the disk.
 The patch provided here replaces the use of byte[] with a new abstract class 
 called NormFactors.  
 NormFactors has two methods on it
 public abstract byte getByte(int doc) throws IOException;  // Returns the 
 byte[doc]
 public float getFactor(int doc) throws IOException;// Calls 
 Similarity.decodeNorm(getByte(doc))
 There are four implementations of this abstract class
 1.  NormFactors.EmptyNormFactors - This replaces the fakeNorms with a 
 singleton that only returns 1.0
 2.  NormFactors.ByteNormFactors - Converts a byte[] to a NormFactors for 
 backwards compatibility in constructors.
 3.  MultiNormFactors - Multiplexes the NormFactors in MultiReader to prevent 
 the need to construct the gigantic norms array.
 4.  SegmentReader.Norm - Same class, but now extends NormFactors to provide 
 the same access.
 In addition, Many of the Query and Scorer classes were changes to pass around 
 NormFactors instead of byte[], and to call getFactor() instead of using the 
 byte[].  I have kept around IndexReader.norms(String) for backwards 
 compatibiltiy, but marked it as deprecated.  I believe that the use of 
 ByteNormFactors in IndexReader.getNormFactors() will keep backward 
 compatibility with other IndexReader implementations, but I don't know how to 
 test that.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object

2006-03-01 Thread Steven Tamm (JIRA)

[ 
http://issues.apache.org/jira/browse/LUCENE-505?page=comments#action_12368389 ] 

Steven Tamm commented on LUCENE-505:


 I also worry about performance with this change. Have you benchmarked this 
 while searching large indexes?
yes.  see below.  

 For example, in TermScorer.score(HitCollector, int), Lucene's innermost loop, 
 you change two array accesses into a call to an interface. That could make a 
 substantial difference. Small changes to that method can cause significant 
 performance changes. 

Specifically you change two array accesses into a call to an interface.  I 
have changed two byte array references (one of which is static), to a method 
call on an abstract class.  I'm using JDK 1.5.0_06.  Hotspot inlines both calls 
and performance was about the same with a 1M docs index (we have a low term/doc 
ratio, so we have about 8.5M terms).  HPROF doesn't even see the call to 
Similarity.decodeNorm.  If I was using JDK 1.3, I'd probably agree with you, 
but HotSpot is very good at figuring this stuff out and autoinlining the calls.

As for the numbers: an average request returning 5000 hits from our 0.5G index 
was at ~485ms average on my box before.  It's now at ~480ms.  (50 runs each).  
Most of that is overhead, granted.  

The increase in performance may be obscured by my other change in TermScorer 
(LUCENE-502).  I'm not sure of the history of TermScorer, but it seems heavily 
optimized for a Large # Terms/Document.  We have a low # Terms/Document, so 
performance suffers greatly..  Performance was dramatically improved by not 
unnecessarily caching things.   TermScorer seems to be heavily optimized for a 
non-modern VM (like inlining next() into score(), caching result of Math.sqrt 
for each term being queried, having a doc/freq cache that provides no benefit 
unless iterating backwards, etc).  The total of the term scorer changes brought 
the average down from ~580ms. 

Since we use a lot of large indexes and don't keep them in memory all that 
often, our performance increases dramatically due to the reduction in GC 
overhead.  As we move to not actually storing the Norms array in memory but 
instead using the disk, this change will have an even higher benefit.  I'm in 
the process of preparing a set of patches that will help people that don't have 
long-lived indexes, and this is just one part.

 MultiReader.norm() takes up too much memory: norms byte[] should be made into 
 an Object
 ---

  Key: LUCENE-505
  URL: http://issues.apache.org/jira/browse/LUCENE-505
  Project: Lucene - Java
 Type: Improvement
   Components: Index
 Versions: 2.0
  Environment: Patch is against Lucene 1.9 trunk (as of Mar 1 06)
 Reporter: Steven Tamm
  Attachments: NormFactors.patch, NormFactors.patch

 MultiReader.norms() is very inefficient: it has to construct a byte array 
 that's as long as all the documents in every segment.  This doubles the 
 memory requirement for scoring MultiReaders vs. Segment Readers.  Although 
 this is cached, it's still a baseline of memory that is unnecessary.
 The problem is that the Normalization Factors are passed around as a byte[].  
 If it were instead replaced with an Object, you could perform a whole host of 
 optimizations
 a.  When reading, you wouldn't have to construct a fakeNorms array of all 
 1.0fs.  You could instead return a singleton object that would just return 
 1.0f.
 b.  MultiReader could use an object that could delegate to NormFactors of the 
 subreaders
 c.  You could write an implementation that could use mmap to access the norm 
 factors.  Or if the index isn't long lived, you could use an implementation 
 that reads directly from the disk.
 The patch provided here replaces the use of byte[] with a new abstract class 
 called NormFactors.  
 NormFactors has two methods on it
 public abstract byte getByte(int doc) throws IOException;  // Returns the 
 byte[doc]
 public float getFactor(int doc) throws IOException;// Calls 
 Similarity.decodeNorm(getByte(doc))
 There are four implementations of this abstract class
 1.  NormFactors.EmptyNormFactors - This replaces the fakeNorms with a 
 singleton that only returns 1.0
 2.  NormFactors.ByteNormFactors - Converts a byte[] to a NormFactors for 
 backwards compatibility in constructors.
 3.  MultiNormFactors - Multiplexes the NormFactors in MultiReader to prevent 
 the need to construct the gigantic norms array.
 4.  SegmentReader.Norm - Same class, but now extends NormFactors to provide 
 the same access.
 In addition, Many of the Query and Scorer classes were changes to pass around 
 NormFactors instead of byte[], and to call getFactor() instead of using the 
 byte[].  I have kept around IndexReader.norms(String) for backwards 
 compatibiltiy, but marked it as deprecated.  I believe that the use of 
 ByteNormFactors in

[jira] Created: (LUCENE-506) Optimize Memory Use for Short-Lived Indexes (Do not load TermInfoIndex if you know the queries ahead of time)

2006-03-01 Thread Steven Tamm (JIRA)

Optimize Memory Use for Short-Lived Indexes (Do not load TermInfoIndex if you 
know the queries ahead of time)
-

 Key: LUCENE-506
 URL: http://issues.apache.org/jira/browse/LUCENE-506
 Project: Lucene - Java
Type: Improvement
  Components: Index  
Versions: 2.0
 Environment: Patch against Lucene 1.9 trunk as of Mar 1 06
Reporter: Steven Tamm


Summary: Provide a way to avoid loading the TermInfoIndex into memory if you 
know all the terms you are ever going to query.

In our search environment, we have a large number of indexes (many thousands), 
any of which may be queried by any number of hosts.  These indexes may be very 
large (~1M document), but since we have a low term/doc ratio, we have 7-11M 
terms.  With an index interval of 128, that means ~70-90K terms.  On loading 
the index, it instantiates a Term, a TermInfo, a String, and a char[].  When 
the document is long lived, this makes some sense because you can quickly 
search the list of terms using binary search.  However, since we throw away the 
Indexes very often, a lot of garbage is created per query

Here's an example where we load a large index 10 times.  This corresponds to 
7MB of garbage per query.
  percent  live  alloc'ed  stack class
 rank   self  accum bytes objs bytes  objs trace name
1  4.48%  4.48%   4678736 128946  23393680 644730 387749 char[]
3  3.95% 12.61%   4126272 128946  20631360 644730 387751 
org.apache.lucene.index.TermInfo
6  2.96% 22.71%   3094704 128946  15473520 644730 387748 java.lang.String
8  1.98% 26.97%   2063136 128946  10315680 644730 387750 
org.apache.lucene.index.Term

This adds up after a while.  Since we know exactly which Terms we're going to 
search for before even opening the index, there's no need to allocate this much 
memory.  Upon opening the index, we can go through the TII in sequential order 
and retrieve the entries into the main term dictionary and reduce the storage 
requirements dramatically.  This reduces the amount of garbage generated by 
querying by about 60% if you only make 1 query/index with a 77% increase in 
throughput.

This is accomplished by factoring out the index loading aspects of 
TermInfosReader into a new file, SegmentTermInfosReader.  TermInfosReader 
becomes a base class to allow access to terms.  A new class, 
PrefetchedTermInfosReader will, upon startup, sort the passed in terms and 
retrieve the IndexEntries for those terms.  IndexReader and SegmentReader are 
modified to take new constructor methods that take a Collection of Terms that 
correspond to the total set of terms that will ever be searched in the life of 
the index.

In order to support the skipping behavior, some changes need to be made to 
SegmentTermEnum: specifically, we need to be able to go back an entry in order 
to retrieve the previous TermInfo and IndexPointer.  This is because, unlike 
the normal case, with the index  we want to return the value right before the 
intended field (so that we can be behind the desired termin the main 
dictionary).   For example, if we're looking for  apple in the index,  and 
the two adjacent values are abba and argon, we want to return abba 
instead of argon.  That way we won't miss any terms in the real index.   This 
code is confusing; it should probably be moved to an subclass of TermBuffer, 
but that required more code.  Not wanting to modify TermBuffer to keep it 
small, also lead to the odd NPE catch in SegmentTermEnum.java.  Stickler for 
contracts may want to rename SegmentTermEnum.skipTo() to a different name 
because it implements a different contract: but it would be useful for anyone 
trying to skip around in the TII, so I figured it was the right thing to do.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-506) Optimize Memory Use for Short-Lived Indexes (Do not load TermInfoIndex if you know the queries ahead of time)

2006-03-01 Thread Steven Tamm (JIRA)

 [ http://issues.apache.org/jira/browse/LUCENE-506?page=all ]

Steven Tamm updated LUCENE-506:
---

Attachment: Prefetching.patch

This also includes two additional test cases.  The public exposure to the 
prefetching is controlled solely by IndexReader.open(Directory,Collection).  If 
you try to query a term that wasn't included, 

This also includes some wildcard handling.  I'm not sure it's absolutely 
necessary for WildcardTermEnum or FuzzyTermEnum.  You can probably remove the 
entire if (entry == null) block of PrefetchedTermInfosReader.seekEnum.  But 
this provides more flexibility.

 Optimize Memory Use for Short-Lived Indexes (Do not load TermInfoIndex if you 
 know the queries ahead of time)
 -

  Key: LUCENE-506
  URL: http://issues.apache.org/jira/browse/LUCENE-506
  Project: Lucene - Java
 Type: Improvement
   Components: Index
 Versions: 2.0
  Environment: Patch against Lucene 1.9 trunk as of Mar 1 06
 Reporter: Steven Tamm
  Attachments: Prefetching.patch

 Summary: Provide a way to avoid loading the TermInfoIndex into memory if you 
 know all the terms you are ever going to query.
 In our search environment, we have a large number of indexes (many 
 thousands), any of which may be queried by any number of hosts.  These 
 indexes may be very large (~1M document), but since we have a low term/doc 
 ratio, we have 7-11M terms.  With an index interval of 128, that means 
 ~70-90K terms.  On loading the index, it instantiates a Term, a TermInfo, a 
 String, and a char[].  When the document is long lived, this makes some sense 
 because you can quickly search the list of terms using binary search.  
 However, since we throw away the Indexes very often, a lot of garbage is 
 created per query
 Here's an example where we load a large index 10 times.  This corresponds to 
 7MB of garbage per query.
   percent  live  alloc'ed  stack class
  rank   self  accum bytes objs bytes  objs trace name
 1  4.48%  4.48%   4678736 128946  23393680 644730 387749 char[]
 3  3.95% 12.61%   4126272 128946  20631360 644730 387751 
 org.apache.lucene.index.TermInfo
 6  2.96% 22.71%   3094704 128946  15473520 644730 387748 java.lang.String
 8  1.98% 26.97%   2063136 128946  10315680 644730 387750 
 org.apache.lucene.index.Term
 This adds up after a while.  Since we know exactly which Terms we're going to 
 search for before even opening the index, there's no need to allocate this 
 much memory.  Upon opening the index, we can go through the TII in sequential 
 order and retrieve the entries into the main term dictionary and reduce the 
 storage requirements dramatically.  This reduces the amount of garbage 
 generated by querying by about 60% if you only make 1 query/index with a 77% 
 increase in throughput.
 This is accomplished by factoring out the index loading aspects of 
 TermInfosReader into a new file, SegmentTermInfosReader.  TermInfosReader 
 becomes a base class to allow access to terms.  A new class, 
 PrefetchedTermInfosReader will, upon startup, sort the passed in terms and 
 retrieve the IndexEntries for those terms.  IndexReader and SegmentReader are 
 modified to take new constructor methods that take a Collection of Terms that 
 correspond to the total set of terms that will ever be searched in the life 
 of the index.
 In order to support the skipping behavior, some changes need to be made to 
 SegmentTermEnum: specifically, we need to be able to go back an entry in 
 order to retrieve the previous TermInfo and IndexPointer.  This is because, 
 unlike the normal case, with the index  we want to return the value right 
 before the intended field (so that we can be behind the desired termin the 
 main dictionary).   For example, if we're looking for  apple in the index,  
 and the two adjacent values are abba and argon, we want to return abba 
 instead of argon.  That way we won't miss any terms in the real index.   
 This code is confusing; it should probably be moved to an subclass of 
 TermBuffer, but that required more code.  Not wanting to modify TermBuffer to 
 keep it small, also lead to the odd NPE catch in SegmentTermEnum.java.  
 Stickler for contracts may want to rename SegmentTermEnum.skipTo() to a 
 different name because it implements a different contract: but it would be 
 useful for anyone trying to skip around in the TII, so I figured it was the 
 right thing to do.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional

[jira] Created: (LUCENE-508) SegmentTermEnum.next() doesn't maintain prevBuffer at end

2006-03-01 Thread Steven Tamm (JIRA)

SegmentTermEnum.next() doesn't maintain prevBuffer at end
-

 Key: LUCENE-508
 URL: http://issues.apache.org/jira/browse/LUCENE-508
 Project: Lucene - Java
Type: Bug
  Components: Index  
Versions: 1.9, 2.0
 Environment: Lucene Trunk
Reporter: Steven Tamm


When you're iterating a SegmentTermEnum and you go past the end of the docs, 
you end up with a state where the nextBuffer = null and the prevBuffer is the 
penultimate term, not the last term.  This patch fixes it.  (It's also required 
for my Prefetching bug [LUCENE-506])

Index: java/org/apache/lucene/index/SegmentTermEnum.java
===
--- java/org/apache/lucene/index/SegmentTermEnum.java   (revision 382121)
+++ java/org/apache/lucene/index/SegmentTermEnum.java   (working copy)
@@ -109,6 +109,7 @@
   /** Increments the enumeration to the next element.  True if one exists.*/
   public final boolean next() throws IOException {
 if (position++ = size - 1) {
+  prevBuffer.set(termBuffer);
   termBuffer.reset();
   return false;
 }


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (LUCENE-503) Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene

FuzzyQuery / PriorityQueue BUG

Re: FuzzyQuery / PriorityQueue BUG

[jira] Created: (LUCENE-504) FuzzyQuery produces a java.lang.NegativeArraySizeException in PriorityQueue.initialize if I use Integer.MAX_VALUE as BooleanQuery.MaxClauseCount

[jira] Updated: (LUCENE-504) FuzzyQuery produces a java.lang.NegativeArraySizeException in PriorityQueue.initialize if I use Integer.MAX_VALUE as BooleanQuery.MaxClauseCount

Re: version issue ?

Re: Index compatibility question

[jira] Updated: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object

[jira] Commented: (LUCENE-505) MultiReader.norm() takes up too much memory: norms byte[] should be made into an Object

[jira] Created: (LUCENE-506) Optimize Memory Use for Short-Lived Indexes (Do not load TermInfoIndex if you know the queries ahead of time)

[jira] Updated: (LUCENE-506) Optimize Memory Use for Short-Lived Indexes (Do not load TermInfoIndex if you know the queries ahead of time)

[jira] Created: (LUCENE-508) SegmentTermEnum.next() doesn't maintain prevBuffer at end

12 matches

Site Navigation

Mail list logo

Footer information