Re: lucene indexing and merge process

2007-10-19 Thread Michael McCandless

It seems like there are (at least) two angles here for getting better
performance from FieldCache:

  1) Be incremental: with reopen() we should only have to update a
 subset of the array in the FieldCache, according to the changed
 segments.  This is what Hoss is working on and Mark was referring
 to and I think it's very important!

  2) Parsing is slow (?): I'm guessing one of the reasons that John
 added the _X.udt file was because it's much faster to load an
 array of already-parsed ints than to ask FieldCache to populate
 itself.

Even if we do #1, I think #2 could be a big win (in addition)?  John
do you have any numbers of how much faster it is to load the array of
ints from the _X.udt file vs having FieldCache populate itself?

Also on the original question of can we open up SegmentReader,
FieldsWriter, etc., I think that's a good idea?  At least we can make
things protected instead of private/final?

Mike

Ning Li [EMAIL PROTECTED] wrote:
 I see what you mean by 2) now. What Mark said should work for you when
 it's done.
 
 Cheers,
 Ning
 
 On 10/18/07, John Wang [EMAIL PROTECTED] wrote:
  Hi Ning:
  That is essentially what field cache does. Doing this for each docid in
  the result set will be slow if the result set is large. But loading it in
  memory when opening index can also be slow if the index is large and updates
  often.
 
  Thanks
 
  -John
 
  On 10/18/07, Ning Li [EMAIL PROTECTED] wrote:
  
   Make all documents have a term, say ID:UID, and for each document,
   store its UID in the term's payload. You can read off this posting
   list to create your array. Will this work for you, John?
  
   Cheers,
   Ning
  
  
   On 10/18/07, Erik Hatcher [EMAIL PROTECTED] wrote:
Forwarding this to java-dev per request.  Seems like the best place
to discuss this topic.
   
Erik
   
   
Begin forwarded message:
   
 From: John Wang [EMAIL PROTECTED]
 Date: October 17, 2007 5:43:29 PM EDT
 To: [EMAIL PROTECTED]
 Subject: lucene indexing and merge process

 Hi Erik:

 We are revamping our search system here at LinekdIn. And we are
 using Lucene.

 One issue we ran across is that we store an UID in Lucene which
 we map to the DB storage. So given a docid, to lookup its UID, we
 have the following solutions:

 1) Index it as a Stored field and get it from reader.document (very
 slow if recall is large)
 2) Load/Warmup the FieldCache (for large corpus, loading up the
 indexreader can be slow)
 3) construct it using the FieldCache and persist it on disk
 everytime the index changes. (not suitable for real time indexing,
 e.g. this process will degrade as # of documents get large)

 None of the above solutions turn out to be adequate for our
 requirements.

  What we end up doing is to modify Lucene code by changing
 SegmentReader,DocumentWriter,and FieldWriter classes by taking
 advantage of the Lucene Segment/merge process. E.g:

  For each segment, we store a .udt file, which is an int[]
 array, (by changing the FieldWriter class)

  And SegmentReader will load the .udt file into an array.

  And merge happens seemlessly.

  Because the tight encapsulation around these classes, e.g.
 private and final methods, it is very difficult to extend Lucene
 while avoiding branch into our own version. Is there a way we can
 open up and make these classes extensible? We'd be happy to
 contribute what we have done.

  I guess to tackle the problem from a different angle: is there
 a way to incorporate FieldCache into the segments (it is strictly
 in memory now), and build disk versions while indexing.


  Hope I am making sense.

 I did not send this out to the mailing list because I wasn't
 sure if this is a dev question or an user question, feel free to
 either forward it to the right mailing list or let me know and I
 can forward it.


 Thanks

 -John

   
   
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
   
   
  
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
  
  
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1020) Basic tool for checking repairing an index

2007-10-19 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-1020:
---

Attachment: LUCENE-1020.take2.patch

Attached patch: another rev of this tool, with a few minor additions.

I plan to commit in a day or two.


 Basic tool for checking  repairing an index
 

 Key: LUCENE-1020
 URL: https://issues.apache.org/jira/browse/LUCENE-1020
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Affects Versions: 2.3
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Attachments: LUCENE-1020.patch, LUCENE-1020.take2.patch


 This has been requested a number of times on the mailing lists.  Most
 recently here:
   http://www.gossamer-threads.com/lists/lucene/java-user/53474
 I think we should provide a basic tool out of the box.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-743) IndexReader.reopen()

2007-10-19 Thread Michael Busch (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536353
 ] 

Michael Busch commented on LUCENE-743:
--

Hi Mike,

I'm not sure if I fully understand your comment. Consider the following (quite 
constructed) example:

{code:java}
IndexReader reader1 = IndexReader.open(index1);  // optimized index, reader1 is 
a SegmentReader
IndexReader multiReader1 = new MultiReader(new IndexReader[] {reader1, 
IndexReader.open(index2)});
... // modify index2
IndexReader multiReader2 = multiReader1.reopen();  
// only index2 changed, so multiReader2 uses reader1 and has to increment the 
refcount of reader1
... // modify index1
IndexReader reader2 = reader1.reopen();
// reader2 is a new instance that shares resources with reader1
... // modify index1
IndexReader reader3 = reader2.reopen();
// reader3 is a new instance that shares resources with reader1 and reader2
{code}

Now the user closes the readers in this order:
# multiReader1.close();
# multiReader2.close();
# reader2.close();
# reader3.close();

reader1 should be marked as closed after 2., right? Because 
multiReader1.close() and multiReader2.close() have to decrement reader1's 
refcount. But the underlying files have to remain open until after 4., because 
reader2 and reader3 use reader1's resources.

So don't we need 2 refcount values in reader1? One that tells us when the 
reader itself can be marked as closed, and one that tells when the resources 
can be closed? Then multiReader1 and multiReader2 would decrement the first 
refCount, whereas reader2 and reader3 both have to know reader1, so that they 
can decrement the second refcount.

I hope I'm just completely confused now and someone tells me that the whole 
thing is much simpler :-)



 IndexReader.reopen()
 

 Key: LUCENE-743
 URL: https://issues.apache.org/jira/browse/LUCENE-743
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Otis Gospodnetic
Assignee: Michael Busch
Priority: Minor
 Fix For: 2.3

 Attachments: IndexReaderUtils.java, lucene-743-take2.patch, 
 lucene-743.patch, lucene-743.patch, lucene-743.patch, MyMultiReader.java, 
 MySegmentReader.java, varient-no-isCloneSupported.BROKEN.patch


 This is Robert Engels' implementation of IndexReader.reopen() functionality, 
 as a set of 3 new classes (this was easier for him to implement, but should 
 probably be folded into the core, if this looks good).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-19 Thread John Wang
Hi Michael:
 Thanks for the info.

 I haven't played with payloads. Can you give me an example or point me
to how it is used to solve this problem?

Thanks

-John

On 10/19/07, Michael Busch [EMAIL PROTECTED] wrote:

 John Wang wrote:
 
   I can tried to get some numbers for leading an int[] array vs
  FieldCache.getInts().

 I've had a similar performance problem when I used the FieldCache. The
 loading performance is apparently so slow, because each value is stored
 as a term in the dictionary. For loading the cache it is necessary to
 iterate over all terms for the field in the dictionary. And for each
 term it's posting list is opened to check which documents have that value.

 If you store unique docIds, then there are no two documents that share
 the same value. That means, that each value gets its own entry in the
 dictionary and to load each value it is necessary to perform two random
 I/O seeks (one for term lookup + one to open the posting list).

 In my app it took for a big index several minutes to fill the cache like
 that.

 To speed things up I did essentially what Ning suggested. Now I store
 the values as payloads in the posting list of an artificial term. To
 fill my cache it's only necessary to perform a couple of I/O seeks for
 opening the posting list of the specific term, then it is just a
 sequential scan to load all values. With this approach the time for
 filling the cache went down from minutes to seconds!

 Now this approach is already much better than the current field cache
 implementation, but it still can be improved. In fact, we already have a
 mechanism for doing that: the norms. Norms are stored with a fixed size,
 which means both random access and sequential scan are optimal. Norms
 are also cached in memory, and filling that cache is much faster
 compared to the current FieldCache approach.

 I was therefore thinking about adding per-document payloads to Lucene
 (we can also call it document-metadata). The API could look like this:

 Document d = new Document();
 byte[] uidValue = ...
 d.addMetadata(uid, uidValue);

 And on the retrieval side all values could either be loaded into the
 field cache, or, if the index is too big, a new API can be used:

 IndexReader reader = IndexReader.open(...);
 DocumentMetadataIterator it = reader.metadataIterator(uid);

 where DocumentMetadataIterator is an interface similar to TermDocs:

 interface DocumentMetadataIterator {
   void seek(String name);
   boolean next();
   boolean skipTo(int doc);

   int doc();
   byte[] getMetadata();
 }

 The next question would be how to store the per-doc payloads (PDP). If
 all values have the same length (as the unique docIds), then we should
 store them as efficiently as possible, like the norms. However, we still
 want to offer the flexibility of having variable-length values. For this
 case we could use a new data structure similar to our posting list.

 PDPList   -- FixedLengthPDPList | VariableLengthPDPList,
 SkipList
 FixedLengthPDPList-- Payload^SegSize
 VariableLengthPDPList -- DocDelta, PayloadLength?, Payload
 Payload   -- Byte^PayloadLength
 PayloadLength -- VInt
 SkipList  -- see frq.file

 Because we don't have global field semantics Lucene should automatically
 pick the right data structure. This could work like this: When the
 DocumentsWriter writes a segment it checks whether all values of a PDP
 have the same length. If yes, it stores them as FixedLengthPDPList, if
 not, then as VariableLengthPDPList.
 When the SegmentMerger merges two or more segments it checks if all
 segments have a FixedLengthPDPList with the same length for a PDP. If
 not, it writes a VariableLengthPDPList to the new segment.

 I think this would be a nice new feature for Lucene. We could then have
 user-defined and Lucene-specific PDPs. For example, norms would be in
 the latter category (this way we would get rid of the special code for
 norms, as they could be handled as PDPs). It would also be easy to add
 new features in the future, like splitting the norms into two values: a
 norm and a boost value.

 OK lot's of thoughts, I'm sure I'll get lot's of comments too ... ;)

 - Michael

 
  Thanks
 
  -John
 
  On 10/19/07, Michael McCandless [EMAIL PROTECTED] wrote:
 
  It seems like there are (at least) two angles here for getting better
  performance from FieldCache:
 
1) Be incremental: with reopen() we should only have to update a
   subset of the array in the FieldCache, according to the changed
   segments.  This is what Hoss is working on and Mark was referring
   to and I think it's very important!
 
2) Parsing is slow (?): I'm guessing one of the reasons that John
   added the _X.udt file was because it's much faster to load an
   array of already-parsed ints than to ask FieldCache to populate
   itself.
 
  Even if we do #1, I think #2 could be a big win (in addition)?  John
  do you have any numbers of 

Per-document Payloads (was: Re: lucene indexing and merge process)

2007-10-19 Thread Michael Busch
John Wang wrote:
 
  I can tried to get some numbers for leading an int[] array vs
 FieldCache.getInts().

I've had a similar performance problem when I used the FieldCache. The
loading performance is apparently so slow, because each value is stored
as a term in the dictionary. For loading the cache it is necessary to
iterate over all terms for the field in the dictionary. And for each
term it's posting list is opened to check which documents have that value.

If you store unique docIds, then there are no two documents that share
the same value. That means, that each value gets its own entry in the
dictionary and to load each value it is necessary to perform two random
I/O seeks (one for term lookup + one to open the posting list).

In my app it took for a big index several minutes to fill the cache like
that.

To speed things up I did essentially what Ning suggested. Now I store
the values as payloads in the posting list of an artificial term. To
fill my cache it's only necessary to perform a couple of I/O seeks for
opening the posting list of the specific term, then it is just a
sequential scan to load all values. With this approach the time for
filling the cache went down from minutes to seconds!

Now this approach is already much better than the current field cache
implementation, but it still can be improved. In fact, we already have a
mechanism for doing that: the norms. Norms are stored with a fixed size,
which means both random access and sequential scan are optimal. Norms
are also cached in memory, and filling that cache is much faster
compared to the current FieldCache approach.

I was therefore thinking about adding per-document payloads to Lucene
(we can also call it document-metadata). The API could look like this:

Document d = new Document();
byte[] uidValue = ...
d.addMetadata(uid, uidValue);

And on the retrieval side all values could either be loaded into the
field cache, or, if the index is too big, a new API can be used:

IndexReader reader = IndexReader.open(...);
DocumentMetadataIterator it = reader.metadataIterator(uid);

where DocumentMetadataIterator is an interface similar to TermDocs:

interface DocumentMetadataIterator {
  void seek(String name);
  boolean next();
  boolean skipTo(int doc);

  int doc();
  byte[] getMetadata();
}

The next question would be how to store the per-doc payloads (PDP). If
all values have the same length (as the unique docIds), then we should
store them as efficiently as possible, like the norms. However, we still
want to offer the flexibility of having variable-length values. For this
case we could use a new data structure similar to our posting list.

PDPList   -- FixedLengthPDPList | VariableLengthPDPList,
SkipList
FixedLengthPDPList-- Payload^SegSize
VariableLengthPDPList -- DocDelta, PayloadLength?, Payload
Payload   -- Byte^PayloadLength
PayloadLength -- VInt
SkipList  -- see frq.file

Because we don't have global field semantics Lucene should automatically
pick the right data structure. This could work like this: When the
DocumentsWriter writes a segment it checks whether all values of a PDP
have the same length. If yes, it stores them as FixedLengthPDPList, if
not, then as VariableLengthPDPList.
When the SegmentMerger merges two or more segments it checks if all
segments have a FixedLengthPDPList with the same length for a PDP. If
not, it writes a VariableLengthPDPList to the new segment.

I think this would be a nice new feature for Lucene. We could then have
user-defined and Lucene-specific PDPs. For example, norms would be in
the latter category (this way we would get rid of the special code for
norms, as they could be handled as PDPs). It would also be easy to add
new features in the future, like splitting the norms into two values: a
norm and a boost value.

OK lot's of thoughts, I'm sure I'll get lot's of comments too ... ;)

- Michael

 
 Thanks
 
 -John
 
 On 10/19/07, Michael McCandless [EMAIL PROTECTED] wrote:

 It seems like there are (at least) two angles here for getting better
 performance from FieldCache:

   1) Be incremental: with reopen() we should only have to update a
  subset of the array in the FieldCache, according to the changed
  segments.  This is what Hoss is working on and Mark was referring
  to and I think it's very important!

   2) Parsing is slow (?): I'm guessing one of the reasons that John
  added the _X.udt file was because it's much faster to load an
  array of already-parsed ints than to ask FieldCache to populate
  itself.

 Even if we do #1, I think #2 could be a big win (in addition)?  John
 do you have any numbers of how much faster it is to load the array of
 ints from the _X.udt file vs having FieldCache populate itself?

 Also on the original question of can we open up SegmentReader,
 FieldsWriter, etc., I think that's a good idea?  At least we can make
 things protected instead of private/final?

 Mike

 Ning Li [EMAIL 

[jira] Updated: (LUCENE-997) Add search timeout support to Lucene

2007-10-19 Thread Sean Timm (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Timm updated LUCENE-997:
-

Attachment: timeout.patch

Two issues are addressed in this latest patch:

1) Timeout support was not added to: public TopFieldDocs search(Weight weight, 
Filter filter, final int nDocs, Sort sort)

2) getCounter() in TimerThread was replaced by getMilliseconds()


 Add search timeout support to Lucene
 

 Key: LUCENE-997
 URL: https://issues.apache.org/jira/browse/LUCENE-997
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Sean Timm
Priority: Minor
 Attachments: LuceneTimeoutTest.java, timeout.patch, timeout.patch


 This patch is based on Nutch-308. 
 This patch adds support for a maximum search time limit. After this time is 
 exceeded, the search thread is stopped, partial results (if any) are returned 
 and the total number of results is estimated.
 This patch tries to minimize the overhead related to time-keeping by using a 
 version of safe unsynchronized timer.
 This was also discussed in an e-mail thread.
 http://www.nabble.com/search-timeout-tf3410206.html#a9501029

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Remove TermEnum.skipTo(Term target)

2007-10-19 Thread Doug Cutting

Karl Wettin wrote:
So what's up with this method? Did anyone ever figure out what it is 
used for?


I found the origin of it.

It was added in 2004:
  http://svn.apache.org/viewvc?view=revrevision=150206.

This was to fix issue:
  http://issues.apache.org/bugzilla/show_bug.cgi?id=18927

But the code was originally written in 2001 (!) by Dmitry Serebrennikov. 
 The original patch is at:


http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200110.mbox/[EMAIL 
PROTECTED]

Doug

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: lucene indexing and merge process

2007-10-19 Thread John Wang
Hi Mike:

 This is an excellent analysis.

 To do 2), we tried computing the field cache at indexing time to avoid
parsing at search time. But what we've found out was that this degrades
indexing (because it computes the entire fieldcache, not in segements) which
was not acceptable to our project either.

 I can tried to get some numbers for leading an int[] array vs
FieldCache.getInts().

Thanks

-John

On 10/19/07, Michael McCandless [EMAIL PROTECTED] wrote:


 It seems like there are (at least) two angles here for getting better
 performance from FieldCache:

   1) Be incremental: with reopen() we should only have to update a
  subset of the array in the FieldCache, according to the changed
  segments.  This is what Hoss is working on and Mark was referring
  to and I think it's very important!

   2) Parsing is slow (?): I'm guessing one of the reasons that John
  added the _X.udt file was because it's much faster to load an
  array of already-parsed ints than to ask FieldCache to populate
  itself.

 Even if we do #1, I think #2 could be a big win (in addition)?  John
 do you have any numbers of how much faster it is to load the array of
 ints from the _X.udt file vs having FieldCache populate itself?

 Also on the original question of can we open up SegmentReader,
 FieldsWriter, etc., I think that's a good idea?  At least we can make
 things protected instead of private/final?

 Mike

 Ning Li [EMAIL PROTECTED] wrote:
  I see what you mean by 2) now. What Mark said should work for you when
  it's done.
 
  Cheers,
  Ning
 
  On 10/18/07, John Wang [EMAIL PROTECTED] wrote:
   Hi Ning:
   That is essentially what field cache does. Doing this for each
 docid in
   the result set will be slow if the result set is large. But loading it
 in
   memory when opening index can also be slow if the index is large and
 updates
   often.
  
   Thanks
  
   -John
  
   On 10/18/07, Ning Li [EMAIL PROTECTED] wrote:
   
Make all documents have a term, say ID:UID, and for each document,
store its UID in the term's payload. You can read off this posting
list to create your array. Will this work for you, John?
   
Cheers,
Ning
   
   
On 10/18/07, Erik Hatcher [EMAIL PROTECTED] wrote:
 Forwarding this to java-dev per request.  Seems like the best
 place
 to discuss this topic.

 Erik


 Begin forwarded message:

  From: John Wang [EMAIL PROTECTED]
  Date: October 17, 2007 5:43:29 PM EDT
  To: [EMAIL PROTECTED]
  Subject: lucene indexing and merge process
 
  Hi Erik:
 
  We are revamping our search system here at LinekdIn. And we
 are
  using Lucene.
 
  One issue we ran across is that we store an UID in Lucene
 which
  we map to the DB storage. So given a docid, to lookup its UID,
 we
  have the following solutions:
 
  1) Index it as a Stored field and get it from reader.document(very
  slow if recall is large)
  2) Load/Warmup the FieldCache (for large corpus, loading up the
  indexreader can be slow)
  3) construct it using the FieldCache and persist it on disk
  everytime the index changes. (not suitable for real time
 indexing,
  e.g. this process will degrade as # of documents get large)
 
  None of the above solutions turn out to be adequate for our
  requirements.
 
   What we end up doing is to modify Lucene code by changing
  SegmentReader,DocumentWriter,and FieldWriter classes by taking
  advantage of the Lucene Segment/merge process. E.g:
 
   For each segment, we store a .udt file, which is an int[]
  array, (by changing the FieldWriter class)
 
   And SegmentReader will load the .udt file into an array.
 
   And merge happens seemlessly.
 
   Because the tight encapsulation around these classes, e.g.
  private and final methods, it is very difficult to extend Lucene
  while avoiding branch into our own version. Is there a way we
 can
  open up and make these classes extensible? We'd be happy to
  contribute what we have done.
 
   I guess to tackle the problem from a different angle: is
 there
  a way to incorporate FieldCache into the segments (it is
 strictly
  in memory now), and build disk versions while indexing.
 
 
   Hope I am making sense.
 
  I did not send this out to the mailing list because I wasn't
  sure if this is a dev question or an user question, feel free to
  either forward it to the right mailing list or let me know and I
  can forward it.
 
 
  Thanks
 
  -John
 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


   
   
 

[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: CachedTokenStream.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: Encoder.java, Formatter.java, Highlighter.java, 
 Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, 
 HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, 
 HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, 
 QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, 
 SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java, 
 SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, 
 WeightedSpanTerm.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: CachedTokenStream.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: Encoder.java, Formatter.java, Highlighter.java, 
 Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, 
 HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, 
 HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, 
 QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, 
 SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java, 
 SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, 
 WeightedSpanTerm.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: SpanHighlighterTest.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: SpanScorer.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: WeightedSpanTerm.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: SpanHighlighterTest.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: QuerySpansExtractor.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: MemoryIndex.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: SpanScorer.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: SimpleFormatter.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: QuerySpansExtractor.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: HighlighterTest.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: QuerySpansExtractor.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: Highlighter.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: Highlighter.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: Highlighter.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: HighlighterTest.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: HighlighterTest.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: Highlighter.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: Formatter.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: Encoder.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: Formatter.java, Highlighter.java, Highlighter.java, 
 Highlighter.java, Highlighter.java, Highlighter.java, HighlighterTest.java, 
 HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, 
 MemoryIndex.java, QuerySpansExtractor.java, QuerySpansExtractor.java, 
 QuerySpansExtractor.java, QuerySpansExtractor.java, SimpleFormatter.java, 
 spanhighlighter.patch, spanhighlighter10.patch, spanhighlighter11.patch, 
 spanhighlighter12.patch, spanhighlighter2.patch, spanhighlighter3.patch, 
 spanhighlighter5.patch, spanhighlighter6.patch, spanhighlighter7.patch, 
 spanhighlighter8.patch, spanhighlighter9.patch, spanhighlighter_patch_4.zip, 
 SpanHighlighterTest.java, SpanHighlighterTest.java, SpanScorer.java, 
 SpanScorer.java, WeightedSpanTerm.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: DefaultEncoder.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: Encoder.java, Formatter.java, Highlighter.java, 
 Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, 
 HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, 
 HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, 
 QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, 
 SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java, 
 SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, 
 WeightedSpanTerm.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: spanhighlighter12.patch

Nice little addition courtesy of Michael Goddard:

...In our Lucene work, we took the approach of indexing all fields into a 
single field, FULLTEXT, which is the default field for queries.  Our query 
syntax is such that a user can combine clauses against named fields with 
clauses with no field specified.  When we go to highlight such queries, if a 
given clause is against this FULLTEXT field but we're highlighting text in the 
TITLE field, we'd still like for matching terms to be highlighted...

Thanks for the patch Micahael.

There is a new constructor that allows you to specify a default field. Terms 
from this field will be highlighted regardless of the specific field you are 
highlighting.

Only file to worry about in that huge mess of files listed above is 
spanhighlighter12.patch.

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: Encoder.java, Formatter.java, Highlighter.java, 
 Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, 
 HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, 
 HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, 
 QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, 
 SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java, 
 SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, 
 WeightedSpanTerm.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-550) InstantiatedIndex - faster but memory consuming index

2007-10-19 Thread Karl Wettin (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wettin updated LUCENE-550:
---

Attachment: LUCENE-550_20071019_no_core_changes.txt

In this patch:

 * IndexReader.terms(Term) optimization, initial seek now jit-call away given 
the term exists, rather than using binary search.
 * A handful of minor optimizations
 * IndexReader.version() mimics Segment-dito

 

 InstantiatedIndex - faster but memory consuming index
 -

 Key: LUCENE-550
 URL: https://issues.apache.org/jira/browse/LUCENE-550
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Affects Versions: 2.0.0
Reporter: Karl Wettin
Assignee: Grant Ingersoll
 Attachments: HitCollectionBench.jpg, lucene-550.jpg, 
 LUCENE-550_20070804_no_core_changes.txt, 
 LUCENE-550_20070808_no_core_changes.txt, 
 LUCENE-550_20070817_no_core_changes.txt, 
 LUCENE-550_20070928_no_core_changes.txt, 
 LUCENE-550_20071008_no_core_changes.txt, 
 LUCENE-550_20071017_no_core_changes.txt, 
 LUCENE-550_20071019_no_core_changes.txt, test-reports.zip, trunk.diff.bz2, 
 trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, 
 trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, trunk.diff.bz2, 
 trunk.diff.bz2, trunk.diff.bz2


 An non file centrinc all in memory index. Consumes some 2x the memory of a 
 RAMDirectory (in a term satured index) but is between 3x-60x faster depending 
 on application and how one counts. Average query is about 8x faster. 
 IndexWriter and IndexModifier have been realized in InterfaceIndexWriter and 
 InterfaceIndexModifier. 
 InstantiatedIndex is wrapped in a new top layer index facade (class Index) 
 that comes with factory methods for writers, readers and searchers for unison 
 index handeling. There are decorators with notification handling that can be 
 used for automatically syncronizing searchers on updates, et.c. 
 Index also comes with FS/RAMDirectory implementation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-1031) Fixes a handful of misspellings/mistakes in changes.txt

2007-10-19 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-1031.


   Resolution: Fixed
Fix Version/s: 2.3
Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])

Fixed -- thanks Mark!

 Fixes a handful of misspellings/mistakes in changes.txt
 ---

 Key: LUCENE-1031
 URL: https://issues.apache.org/jira/browse/LUCENE-1031
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Affects Versions: 2.3
Reporter: Mark Miller
Assignee: Michael McCandless
Priority: Trivial
 Fix For: 2.3

 Attachments: changes.txt.patch


 There are a handful of misspellings/mistakes in changes.txt. This patch fixes 
 them. Avoided the one or two British to English conversions g

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1031) Fixes a handful of misspellings/mistakes in changes.txt

2007-10-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12536205
 ] 

Michael McCandless commented on LUCENE-1031:


Sheesh, we committers really can't spell!!

Thanks Mark!

I'll commit.

 Fixes a handful of misspellings/mistakes in changes.txt
 ---

 Key: LUCENE-1031
 URL: https://issues.apache.org/jira/browse/LUCENE-1031
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Affects Versions: 2.3
Reporter: Mark Miller
Priority: Trivial
 Attachments: changes.txt.patch


 There are a handful of misspellings/mistakes in changes.txt. This patch fixes 
 them. Avoided the one or two British to English conversions g

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Remove TermEnum.skipTo(Term target)

2007-10-19 Thread Karl Wettin

Wolfgang Hoschek wrote at Wed, 04 May 2005 20:59:24 GMT:
I was considering an efficient impl of TermEnum.skipTo(Term target)  
for

the MemoryIndex. But then I realized that nothing anywhere in Lucene
calls that method. It's effectively dead code; a remainder of a
previous ice age - nothing would break if it would be removed. I'd
suggest doing so unless I'm missing something.


So what's up with this method? Did anyone ever figure out what it is  
used for?


--
karl

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Commented: (LUCENE-1029) Illegal character replacements in ISOLatin1AccentFilter

2007-10-19 Thread Mark Miller



If you are to compare with stemmers, consider that these creates unique tokens 
that does not interfere with semantic meanings.
  
Not starting anything here again, but it took me so darn long to find 
something that porter stems and kills the semantic meaning that I had to 
share. That damn algorithm is amazing...I was coming to the conclusion 
that it was absolutely perfect on the English language...until after a 
couple days of searching I found international goes to intern. Eureka! 
Though a hollow victory at best. That algorithm is pretty amazing...


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1031) Fixes a handful of misspellings/mistakes in changes.txt

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1031:


Attachment: changes.txt.patch

 Fixes a handful of misspellings/mistakes in changes.txt
 ---

 Key: LUCENE-1031
 URL: https://issues.apache.org/jira/browse/LUCENE-1031
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Affects Versions: 2.3
Reporter: Mark Miller
Priority: Trivial
 Attachments: changes.txt.patch


 There are a handful of misspellings/mistakes in changes.txt. This patch fixes 
 them. Avoided the one or two British to English conversions g

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Created: (LUCENE-1031) Fixes a handful of misspellings/mistakes in changes.txt

2007-10-19 Thread Mark Miller (JIRA)
Fixes a handful of misspellings/mistakes in changes.txt
---

 Key: LUCENE-1031
 URL: https://issues.apache.org/jira/browse/LUCENE-1031
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Affects Versions: 2.3
Reporter: Mark Miller
Priority: Trivial
 Attachments: changes.txt.patch

There are a handful of misspellings/mistakes in changes.txt. This patch fixes 
them. Avoided the one or two British to English conversions g

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: HighlighterTest.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-794) Extend contrib Highlighter to properly support phrase queries and span queries

2007-10-19 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-794:
---

Attachment: (was: CachedTokenStream.java)

 Extend contrib Highlighter to properly support phrase queries and span queries
 --

 Key: LUCENE-794
 URL: https://issues.apache.org/jira/browse/LUCENE-794
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Other
Reporter: Mark Miller
Priority: Minor
 Attachments: Encoder.java, Formatter.java, Highlighter.java, 
 Highlighter.java, Highlighter.java, Highlighter.java, Highlighter.java, 
 HighlighterTest.java, HighlighterTest.java, HighlighterTest.java, 
 HighlighterTest.java, MemoryIndex.java, QuerySpansExtractor.java, 
 QuerySpansExtractor.java, QuerySpansExtractor.java, QuerySpansExtractor.java, 
 SimpleFormatter.java, spanhighlighter.patch, spanhighlighter10.patch, 
 spanhighlighter11.patch, spanhighlighter12.patch, spanhighlighter2.patch, 
 spanhighlighter3.patch, spanhighlighter5.patch, spanhighlighter6.patch, 
 spanhighlighter7.patch, spanhighlighter8.patch, spanhighlighter9.patch, 
 spanhighlighter_patch_4.zip, SpanHighlighterTest.java, 
 SpanHighlighterTest.java, SpanScorer.java, SpanScorer.java, 
 WeightedSpanTerm.java


 This patch adds a new Scorer class (SpanQueryScorer) to the Highlighter 
 package that scores just like QueryScorer, but scores a 0 for Terms that did 
 not cause the Query hit. This gives 'actual' hit highlighting for the range 
 of SpanQuerys and PhraseQuery. There is also a new Fragmenter that attempts 
 to fragment without breaking up Spans.
 See http://issues.apache.org/jira/browse/LUCENE-403 for some background.
 There is a dependency on MemoryIndex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]