[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-09 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621130#action_12621130
 ] 

Michael McCandless commented on LUCENE-1333:


DM, one pattern that makes me nervous is this, from QueryTermVector.java:

{code}
  for (Token next = stream.next(new Token()); next != null; next = 
stream.next(next)) {
{code}

I don't think you should be "recycling" that next and passing it back in the 
next time you call stream.next, because a TokenStream is not *required* to use 
the Token you had passed in and so suddenly you are potentially asking it to 
re-use a token it had previously returned, which it may not expect.  Likely it 
won't matter but I think this is still safer:

{code}
  final Token result = new Token();
  for (Token next = stream.next(result); next != null; next = 
stream.next(result)) {
{code}

> Token implementation needs improvements
> ---
>
> Key: LUCENE-1333
> URL: https://issues.apache.org/jira/browse/LUCENE-1333
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.3.1
> Environment: All
>Reporter: DM Smith
>Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1333-analysis.patch, LUCENE-1333-analyzers.patch, 
> LUCENE-1333-core.patch, LUCENE-1333-highlighter.patch, 
> LUCENE-1333-instantiated.patch, LUCENE-1333-lucli.patch, 
> LUCENE-1333-memory.patch, LUCENE-1333-miscellaneous.patch, 
> LUCENE-1333-queries.patch, LUCENE-1333-snowball.patch, 
> LUCENE-1333-wikipedia.patch, LUCENE-1333-wordnet.patch, 
> LUCENE-1333-xml-query-parser.patch, LUCENE-1333.patch, LUCENE-1333.patch, 
> LUCENE-1333.patch, LUCENE-1333a.txt
>
>
> This was discussed in the thread (not sure which place is best to reference 
> so here are two):
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200805.mbox/[EMAIL 
> PROTECTED]
> or to see it all at once:
> http://www.gossamer-threads.com/lists/lucene/java-dev/62851
> Issues:
> 1. JavaDoc is insufficient, leading one to read the code to figure out how to 
> use the class.
> 2. Deprecations are incomplete. The constructors that take String as an 
> argument and the methods that take and/or return String should *all* be 
> deprecated.
> 3. The allocation policy is too aggressive. With large tokens the resulting 
> buffer can be over-allocated. A less aggressive algorithm would be better. In 
> the thread, the Python example is good as it is computationally simple.
> 4. The parts of the code that currently use Token's deprecated methods can be 
> upgraded now rather than waiting for 3.0. As it stands, filter chains that 
> alternate between char[] and String are sub-optimal. Currently, it is used in 
> core by Query classes. The rest are in contrib, mostly in analyzers.
> 5. Some internal optimizations can be done with regard to char[] allocation.
> 6. TokenStream has next() and next(Token), next() should be deprecated, so 
> that reuse is maximized and descendant classes should be rewritten to 
> over-ride next(Token)
> 7. Tokens are often stored as a String in a Term. It would be good to add 
> constructors that took a Token. This would simplify the use of the two 
> together.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-09 Thread DM Smith (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621132#action_12621132
 ] 

DM Smith commented on LUCENE-1333:
--

I'll give my analysis here. Feel free to make the change or kick it back to me 
to make it, if you think your pattern is best. (If I do it, it will be after 
this weekend.)

I've tried to be consistent here. The prior pattern was inconsistent and often 
was:
{code}
 Token token = null;
  while ((token = input.next()) != null) {
{code}

There were other variants including "forever loops". As you noticed, I replace
There are two basic implementations of Token next(Token):
1) Producer: These create tokens from input. Their pattern is to take their 
argument and call clear on it and then set startOffset, endOffset and type 
appropriately. Their assumption is that they have to start with a pristine 
token and that other than space, there is nothing about the token that is 
passed in that can be reused.

2) Consumer: These "filter" their argument. Their only assumption is that in 
the call chain that there was a producer that created the token that they need 
to reuse. In this case, they typically will preserve startOffset and endOffset 
because those are to represent the position of the token in the input. They may 
refine type, flags and payload, but otherwise have to preserve them. Most 
typically, they will set the termBuffer. There are a few types of consumers. 
Here are some:
a) Transformational Filters: They take their argument and transform it's 
termBuffer.
b) Splitting Filters: They take their argument and split the token into 
several. Sometimes they will return the original; other times just the parts. 
When creating these tokens, calling clone() on the prototype will preserve 
flags, payloads, start and end offsets and type. These clones are sometimes 
stored in a buffer, but sometimes are incrementally computed with each call to 
next(Token). With the latter, they will typically cache a clone of the passed 
in token. I think that, when possible, incremental computation is preferable, 
but at the cost of a less obvious implementation.
c) Caching Filter: If their buffer is empty, they repeatedly call result = 
input.next(token), clone and buffer cache their result in some collection. Once 
full, they will return their buffer's content. If, the caching filter is 
resettable, they must return clones of their content. Otherwise, down stream 
consumers may change their arguments, disastrously.

Callers of Token next(Token) have the responsibility of never calling with a 
null token. (I think producer tokens probably should check and create a token 
if it is so. But I don't think that is what they do now.)

The upshot of all of this, Producers don't care which token they reuse. If it 
was from the original loop, or from the result of the last call to token = 
stream.next(token), both are equally good. The token pre-existed and needs to 
be fully reset. Consumers presume that the token was produced (or at least 
appropriately re-initialized and filled in) by a producer.

Your form of the loop is very advisable in a few places. Most typically with a 
loop within a loop, with the inner looping over all the tokens in a stream. In 
this case, the final Token would be created outside the outer loop. Using your 
pattern, there would encourage maximal reuse. Using mine, the programmer would 
have to figure out when it was appropriate to do one or the other.

The other value to your pattern is that next(Token) is always called with a 
non-null Token.

I think that calling the token "result" is not the best. It is a bit confusing 
as it is not the result of calling next(Token). Perhaps, to make reuse acutely 
obvious:
{code}
 final Token reusableToken = new Token();
 for (Token token = stream.next(reusableToken); token != null; token = 
stream.next(reusableToken)) {
{code}



> Token implementation needs improvements
> ---
>
> Key: LUCENE-1333
> URL: https://issues.apache.org/jira/browse/LUCENE-1333
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.3.1
> Environment: All
>Reporter: DM Smith
>Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1333-analysis.patch, LUCENE-1333-analyzers.patch, 
> LUCENE-1333-core.patch, LUCENE-1333-highlighter.patch, 
> LUCENE-1333-instantiated.patch, LUCENE-1333-lucli.patch, 
> LUCENE-1333-memory.patch, LUCENE-1333-miscellaneous.patch, 
> LUCENE-1333-queries.patch, LUCENE-1333-snowball.patch, 
> LUCENE-1333-wikipedia.patch, LUCENE-1333-wordnet.patch, 
> LUCENE-1333-xml-query-parser.patch, LUCENE-1333.patch, LUCENE-1333.patch, 
> LUCENE-1333.patch, LUCENE-1333a.txt
>
>
> This was discussed in the thread (not sure which place is best to referenc

[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-08-09 Thread Eks Dev (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eks Dev updated LUCENE-1219:


Attachment: LUCENE-1219.extended.patch

bq. couldn't you just call document.getFieldable(name), and then call 
binaryValue(byte[] result) on that Fieldable, and then get the length from it 
(getBinaryLength()) too? (Trying to minimize API changes).

sure, good tip, I this could work.  No need to have this 
byte[]->Fieldable-byte[] loop, it confuses. I have attached patch that uses 
this approach. But I created getBinaryValue(byte[]) instead of 
binaryValue(byte[]) as we have binaryValue() as deprecated method (would be 
confusing as well). Not really tested, but looks simple enough 

Just thinking aloud
This is one nice feature, but I permanently had a feeling I do not understand 
this Field structures, roles and responsibilities :)  
Field/Fieldable/AbstractField hierarchy is really ripe for good 
re-factoring.This bigamy with index / search use cases makes things not really 
easy to follow, Hoss has right, we need some way to divorce RetrievedField from 
FieldToBeIndexed, they are definitely not the same, just very similar.   

> support array/offset/ length setters for Field with binary data
> ---
>
> Key: LUCENE-1219
> URL: https://issues.apache.org/jira/browse/LUCENE-1219
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Index
>Reporter: Eks Dev
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-1219.extended.patch, LUCENE-1219.extended.patch, 
> LUCENE-1219.patch, LUCENE-1219.patch, LUCENE-1219.patch, LUCENE-1219.patch, 
> LUCENE-1219.take2.patch, LUCENE-1219.take3.patch
>
>
> currently Field/Fieldable interface supports only compact, zero based byte 
> arrays. This forces end users to create and copy content of new objects 
> before passing them to Lucene as such fields are often of variable size. 
> Depending on use case, this can bring far from negligible  performance  
> improvement. 
> this approach extends Fieldable interface with 3 new methods   
> getOffset(); gettLenght(); and getBinaryValue() (this only returns reference 
> to the array)
>

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1001) Add Payload retrieval to Spans

2008-08-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1001:


Attachment: LUCENE-1001.patch

Anyone still have a use case for this issue?

Here is a patch that I think fixes the orderedspans problem - need to test 
further, but that may be the last piece on those parts.

Beyond that, I think that a span uses only one clause to determine if a payload 
is available for the whole span - it seems to me we have to ask every clause.

As far as the ordering of returned payloads, I don't see how they can be 
ordered by the user without having some info in the payload itself - I mean its 
just going to be a collection of byte arrays right? How could you order them? 
Seems at most you can say those payloads came from the given span and use them 
all.

The more I look at spans the less I understand them I think  Its like 
repeating certain words over and over.

> Add Payload retrieval to Spans
> --
>
> Key: LUCENE-1001
> URL: https://issues.apache.org/jira/browse/LUCENE-1001
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1001.patch, LUCENE-1001.patch, LUCENE-1001.patch
>
>
> It will be nice to have access to payloads when doing SpanQuerys.
> See http://www.gossamer-threads.com/lists/lucene/java-dev/52270 and 
> http://www.gossamer-threads.com/lists/lucene/java-dev/51134
> Current API, added to Spans.java is below.  I will try to post a patch as 
> soon as I can figure out how to make it work for unordered spans (I believe I 
> have all the other cases working).
> {noformat}
>  /**
>* Returns the payload data for the current span.
>* This is invalid until [EMAIL PROTECTED] #next()} is called for
>* the first time.
>* This method must not be called more than once after each call
>* of [EMAIL PROTECTED] #next()}. However, payloads are loaded lazily,
>* so if the payload data for the current position is not needed,
>* this method may not be called at all for performance reasons.
>* 
>* 
>* WARNING: The status of the Payloads feature is experimental.
>* The APIs introduced here might change in the future and will not be
>* supported anymore in such a case.
>*
>* @return a List of byte arrays containing the data of this payload
>* @throws IOException
>*/
>   // TODO: Remove warning after API has been finalized
>   List/**/ getPayload() throws IOException;
>   /**
>* Checks if a payload can be loaded at this position.
>* 
>* Payloads can only be loaded once per call to
>* [EMAIL PROTECTED] #next()}.
>* 
>* 
>* WARNING: The status of the Payloads feature is experimental.
>* The APIs introduced here might change in the future and will not be
>* supported anymore in such a case.
>*
>* @return true if there is a payload available at this position that can 
> be loaded
>*/
>   // TODO: Remove warning after API has been finalized
>   public boolean isPayloadAvailable();
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1001) Add Payload retrieval to Spans

2008-08-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1001:


Attachment: LUCENE-1001.patch

Without the absolute paths in the patch this time (get it together eclipse)

> Add Payload retrieval to Spans
> --
>
> Key: LUCENE-1001
> URL: https://issues.apache.org/jira/browse/LUCENE-1001
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1001.patch, LUCENE-1001.patch, LUCENE-1001.patch
>
>
> It will be nice to have access to payloads when doing SpanQuerys.
> See http://www.gossamer-threads.com/lists/lucene/java-dev/52270 and 
> http://www.gossamer-threads.com/lists/lucene/java-dev/51134
> Current API, added to Spans.java is below.  I will try to post a patch as 
> soon as I can figure out how to make it work for unordered spans (I believe I 
> have all the other cases working).
> {noformat}
>  /**
>* Returns the payload data for the current span.
>* This is invalid until [EMAIL PROTECTED] #next()} is called for
>* the first time.
>* This method must not be called more than once after each call
>* of [EMAIL PROTECTED] #next()}. However, payloads are loaded lazily,
>* so if the payload data for the current position is not needed,
>* this method may not be called at all for performance reasons.
>* 
>* 
>* WARNING: The status of the Payloads feature is experimental.
>* The APIs introduced here might change in the future and will not be
>* supported anymore in such a case.
>*
>* @return a List of byte arrays containing the data of this payload
>* @throws IOException
>*/
>   // TODO: Remove warning after API has been finalized
>   List/**/ getPayload() throws IOException;
>   /**
>* Checks if a payload can be loaded at this position.
>* 
>* Payloads can only be loaded once per call to
>* [EMAIL PROTECTED] #next()}.
>* 
>* 
>* WARNING: The status of the Payloads feature is experimental.
>* The APIs introduced here might change in the future and will not be
>* supported anymore in such a case.
>*
>* @return true if there is a payload available at this position that can 
> be loaded
>*/
>   // TODO: Remove warning after API has been finalized
>   public boolean isPayloadAvailable();
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1001) Add Payload retrieval to Spans

2008-08-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1001:


Attachment: (was: LUCENE-1001.patch)

> Add Payload retrieval to Spans
> --
>
> Key: LUCENE-1001
> URL: https://issues.apache.org/jira/browse/LUCENE-1001
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1001.patch, LUCENE-1001.patch, LUCENE-1001.patch
>
>
> It will be nice to have access to payloads when doing SpanQuerys.
> See http://www.gossamer-threads.com/lists/lucene/java-dev/52270 and 
> http://www.gossamer-threads.com/lists/lucene/java-dev/51134
> Current API, added to Spans.java is below.  I will try to post a patch as 
> soon as I can figure out how to make it work for unordered spans (I believe I 
> have all the other cases working).
> {noformat}
>  /**
>* Returns the payload data for the current span.
>* This is invalid until [EMAIL PROTECTED] #next()} is called for
>* the first time.
>* This method must not be called more than once after each call
>* of [EMAIL PROTECTED] #next()}. However, payloads are loaded lazily,
>* so if the payload data for the current position is not needed,
>* this method may not be called at all for performance reasons.
>* 
>* 
>* WARNING: The status of the Payloads feature is experimental.
>* The APIs introduced here might change in the future and will not be
>* supported anymore in such a case.
>*
>* @return a List of byte arrays containing the data of this payload
>* @throws IOException
>*/
>   // TODO: Remove warning after API has been finalized
>   List/**/ getPayload() throws IOException;
>   /**
>* Checks if a payload can be loaded at this position.
>* 
>* Payloads can only be loaded once per call to
>* [EMAIL PROTECTED] #next()}.
>* 
>* 
>* WARNING: The status of the Payloads feature is experimental.
>* The APIs introduced here might change in the future and will not be
>* supported anymore in such a case.
>*
>* @return true if there is a payload available at this position that can 
> be loaded
>*/
>   // TODO: Remove warning after API has been finalized
>   public boolean isPayloadAvailable();
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-777) SpanWithinQuery - A SpanNotQuery that allows a specified number of intersections

2008-08-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-777:
---

Attachment: LUCENE-777.patch

Fixes hashcode, single doc wrapping problem seems to have been in my head, 
added test method for new query, fixes formatting

> SpanWithinQuery - A SpanNotQuery that allows a specified number of 
> intersections
> 
>
> Key: LUCENE-777
> URL: https://issues.apache.org/jira/browse/LUCENE-777
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Reporter: Mark Miller
>Priority: Minor
> Attachments: LUCENE-777.patch, SpanWithinQuery.java
>
>
> A SpanNotQuery that allows a specified number of intersections.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token

2008-08-09 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621175#action_12621175
 ] 

Doron Cohen commented on LUCENE-1350:
-

DM Thanks for taking care of this large change!
By Mike's comments on LUCENE-1333 seems LUCENE-1333 will 
be committed and this one will be canceled so I feel kinda bad for 
the time you put in the last patch here. 

> Filters which are "consumers" should not reset the payload or flags and 
> should better reuse the token
> -
>
> Key: LUCENE-1350
> URL: https://issues.apache.org/jira/browse/LUCENE-1350
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis, contrib/*
>Reporter: Doron Cohen
>Assignee: Doron Cohen
> Fix For: 2.3.3
>
> Attachments: LUCENE-1350.patch, LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no 
> payloads.
> A workaround for this is to apply stemming first and only then run whatever 
> logic creates the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing 
> next(Token), effectively also fixing the unwanted resetting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1333) Token implementation needs improvements

2008-08-09 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12621181#action_12621181
 ] 

Doron Cohen commented on LUCENE-1333:
-

This 'final' pattern is indeed more clear about reuse.

But still would like to clarify on what can the TokenStream assume. I think 
TokenStream cannot assume anything about the token it gets as input, and, 
once it returned a token, it cannot assume anything about how that token 
is used.  So why should it not expect being passed the token it just returned?


> Token implementation needs improvements
> ---
>
> Key: LUCENE-1333
> URL: https://issues.apache.org/jira/browse/LUCENE-1333
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Affects Versions: 2.3.1
> Environment: All
>Reporter: DM Smith
>Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1333-analysis.patch, LUCENE-1333-analyzers.patch, 
> LUCENE-1333-core.patch, LUCENE-1333-highlighter.patch, 
> LUCENE-1333-instantiated.patch, LUCENE-1333-lucli.patch, 
> LUCENE-1333-memory.patch, LUCENE-1333-miscellaneous.patch, 
> LUCENE-1333-queries.patch, LUCENE-1333-snowball.patch, 
> LUCENE-1333-wikipedia.patch, LUCENE-1333-wordnet.patch, 
> LUCENE-1333-xml-query-parser.patch, LUCENE-1333.patch, LUCENE-1333.patch, 
> LUCENE-1333.patch, LUCENE-1333a.txt
>
>
> This was discussed in the thread (not sure which place is best to reference 
> so here are two):
> http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200805.mbox/[EMAIL 
> PROTECTED]
> or to see it all at once:
> http://www.gossamer-threads.com/lists/lucene/java-dev/62851
> Issues:
> 1. JavaDoc is insufficient, leading one to read the code to figure out how to 
> use the class.
> 2. Deprecations are incomplete. The constructors that take String as an 
> argument and the methods that take and/or return String should *all* be 
> deprecated.
> 3. The allocation policy is too aggressive. With large tokens the resulting 
> buffer can be over-allocated. A less aggressive algorithm would be better. In 
> the thread, the Python example is good as it is computationally simple.
> 4. The parts of the code that currently use Token's deprecated methods can be 
> upgraded now rather than waiting for 3.0. As it stands, filter chains that 
> alternate between char[] and String are sub-optimal. Currently, it is used in 
> core by Query classes. The rest are in contrib, mostly in analyzers.
> 5. Some internal optimizations can be done with regard to char[] allocation.
> 6. TokenStream has next() and next(Token), next() should be deprecated, so 
> that reuse is maximized and descendant classes should be rewritten to 
> over-ride next(Token)
> 7. Tokens are often stored as a String in a Term. It would be good to add 
> constructors that took a Token. This would simplify the use of the two 
> together.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Closed: (LUCENE-998) BooleanQuery.setMaxClauseCount(int) is static

2008-08-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller closed LUCENE-998.
--

Resolution: Won't Fix

Why not? It can always be reopened, and it doesn't look like a reasonable 
change near term.

> BooleanQuery.setMaxClauseCount(int) is static
> -
>
> Key: LUCENE-998
> URL: https://issues.apache.org/jira/browse/LUCENE-998
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Search
>Affects Versions: 2.1
>Reporter: Tim Lebedkov
>
> BooleanQuery.setMaxClauseCount(int) is static. It does not allow searching in 
> multiple indices from different threads using different settings. This 
> setting should be probably moved in to the IndexSearcher.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-1170) query with AND and OR not retrieving correct results

2008-08-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved LUCENE-1170.
-

Resolution: Won't Fix

I think its clear that the standard query parser does not operate with known or 
desired precedence rules. Try the Precedence query parser or enhance the 
current one, but I would say this is expected behavior at this point.

> query with AND and OR not retrieving correct results
> 
>
> Key: LUCENE-1170
> URL: https://issues.apache.org/jira/browse/LUCENE-1170
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: QueryParser
>Affects Versions: 2.3
> Environment: linux and windows
>Reporter: Graham Maloon
>
> I was working with Lucene 1.4, and have now upgraded to 2.3.0 but there is 
> still a problem that I am experiencing with the Queryparser
>  
> I am passing the following queries:
>  
> "big brother" - works fine
> "big brother" AND dubai - works fine
> "big brother" AND football - works fine
> "big brother" AND dubai OR football - returns extra documents which contain 
> "big brother" but do not contain either dubai or football.
> "big brother" AND (dubai OR football) gives the same as the one above  
>  
> Am I doing something wrong?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1001) Add Payload retrieval to Spans

2008-08-09 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated LUCENE-1001:


Attachment: LUCENE-1001.patch

Fixes the unorderedspan ispostionavailable issue for good measure.

I think we have to give the payloads back unsorted - there are probably cases 
where you could just use all the payloads for a span rather than per term, so 
we might as well not incur a penalty there. If you need per term, you can put 
the position into the payload (pretty simple) and then just sort yourself.

> Add Payload retrieval to Spans
> --
>
> Key: LUCENE-1001
> URL: https://issues.apache.org/jira/browse/LUCENE-1001
> Project: Lucene - Java
>  Issue Type: New Feature
>  Components: Search
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 2.4
>
> Attachments: LUCENE-1001.patch, LUCENE-1001.patch, LUCENE-1001.patch, 
> LUCENE-1001.patch
>
>
> It will be nice to have access to payloads when doing SpanQuerys.
> See http://www.gossamer-threads.com/lists/lucene/java-dev/52270 and 
> http://www.gossamer-threads.com/lists/lucene/java-dev/51134
> Current API, added to Spans.java is below.  I will try to post a patch as 
> soon as I can figure out how to make it work for unordered spans (I believe I 
> have all the other cases working).
> {noformat}
>  /**
>* Returns the payload data for the current span.
>* This is invalid until [EMAIL PROTECTED] #next()} is called for
>* the first time.
>* This method must not be called more than once after each call
>* of [EMAIL PROTECTED] #next()}. However, payloads are loaded lazily,
>* so if the payload data for the current position is not needed,
>* this method may not be called at all for performance reasons.
>* 
>* 
>* WARNING: The status of the Payloads feature is experimental.
>* The APIs introduced here might change in the future and will not be
>* supported anymore in such a case.
>*
>* @return a List of byte arrays containing the data of this payload
>* @throws IOException
>*/
>   // TODO: Remove warning after API has been finalized
>   List/**/ getPayload() throws IOException;
>   /**
>* Checks if a payload can be loaded at this position.
>* 
>* Payloads can only be loaded once per call to
>* [EMAIL PROTECTED] #next()}.
>* 
>* 
>* WARNING: The status of the Payloads feature is experimental.
>* The APIs introduced here might change in the future and will not be
>* supported anymore in such a case.
>*
>* @return true if there is a payload available at this position that can 
> be loaded
>*/
>   // TODO: Remove warning after API has been finalized
>   public boolean isPayloadAvailable();
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]