date:20100114

[jira] Reopened: (LUCENE-2114) Improve org.apache.lucene.search.Filter Documentation and Tests to reflect per segment readers

2010-01-14 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-2114:



Need to backport to 2.9.x, 3.0.x

> Improve org.apache.lucene.search.Filter Documentation and Tests to reflect 
> per segment readers
> --
>
> Key: LUCENE-2114
> URL: https://issues.apache.org/jira/browse/LUCENE-2114
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9, 2.9.1, 3.0
>Reporter: Simon Willnauer
> Fix For: 3.1
>
> Attachments: LUCENE-2114.patch
>
>
> Filter Javadoc does not mention that the Reader passed to getDocIDSet(Reader) 
> could be on a per-segment basis.
> This caused confusion on the users-list -- see 
> http://lucene.markmail.org/message/6knz2mkqbpxjz5po?q=date:200912+list:org.apache.lucene.java-user&page=1
> We should improve the javadoc and also add a testcase that reflects filtering 
> on a per-segment basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Dynamic array reallocation algorithms

2010-01-14 Thread Michael McCandless

On Wed, Jan 13, 2010 at 9:12 PM, Marvin Humphrey  wrote:
> On Wed, Jan 13, 2010 at 11:46:50AM -0500, Michael McCandless wrote:
>> If forced to pick, in general, I tend to prefer burning CPU not RAM,
>> because the CPU is often a one-time burn, whereas RAM ties up storage
>> for indefinite amounts of time.
>
> With our dependence on indexes being RAM-resident for optimum performance, I'd
> also favor being conservative with RAM.

OK let's tentatively stick with 1/8th asymptotic growth...

>> I think this function should still aim to handle the smallish values,
>> ie, we shouldn't require every caller to have to handle the small
>> values themselves.  Callers that want to override the small cases can
>> still do so...
>
> The more "helpful" the behavior of getNextSize(), the harder it is to
> understand what happens when you partially override it.
>
> But I guess it's not that big a deal one way or the other.  There aren't that
> many places in Lucene where you might call getNextSize().  There are more such
> places in Lucy because we have to roll our own string and array classes, and
> we need finer-grained control over what happens there -- so maybe that
> explains why I'm not excited about trying to cram all that logic into a shared
> routine.
>
> Putting more logic into getNextSize() would be less of a problem if Lucene's
> implementation was less convoluted.  It's only one line and one comment, but
> it's deceptively difficult to grok.

> Looks like some Perl golfer wrote it.  ;)

Yes!  I'm laughing over here... I'll verbosify the comment and try to
fold in some
of the ideas from this thread I'll work up a patch.

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Compound File Default

2010-01-14 Thread Grant Ingersoll

On Jan 12, 2010, at 1:32 PM, Marvin Humphrey wrote:

> 
> But beyond that, Lucene adopted the compound file format default for a reason,
> right?  What's changed about the environment that justifies overturning that
> decision?  

The history, as I recall, is it used to be off in 1.x.  Then, b/c some people 
(not a lot) were hitting it, we turned on CFS by default.  The perf. penalty is 
pretty significant and I think it is far easier to tell someone how to up their 
file handle limit or turn it on, then it is to overcome the sense that Lucene 
is slower than it really is.

-Grant
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2209) add @experimental javadocs tag

2010-01-14 Thread Robert Muir (JIRA)

add @experimental javadocs tag
--

 Key: LUCENE-2209
 URL: https://issues.apache.org/jira/browse/LUCENE-2209
 Project: Lucene - Java
  Issue Type: Task
  Components: Javadocs
Reporter: Robert Muir


There are a lot of things marked experimental, api subject to change, etc. in 
lucene.

this patch simply adds a @description tag to common-build.xml so that we can 
use it, for more consistency.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2209) add @experimental javadocs tag

2010-01-14 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2209:


Description: 
There are a lot of things marked experimental, api subject to change, etc. in 
lucene.

this patch simply adds a @experimental tag to common-build.xml so that we can 
use it, for more consistency.


  was:
There are a lot of things marked experimental, api subject to change, etc. in 
lucene.

this patch simply adds a @description tag to common-build.xml so that we can 
use it, for more consistency.



> add @experimental javadocs tag
> --
>
> Key: LUCENE-2209
> URL: https://issues.apache.org/jira/browse/LUCENE-2209
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Javadocs
>Reporter: Robert Muir
> Attachments: LUCENE-2209.patch
>
>
> There are a lot of things marked experimental, api subject to change, etc. in 
> lucene.
> this patch simply adds a @experimental tag to common-build.xml so that we can 
> use it, for more consistency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2209) add @experimental javadocs tag

2010-01-14 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2209:


Attachment: LUCENE-2209.patch

i only searched on the word 'experimental' and replaced those with 
@experimental. I did not also search on 'expert' ... are these considered the 
same?

> add @experimental javadocs tag
> --
>
> Key: LUCENE-2209
> URL: https://issues.apache.org/jira/browse/LUCENE-2209
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Javadocs
>Reporter: Robert Muir
> Attachments: LUCENE-2209.patch
>
>
> There are a lot of things marked experimental, api subject to change, etc. in 
> lucene.
> this patch simply adds a @description tag to common-build.xml so that we can 
> use it, for more consistency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

2010-01-14 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800222#action_12800222
 ] 

Michael McCandless commented on LUCENE-2209:


This is great!

Can we also have @internal mean "public only because Lucene needs to access it 
across packages"?  Eg, most things under oal.util...

> add @experimental javadocs tag
> --
>
> Key: LUCENE-2209
> URL: https://issues.apache.org/jira/browse/LUCENE-2209
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Javadocs
>Reporter: Robert Muir
> Attachments: LUCENE-2209.patch
>
>
> There are a lot of things marked experimental, api subject to change, etc. in 
> lucene.
> this patch simply adds a @experimental tag to common-build.xml so that we can 
> use it, for more consistency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

2010-01-14 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800229#action_12800229
 ] 

Robert Muir commented on LUCENE-2209:
-

Mike, for this one I would need a list :)

Also we need to decide if expert is its own thing or just equivalent with 
@experimental, if it is the same we can mark it as such.

Finally any wording and formatting changes, I tested and we can make the text 
red etc if we want (like some did), currently it is only bold as that is the 
default.
its a little ugly since we have to escape the html to work in build.xml but not 
too bad.


> add @experimental javadocs tag
> --
>
> Key: LUCENE-2209
> URL: https://issues.apache.org/jira/browse/LUCENE-2209
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Javadocs
>Reporter: Robert Muir
> Attachments: LUCENE-2209.patch
>
>
> There are a lot of things marked experimental, api subject to change, etc. in 
> lucene.
> this patch simply adds a @experimental tag to common-build.xml so that we can 
> use it, for more consistency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Lucene 2.9.0 Near Real Time Indexing and lock timeouts

2010-01-14 Thread jchang


With only 10 concurrent consumers, I do get lock problems.  However, I am
calling commit() at the end of each addition.  Could I expect better
concurrency without timeouts if I did not commit as often?

-- 
View this message in context: 
http://old.nabble.com/Lucene-2.9.0-Near-Real-Time-Indexing-and-lock-timeouts-tp27136743p27164797.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2210) trectopicsreader doesn't properly read descriptions or narratives

2010-01-14 Thread Robert Muir (JIRA)

trectopicsreader doesn't properly read descriptions or narratives
-

 Key: LUCENE-2210
 URL: https://issues.apache.org/jira/browse/LUCENE-2210
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Robert Muir
Priority: Minor
 Fix For: 3.1
 Attachments: LUCENE-2210.patch

TrecTopicsReader does not read these fields correctly, as demonstrated by the 
test case.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2210) trectopicsreader doesn't properly read descriptions or narratives

2010-01-14 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2210:


Attachment: LUCENE-2210.patch

> trectopicsreader doesn't properly read descriptions or narratives
> -
>
> Key: LUCENE-2210
> URL: https://issues.apache.org/jira/browse/LUCENE-2210
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/benchmark
>Reporter: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2210.patch
>
>
> TrecTopicsReader does not read these fields correctly, as demonstrated by the 
> test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Assigned: (LUCENE-2210) trectopicsreader doesn't properly read descriptions or narratives

2010-01-14 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-2210:
---

Assignee: Robert Muir

> trectopicsreader doesn't properly read descriptions or narratives
> -
>
> Key: LUCENE-2210
> URL: https://issues.apache.org/jira/browse/LUCENE-2210
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/benchmark
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2210.patch
>
>
> TrecTopicsReader does not read these fields correctly, as demonstrated by the 
> test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2210) trectopicsreader doesn't properly read descriptions or narratives

2010-01-14 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800256#action_12800256
 ] 

Robert Muir commented on LUCENE-2210:
-

I will commit the fix shortly unless anyone objects

> trectopicsreader doesn't properly read descriptions or narratives
> -
>
> Key: LUCENE-2210
> URL: https://issues.apache.org/jira/browse/LUCENE-2210
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/benchmark
>Reporter: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2210.patch
>
>
> TrecTopicsReader does not read these fields correctly, as demonstrated by the 
> test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Lucene 2.9.0 Near Real Time Indexing and lock timeouts

2010-01-14 Thread Michael McCandless

Calling commit after every addition will drastically slow down your
indexing throughput, and concurrency (commit is internally
synchronized), but should not create lock timeouts, unless you are
also opening a new IndexWriter for every addition?

Mike

On Thu, Jan 14, 2010 at 12:15 PM, jchang  wrote:
>
> With only 10 concurrent consumers, I do get lock problems.  However, I am
> calling commit() at the end of each addition.  Could I expect better
> concurrency without timeouts if I did not commit as often?
>
> --
> View this message in context: 
> http://old.nabble.com/Lucene-2.9.0-Near-Real-Time-Indexing-and-lock-timeouts-tp27136743p27164797.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
>
> -
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

2010-01-14 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800283#action_12800283
 ] 

Michael McCandless commented on LUCENE-2209:


We can start by making @internal expand to a consistent warning in the javadoc 
(maybe start from oal.util.cache.DBLRUCache's warning?)?  I'll pull together a 
proposed set of classes/methods that we should add @internal to.

I think @expert is a different concept than @experimental, though, it may not 
warrant its own tag because merely putting "Expert:" in front of the javadocs 
seems OK?

I would actually prefer bold, not red -- I think the red is overkill ;)  I like 
the current wording.

> add @experimental javadocs tag
> --
>
> Key: LUCENE-2209
> URL: https://issues.apache.org/jira/browse/LUCENE-2209
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Javadocs
>Reporter: Robert Muir
> Attachments: LUCENE-2209.patch
>
>
> There are a lot of things marked experimental, api subject to change, etc. in 
> lucene.
> this patch simply adds a @experimental tag to common-build.xml so that we can 
> use it, for more consistency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

2010-01-14 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800304#action_12800304
 ] 

Hoss Man commented on LUCENE-2209:
--

small suggestion...

@todo is a pretty wide spread and long used custom javadoc tag, so most people 
don't worry about it ... but for any other custom tags that projects use, it's 
strongly suggested that they always have a "." in their name.  The Javadoc 
compatibility contract is that future versions of javadoc won't add tags that 
have periods in their name so it's the way to avoid collisions (you should 
actauly see a warning about using a tag without a "." in it's name from javadoc 
when you declare these.

So i would suggest @lucene.internal, @lucene.expert, @lucene.experimental, 
etc...

> add @experimental javadocs tag
> --
>
> Key: LUCENE-2209
> URL: https://issues.apache.org/jira/browse/LUCENE-2209
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Javadocs
>Reporter: Robert Muir
> Attachments: LUCENE-2209.patch
>
>
> There are a lot of things marked experimental, api subject to change, etc. in 
> lucene.
> this patch simply adds a @experimental tag to common-build.xml so that we can 
> use it, for more consistency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

2010-01-14 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800310#action_12800310
 ] 

Robert Muir commented on LUCENE-2209:
-

Hoss, i see your point but i think this is just a silly warning (as you 
mentioned, it applies to @todo also!)

the idea for doing this @internal, etc came from ICU, whose code becomes part 
of the JDK itself. I cannot see this becoming a problem.

> add @experimental javadocs tag
> --
>
> Key: LUCENE-2209
> URL: https://issues.apache.org/jira/browse/LUCENE-2209
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Javadocs
>Reporter: Robert Muir
> Attachments: LUCENE-2209.patch
>
>
> There are a lot of things marked experimental, api subject to change, etc. in 
> lucene.
> this patch simply adds a @experimental tag to common-build.xml so that we can 
> use it, for more consistency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

2010-01-14 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800313#action_12800313
 ] 

Robert Muir commented on LUCENE-2209:
-

p.s. hossman i only commented because you are out to get me :) 

we should try to minimize the warnings, we also have an unused one 
@uml.something that javadoc warns about.

> add @experimental javadocs tag
> --
>
> Key: LUCENE-2209
> URL: https://issues.apache.org/jira/browse/LUCENE-2209
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Javadocs
>Reporter: Robert Muir
> Attachments: LUCENE-2209.patch
>
>
> There are a lot of things marked experimental, api subject to change, etc. in 
> lucene.
> this patch simply adds a @experimental tag to common-build.xml so that we can 
> use it, for more consistency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2210) trectopicsreader doesn't properly read descriptions or narratives

2010-01-14 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2210.
-

Resolution: Fixed

Committed revision 899369.

> trectopicsreader doesn't properly read descriptions or narratives
> -
>
> Key: LUCENE-2210
> URL: https://issues.apache.org/jira/browse/LUCENE-2210
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/benchmark
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2210.patch
>
>
> TrecTopicsReader does not read these fields correctly, as demonstrated by the 
> test case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Resolved: (LUCENE-2114) Improve org.apache.lucene.search.Filter Documentation and Tests to reflect per segment readers

2010-01-14 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-2114.


   Resolution: Fixed
Fix Version/s: 3.0.1
   2.9.2

> Improve org.apache.lucene.search.Filter Documentation and Tests to reflect 
> per segment readers
> --
>
> Key: LUCENE-2114
> URL: https://issues.apache.org/jira/browse/LUCENE-2114
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Search
>Affects Versions: 2.9, 2.9.1, 3.0
>Reporter: Simon Willnauer
> Fix For: 2.9.2, 3.0.1, 3.1
>
> Attachments: LUCENE-2114.patch
>
>
> Filter Javadoc does not mention that the Reader passed to getDocIDSet(Reader) 
> could be on a per-segment basis.
> This caused confusion on the users-list -- see 
> http://lucene.markmail.org/message/6knz2mkqbpxjz5po?q=date:200912+list:org.apache.lucene.java-user&page=1
> We should improve the javadoc and also add a testcase that reflects filtering 
> on a per-segment basis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: svn commit: r898055 - in /lucene/java/trunk/contrib/benchmark: build.xml src/java/org/apache/lucene/benchmark/byTask/feeds/LongToEnglishContentSource.java src/java/org/apache/lucene/benchmark/by

2010-01-14 Thread Robert Muir

i am going to move English from src/test to src/java now that it is
being used in non-test code.

it is making javadocs cry, etc.

On Mon, Jan 11, 2010 at 3:29 PM,   wrote:
> Author: gsingers
> Date: Mon Jan 11 20:29:40 2010
> New Revision: 898055
>
> URL: http://svn.apache.org/viewvc?rev=898055&view=rev
> Log:
> Add support for LongToEnglish doc/query maker
>
> Added:
>    
> lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/LongToEnglishContentSource.java
>    (with props)
>    
> lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/LongToEnglishQueryMaker.java
>    (with props)
> Modified:
>    lucene/java/trunk/contrib/benchmark/build.xml
>
> Modified: lucene/java/trunk/contrib/benchmark/build.xml
> URL: 
> http://svn.apache.org/viewvc/lucene/java/trunk/contrib/benchmark/build.xml?rev=898055&r1=898054&r2=898055&view=diff
> ==
> --- lucene/java/trunk/contrib/benchmark/build.xml (original)
> +++ lucene/java/trunk/contrib/benchmark/build.xml Mon Jan 11 20:29:40 2010
> @@ -104,6 +104,7 @@
>     
>         
>         
> +      
>          path="${common.dir}/build/contrib/highlighter/classes/java"/>
>         
>          path="${common.dir}/build/contrib/fast-vector-highlighter/classes/java"/>
> @@ -120,7 +121,7 @@
>     
>     
>
> -     +          description="Run compound penalty perf test (optional: 
> -Dtask.alg=your-algorithm-file -Dtask.mem=java-max-mem)">
>         Working Directory: ${working.dir}
>          maxmemory="${task.mem}" fork="true">
>
> Added: 
> lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/LongToEnglishContentSource.java
> URL: 
> http://svn.apache.org/viewvc/lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/LongToEnglishContentSource.java?rev=898055&view=auto
> ==
> --- 
> lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/LongToEnglishContentSource.java
>  (added)
> +++ 
> lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/LongToEnglishContentSource.java
>  Mon Jan 11 20:29:40 2010
> @@ -0,0 +1,37 @@
> +package org.apache.lucene.benchmark.byTask.feeds;
> +
> +import org.apache.lucene.util.English;
> +
> +import java.io.IOException;
> +import java.util.Date;
> +
> +
> +/**
> + *
> + *
> + **/
> +public class LongToEnglishContentSource extends ContentSource{
> +  private long counter = Long.MIN_VALUE + 10;
> +
> +  public void close() throws IOException {
> +
> +  }
> +  //TODO: reduce/clean up synchonization
> +  public synchronized DocData getNextDocData(DocData docData) throws 
> NoMoreDataException, IOException {
> +    docData.clear();
> +    docData.setBody(English.longToEnglish(counter));
> +    docData.setName("doc_" + String.valueOf(counter));
> +    docData.setTitle("title_" + String.valueOf(counter));
> +    docData.setDate(new Date());
> +    if (counter == Long.MAX_VALUE){
> +      counter = Long.MIN_VALUE + 10;//loop around
> +    }
> +    counter++;
> +    return docData;
> +  }
> +
> + �...@override
> +  public void resetInputs() throws IOException {
> +    counter = Long.MIN_VALUE + 10;
> +  }
> +}
>
> Propchange: 
> lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/LongToEnglishContentSource.java
> --
>    svn:eol-style = native
>
> Added: 
> lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/LongToEnglishQueryMaker.java
> URL: 
> http://svn.apache.org/viewvc/lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/LongToEnglishQueryMaker.java?rev=898055&view=auto
> ==
> --- 
> lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/LongToEnglishQueryMaker.java
>  (added)
> +++ 
> lucene/java/trunk/contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask/feeds/LongToEnglishQueryMaker.java
>  Mon Jan 11 20:29:40 2010
> @@ -0,0 +1,49 @@
> +package org.apache.lucene.benchmark.byTask.feeds;
> +
> +import org.apache.lucene.analysis.Analyzer;
> +import org.apache.lucene.analysis.standard.StandardAnalyzer;
> +import org.apache.lucene.benchmark.byTask.tasks.NewAnalyzerTask;
> +import org.apache.lucene.benchmark.byTask.utils.Config;
> +import org.apache.lucene.queryParser.QueryParser;
> +import org.apache.lucene.search.Query;
> +import org.apache.lucene.util.English;
> +import org.apache.lucene.util.Version;
> +
> +
> +/**
> + *
> + *
> + **/
> +public class LongToEnglishQueryMaker implements QueryMaker {
> +  long counter = Long.MIN_VALUE + 10;
> +  protected QueryParser parser;
> +
> +  public Query makeQuery(int size) throws Except

[jira] Updated: (LUCENE-2209) add @experimental javadocs tag

2010-01-14 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2209:


Attachment: LUCENE-2209.patch

i changed this to @lucene.experimental, and added @lucene.internal (only used 
by double-barreled cache for now). I simplified its wording so maybe it needs 
more.

i also removed unused tags: @uml.property (completely unused) and @todo (only 
used in one place, replaced with TODO:)

> add @experimental javadocs tag
> --
>
> Key: LUCENE-2209
> URL: https://issues.apache.org/jira/browse/LUCENE-2209
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Javadocs
>Reporter: Robert Muir
> Attachments: LUCENE-2209.patch, LUCENE-2209.patch
>
>
> There are a lot of things marked experimental, api subject to change, etc. in 
> lucene.
> this patch simply adds a @experimental tag to common-build.xml so that we can 
> use it, for more consistency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Hudson build is back to normal: Lucene-trunk #1062

2010-01-14 Thread Apache Hudson Server

See 



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2195) Speedup CharArraySet if set is empty

2010-01-14 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2195?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800405#action_12800405
 ] 

Simon Willnauer commented on LUCENE-2195:
-

any comments on the latest patch?

> Speedup CharArraySet if set is empty
> 
>
> Key: LUCENE-2195
> URL: https://issues.apache.org/jira/browse/LUCENE-2195
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Analysis
>Reporter: Simon Willnauer
> Fix For: 3.1
>
> Attachments: LUCENE-2195.patch, LUCENE-2195.patch, LUCENE-2195.patch
>
>
> CharArraySet#contains(...) always creates a HashCode of the String, Char[] or 
> CharSequence even if the set is empty. 
> contains should return false if set it empty

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-14 Thread Uwe Schindler (JIRA)

Advances BaseTokenStreamTestCase that uses a fake attribute to check, if 
clearAttributes() was called correctly - found bugs in contrib/analyzers
-

 Key: LUCENE-2211
 URL: https://issues.apache.org/jira/browse/LUCENE-2211
 Project: Lucene - Java
  Issue Type: Bug
  Components: Analysis, contrib/analyzers
Affects Versions: 3.0
Reporter: Uwe Schindler
 Fix For: 3.1


Robert had the idea to use a fake attribute inside BaseTokenStreamTestCase that 
records if its clear() method was called. If this is not the case after 
incrementToken(), asserTokenStreamContents fails. It also uses the attribute in 
TeeSinkTokenFilter, because there a lot of copying, captureState and 
restoreState() is used. By the attribute, you can track wonderful, if 
save/restore and clearAttributes is correctly implemented. It also verifies 
that *before* a captureState() it was also cleared (as the state will also 
contain the clear call). Because if you consume tokens in a filter, capture the 
consumed tokens and insert them, the capturedStates must also be cleared before.

In contrib analyzers are some test that fail to pass this additional assertion. 
They are not fixed in the attached patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-14 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2211:
--

Attachment: LUCENE-2211.patch

> Advances BaseTokenStreamTestCase that uses a fake attribute to check, if 
> clearAttributes() was called correctly - found bugs in contrib/analyzers
> -
>
> Key: LUCENE-2211
> URL: https://issues.apache.org/jira/browse/LUCENE-2211
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis, contrib/analyzers
>Affects Versions: 3.0
>Reporter: Uwe Schindler
> Fix For: 3.1
>
> Attachments: LUCENE-2211.patch
>
>
> Robert had the idea to use a fake attribute inside BaseTokenStreamTestCase 
> that records if its clear() method was called. If this is not the case after 
> incrementToken(), asserTokenStreamContents fails. It also uses the attribute 
> in TeeSinkTokenFilter, because there a lot of copying, captureState and 
> restoreState() is used. By the attribute, you can track wonderful, if 
> save/restore and clearAttributes is correctly implemented. It also verifies 
> that *before* a captureState() it was also cleared (as the state will also 
> contain the clear call). Because if you consume tokens in a filter, capture 
> the consumed tokens and insert them, the capturedStates must also be cleared 
> before.
> In contrib analyzers are some test that fail to pass this additional 
> assertion. They are not fixed in the attached patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-14 Thread Uwe Schindler (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2211:
--

Affects Version/s: 2.9
   2.9.1
Fix Version/s: 3.0.1
   2.9.2

The ngram things are serious, so also backport.

We get the non-generic java 1.4 version for solr 1.5 for free.

> Advances BaseTokenStreamTestCase that uses a fake attribute to check, if 
> clearAttributes() was called correctly - found bugs in contrib/analyzers
> -
>
> Key: LUCENE-2211
> URL: https://issues.apache.org/jira/browse/LUCENE-2211
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis, contrib/analyzers
>Affects Versions: 2.9, 2.9.1, 3.0
>Reporter: Uwe Schindler
> Fix For: 2.9.2, 3.0.1, 3.1
>
> Attachments: LUCENE-2211.patch
>
>
> Robert had the idea to use a fake attribute inside BaseTokenStreamTestCase 
> that records if its clear() method was called. If this is not the case after 
> incrementToken(), asserTokenStreamContents fails. It also uses the attribute 
> in TeeSinkTokenFilter, because there a lot of copying, captureState and 
> restoreState() is used. By the attribute, you can track wonderful, if 
> save/restore and clearAttributes is correctly implemented. It also verifies 
> that *before* a captureState() it was also cleared (as the state will also 
> contain the clear call). Because if you consume tokens in a filter, capture 
> the consumed tokens and insert them, the capturedStates must also be cleared 
> before.
> In contrib analyzers are some test that fail to pass this additional 
> assertion. They are not fixed in the attached patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-14 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2211:


Attachment: LUCENE-2211.patch

uwe's patch, with the fixes for contrib.

broken were compounds, n-gram filter, and edge n-gram filter

> Advances BaseTokenStreamTestCase that uses a fake attribute to check, if 
> clearAttributes() was called correctly - found bugs in contrib/analyzers
> -
>
> Key: LUCENE-2211
> URL: https://issues.apache.org/jira/browse/LUCENE-2211
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis, contrib/analyzers
>Affects Versions: 2.9, 2.9.1, 3.0
>Reporter: Uwe Schindler
> Fix For: 2.9.2, 3.0.1, 3.1
>
> Attachments: LUCENE-2211.patch, LUCENE-2211.patch
>
>
> Robert had the idea to use a fake attribute inside BaseTokenStreamTestCase 
> that records if its clear() method was called. If this is not the case after 
> incrementToken(), asserTokenStreamContents fails. It also uses the attribute 
> in TeeSinkTokenFilter, because there a lot of copying, captureState and 
> restoreState() is used. By the attribute, you can track wonderful, if 
> save/restore and clearAttributes is correctly implemented. It also verifies 
> that *before* a captureState() it was also cleared (as the state will also 
> contain the clear call). Because if you consume tokens in a filter, capture 
> the consumed tokens and insert them, the capturedStates must also be cleared 
> before.
> In contrib analyzers are some test that fail to pass this additional 
> assertion. They are not fixed in the attached patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-14 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800464#action_12800464
 ] 

Robert Muir commented on LUCENE-2211:
-

before committing any fix i want to review / add tests for any tokenstreams 
that do not yet use this BaseTokenStreamTestCase, just to be sure there are no 
others with this problem.

it may seem trivial but if this clearing does not take place properly, then 
things like position increment with stopfilter can grow to very large values, 
overflow, and cause IndexWriter to throw an exception: 
http://www.lucidimagination.com/search/document/f649a19901d33c75/illegalargumentexception_when_indexwriter_adddocument



> Advances BaseTokenStreamTestCase that uses a fake attribute to check, if 
> clearAttributes() was called correctly - found bugs in contrib/analyzers
> -
>
> Key: LUCENE-2211
> URL: https://issues.apache.org/jira/browse/LUCENE-2211
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis, contrib/analyzers
>Affects Versions: 2.9, 2.9.1, 3.0
>Reporter: Uwe Schindler
> Fix For: 2.9.2, 3.0.1, 3.1
>
> Attachments: LUCENE-2211.patch, LUCENE-2211.patch
>
>
> Robert had the idea to use a fake attribute inside BaseTokenStreamTestCase 
> that records if its clear() method was called. If this is not the case after 
> incrementToken(), asserTokenStreamContents fails. It also uses the attribute 
> in TeeSinkTokenFilter, because there a lot of copying, captureState and 
> restoreState() is used. By the attribute, you can track wonderful, if 
> save/restore and clearAttributes is correctly implemented. It also verifies 
> that *before* a captureState() it was also cleared (as the state will also 
> contain the clear call). Because if you consume tokens in a filter, capture 
> the consumed tokens and insert them, the capturedStates must also be cleared 
> before.
> In contrib analyzers are some test that fail to pass this additional 
> assertion. They are not fixed in the attached patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

2010-01-14 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800517#action_12800517
 ] 

Hoss Man commented on LUCENE-2209:
--

bq. p.s. hossman i only commented because you are out to get me

I'm deeply hurt that you think I am out to get you -- It's just that there are 
just some things i feel very passionate about.

It just so happens that undermining everything you do, and contradicting 
everything you say, are the two things i'm most passionate about in the whole 
wide world ... but that doesn't mean i'm out to get you.



> add @experimental javadocs tag
> --
>
> Key: LUCENE-2209
> URL: https://issues.apache.org/jira/browse/LUCENE-2209
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Javadocs
>Reporter: Robert Muir
> Attachments: LUCENE-2209.patch, LUCENE-2209.patch
>
>
> There are a lot of things marked experimental, api subject to change, etc. in 
> lucene.
> this patch simply adds a @experimental tag to common-build.xml so that we can 
> use it, for more consistency.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-14 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2211:


Attachment: LUCENE-2211.patch

Hello Uwe, i did not get time to review all tokenstreams but I converted a 
ShingleMatrix test to assertTokenStreamContents and found a clearAttributes() 
bug in it too, so it is also fixed in this one, more tokenstreams with problems 
might remain.

> Advances BaseTokenStreamTestCase that uses a fake attribute to check, if 
> clearAttributes() was called correctly - found bugs in contrib/analyzers
> -
>
> Key: LUCENE-2211
> URL: https://issues.apache.org/jira/browse/LUCENE-2211
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis, contrib/analyzers
>Affects Versions: 2.9, 2.9.1, 3.0
>Reporter: Uwe Schindler
> Fix For: 2.9.2, 3.0.1, 3.1
>
> Attachments: LUCENE-2211.patch, LUCENE-2211.patch, LUCENE-2211.patch
>
>
> Robert had the idea to use a fake attribute inside BaseTokenStreamTestCase 
> that records if its clear() method was called. If this is not the case after 
> incrementToken(), asserTokenStreamContents fails. It also uses the attribute 
> in TeeSinkTokenFilter, because there a lot of copying, captureState and 
> restoreState() is used. By the attribute, you can track wonderful, if 
> save/restore and clearAttributes is correctly implemented. It also verifies 
> that *before* a captureState() it was also cleared (as the state will also 
> contain the clear call). Because if you consume tokens in a filter, capture 
> the consumed tokens and insert them, the capturedStates must also be cleared 
> before.
> In contrib analyzers are some test that fail to pass this additional 
> assertion. They are not fixed in the attached patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

2010-01-14 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2211:


Attachment: LUCENE-2211.patch

Hi Uwe, PrefixAwareTokenFilter did not clearAttributes() either. I tried all 
others i could find but I think the rest are ok.

> Advances BaseTokenStreamTestCase that uses a fake attribute to check, if 
> clearAttributes() was called correctly - found bugs in contrib/analyzers
> -
>
> Key: LUCENE-2211
> URL: https://issues.apache.org/jira/browse/LUCENE-2211
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: Analysis, contrib/analyzers
>Affects Versions: 2.9, 2.9.1, 3.0
>Reporter: Uwe Schindler
> Fix For: 2.9.2, 3.0.1, 3.1
>
> Attachments: LUCENE-2211.patch, LUCENE-2211.patch, LUCENE-2211.patch, 
> LUCENE-2211.patch
>
>
> Robert had the idea to use a fake attribute inside BaseTokenStreamTestCase 
> that records if its clear() method was called. If this is not the case after 
> incrementToken(), asserTokenStreamContents fails. It also uses the attribute 
> in TeeSinkTokenFilter, because there a lot of copying, captureState and 
> restoreState() is used. By the attribute, you can track wonderful, if 
> save/restore and clearAttributes is correctly implemented. It also verifies 
> that *before* a captureState() it was also cleared (as the state will also 
> contain the clear call). Because if you consume tokens in a filter, capture 
> the consumed tokens and insert them, the capturedStates must also be cleared 
> before.
> In contrib analyzers are some test that fail to pass this additional 
> assertion. They are not fixed in the attached patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Created: (LUCENE-2212) add a test for PorterStemFilter

2010-01-14 Thread Robert Muir (JIRA)

add a test for PorterStemFilter
---

 Key: LUCENE-2212
 URL: https://issues.apache.org/jira/browse/LUCENE-2212
 Project: Lucene - Java
  Issue Type: Test
  Components: Analysis
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Minor
 Fix For: 3.1
 Attachments: LUCENE-2212.patch

There are no tests for PorterStemFilter, yet svn history reveals some (very 
minor) cleanups, etc.
The only thing executing its code in tests is a test or two in SmartChinese 
tests.

This patch runs the StemFilter against Martin Porter's test data set for this 
stemmer, checking for expected output.

The zip file is 100KB added to src/test, if this is too large I can change it 
to download the data instead.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2212) add a test for PorterStemFilter

2010-01-14 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2212:


Attachment: LUCENE-2212.patch

the test itself

> add a test for PorterStemFilter
> ---
>
> Key: LUCENE-2212
> URL: https://issues.apache.org/jira/browse/LUCENE-2212
> Project: Lucene - Java
>  Issue Type: Test
>  Components: Analysis
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2212.patch
>
>
> There are no tests for PorterStemFilter, yet svn history reveals some (very 
> minor) cleanups, etc.
> The only thing executing its code in tests is a test or two in SmartChinese 
> tests.
> This patch runs the StemFilter against Martin Porter's test data set for this 
> stemmer, checking for expected output.
> The zip file is 100KB added to src/test, if this is too large I can change it 
> to download the data instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2212) add a test for PorterStemFilter

2010-01-14 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2212:


Attachment: porterTestData.zip

the test data (100KB zipped), for src/test/org/apache/lucene/analysis folder.

Note: on his website he says: The Porter stemmer should be regarded as 
'frozen', that is, strictly defined, and not amenable to further modification.

So the only benefit to downloading this data instead of adding it to svn would 
be to save space.
See http://tartarus.org/~martin/PorterStemmer/

> add a test for PorterStemFilter
> ---
>
> Key: LUCENE-2212
> URL: https://issues.apache.org/jira/browse/LUCENE-2212
> Project: Lucene - Java
>  Issue Type: Test
>  Components: Analysis
>Reporter: Robert Muir
>Assignee: Robert Muir
>Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2212.patch, porterTestData.zip
>
>
> There are no tests for PorterStemFilter, yet svn history reveals some (very 
> minor) cleanups, etc.
> The only thing executing its code in tests is a test or two in SmartChinese 
> tests.
> This patch runs the StemFilter against Martin Porter's test data set for this 
> stemmer, checking for expected output.
> The zip file is 100KB added to src/test, if this is too large I can change it 
> to download the data instead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Reopened: (LUCENE-2114) Improve org.apache.lucene.search.Filter Documentation and Tests to reflect per segment readers

Re: Dynamic array reallocation algorithms

Re: Compound File Default

[jira] Created: (LUCENE-2209) add @experimental javadocs tag

[jira] Updated: (LUCENE-2209) add @experimental javadocs tag

[jira] Updated: (LUCENE-2209) add @experimental javadocs tag

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

Re: Lucene 2.9.0 Near Real Time Indexing and lock timeouts

[jira] Created: (LUCENE-2210) trectopicsreader doesn't properly read descriptions or narratives

[jira] Updated: (LUCENE-2210) trectopicsreader doesn't properly read descriptions or narratives

[jira] Assigned: (LUCENE-2210) trectopicsreader doesn't properly read descriptions or narratives

[jira] Commented: (LUCENE-2210) trectopicsreader doesn't properly read descriptions or narratives

Re: Lucene 2.9.0 Near Real Time Indexing and lock timeouts

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

[jira] Resolved: (LUCENE-2210) trectopicsreader doesn't properly read descriptions or narratives

[jira] Resolved: (LUCENE-2114) Improve org.apache.lucene.search.Filter Documentation and Tests to reflect per segment readers

Re: svn commit: r898055 - in /lucene/java/trunk/contrib/benchmark: build.xml src/java/org/apache/lucene/benchmark/byTask/feeds/LongToEnglishContentSource.java src/java/org/apache/lucene/benchmark/by

[jira] Updated: (LUCENE-2209) add @experimental javadocs tag

Hudson build is back to normal: Lucene-trunk #1062

[jira] Commented: (LUCENE-2195) Speedup CharArraySet if set is empty

[jira] Created: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

[jira] Commented: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

[jira] Commented: (LUCENE-2209) add @experimental javadocs tag

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

[jira] Updated: (LUCENE-2211) Advances BaseTokenStreamTestCase that uses a fake attribute to check, if clearAttributes() was called correctly - found bugs in contrib/analyzers

[jira] Created: (LUCENE-2212) add a test for PorterStemFilter

[jira] Updated: (LUCENE-2212) add a test for PorterStemFilter

[jira] Updated: (LUCENE-2212) add a test for PorterStemFilter

35 matches

Site Navigation

Mail list logo

Footer information