[jira] [Commented] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233996#comment-13233996
 ] 

Robert Muir commented on LUCENE-3894:
-

I think we have bugs in some tokenizers. Its currently very hard to reproduce 
and we get no random seed :(

I think the issue is the maxWordLength=20. This is not long enough to catch 
bugs in tokenizers I think,
we should exceed whatever buffersize they use for example.

So I think we need to refactor this logic so that the multithreaded tests take 
maxWordLength, and ensure
this parameter is always respected.

This way, tests for things like tokenizers can bump this up to things like 
CharTokenizer.IO_BUFFER_SIZE*2
or whatever makes sense to them, to ensure we really test them well.

I don't like the fact that only my stupid trivial test (testHugeDoc) found the 
IO-311 bug, what if we
didn't have that silly test? 

I'll add a patch.

> Make BaseTokenStreamTestCase a bit more evil
> 
>
> Key: LUCENE-3894
> URL: https://issues.apache.org/jira/browse/LUCENE-3894
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3894.patch, LUCENE-3894.patch, LUCENE-3894.patch
>
>
> Throw an exception from the Reader while tokenizing, stop after not consuming 
> all tokens, sometimes spoon-feed chars from the reader...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233945#comment-13233945
 ] 

Michael McCandless commented on LUCENE-3894:


Thanks Rob!

> Make BaseTokenStreamTestCase a bit more evil
> 
>
> Key: LUCENE-3894
> URL: https://issues.apache.org/jira/browse/LUCENE-3894
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3894.patch, LUCENE-3894.patch, LUCENE-3894.patch
>
>
> Throw an exception from the Reader while tokenizing, stop after not consuming 
> all tokens, sometimes spoon-feed chars from the reader...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233893#comment-13233893
 ] 

Robert Muir commented on LUCENE-3894:
-

Thats it! But this 'new read method' is not really new, its from commons-io! we 
should open a bug over there...

> Make BaseTokenStreamTestCase a bit more evil
> 
>
> Key: LUCENE-3894
> URL: https://issues.apache.org/jira/browse/LUCENE-3894
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3894.patch, LUCENE-3894.patch, LUCENE-3894.patch
>
>
> Throw an exception from the Reader while tokenizing, stop after not consuming 
> all tokens, sometimes spoon-feed chars from the reader...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil

2012-03-20 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233886#comment-13233886
 ] 

Michael McCandless commented on LUCENE-3894:


I think that new read method needs to use the incoming offset (ie, pass 
location + offset, not location, as 2nd arg to input.read)?  Does testHugeDoc 
then pass?

> Make BaseTokenStreamTestCase a bit more evil
> 
>
> Key: LUCENE-3894
> URL: https://issues.apache.org/jira/browse/LUCENE-3894
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3894.patch, LUCENE-3894.patch, LUCENE-3894.patch
>
>
> Throw an exception from the Reader while tokenizing, stop after not consuming 
> all tokens, sometimes spoon-feed chars from the reader...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org