[jira] [Commented] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil
[ https://issues.apache.org/jira/browse/LUCENE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233996#comment-13233996 ] Robert Muir commented on LUCENE-3894: - I think we have bugs in some tokenizers. Its currently very hard to reproduce and we get no random seed :( I think the issue is the maxWordLength=20. This is not long enough to catch bugs in tokenizers I think, we should exceed whatever buffersize they use for example. So I think we need to refactor this logic so that the multithreaded tests take maxWordLength, and ensure this parameter is always respected. This way, tests for things like tokenizers can bump this up to things like CharTokenizer.IO_BUFFER_SIZE*2 or whatever makes sense to them, to ensure we really test them well. I don't like the fact that only my stupid trivial test (testHugeDoc) found the IO-311 bug, what if we didn't have that silly test? I'll add a patch. > Make BaseTokenStreamTestCase a bit more evil > > > Key: LUCENE-3894 > URL: https://issues.apache.org/jira/browse/LUCENE-3894 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3894.patch, LUCENE-3894.patch, LUCENE-3894.patch > > > Throw an exception from the Reader while tokenizing, stop after not consuming > all tokens, sometimes spoon-feed chars from the reader... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil
[ https://issues.apache.org/jira/browse/LUCENE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233945#comment-13233945 ] Michael McCandless commented on LUCENE-3894: Thanks Rob! > Make BaseTokenStreamTestCase a bit more evil > > > Key: LUCENE-3894 > URL: https://issues.apache.org/jira/browse/LUCENE-3894 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3894.patch, LUCENE-3894.patch, LUCENE-3894.patch > > > Throw an exception from the Reader while tokenizing, stop after not consuming > all tokens, sometimes spoon-feed chars from the reader... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil
[ https://issues.apache.org/jira/browse/LUCENE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233893#comment-13233893 ] Robert Muir commented on LUCENE-3894: - Thats it! But this 'new read method' is not really new, its from commons-io! we should open a bug over there... > Make BaseTokenStreamTestCase a bit more evil > > > Key: LUCENE-3894 > URL: https://issues.apache.org/jira/browse/LUCENE-3894 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3894.patch, LUCENE-3894.patch, LUCENE-3894.patch > > > Throw an exception from the Reader while tokenizing, stop after not consuming > all tokens, sometimes spoon-feed chars from the reader... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3894) Make BaseTokenStreamTestCase a bit more evil
[ https://issues.apache.org/jira/browse/LUCENE-3894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13233886#comment-13233886 ] Michael McCandless commented on LUCENE-3894: I think that new read method needs to use the incoming offset (ie, pass location + offset, not location, as 2nd arg to input.read)? Does testHugeDoc then pass? > Make BaseTokenStreamTestCase a bit more evil > > > Key: LUCENE-3894 > URL: https://issues.apache.org/jira/browse/LUCENE-3894 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Michael McCandless >Assignee: Michael McCandless > Fix For: 3.6, 4.0 > > Attachments: LUCENE-3894.patch, LUCENE-3894.patch, LUCENE-3894.patch > > > Throw an exception from the Reader while tokenizing, stop after not consuming > all tokens, sometimes spoon-feed chars from the reader... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org