Thanks Stuart!
webrev has been updated accordingly based on your suggestion.
http://cr.openjdk.java.net/~sherman/8072582_8139414/webrev
-Sherman
On 6/14/16, 1:22 PM, Stuart Marks wrote:
Hi Sherman,
The fix looks good.
It would be helpful if the test for 8072582 generated the string
instead of using a literal that's more than 1K long. The exact length
is significant because Scanner's default buffer size is 1024, so the
delimiter has to straddle the buffer boundary.
The 8139414 test generates its string, which is nicer. In this case
the test is taken from the bug report, but in my opinion the addition
of the "boundary" variable (which is the string ";") makes things more
obscure. I'd suggest inlining it.
For both test cases it might be helpful to have a little utility that
appends n copies of a char to a StringBuilder.
Thanks,
s'marks
On 6/8/16 1:57 PM, Xueming Shen wrote:
Hi,
Please help review the change for
JDK-8139414: java.util.Scanner hasNext() returns true, next() throws
NoSuchElementException
JDK-8072582: Scanner delimits incorrectly when delimiter spans a
buffer boundary
issue: https://bugs.openjdk.java.net/browse/JDK-8139414
https://bugs.openjdk.java.net/browse/JDK-8072582
webrev: http://cr.openjdk.java.net/~sherman/8072582_8139414/webrev
In both cases the delimiter pattern is a kinda of "alternation" regex
construct
which can "match" the existing characters at the end of the internal
buffer as
delimiters, AND can extend to match more delimiters if more input is
available.
In issue JDK-8139414, the hasNext() uses hasTokenInBuffer() to find
the delimiters
"-;". It does not go beyond the boundary to check if there is more
character, such
as "-" that can also be part of the delimiters). So hasNext() returns
true with the
assumption that there is a token because there is/are more character
after "-;".
But method getCompleteTokenInBuffer() (used by next()
implementation), which
has the logic to check beyond the boundary even the delimiter pattern
already
has a match. It matches "-;-" as the delimiters and then find no
"next" (null)
after
that.
Similar for issue 8072582. This time the getCompleteTokenInBuffer
does not
use the "lookingAt() and beyond" logic for the second delimiters,
which triggers
problem when the delimiter pattern has different match result
(beginning position)
for cases within boundary and beyond boundary.
The proposed fix here is to always check if there is more input when
match
delimiters at the internal buffer boundary.
Thanks,
Sherman