This looks very nice, just a minor spec comment.. > On 9 Sep 2015, at 01:53, Stuart Marks <stuart.ma...@oracle.com> wrote: > > > > On 9/4/15 1:35 AM, Paul Sandoz wrote: >> Hi Stuart, >> >> This is looking very good. >> >> Just some comments on the tricky aspect related to late-binding of the >> Stream to the scanner state: >> >> 2652 * <p>This scanner's state should not be modified during execution >> of the returned >> 2653 * stream's pipeline. Subsequent calls to any methods on this >> scanner other than >> 2654 * {@link #close} and {@link #ioException} may return undefined >> results or may cause >> 2655 * undefined effects on the returned stream. The returned stream's >> source >> 2656 * {@code Spliterator} is <em>fail-fast</em> and will, on a >> best-effort >> 2657 * basis, throw a {@link java.util.ConcurrentModificationException} >> if such >> 2658 * modification is detected. >> >> “Subsequent” is a little vague here, does it mean before, during or after >> stream pipeline execution? > >> Before: >> >> Most methods on scanner may be called before stream pipeline execution, they >> just adjust the scanner state from which to start from. If close is called >> it should result in an ISE on pipeline execution. >> >> During: >> >> Either a CME on a best-effort basis if scanner state is modified by a >> behavioural parameter, or an ISE if close is called. >> >> After: >> >> The scanner is in an indeterminate state. For further operations it needs to >> be reset, and if the scanner was closed during execution an ISE will result >> on further operations. >> >> We should try and succinctly talk about all three cases in the specification. > > Mmm, yes, can't get anything vague past you. :-) "Subsequent" should mean > anytime after initiation of the pipeline execution. But it would be better to > be a bit more explicit. Note that effects go both ways; during pipeline > execution, calls to scanner methods might cause the stream to throw CME, and > stream operations might cause scanner methods to return unspecified results. > > I think the following covers all of the before, during, and after cases. > > << Scanning starts upon initiation of the terminal stream operation, using > the current state of this scanner. Subsequent calls to any methods on this > scanner other than {@link #close} and {@link #ioException} may return > undefined results or may cause undefined effects on the returned stream. The > returned stream's source {@code Spliterator} is <em>fail-fast</em> and will, > on a best-effort basis, throw a {@link > java.util.ConcurrentModificationException} if any such calls are detected > during pipeline execution. >> > > The reset() method will reset the delimiter, locale, and radix, but it can't > reset the position in the input, so the scanner effectively cannot be reused > after any stream operation has been initiated. > > I'll add the following: > > << After stream execution completes, this scanner is left in an indeterminate > state and cannot be reused. >>
I think this note is good, but the webrev/specdiff uses the term ‘pipeline execution’. I think ‘stream execution’ is less likely to cause confusion. -Chris. > The closed behavior is separate from CME, so I'll add the following to the > text that covers the closing behavior. > > << IllegalStateException is thrown if the scanner has been closed when this > method is called, or if this scanner is closed during pipeline execution. >> > > All of the above apply to both the tokens() method and the main findAll() > method. > >> You might need to double check FindSpliterator for the before and during >> cases as i don’t think findPatternInBuffer checks if the scanner is closed, >> i think it needs an ensureOpen call in tryAdvance. > > Good catch! I've added an ensureOpen() call here and I've also add tests to > cover this case. > > Updated webrev: > > http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.3/ > <http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.3/> > > Updated specdiff: > > http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.3/overview-summary.html > > <http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.3/overview-summary.html> > > s'marks > > > >> Paul. >> >> On 4 Sep 2015, at 08:17, Stuart Marks <stuart.ma...@oracle.com> wrote: >> >>> Please review this update to the Scanner enhancement I proposed a while >>> back. [1] >>> >>> I've updated based on some discussions with Paul Sandoz. The updates since >>> the previous posting are 1) coordination of spec wording from Matcher; 2) >>> addition of ConcurrentModificationException; 3) updating tests to use the >>> streams testing framework; 4) some javadoc cleanups. >>> >>> Bug: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8072722 >>> >>> Webrev: >>> >>> http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.2/ >>> >>> Specdiff: >>> >>> >>> http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.2/overview-summary.html >>> >>> >>> For convenience, I've appended below the description from my earlier post. >>> [1] >>> >>> s'marks >>> >>> [1] >>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/034821.html >>> >>> >>> ------- >>> >>> >>> Scanner is essentially a regular expression matcher that matches over >>> arbitrary input (e.g., from a file) instead of a fixed string like Matcher. >>> Scanner will read and buffer additional input as necessary when looking for >>> matches. >>> >>> This change proposes to add two streams methods: >>> >>> 1. tokens(), which returns a stream of tokens delimited by the Scanner's >>> delimiter. Scanner's default delimiter is whitespace, so the following will >>> collect a list of whitespace-separated words from a file: >>> >>> try (Scanner sc = new Scanner(Paths.get(FILENAME))) { >>> List<String> words = sc.tokens().collect(toList()); >>> } >>> >>> 2. findAll(pattern), which returns a stream of match results generated by >>> searching the input for the given pattern (either a Pattern or a String). >>> For example, the following will extract from a file all words that are >>> surrounded by "_" characters, such as _foo_ : >>> >>> try (Scanner sc = new Scanner(Paths.get(FILENAME))) { >>> return sc.findAll("_([\\w]+)_") >>> .map(mr -> mr.group(1)) >>> .collect(toList()); >>> } >>> >>> Implementation notes. A Scanner is essentially already an iterator of >>> tokens, so tokens() pretty much just wraps "this" into a stream. The >>> findAll() methods are a wrapper around repeated calls to >>> findWithinHorizon(pattern, 0) with a bit of refactoring to avoid converting >>> the MatchResult to a String prematurely. >>> >>> The tests are pretty straightforward, with some additional cleanups, such >>> as using try-with-resources. >>> >>> -------