This looks very nice, just a minor spec comment..

> On 9 Sep 2015, at 01:53, Stuart Marks <stuart.ma...@oracle.com> wrote:
> 
> 
> 
> On 9/4/15 1:35 AM, Paul Sandoz wrote:
>> Hi Stuart,
>> 
>> This is looking very good.
>> 
>> Just some comments on the tricky aspect related to late-binding of the 
>> Stream to the scanner state:
>> 
>> 2652      * <p>This scanner's state should not be modified during execution 
>> of the returned
>> 2653      * stream's pipeline. Subsequent calls to any methods on this 
>> scanner other than
>> 2654      * {@link #close} and {@link #ioException} may return undefined 
>> results or may cause
>> 2655      * undefined effects on the returned stream. The returned stream's 
>> source
>> 2656      * {@code Spliterator} is <em>fail-fast</em> and will, on a 
>> best-effort
>> 2657      * basis, throw a {@link java.util.ConcurrentModificationException} 
>> if such
>> 2658      * modification is detected.
>> 
>> “Subsequent” is a little vague here, does it mean before, during or after 
>> stream pipeline execution?
> 
>> Before:
>> 
>> Most methods on scanner may be called before stream pipeline execution, they 
>> just adjust the scanner state from which to start from. If close is called 
>> it should result in an ISE on pipeline execution.
>> 
>> During:
>> 
>> Either a CME on a best-effort basis if scanner state is modified by a 
>> behavioural parameter, or an ISE if close is called.
>> 
>> After:
>> 
>> The scanner is in an indeterminate state. For further operations it needs to 
>> be reset, and if the scanner was closed during execution an ISE will result 
>> on further operations.
>> 
>> We should try and succinctly talk about all three cases in the specification.
> 
> Mmm, yes, can't get anything vague past you. :-) "Subsequent" should mean 
> anytime after initiation of the pipeline execution. But it would be better to 
> be a bit more explicit. Note that effects go both ways; during pipeline 
> execution, calls to scanner methods might cause the stream to throw CME, and 
> stream operations might cause scanner methods to return unspecified results.
> 
> I think the following covers all of the before, during, and after cases.
> 
> << Scanning starts upon initiation of the terminal stream operation, using 
> the current state of this scanner. Subsequent calls to any methods on this 
> scanner other than {@link #close} and {@link #ioException} may return 
> undefined results or may cause undefined effects on the returned stream. The 
> returned stream's source {@code Spliterator} is <em>fail-fast</em> and will, 
> on a best-effort basis, throw a {@link 
> java.util.ConcurrentModificationException} if any such calls are detected 
> during pipeline execution. >>
> 
> The reset() method will reset the delimiter, locale, and radix, but it can't 
> reset the position in the input, so the scanner effectively cannot be reused 
> after any stream operation has been initiated.
> 
> I'll add the following:
> 
> << After stream execution completes, this scanner is left in an indeterminate 
> state and cannot be reused. >>

I think this note is good, but the webrev/specdiff uses the term ‘pipeline 
execution’. I think ‘stream execution’ is less likely to cause confusion.

-Chris.


> The closed behavior is separate from CME, so I'll add the following to the 
> text that covers the closing behavior.
> 
> << IllegalStateException is thrown if the scanner has been closed when this 
> method is called, or if this scanner is closed during pipeline execution. >>
> 
> All of the above apply to both the tokens() method and the main findAll() 
> method.
> 
>> You might need to double check FindSpliterator for the before and during 
>> cases as i don’t think findPatternInBuffer checks if the scanner is closed, 
>> i think it needs an ensureOpen call in tryAdvance.
> 
> Good catch! I've added an ensureOpen() call here and I've also add tests to 
> cover this case.
> 
> Updated webrev:
> 
>  http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.3/ 
> <http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.3/>
> 
> Updated specdiff:
> 
> http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.3/overview-summary.html
>  
> <http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.3/overview-summary.html>
> 
> s'marks
> 
> 
> 
>> Paul.
>> 
>> On 4 Sep 2015, at 08:17, Stuart Marks <stuart.ma...@oracle.com> wrote:
>> 
>>> Please review this update to the Scanner enhancement I proposed a while 
>>> back. [1]
>>> 
>>> I've updated based on some discussions with Paul Sandoz. The updates since 
>>> the previous posting are 1) coordination of spec wording from Matcher; 2) 
>>> addition of ConcurrentModificationException; 3) updating tests to use the 
>>> streams testing framework; 4) some javadoc cleanups.
>>> 
>>> Bug:
>>> 
>>>     https://bugs.openjdk.java.net/browse/JDK-8072722
>>> 
>>> Webrev:
>>> 
>>>     http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.2/
>>> 
>>> Specdiff:
>>> 
>>>     
>>> http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.2/overview-summary.html
>>> 
>>> 
>>> For convenience, I've appended below the description from my earlier post. 
>>> [1]
>>> 
>>> s'marks
>>> 
>>> [1] 
>>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-August/034821.html
>>> 
>>> 
>>> -------
>>> 
>>> 
>>> Scanner is essentially a regular expression matcher that matches over 
>>> arbitrary input (e.g., from a file) instead of a fixed string like Matcher. 
>>> Scanner will read and buffer additional input as necessary when looking for 
>>> matches.
>>> 
>>> This change proposes to add two streams methods:
>>> 
>>> 1. tokens(), which returns a stream of tokens delimited by the Scanner's 
>>> delimiter. Scanner's default delimiter is whitespace, so the following will 
>>> collect a list of whitespace-separated words from a file:
>>> 
>>>    try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
>>>        List<String> words = sc.tokens().collect(toList());
>>>    }
>>> 
>>> 2. findAll(pattern), which returns a stream of match results generated by 
>>> searching the input for the given pattern (either a Pattern or a String). 
>>> For example, the following will extract from a file all words that are 
>>> surrounded by "_" characters, such as _foo_ :
>>> 
>>>    try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
>>>        return sc.findAll("_([\\w]+)_")
>>>                 .map(mr -> mr.group(1))
>>>                 .collect(toList());
>>>    }
>>> 
>>> Implementation notes. A Scanner is essentially already an iterator of 
>>> tokens, so tokens() pretty much just wraps "this" into a stream. The 
>>> findAll() methods are a wrapper around repeated calls to 
>>> findWithinHorizon(pattern, 0) with a bit of refactoring to avoid converting 
>>> the MatchResult to a String prematurely.
>>> 
>>> The tests are pretty straightforward, with some additional cleanups, such 
>>> as using try-with-resources.
>>> 
>>> -------

Reply via email to