Hi all,

Please review this API enhancement that adds streams support to 
java.util.Scanner.

Scanner is essentially a regular expression matcher that matches over arbitrary input (e.g., from a file) instead of a fixed string like Matcher. Scanner will read and buffer additional input as necessary when looking for matches.

This change proposes to add two streams methods:

1. tokens(), which returns a stream of tokens delimited by the Scanner's delimiter. Scanner's default delimiter is whitespace, so the following will collect a list of whitespace-separated words from a file:

    try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
        List<String> words = sc.tokens().collect(toList());
    }

2. findAll(pattern), which returns a stream of match results generated by searching the input for the given pattern (either a Pattern or a String). For example, the following will extract from a file all words that are surrounded by "_" characters, such as _foo_ :

    try (Scanner sc = new Scanner(Paths.get(FILENAME))) {
        return sc.findAll("_([\\w]+)_")
                 .map(mr -> mr.group(1))
                 .collect(toList());
    }

Implementation notes. A Scanner is essentially already an iterator of tokens, so tokens() pretty much just wraps "this" into a stream. The findAll() methods are a wrapper around repeated calls to findWithinHorizon(pattern, 0) with a bit of refactoring to avoid converting the MatchResult to a String prematurely.

The tests are pretty straightforward, with some additional cleanups, such as using try-with-resources.

Bug:

        https://bugs.openjdk.java.net/browse/JDK-8072722

Webrev:

        http://cr.openjdk.java.net/~smarks/reviews/8072722/webrev.0/

Specdiff:

        
http://cr.openjdk.java.net/~smarks/reviews/8072722/specdiff.0/overview-summary.html

Thanks,

s'marks

Reply via email to