Re: Raw String Literal Library Support

2018-03-21 Thread Jim Laskey
One more change set.

trimIndent -> stripIndent
trimMarkers -> stripMarkers



> On Mar 20, 2018, at 10:35 AM, Jim Laskey  wrote:
> 
> Summary.
> 
> A. Line support.
> 
> - Supporting a mix of line terminators `\n|\r\n|\r` is already a well 
> established pattern in language parsers, in the JDK (ex. see  
> java.nio.file.FileChannelLinesSpliterator) and RegEx (ex. see `\R`). The 
> performance difference between checking one vs the three is negligible.
> 
> - Yes, Stream stream = 
> Pattern.compile("\n|\r\n|\r").splitAsStream(string); is very useful 
> (Spliterators rule), but is cumbersome in this expected to be common use 
> case. Only so-so streamy. :-)
> 
> - BufferedRead.lines() vs. String.lines() is a tricky discussion. It comes 
> down to whether the new line is a terminator or a separator.  In the i/o 
> case, it seems terminator is the right answer. A well formed text file will 
> have a new line at the end of every line.  However, I think you’ll find when 
> people work with multi-line strings they think of new line as a separator. 
> Hence, the common use of split(“\n”) and “”.split(“\n”).length == 1. 
> Indentation, the position of closing delimiter and margin trimming makes that 
> last line very fluid.
> 
> What clinches the deal is that  
> string.lines().collect(joining(“\n”)).equals(string). I’ll ensure both 
> versions of lines() have the difference well javadocumented.
> 
> - The current Spliterator implementation makes 
> String.lines().toArray(String[]::new) an order of magnitude faster than 
> split(`\n|\r\n|\r`). That’s why I implemented it for margin management. 
> Faster still if no collection/array is constructed.
> 
> BTW: split(`\R`) is 2x-3x faster than split(`\n|\r\n|\r`). Nice.
> 
> B. Additions to basic trim methods.
> 
> - Revamped to become strip, stripLeading, stripTrailing using 
> Character.isWhiteSpace(codepoint) as the test (optimized using ch == ‘ ' || 
> ch == ‘\t’ || Character.isWhiteSpace(ch)).
> 
> - No strong feeling about it, but String.trim() could be recommended for 
> deprecation.
> 
> C. Margin management.
> 
> - String.trimMarkers() as a default to String.trimMarkers(“|”, “|”) is 
> reasonable.  Will put it in the CSR for broader discussion.
> 
> - Re use of patterns. I think the Stream lines() method will make it 
> very easy enough to create custom trim margin lambdas.
> 
> D. Escape management.
> 
> - Good
> 
> Cheers,
> 
> — Jim
> 
> 
> 
> 
>> On Mar 13, 2018, at 10:47 AM, Jim Laskey  wrote:
>> 
>> With the announcement of JEP 326 Raw String Literals, we would like to open 
>> up a discussion with regards to RSL library support. Below are several 
>> implemented String methods that are believed to be appropriate. Please 
>> comment on those mentioned below including recommending alternate names or 
>> signatures. Additional methods can be considered if warranted, but as 
>> always, the bar for inclusion in String is high.
>> 
>> You should keep a couple things in mind when reviewing these methods.
>> 
>> Methods should be applicable to all strings, not just Raw String Literals.
>> 
>> The number of additional methods should be minimized, not adding every 
>> possible method.
>> 
>> Don't put any emphasis on performance. That is a separate discussion.
>> 
>> Cheers,
>> 
>> -- Jim
>> 
>> A. Line support.
>> 
>> public Stream lines()
>> Returns a stream of substrings extracted from this string partitioned by 
>> line terminators. Internally, the stream is implemented using a 
>> Spliteratorthat extracts one line at a time. The line terminators recognized 
>> are \n, \r\n and \r. This method provides versatility for the developer 
>> working with multi-line strings.
>>Example:
>> 
>>   String string = "abc\ndef\nghi";
>>   Stream stream = string.lines();
>>   List list = stream.collect(Collectors.toList());
>> 
>>Result:
>> 
>>[abc, def, ghi]
>> 
>> 
>>Example:
>> 
>>   String string = "abc\ndef\nghi";
>>   String[] array = string.lines().toArray(String[]::new);
>> 
>>Result:
>> 
>>[Ljava.lang.String;@33e5ccce // [abc, def, ghi]
>> 
>> 
>>Example:
>> 
>>   String string = "abc\ndef\r\nghi\rjkl";
>>   String platformString =
>>   string.lines().collect(joining(System.lineSeparator()));
>> 
>>Result:
>> 
>>abc
>>def
>>ghi
>>jkl
>> 
>> 
>>Example:
>> 
>>   String string = " abc  \n   def  \n ghi   ";
>>   String trimmedString =
>>string.lines().map(s -> s.trim()).collect(joining("\n"));
>> 
>>Result:
>> 
>>abc
>>def
>>ghi
>> 
>> 
>>Example:
>> 
>>   String table = `First Name  SurnamePhone
>>   Al  Albert 555-
>>   Bob Roberts555-
>>   Cal Calvin 555-
>>  `;
>> 
>>   // Extract headers
>>   String firstLine = 

Re: Raw String Literal Library Support

2018-03-20 Thread Jim Laskey
Summary.

A. Line support.

- Supporting a mix of line terminators `\n|\r\n|\r` is already a well 
established pattern in language parsers, in the JDK (ex. see  
java.nio.file.FileChannelLinesSpliterator) and RegEx (ex. see `\R`). The 
performance difference between checking one vs the three is negligible.

- Yes, Stream stream = 
Pattern.compile("\n|\r\n|\r").splitAsStream(string); is very useful 
(Spliterators rule), but is cumbersome in this expected to be common use case. 
Only so-so streamy. :-)

- BufferedRead.lines() vs. String.lines() is a tricky discussion. It comes down 
to whether the new line is a terminator or a separator.  In the i/o case, it 
seems terminator is the right answer. A well formed text file will have a new 
line at the end of every line.  However, I think you’ll find when people work 
with multi-line strings they think of new line as a separator. Hence, the 
common use of split(“\n”) and “”.split(“\n”).length == 1. Indentation, the 
position of closing delimiter and margin trimming makes that last line very 
fluid.

What clinches the deal is that  
string.lines().collect(joining(“\n”)).equals(string). I’ll ensure both versions 
of lines() have the difference well javadocumented.

- The current Spliterator implementation makes 
String.lines().toArray(String[]::new) an order of magnitude faster than 
split(`\n|\r\n|\r`). That’s why I implemented it for margin management. Faster 
still if no collection/array is constructed.

BTW: split(`\R`) is 2x-3x faster than split(`\n|\r\n|\r`). Nice.

B. Additions to basic trim methods.

- Revamped to become strip, stripLeading, stripTrailing using 
Character.isWhiteSpace(codepoint) as the test (optimized using ch == ‘ ' || ch 
== ‘\t’ || Character.isWhiteSpace(ch)).

- No strong feeling about it, but String.trim() could be recommended for 
deprecation.

C. Margin management.

- String.trimMarkers() as a default to String.trimMarkers(“|”, “|”) is 
reasonable.  Will put it in the CSR for broader discussion.

- Re use of patterns. I think the Stream lines() method will make it 
very easy enough to create custom trim margin lambdas.

D. Escape management.

- Good

Cheers,

— Jim




> On Mar 13, 2018, at 10:47 AM, Jim Laskey  wrote:
> 
> With the announcement of JEP 326 Raw String Literals, we would like to open 
> up a discussion with regards to RSL library support. Below are several 
> implemented String methods that are believed to be appropriate. Please 
> comment on those mentioned below including recommending alternate names or 
> signatures. Additional methods can be considered if warranted, but as always, 
> the bar for inclusion in String is high.
> 
> You should keep a couple things in mind when reviewing these methods.
> 
> Methods should be applicable to all strings, not just Raw String Literals.
> 
> The number of additional methods should be minimized, not adding every 
> possible method.
> 
> Don't put any emphasis on performance. That is a separate discussion.
> 
> Cheers,
> 
> -- Jim
> 
> A. Line support.
> 
> public Stream lines()
> Returns a stream of substrings extracted from this string partitioned by line 
> terminators. Internally, the stream is implemented using a Spliteratorthat 
> extracts one line at a time. The line terminators recognized are \n, \r\n and 
> \r. This method provides versatility for the developer working with 
> multi-line strings.
> Example:
> 
>String string = "abc\ndef\nghi";
>Stream stream = string.lines();
>List list = stream.collect(Collectors.toList());
> 
> Result:
> 
> [abc, def, ghi]
> 
> 
> Example:
> 
>String string = "abc\ndef\nghi";
>String[] array = string.lines().toArray(String[]::new);
> 
> Result:
> 
> [Ljava.lang.String;@33e5ccce // [abc, def, ghi]
> 
> 
> Example:
> 
>String string = "abc\ndef\r\nghi\rjkl";
>String platformString =
>string.lines().collect(joining(System.lineSeparator()));
> 
> Result:
> 
> abc
> def
> ghi
> jkl
> 
> 
> Example:
> 
>String string = " abc  \n   def  \n ghi   ";
>String trimmedString =
> string.lines().map(s -> s.trim()).collect(joining("\n"));
> 
> Result:
> 
> abc
> def
> ghi
> 
> 
> Example:
> 
>String table = `First Name  SurnamePhone
>Al  Albert 555-
>Bob Roberts555-
>Cal Calvin 555-
>   `;
> 
>// Extract headers
>String firstLine = table.lines().findFirst​().orElse("");
>List headings = List.of(firstLine.trim().split(`\s{2,}`));
> 
>// Build stream of maps
>Stream> stream =
>table.lines().skip(1)
> .map(line -> line.trim())
> .filter(line -> !line.isEmpty())
>

Re: Raw String Literal Library Support

2018-03-16 Thread Michael Hixson
On Fri, Mar 16, 2018 at 8:58 AM, Stephen Colebourne
 wrote:
> On 14 March 2018 at 23:05, Michael Hixson  wrote:
>> For example, does ``.lines() produce an empty stream?
>
> I believe `` is a compile error.
> (A mistake IMO, but necessary if you have unlimited delimiters)

Ah, oops.  I meant to ask about calling lines() on the empty string.  "".lines()

Looking at the implementation from a week ago [1], I think it
disagrees with BufferedReader about what lines are - specifically when
it comes to the empty string and any string ending with a line
separator.  That seems not good.  But that behavior isn't specified in
Jim's description or examples so I'm wondering if that's intentional.
(Or I could be reading the code wrong).

[1] 
http://hg.openjdk.java.net/amber/amber/file/5a2e574f43fb/src/java.base/share/classes/java/lang/StringLatin1.java#l560

-Michael

>
> Stephen


Re: Raw String Literal Library Support

2018-03-16 Thread Stephen Colebourne
On 14 March 2018 at 23:05, Michael Hixson  wrote:
> For example, does ``.lines() produce an empty stream?

I believe `` is a compile error.
(A mistake IMO, but necessary if you have unlimited delimiters)

Stephen


Re: Raw String Literal Library Support

2018-03-15 Thread Alan Bateman

On 13/03/2018 13:47, Jim Laskey wrote:

:

We are recommending that trimLeft and trimRight use UWS, leave trim alone to 
avoid breaking the world and then possibly introduce trimWhitespace that uses 
UWS.
Right, it would too risky to change rim() as it goes all the way back to 
JDK 1.0.


If you introduce a method named "trimWhitespace" then I think it would 
be a surprising if were not aligned with "isWhitespace". I also share 
Stuart's concerns about the handling of control characters in legacy 
trim (or TWS in the proposal). Can you expand a bit on why UWS was 
recommended?


-Alan



Re: Raw String Literal Library Support

2018-03-15 Thread Stephen Colebourne
On 14 March 2018 at 23:55, Stuart Marks  wrote:
> So, how about we define trimLeft, trimRight, and trimWhitespace
> all in terms of Character.isWhitespace?

This seems like a reasonable approach. I'd expect tab to be trimmed for example.

Commons-Lang is a good source to consider when looking at naming.
https://commons.apache.org/proper/commons-lang/javadocs/api-release/org/apache/commons/lang3/StringUtils.html

Rather than re-using "trim", commons-lang uses "strip". So you have
strip(), stripLeft() and stripRight().

If you want to stick with "trim", how about trimAll() instead of
trimWhitespace(). Shorter and more obvious I think. Otherwise, I think
you'd need trimWhitespaceLeft() and trimWhitespaceRight() to match.

In line with these whitespace methods, I'd like to see isBlank()
added, could also be named isWhitespace(). The existing isEmpty()
method is fine, but a lot of the time user input validation routines
want to base their decision on "empty once trimmed". str.isBlank()
would be the same as str.trimAll().isEmpty() but without the object
creation.

Finally, a constant for EMPTY has always been missing from
java.lang.String. It would be great to add it.

Stephen


Re: Raw String Literal Library Support

2018-03-14 Thread Stuart Marks

Hi Jim,

Some comments (really, mainly just quibbles) about string trimming. First,

* String.trim trims characters <= \u0020 from each end of a string. I agree that 
String.trim should be preserved unchanged for compatibility purposes.


* The trimLeft, trimRight, and trimWhitespace (which trims both ends) methods 
make sense. These three should all use the same definition of whitespace.


* My issue concerns what definition of whitespace they use.

What you outlined in the quoted section below doesn't line up with the 
definitions in the API spec.


The existing methods Character.isSpaceChar(codepoint) and 
Character.isWhitespace(codepoint) are well-defined but somewhat different 
notions of whitespace.


**

The Character.isSpaceChar method returns true if the code point is a member of 
any of these categories:


SPACE_SEPARATOR
LINE_SEPARATOR
PARAGRAPH_SEPARATOR

In JDK 10, which conforms to Unicode 8.0.0, the SPACE_SEPARATOR category 
includes the following characters:


U+0020 SPACE
U+00A0 NO-BREAK SPACE
U+1680 OGHAM SPACE MARK
U+2000 EN QUAD
U+2001 EM QUAD
U+2002 EN SPACE
U+2003 EM SPACE
U+2004 THREE-PER-EM SPACE
U+2005 FOUR-PER-EM SPACE
U+2006 SIX-PER-EM SPACE
U+2007 FIGURE SPACE
U+2008 PUNCTUATION SPACE
U+2009 THIN SPACE
U+200A HAIR SPACE
U+202F NARROW NO-BREAK SPACE
U+205F MEDIUM MATHEMATICAL SPACE
U+3000 IDEOGRAPHIC SPACE

The LINE_SEPARATOR category contains this one character:

U+2028 LINE SEPARATOR

And the PARAGRAPH_SEPARATOR category contains just this one character:

U+2029 PARAGRAPH SEPARATOR

**

Meanwhile, the Character.isWhitespace method returns true if the code point is 
in one of these categories:


SPACE_SEPARATOR, excluding
U+00A0 NO-BREAK SPACE
U+2007 FIGURE SPACE
U+202F NARROW NO-BREAK SPACE
LINE_SEPARATOR
PARAGRAPH_SEPARATOR

or if it is one of these characters:

U+0009 HORIZONTAL TABULATION.
U+000A LINE FEED.
U+000B VERTICAL TABULATION.
U+000C FORM FEED.
U+000D CARRIAGE RETURN.
U+001C FILE SEPARATOR.
U+001D GROUP SEPARATOR.
U+001E RECORD SEPARATOR.
U+001F UNIT SEPARATOR.

**

You mentioned several different definitions of whitespace:

 - trim's whitespace (TWS): chars <= U+0020
 - Character's whitespace (CWS): I'm not sure what you meant by this
 - union whitespace (UWS): union of TWS and CWS

I don't think we should be creating a new definition of whitespace, such as UWS, 
if at all possible. TWS is strange in that it contains a bunch of control 
characters that aren't necessarily whitespace, and it omits Unicode whitespace. 
Character.isSpaceChar includes various no-break spaces, which I don't think 
should be trimmed away, and it also omits various ASCII white space characters, 
which I think most programmers would find surprising.


Finally, Character.isWhitespace includes the ASCII whitespace characters and 
Unicode space separators, but excludes no-break spaces. This makes the most 
sense to me. So, how about we define trimLeft, trimRight, and trimWhitespace all 
in terms of Character.isWhitespace?


s'marks




On 3/13/18 6:47 AM, Jim Laskey wrote:

B. Additions to basic trim methods. In addition to margin methods trimIndent 
and trimMarkers described below in Section C, it would be worth introducing 
trimLeft and trimRight to augment the longstanding trim method. A key question 
is how trimLeft and trimRight should detect whitespace, because different 
definitions of whitespace exist in the library.

trim itself uses the simple test less than or equal to the space character, a 
fast test but not Unicode friendly.

Character.isWhitespace(codepoint) returns true if codepoint one of the 
following;

SPACE_SEPARATOR.
LINE_SEPARATOR.
PARAGRAPH_SEPARATOR.
'\t', U+0009 HORIZONTAL TABULATION.
'\n', U+000A LINE FEED.
'\u000B', U+000B VERTICAL TABULATION.
'\f', U+000C FORM FEED.
'\r', U+000D CARRIAGE RETURN.
'\u001C', U+001C FILE SEPARATOR.
'\u001D', U+001D GROUP SEPARATOR.
'\u001E', U+001E RECORD SEPARATOR.
'\u001F', U+001F UNIT SEPARATOR.
' ',  U+0020 SPACE.
(Note: that non-breaking space (\u00A0) is excluded)

Character.isSpaceChar(codepoint) returns true if codepoint one of the following;

SPACE_SEPARATOR.
LINE_SEPARATOR.
PARAGRAPH_SEPARATOR.
' ',  U+0020 SPACE.
'\u00A0', U+00A0 NON-BREAKING SPACE.
That sets up several kinds of whitespace; trim's whitespace (TWS), Character 
whitespace (CWS) and the union of the two (UWS). TWS is a fast test. CWS is a 
slow test. UWS is fast for Latin1 and slow-ish for UTF-16.

We are recommending that trimLeft and trimRight use UWS, leave trim alone to 
avoid breaking the world and then possibly introduce trimWhitespace that uses 
UWS.

public String trim()
Removes characters less than equal to space from the beginning and end of the 
string. No, change except spec clarification and 

Re: Raw String Literal Library Support

2018-03-14 Thread Michael Hixson
Hi Jim,

Does string.lines() agree with new BufferedReader(new
StringReader(string)).lines() on what the lines are for all inputs?
For example, does ``.lines() produce an empty stream?

-Michael

On Tue, Mar 13, 2018 at 6:47 AM, Jim Laskey  wrote:
> With the announcement of JEP 326 Raw String Literals, we would like to open 
> up a discussion with regards to RSL library support. Below are several 
> implemented String methods that are believed to be appropriate. Please 
> comment on those mentioned below including recommending alternate names or 
> signatures. Additional methods can be considered if warranted, but as always, 
> the bar for inclusion in String is high.
>
> You should keep a couple things in mind when reviewing these methods.
>
> Methods should be applicable to all strings, not just Raw String Literals.
>
> The number of additional methods should be minimized, not adding every 
> possible method.
>
> Don't put any emphasis on performance. That is a separate discussion.
>
> Cheers,
>
> -- Jim
>
> A. Line support.
>
> public Stream lines()
> Returns a stream of substrings extracted from this string partitioned by line 
> terminators. Internally, the stream is implemented using a Spliteratorthat 
> extracts one line at a time. The line terminators recognized are \n, \r\n and 
> \r. This method provides versatility for the developer working with 
> multi-line strings.
>  Example:
>
> String string = "abc\ndef\nghi";
> Stream stream = string.lines();
> List list = stream.collect(Collectors.toList());
>
>  Result:
>
>  [abc, def, ghi]
>
>
>  Example:
>
> String string = "abc\ndef\nghi";
> String[] array = string.lines().toArray(String[]::new);
>
>  Result:
>
>  [Ljava.lang.String;@33e5ccce // [abc, def, ghi]
>
>
>  Example:
>
> String string = "abc\ndef\r\nghi\rjkl";
> String platformString =
> string.lines().collect(joining(System.lineSeparator()));
>
>  Result:
>
>  abc
>  def
>  ghi
>  jkl
>
>
>  Example:
>
> String string = " abc  \n   def  \n ghi   ";
> String trimmedString =
>  string.lines().map(s -> s.trim()).collect(joining("\n"));
>
>  Result:
>
>  abc
>  def
>  ghi
>
>
>  Example:
>
> String table = `First Name  SurnamePhone
> Al  Albert 555-
> Bob Roberts555-
> Cal Calvin 555-
>`;
>
> // Extract headers
> String firstLine = table.lines().findFirst().orElse("");
> List headings = List.of(firstLine.trim().split(`\s{2,}`));
>
> // Build stream of maps
> Stream> stream =
> table.lines().skip(1)
>  .map(line -> line.trim())
>  .filter(line -> !line.isEmpty())
>  .map(line -> line.split(`\s{2,}`))
>  .map(columns -> {
>  List values = List.of(columns);
>  return IntStream.range(0, headings.size()).boxed()
>  .collect(toMap(headings::get, 
> values::get));
>  });
>
> // print all "First Name"
> stream.map(row -> row.get("First Name"))
>   .forEach(name -> System.out.println(name));
>
>  Result:
>
>  Al
>  Bob
>  Cal
> B. Additions to basic trim methods. In addition to margin methods trimIndent 
> and trimMarkers described below in Section C, it would be worth introducing 
> trimLeft and trimRight to augment the longstanding trim method. A key 
> question is how trimLeft and trimRight should detect whitespace, because 
> different definitions of whitespace exist in the library.
>
> trim itself uses the simple test less than or equal to the space character, a 
> fast test but not Unicode friendly.
>
> Character.isWhitespace(codepoint) returns true if codepoint one of the 
> following;
>
>SPACE_SEPARATOR.
>LINE_SEPARATOR.
>PARAGRAPH_SEPARATOR.
>'\t', U+0009 HORIZONTAL TABULATION.
>'\n', U+000A LINE FEED.
>'\u000B', U+000B VERTICAL TABULATION.
>'\f', U+000C FORM FEED.
>'\r', U+000D CARRIAGE RETURN.
>'\u001C', U+001C FILE SEPARATOR.
>'\u001D', U+001D GROUP SEPARATOR.
>'\u001E', U+001E RECORD SEPARATOR.
>'\u001F', U+001F UNIT SEPARATOR.
>' ',  U+0020 SPACE.
> (Note: that non-breaking space (\u00A0) is excluded)
>
> Character.isSpaceChar(codepoint) returns true if codepoint one of the 
> following;
>
>SPACE_SEPARATOR.
>LINE_SEPARATOR.
>PARAGRAPH_SEPARATOR.
>' ',  U+0020 SPACE.
>'\u00A0', U+00A0 NON-BREAKING SPACE.
> That sets up several kinds of whitespace; trim's whitespace (TWS), Character 
> whitespace (CWS) and the union of the two (UWS). TWS is a fast test. 

Re: Raw String Literal Library Support

2018-03-14 Thread Remi Forax
doh,
sorry for the tangential comment, it was the only comment i had, all other 
methods are fine.

Rémi

- Mail original -
> De: "Brian Goetz" <brian.go...@oracle.com>
> À: "Peter Levart" <peter.lev...@gmail.com>, "Xueming Shen" 
> <xueming.s...@oracle.com>, "core-libs-dev"
> <core-libs-dev@openjdk.java.net>
> Envoyé: Mercredi 14 Mars 2018 16:26:30
> Objet: Re: Raw String Literal Library Support

> Perhaps we can "split" this discussion on splitting into a separate
> thread.  What's happened here is what always happens, which is:
> 
>  - Jim spent a lot of time and effort writing a comprehensive and clear
> proposal;
>  - Someone made a tangential comment on one aspect of it;
>  - Flood of deep-dive responses on that aspect;
>  - Everyone chimes in designing their favorite method not proposed;
>  - No one ever comes back to the substance of the proposal.
> 
> Hitting the reset button...
> 
> 
> On 3/14/2018 9:11 AM, Peter Levart wrote:
>> I think that:
>>
>> String delim = ...;
>> String r =
>> s.splits(Pattern.quote(delim)).collect(Collectors.joining(delim));
>>
>> ... should always produce a result such that r.equals(s);
>>
>>
>> Otherwise, is it wise to add methods that take a regex as a String? It
>> is rarely needed for a regex parameter to be dynamic. Usually a
>> constant is specified. Are there any plans for Java to support Pattern
>> constants? With constant dynamic support they would be trivial to
>> implement in bytecode. If there are any such plans, then the methods
>> should perhaps take a Pattern instead.
>>
>> syntax suggestion:
>>
>> '~' is an unary operator for bit-wise negation of integer values. It
>> could be overloaded for String(s) such that the following two were
>> equivalent:
>>
>> ~ string
>> Pattern.compile(string)
>>
>> Now if 'string' above is a constant, '~ string' could be a constant
>> too. Combined with raw string literals, Pattern constants could be
>> very compact.
>>
>>
>> What do you think?
>>
>> Regards, Peter
>>
>> On 03/14/2018 02:35 AM, Xueming Shen wrote:
>>> On 3/13/18, 5:12 PM, Jonathan Bluett-Duncan wrote:
>>>> Paul,
>>>>
>>>> AFAICT, one sort of behaviour which String.split() allows which
>>>> Pattern.splitAsStream() and the proposed String.splits() don't is
>>>> allowing
>>>> a negative limit, e.g. String.split(string, -1).
>>>>
>>>> Over at http://errorprone.info/bugpattern/StringSplitter, they argue
>>>> that a
>>>> limit of -1 has less surprising behaviour than the default of 0,
>>>> because
>>>> e.g. "".split(":") produces [] (empty array), whereas ":".split(":")
>>>> produces [""] (array with an empty string), which IMO is not
>>>> consistent.
>>>>
>>>> This compares with ":".split(":", -1) and "".split(":", -1) which
>>>> produce
>>>> ["", ""] (array with two empty strings, each representing ends of
>>>> `:`) and
>>>> [] (empty array) respectively - more consistent IMO.
>>>>
>>>> Should String.splits(`\n|\r\n?`) follow the behaviour of
>>>> String.split(...,
>>>> 0) or String.split(..., -1)?  I'd personally argue for the latter.
>>>
>>> While these look really confusing, but ":".split(":", n) and
>>> "".split(":", n) are really two
>>> different scenario. One is for a matched delimiter and the other is a
>>> split with no
>>> matched delimiter, in which the spec specifies clearly that it
>>> returns the original string,
>>> in this case the empty string "". Arguably these two don't have to be
>>> "consistent".
>>>
>>> Personally I think the returned list/array from string.split(regex,
>>> -1) might be kinda of
>>> "surprising" for end user, in which it has a "trailing" empty string,
>>> as it appears to be
>>> useless in most use scenario and you probably have to do some special
>>> deal with it.
>>>
>>> -Sherman
>>>
>>>
>>>
>>>>
>>>> Cheers,
>>>> Jonathan
>>>>
>>>> On 13 March 2018 at 23:22, Paul Sandoz<paul.san...@oracle.com>  wrote:
>>>>
>>>>>
>>>>>> On Mar 13, 2018, at 3:49 PM, John Rose<john.r.r...@oracle.com>
>>>>>> wrote:
>>>>>>
>>>>>> On Mar 13, 2018, at 6:47 AM, Jim Laskey<james.las...@oracle.com>
>>>>>> wrote:
>>>>>>> …
>>>>>>> A. Line support.
>>>>>>>
>>>>>>> public Stream  lines()
>>>>>>>
>>>>>> Suggest factoring this as:
>>>>>>
>>>>>> public Stream  splits(String regex) { }
>>>>> +1
>>>>>
>>>>> This is a natural companion to the existing array returning method
>>>>> (as it
>>>>> was the case on Pattern when we added splitAsStream), where one can
>>>>> use a
>>>>> limit() operation to achieve the same effect as the limit parameter
>>>>> on the
>>>>> array returning method.
>>>>>
>>>>>
>>>>>> public Stream  lines() { return splits(`\n|\r\n?`); }
>>>>>>
>>>>> See also Files/BufferedReader.lines. (Without going into details
>>>>> Files.lines has some interesting optimizations.)
>>>>>
>>>>> Paul.
>>>


Re: Raw String Literal Library Support

2018-03-14 Thread Brian Goetz
Perhaps we can "split" this discussion on splitting into a separate 
thread.  What's happened here is what always happens, which is:


 - Jim spent a lot of time and effort writing a comprehensive and clear 
proposal;

 - Someone made a tangential comment on one aspect of it;
 - Flood of deep-dive responses on that aspect;
 - Everyone chimes in designing their favorite method not proposed;
 - No one ever comes back to the substance of the proposal.

Hitting the reset button...


On 3/14/2018 9:11 AM, Peter Levart wrote:

I think that:

String delim = ...;
String r = 
s.splits(Pattern.quote(delim)).collect(Collectors.joining(delim));


... should always produce a result such that r.equals(s);


Otherwise, is it wise to add methods that take a regex as a String? It 
is rarely needed for a regex parameter to be dynamic. Usually a 
constant is specified. Are there any plans for Java to support Pattern 
constants? With constant dynamic support they would be trivial to 
implement in bytecode. If there are any such plans, then the methods 
should perhaps take a Pattern instead.


syntax suggestion:

'~' is an unary operator for bit-wise negation of integer values. It 
could be overloaded for String(s) such that the following two were 
equivalent:


~ string
Pattern.compile(string)

Now if 'string' above is a constant, '~ string' could be a constant 
too. Combined with raw string literals, Pattern constants could be 
very compact.



What do you think?

Regards, Peter

On 03/14/2018 02:35 AM, Xueming Shen wrote:

On 3/13/18, 5:12 PM, Jonathan Bluett-Duncan wrote:

Paul,

AFAICT, one sort of behaviour which String.split() allows which
Pattern.splitAsStream() and the proposed String.splits() don't is 
allowing

a negative limit, e.g. String.split(string, -1).

Over at http://errorprone.info/bugpattern/StringSplitter, they argue 
that a
limit of -1 has less surprising behaviour than the default of 0, 
because

e.g. "".split(":") produces [] (empty array), whereas ":".split(":")
produces [""] (array with an empty string), which IMO is not 
consistent.


This compares with ":".split(":", -1) and "".split(":", -1) which 
produce
["", ""] (array with two empty strings, each representing ends of 
`:`) and

[] (empty array) respectively - more consistent IMO.

Should String.splits(`\n|\r\n?`) follow the behaviour of 
String.split(...,

0) or String.split(..., -1)?  I'd personally argue for the latter.


While these look really confusing, but ":".split(":", n) and 
"".split(":", n) are really two
different scenario. One is for a matched delimiter and the other is a 
split with no
matched delimiter, in which the spec specifies clearly that it 
returns the original string,
in this case the empty string "". Arguably these two don't have to be 
"consistent".


Personally I think the returned list/array from string.split(regex, 
-1) might be kinda of
"surprising" for end user, in which it has a "trailing" empty string, 
as it appears to be
useless in most use scenario and you probably have to do some special 
deal with it.


-Sherman





Cheers,
Jonathan

On 13 March 2018 at 23:22, Paul Sandoz  wrote:



On Mar 13, 2018, at 3:49 PM, John Rose  
wrote:


On Mar 13, 2018, at 6:47 AM, Jim Laskey  
wrote:

…
A. Line support.

public Stream  lines()


Suggest factoring this as:

public Stream  splits(String regex) { }

+1

This is a natural companion to the existing array returning method 
(as it
was the case on Pattern when we added splitAsStream), where one can 
use a
limit() operation to achieve the same effect as the limit parameter 
on the

array returning method.



public Stream  lines() { return splits(`\n|\r\n?`); }


See also Files/BufferedReader.lines. (Without going into details
Files.lines has some interesting optimizations.)

Paul.








Re: Raw String Literal Library Support

2018-03-14 Thread John Rose
On Mar 14, 2018, at 6:11 AM, Peter Levart  wrote:
> 
> Pattern.compile(string)
> 
> Now if 'string' above is a constant, '~ string' could be a constant too. 
> Combined with raw string literals, Pattern constants could be very compact.
> 
> 
> What do you think?

There's no need to introduce syntax in order to gain constant folding.

It's enough to ensure that Pattern.compiler(constant) reduces to
the ldc of a dynamic constant.  We are experimenting on such
ideas here:
  http://hg.openjdk.java.net/amber/amber/shortlog/condy-folding 


(This is very vaguely similar to constexpr in C++, but less static.
It's early days, but enough to show that syntax isn't necessary.)

— John

Re: Raw String Literal Library Support

2018-03-14 Thread Peter Levart

I think that:

String delim = ...;
String r = 
s.splits(Pattern.quote(delim)).collect(Collectors.joining(delim));


... should always produce a result such that r.equals(s);


Otherwise, is it wise to add methods that take a regex as a String? It 
is rarely needed for a regex parameter to be dynamic. Usually a constant 
is specified. Are there any plans for Java to support Pattern constants? 
With constant dynamic support they would be trivial to implement in 
bytecode. If there are any such plans, then the methods should perhaps 
take a Pattern instead.


syntax suggestion:

'~' is an unary operator for bit-wise negation of integer values. It 
could be overloaded for String(s) such that the following two were 
equivalent:


~ string
Pattern.compile(string)

Now if 'string' above is a constant, '~ string' could be a constant too. 
Combined with raw string literals, Pattern constants could be very compact.



What do you think?

Regards, Peter

On 03/14/2018 02:35 AM, Xueming Shen wrote:

On 3/13/18, 5:12 PM, Jonathan Bluett-Duncan wrote:

Paul,

AFAICT, one sort of behaviour which String.split() allows which
Pattern.splitAsStream() and the proposed String.splits() don't is 
allowing

a negative limit, e.g. String.split(string, -1).

Over at http://errorprone.info/bugpattern/StringSplitter, they argue 
that a

limit of -1 has less surprising behaviour than the default of 0, because
e.g. "".split(":") produces [] (empty array), whereas ":".split(":")
produces [""] (array with an empty string), which IMO is not consistent.

This compares with ":".split(":", -1) and "".split(":", -1) which 
produce
["", ""] (array with two empty strings, each representing ends of 
`:`) and

[] (empty array) respectively - more consistent IMO.

Should String.splits(`\n|\r\n?`) follow the behaviour of 
String.split(...,

0) or String.split(..., -1)?  I'd personally argue for the latter.


While these look really confusing, but ":".split(":", n) and 
"".split(":", n) are really two
different scenario. One is for a matched delimiter and the other is a 
split with no
matched delimiter, in which the spec specifies clearly that it returns 
the original string,
in this case the empty string "". Arguably these two don't have to be 
"consistent".


Personally I think the returned list/array from string.split(regex, 
-1) might be kinda of
"surprising" for end user, in which it has a "trailing" empty string, 
as it appears to be
useless in most use scenario and you probably have to do some special 
deal with it.


-Sherman





Cheers,
Jonathan

On 13 March 2018 at 23:22, Paul Sandoz  wrote:




On Mar 13, 2018, at 3:49 PM, John Rose  wrote:

On Mar 13, 2018, at 6:47 AM, Jim Laskey  
wrote:

…
A. Line support.

public Stream  lines()


Suggest factoring this as:

public Stream  splits(String regex) { }

+1

This is a natural companion to the existing array returning method 
(as it
was the case on Pattern when we added splitAsStream), where one can 
use a
limit() operation to achieve the same effect as the limit parameter 
on the

array returning method.



public Stream  lines() { return splits(`\n|\r\n?`); }


See also Files/BufferedReader.lines. (Without going into details
Files.lines has some interesting optimizations.)

Paul.






Re: Raw String Literal Library Support

2018-03-13 Thread Xueming Shen

On 3/13/18, 5:12 PM, Jonathan Bluett-Duncan wrote:

Paul,

AFAICT, one sort of behaviour which String.split() allows which
Pattern.splitAsStream() and the proposed String.splits() don't is allowing
a negative limit, e.g. String.split(string, -1).

Over at http://errorprone.info/bugpattern/StringSplitter, they argue that a
limit of -1 has less surprising behaviour than the default of 0, because
e.g. "".split(":") produces [] (empty array), whereas ":".split(":")
produces [""] (array with an empty string), which IMO is not consistent.

This compares with ":".split(":", -1) and "".split(":", -1) which produce
["", ""] (array with two empty strings, each representing ends of `:`) and
[] (empty array) respectively - more consistent IMO.

Should String.splits(`\n|\r\n?`) follow the behaviour of String.split(...,
0) or String.split(..., -1)?  I'd personally argue for the latter.


While these look really confusing, but ":".split(":", n) and 
"".split(":", n) are really two
different scenario. One is for a matched delimiter and the other is a 
split with no
matched delimiter, in which the spec specifies clearly that it returns 
the original string,
in this case the empty string "". Arguably these two don't have to be 
"consistent".


Personally I think the returned list/array from string.split(regex, -1) 
might be kinda of
"surprising" for end user, in which it has a "trailing" empty string, as 
it appears to be
useless in most use scenario and you probably have to do some special 
deal with it.


-Sherman





Cheers,
Jonathan

On 13 March 2018 at 23:22, Paul Sandoz  wrote:




On Mar 13, 2018, at 3:49 PM, John Rose  wrote:

On Mar 13, 2018, at 6:47 AM, Jim Laskey  wrote:

…
A. Line support.

public Stream  lines()


Suggest factoring this as:

public Stream  splits(String regex) { }

+1

This is a natural companion to the existing array returning method (as it
was the case on Pattern when we added splitAsStream), where one can use a
limit() operation to achieve the same effect as the limit parameter on the
array returning method.



public Stream  lines() { return splits(`\n|\r\n?`); }


See also Files/BufferedReader.lines. (Without going into details
Files.lines has some interesting optimizations.)

Paul.




Re: Raw String Literal Library Support

2018-03-13 Thread Jonathan Bluett-Duncan
Sorry, I should really run things in an IDE before posting code examples
and results!

For examples ":".split(":", 0) and "".split(":", 0), they actually produce
[] and [""] respectively (which I still argue is inconsistent and undesired
for the proposed String.splits()).

For examples ":".split(":", -1) and "".split(":", -1), they actually
produce ["", ""] and [""] respectively, which I like better.

Cheers,
Jonathan

On 14 March 2018 at 00:12, Jonathan Bluett-Duncan 
wrote:

> Paul,
>
> AFAICT, one sort of behaviour which String.split() allows which
> Pattern.splitAsStream() and the proposed String.splits() don't is allowing
> a negative limit, e.g. String.split(string, -1).
>
> Over at http://errorprone.info/bugpattern/StringSplitter, they argue that
> a limit of -1 has less surprising behaviour than the default of 0, because
> e.g. "".split(":") produces [] (empty array), whereas ":".split(":")
> produces [""] (array with an empty string), which IMO is not consistent.
>
> This compares with ":".split(":", -1) and "".split(":", -1) which produce
> ["", ""] (array with two empty strings, each representing ends of `:`) and
> [] (empty array) respectively - more consistent IMO.
>
> Should String.splits(`\n|\r\n?`) follow the behaviour of String.split(...,
> 0) or String.split(..., -1)?  I'd personally argue for the latter.
>
> Cheers,
> Jonathan
>
> On 13 March 2018 at 23:22, Paul Sandoz  wrote:
>
>>
>>
>> > On Mar 13, 2018, at 3:49 PM, John Rose  wrote:
>> >
>> > On Mar 13, 2018, at 6:47 AM, Jim Laskey 
>> wrote:
>> >>
>> >> …
>> >> A. Line support.
>> >>
>> >> public Stream lines()
>> >>
>> >
>> > Suggest factoring this as:
>> >
>> > public Stream splits(String regex) { }
>>
>> +1
>>
>> This is a natural companion to the existing array returning method (as it
>> was the case on Pattern when we added splitAsStream), where one can use a
>> limit() operation to achieve the same effect as the limit parameter on the
>> array returning method.
>>
>>
>> > public Stream lines() { return splits(`\n|\r\n?`); }
>> >
>>
>> See also Files/BufferedReader.lines. (Without going into details
>> Files.lines has some interesting optimizations.)
>>
>> Paul.
>
>
>


Re: Raw String Literal Library Support

2018-03-13 Thread Jonathan Bluett-Duncan
Paul,

AFAICT, one sort of behaviour which String.split() allows which
Pattern.splitAsStream() and the proposed String.splits() don't is allowing
a negative limit, e.g. String.split(string, -1).

Over at http://errorprone.info/bugpattern/StringSplitter, they argue that a
limit of -1 has less surprising behaviour than the default of 0, because
e.g. "".split(":") produces [] (empty array), whereas ":".split(":")
produces [""] (array with an empty string), which IMO is not consistent.

This compares with ":".split(":", -1) and "".split(":", -1) which produce
["", ""] (array with two empty strings, each representing ends of `:`) and
[] (empty array) respectively - more consistent IMO.

Should String.splits(`\n|\r\n?`) follow the behaviour of String.split(...,
0) or String.split(..., -1)?  I'd personally argue for the latter.

Cheers,
Jonathan

On 13 March 2018 at 23:22, Paul Sandoz  wrote:

>
>
> > On Mar 13, 2018, at 3:49 PM, John Rose  wrote:
> >
> > On Mar 13, 2018, at 6:47 AM, Jim Laskey  wrote:
> >>
> >> …
> >> A. Line support.
> >>
> >> public Stream lines()
> >>
> >
> > Suggest factoring this as:
> >
> > public Stream splits(String regex) { }
>
> +1
>
> This is a natural companion to the existing array returning method (as it
> was the case on Pattern when we added splitAsStream), where one can use a
> limit() operation to achieve the same effect as the limit parameter on the
> array returning method.
>
>
> > public Stream lines() { return splits(`\n|\r\n?`); }
> >
>
> See also Files/BufferedReader.lines. (Without going into details
> Files.lines has some interesting optimizations.)
>
> Paul.


Re: Raw String Literal Library Support

2018-03-13 Thread Paul Sandoz


> On Mar 13, 2018, at 3:49 PM, John Rose  wrote:
> 
> On Mar 13, 2018, at 6:47 AM, Jim Laskey  wrote:
>> 
>> …
>> A. Line support.
>> 
>> public Stream lines()
>> 
> 
> Suggest factoring this as:
> 
> public Stream splits(String regex) { }

+1

This is a natural companion to the existing array returning method (as it was 
the case on Pattern when we added splitAsStream), where one can use a limit() 
operation to achieve the same effect as the limit parameter on the array 
returning method.


> public Stream lines() { return splits(`\n|\r\n?`); }
> 

See also Files/BufferedReader.lines. (Without going into details Files.lines 
has some interesting optimizations.)

Paul.

Re: Raw String Literal Library Support

2018-03-13 Thread John Rose
On Mar 13, 2018, at 6:47 AM, Jim Laskey  wrote:
> 
> …
> A. Line support.
> 
> public Stream lines()
> 

Suggest factoring this as:

 public Stream splits(String regex) { }
 public Stream lines() { return splits(`\n|\r\n?`); }

The reason is that "splits" is useful with several other patterns.
For raw strings, splits(`\n`) is a more efficient way to get the same
result (because they normalize CR NL? to NL).  There's also a
nifty unicode-oriented pattern splits(`\R`) which matches a larger
set of line terminations.  And of course splits(":") or splits(`\s`) will
be old friends.  A new friend might be paragraph splitting splits(`\n\n`).

Splitting is old, as Remi points out, but new thing is supplying the
stream-style fluent notation starting from a (potentially) large string
constant.

> B. Additions to basic trim methods. In addition to margin methods trimIndent 
> and trimMarkers described below in Section C, it would be worth introducing 
> trimLeft and trimRight to augment the longstanding trim method. A key 
> question is how trimLeft and trimRight should detect whitespace, because 
> different definitions of whitespace exist in the library. 
> ...
> That sets up several kinds of whitespace; trim's whitespace (TWS), Character 
> whitespace (CWS) and the union of the two (UWS). TWS is a fast test. CWS is a 
> slow test. UWS is fast for Latin1 and slow-ish for UTF-16. 

For the record, even though we are not talking performance much,
CWS is not significantly slower than UWS.  You can use a 64-bit int
constant for a bitmask and check for an arbitrary subset of the first
64 ASCII code points in one or two machine instructions.

> We are recommending that trimLeft and trimRight use UWS, leave trim alone to 
> avoid breaking the world and then possibly introduce trimWhitespace that uses 
> UWS.

Putting aside the performance question, I have to ask if compatibility
with TWS is at all important.  (Don't know the answer, suspect not.)
> …
> C. Margin management. With introduction of multi-line Raw String Literals, 
> developers will have to deal with the extraneous spacing introduced by 
> indenting and formatting string bodies. 
> 
> Note that for all the methods in this group, if the first line is empty then 
> it is removed and if the last is empty then it is removed. This removal 
> provides a means for developers that use delimiters on separate lines to 
> bracket string bodies. Also note, that all line separators are replaced with 
> \n.

(As a bonus, margin management gives a story for escaping leading and trailing
backticks.  If your string is a single line, surround it with pipe characters 
`|asdf|`.
If your string is multiple lines, surround it with blank lines easy to do.  
Either
pipes or newlines will protect backticks from merging into quotes.)

There's a sort of beauty contest going on here between indents and
markers.  I often prefer markers, but I see how indents will often win
the contest.  I'll pre-emptively disagree with anyone who observes
that we only need one of the two.

> public String trimMarkers(String leftMarker, String rightMarker)

I like this function and anticipate using it.  (I use similar things in
shell script here-files.)  Thanks for including end-of-line markers
in the mix.  This allows lines with significant *trailing* whitespace
to protect that whitespace as well as *leading* whitespace.

Suggestion:  Give users a gentle nudge toward the pipe character by
making it a default argument so trimMarkers() => trimMarkers("|","|").

Suggestion:  Allow the markers to be regular expressions.
(So `\|` would be the default.)

> 
> D. Escape management. Since Raw String Literals do not interpret Unicode 
> escapes (\u) or escape sequences (\n, \b, etc), we need to provide a 
> scheme for developers who just want multi-line strings but still have escape 
> sequences interpreted.

This all looks good.

Thanks,

— John

Re: Raw String Literal Library Support

2018-03-13 Thread John Rose
On Mar 13, 2018, at 11:30 AM, Volker Simonis  wrote:
> 
> Would it make sense to have a versions of "lines(LINE_TERM lt)" which
> take a single, concrete form of line terminator?

(Regular expressions for the win!)

Re: Raw String Literal Library Support

2018-03-13 Thread John Rose
On Mar 13, 2018, at 11:42 AM, Remi Forax  wrote:
> 
> it already exists :)
>  Stream stream = Pattern.compile("\n|\r\n|\r").splitAsStream(string);

You want ` instead of " there!

Somebody added support recently for `\R`, which is a more unicode-flavored
version of your pattern (or just `\n`).  Last time I looked it was missing; 
kudos
to whoever added it in.

There should be a fluent streamy syntax for splitting a string, 
string.splits(pat).
Java's incomplete embrace of fluent syntax is old news, *but* there is something
new here:  String expression size.

The raw strings are much larger than classic strings, and so they seem to need 
some
notational assistance that doesn't always require them to be enclosed in round 
parens
and mixed with other arguments.  Having more fluent methods on String seems like
a good move here.

This goes beyond raw strings, and parsing is hard, but maybe there's room for
richer versions of String.lines or String.splits, which can deliver both the 
surrounding
whitespace and one or more fields, for each line (or each paragraph or 
whatever):

public Stream matchResults(String regex) {
  return Pattern.compile(regex).matcher(this).results();
}

The point is that a MatchResult delivers both the whole substring and any groups
embedded in it as part of the match.  Plus indexes, which is nice sometimes.

— John



Re: Raw String Literal Library Support

2018-03-13 Thread Mark Derricutt
On 14 Mar 2018, at 7:42, Remi Forax wrote:

> it already exists :)
>   Stream stream = Pattern.compile("\n|\r\n|\r").splitAsStream(string);
>
> Rémi

One that worked, or had support for grey-space would be a nice addition when 
working with RSL tho, unless grey-space is automatically handled at the 
compiler level ( don't believe I saw that mentioned anywhere ).

Mark



---
"The ease with which a change can be implemented has no relevance at all to 
whether it is the right change for the (Java) Platform for all time."  
Mark Reinhold.

Mark Derricutt
http://www.theoryinpractice.net
http://www.chaliceofblood.net
http://plus.google.com/+MarkDerricutt
http://twitter.com/talios
http://facebook.com/mderricutt


Re: Raw String Literal Library Support

2018-03-13 Thread Remi Forax
Hi Jim,

- Mail original -
> De: "Jim Laskey" <james.las...@oracle.com>
> À: "core-libs-dev" <core-libs-dev@openjdk.java.net>
> Envoyé: Mardi 13 Mars 2018 14:47:29
> Objet: Raw String Literal Library Support

> With the announcement of JEP 326 Raw String Literals, we would like to open 
> up a
> discussion with regards to RSL library support. Below are several implemented
> String methods that are believed to be appropriate. Please comment on those
> mentioned below including recommending alternate names or signatures.
> Additional methods can be considered if warranted, but as always, the bar for
> inclusion in String is high.
> 
> You should keep a couple things in mind when reviewing these methods.
> 
> Methods should be applicable to all strings, not just Raw String Literals.
> 
> The number of additional methods should be minimized, not adding every 
> possible
> method.
> 
> Don't put any emphasis on performance. That is a separate discussion.
> 
> Cheers,
> 
> -- Jim
> 
> A. Line support.
> 
> public Stream lines()
> Returns a stream of substrings extracted from this string partitioned by line
> terminators. Internally, the stream is implemented using a Spliteratorthat
> extracts one line at a time. The line terminators recognized are \n, \r\n and
> \r. This method provides versatility for the developer working with multi-line
> strings.

it already exists :)
  Stream stream = Pattern.compile("\n|\r\n|\r").splitAsStream(string);

Rémi


Re: Raw String Literal Library Support

2018-03-13 Thread Volker Simonis
On Tue, Mar 13, 2018 at 2:47 PM, Jim Laskey  wrote:
> With the announcement of JEP 326 Raw String Literals, we would like to open 
> up a discussion with regards to RSL library support. Below are several 
> implemented String methods that are believed to be appropriate. Please 
> comment on those mentioned below including recommending alternate names or 
> signatures. Additional methods can be considered if warranted, but as always, 
> the bar for inclusion in String is high.
>
> You should keep a couple things in mind when reviewing these methods.
>
> Methods should be applicable to all strings, not just Raw String Literals.
>
> The number of additional methods should be minimized, not adding every 
> possible method.
>
> Don't put any emphasis on performance. That is a separate discussion.
>
> Cheers,
>
> -- Jim
>
> A. Line support.
>
> public Stream lines()
> Returns a stream of substrings extracted from this string partitioned by line 
> terminators. Internally, the stream is implemented using a Spliteratorthat 
> extracts one line at a time. The line terminators recognized are \n, \r\n and 
> \r. This method provides versatility for the developer working with 
> multi-line strings.

So "lines()" will support any mix of  "\n", "\r\n" and "\r" inside a
single string as line terminator?

Will "\n", "\r\n" and "\r" be parsed from left to right with one
character look-ahead? I.e.
\n = 1 newline
\n\r = 2 newlines (i.e. an empty line)
\n\r\n = 2 newlines (i.e. an empty line) because "\r\n" counts as a
single new line
\n\r\n\r = 3 newlines (i.e. two empty lines)

Would it make sense to have a versions of "lines(LINE_TERM lt)" which
take a single, concrete form of line terminator?

>  Example:
>
> String string = "abc\ndef\nghi";
> Stream stream = string.lines();
> List list = stream.collect(Collectors.toList());
>
>  Result:
>
>  [abc, def, ghi]
>
>
>  Example:
>
> String string = "abc\ndef\nghi";
> String[] array = string.lines().toArray(String[]::new);
>
>  Result:
>
>  [Ljava.lang.String;@33e5ccce // [abc, def, ghi]
>
>
>  Example:
>
> String string = "abc\ndef\r\nghi\rjkl";
> String platformString =
> string.lines().collect(joining(System.lineSeparator()));
>
>  Result:
>
>  abc
>  def
>  ghi
>  jkl
>
>
>  Example:
>
> String string = " abc  \n   def  \n ghi   ";
> String trimmedString =
>  string.lines().map(s -> s.trim()).collect(joining("\n"));
>
>  Result:
>
>  abc
>  def
>  ghi
>
>
>  Example:
>
> String table = `First Name  SurnamePhone
> Al  Albert 555-
> Bob Roberts555-
> Cal Calvin 555-
>`;
>
> // Extract headers
> String firstLine = table.lines().findFirst().orElse("");
> List headings = List.of(firstLine.trim().split(`\s{2,}`));
>
> // Build stream of maps
> Stream> stream =
> table.lines().skip(1)
>  .map(line -> line.trim())
>  .filter(line -> !line.isEmpty())
>  .map(line -> line.split(`\s{2,}`))
>  .map(columns -> {
>  List values = List.of(columns);
>  return IntStream.range(0, headings.size()).boxed()
>  .collect(toMap(headings::get, 
> values::get));
>  });
>
> // print all "First Name"
> stream.map(row -> row.get("First Name"))
>   .forEach(name -> System.out.println(name));
>
>  Result:
>
>  Al
>  Bob
>  Cal
> B. Additions to basic trim methods. In addition to margin methods trimIndent 
> and trimMarkers described below in Section C, it would be worth introducing 
> trimLeft and trimRight to augment the longstanding trim method. A key 
> question is how trimLeft and trimRight should detect whitespace, because 
> different definitions of whitespace exist in the library.
>
> trim itself uses the simple test less than or equal to the space character, a 
> fast test but not Unicode friendly.
>
> Character.isWhitespace(codepoint) returns true if codepoint one of the 
> following;
>
>SPACE_SEPARATOR.
>LINE_SEPARATOR.
>PARAGRAPH_SEPARATOR.
>'\t', U+0009 HORIZONTAL TABULATION.
>'\n', U+000A LINE FEED.
>'\u000B', U+000B VERTICAL TABULATION.
>'\f', U+000C FORM FEED.
>'\r', U+000D CARRIAGE RETURN.
>'\u001C', U+001C FILE SEPARATOR.
>'\u001D', U+001D GROUP SEPARATOR.
>'\u001E', U+001E RECORD SEPARATOR.
>'\u001F', U+001F UNIT SEPARATOR.
>' ',  U+0020 SPACE.
> (Note: that non-breaking space (\u00A0) is excluded)
>
> Character.isSpaceChar(codepoint) returns true if codepoint one of the 
> 

Raw String Literal Library Support

2018-03-13 Thread Jim Laskey
With the announcement of JEP 326 Raw String Literals, we would like to open up 
a discussion with regards to RSL library support. Below are several implemented 
String methods that are believed to be appropriate. Please comment on those 
mentioned below including recommending alternate names or signatures. 
Additional methods can be considered if warranted, but as always, the bar for 
inclusion in String is high.

You should keep a couple things in mind when reviewing these methods.

Methods should be applicable to all strings, not just Raw String Literals.

The number of additional methods should be minimized, not adding every possible 
method.

Don't put any emphasis on performance. That is a separate discussion.

Cheers,

-- Jim

A. Line support.

public Stream lines()
Returns a stream of substrings extracted from this string partitioned by line 
terminators. Internally, the stream is implemented using a Spliteratorthat 
extracts one line at a time. The line terminators recognized are \n, \r\n and 
\r. This method provides versatility for the developer working with multi-line 
strings.
 Example:

String string = "abc\ndef\nghi";
Stream stream = string.lines();
List list = stream.collect(Collectors.toList());

 Result:

 [abc, def, ghi]


 Example:

String string = "abc\ndef\nghi";
String[] array = string.lines().toArray(String[]::new);

 Result:

 [Ljava.lang.String;@33e5ccce // [abc, def, ghi]


 Example:

String string = "abc\ndef\r\nghi\rjkl";
String platformString =
string.lines().collect(joining(System.lineSeparator()));

 Result:

 abc
 def
 ghi
 jkl


 Example:

String string = " abc  \n   def  \n ghi   ";
String trimmedString =
 string.lines().map(s -> s.trim()).collect(joining("\n"));

 Result:

 abc
 def
 ghi


 Example:

String table = `First Name  SurnamePhone
Al  Albert 555-
Bob Roberts555-
Cal Calvin 555-
   `;

// Extract headers
String firstLine = table.lines().findFirst​().orElse("");
List headings = List.of(firstLine.trim().split(`\s{2,}`));

// Build stream of maps
Stream> stream =
table.lines().skip(1)
 .map(line -> line.trim())
 .filter(line -> !line.isEmpty())
 .map(line -> line.split(`\s{2,}`))
 .map(columns -> {
 List values = List.of(columns);
 return IntStream.range(0, headings.size()).boxed()
 .collect(toMap(headings::get, 
values::get));
 });

// print all "First Name"
stream.map(row -> row.get("First Name"))
  .forEach(name -> System.out.println(name));

 Result:

 Al
 Bob
 Cal
B. Additions to basic trim methods. In addition to margin methods trimIndent 
and trimMarkers described below in Section C, it would be worth introducing 
trimLeft and trimRight to augment the longstanding trim method. A key question 
is how trimLeft and trimRight should detect whitespace, because different 
definitions of whitespace exist in the library. 

trim itself uses the simple test less than or equal to the space character, a 
fast test but not Unicode friendly. 

Character.isWhitespace(codepoint) returns true if codepoint one of the 
following;

   SPACE_SEPARATOR.
   LINE_SEPARATOR.
   PARAGRAPH_SEPARATOR.
   '\t', U+0009 HORIZONTAL TABULATION.
   '\n', U+000A LINE FEED.
   '\u000B', U+000B VERTICAL TABULATION.
   '\f', U+000C FORM FEED.
   '\r', U+000D CARRIAGE RETURN.
   '\u001C', U+001C FILE SEPARATOR.
   '\u001D', U+001D GROUP SEPARATOR.
   '\u001E', U+001E RECORD SEPARATOR.
   '\u001F', U+001F UNIT SEPARATOR.
   ' ',  U+0020 SPACE.
(Note: that non-breaking space (\u00A0) is excluded) 

Character.isSpaceChar(codepoint) returns true if codepoint one of the following;

   SPACE_SEPARATOR.
   LINE_SEPARATOR.
   PARAGRAPH_SEPARATOR.
   ' ',  U+0020 SPACE.
   '\u00A0', U+00A0 NON-BREAKING SPACE.
That sets up several kinds of whitespace; trim's whitespace (TWS), Character 
whitespace (CWS) and the union of the two (UWS). TWS is a fast test. CWS is a 
slow test. UWS is fast for Latin1 and slow-ish for UTF-16. 

We are recommending that trimLeft and trimRight use UWS, leave trim alone to 
avoid breaking the world and then possibly introduce trimWhitespace that uses 
UWS.

public String trim() 
Removes characters less than equal to space from the beginning and end of the 
string. No, change except spec clarification and links to the new trim methods.
Examples:
"".trim();  // ""
"   ".trim();   // ""
"  abc