Re: String reboot - (1a) incidental whitespace

2019-04-22 Thread Guy Steele
Good points.  Like I said, I think we are in agreement!  —Guy

> On Apr 22, 2019, at 8:00 PM, Alex Buckley  wrote:
> 
> On 4/22/2019 12:16 PM, Guy Steele wrote:
>>> On Apr 22, 2019, at 3:04 PM, Alex Buckley 
>>> wrote:
>>> 
>>> Nope, I don't think multi-line string literals are an attractive
>>> nuisance in any way. We should NOT deem it incorrect to refactor a
>>> sequence of concatenations into a single multi-line string
>>> literal.
>> 
>> I didn’t say (or mean to imply that).  I think it’s a great thing to
>> refactor concatenations into a single multi-line string literal WHEN
>> IT IS DONE CORRECTLY.
>> 
>> However, if you blindly pull out the concatenations and thereby
>> introduce newlines into the string when they were not there before
>> and doing so violates some contract downstream, THAT IS AN INCORRECT
>> TRANSFORMATION.
> 
> Literally, yes, it's an incorrect transformation for the caller to perform if 
> it violates the contract offered by the callee.
> 
>> We certainly agree that it would be a good thing if everything that
>> might be downstream were in fact reasonably tolerant of newlines.
> 
> Yes.
> 
>> BUT IF YOU DON’T KNOW FOR SURE THAT WHAT IS DOWNSTREAM IS TOLERANT OF
>> NEWLINES, AND YOU BLINDLY TRANSFORM A STRING CONCATENATION INTO A
>> MULTI-LINE STRING LITERAL THAT INCLUDES NEWLINES WHERE THERE WERE
>> NONE BEFORE, THAT IS A BAD THING.
> 
> If the callee's contract says "No newlines in the string argument to 
> Customer::setName", then the caller would be doing a bad thing.
> 
> But the reason this topic is interesting(ish) is because we're dealing with 
> something that the vast majority of callees never thought to specify.
> 
> (Well, maybe not "never". I browsed the Java SE API Specification to find a 
> method that takes a String, and randomly clicked on something in JNDI -- 
> https://docs.oracle.com/en/java/javase/12/docs/api/java.naming/javax/naming/Name.html#add(java.lang.String)
>  -- which happens to be strict about the string passed to it, so perhaps 
> someone is about to get an InvalidNameException when they try to lay out a 
> long LDAP query string over multiple lines.)
> 
>> And a feature that makes it too easy to accidentally do a bad thing
>> _might_ be considered an attractive nuisance, AS OPPOSED TO MY
>> SCREAMING ALL-CAPS, WHICH ARE A REPULSIVE NUISANCE.
> 
> I can get 90% of the way to saying "OK, multi-line string literals _might_ be 
> considered an attractive nuisance", but I can't get 100% of the way there 
> because it's such a callee-centric view to take when the purpose of the 
> feature is to simplify the life of the caller. If you crack open the door to 
> give callees a hearing, you'll get requests to statically reject multi-line 
> string literals (such as via a java.* annotation that programmatically 
> indicates "not multi-line safe", or a java.lang.MultilineString type that's a 
> sibling of String) and we don't want to go anywhere near there.
> 
> (I recall a library that took Runnable or somesuch, and fell over when the 
> argument was a lambda expression; the library expected an anonymous inner 
> class instance in order to do some peculiar introspection, which failed on 
> the opaque object reifying a lambda expression. The library developer _might_ 
> have considered lambda expressions an attractive nuisance for a few minutes, 
> but who would have sympathy?)
> 
> Alex



Re: String reboot - (1a) incidental whitespace

2019-04-22 Thread Alex Buckley

On 4/22/2019 12:16 PM, Guy Steele wrote:

On Apr 22, 2019, at 3:04 PM, Alex Buckley 
wrote:

Nope, I don't think multi-line string literals are an attractive
nuisance in any way. We should NOT deem it incorrect to refactor a
sequence of concatenations into a single multi-line string
literal.


I didn’t say (or mean to imply that).  I think it’s a great thing to
refactor concatenations into a single multi-line string literal WHEN
IT IS DONE CORRECTLY.

However, if you blindly pull out the concatenations and thereby
introduce newlines into the string when they were not there before
and doing so violates some contract downstream, THAT IS AN INCORRECT
TRANSFORMATION.


Literally, yes, it's an incorrect transformation for the caller to 
perform if it violates the contract offered by the callee.



We certainly agree that it would be a good thing if everything that
might be downstream were in fact reasonably tolerant of newlines.


Yes.


BUT IF YOU DON’T KNOW FOR SURE THAT WHAT IS DOWNSTREAM IS TOLERANT OF
NEWLINES, AND YOU BLINDLY TRANSFORM A STRING CONCATENATION INTO A
MULTI-LINE STRING LITERAL THAT INCLUDES NEWLINES WHERE THERE WERE
NONE BEFORE, THAT IS A BAD THING.


If the callee's contract says "No newlines in the string argument to 
Customer::setName", then the caller would be doing a bad thing.


But the reason this topic is interesting(ish) is because we're dealing 
with something that the vast majority of callees never thought to specify.


(Well, maybe not "never". I browsed the Java SE API Specification to 
find a method that takes a String, and randomly clicked on something in 
JNDI -- 
https://docs.oracle.com/en/java/javase/12/docs/api/java.naming/javax/naming/Name.html#add(java.lang.String) 
-- which happens to be strict about the string passed to it, so perhaps 
someone is about to get an InvalidNameException when they try to lay out 
a long LDAP query string over multiple lines.)



And a feature that makes it too easy to accidentally do a bad thing
_might_ be considered an attractive nuisance, AS OPPOSED TO MY
SCREAMING ALL-CAPS, WHICH ARE A REPULSIVE NUISANCE.


I can get 90% of the way to saying "OK, multi-line string literals 
_might_ be considered an attractive nuisance", but I can't get 100% of 
the way there because it's such a callee-centric view to take when the 
purpose of the feature is to simplify the life of the caller. If you 
crack open the door to give callees a hearing, you'll get requests to 
statically reject multi-line string literals (such as via a java.* 
annotation that programmatically indicates "not multi-line safe", or a 
java.lang.MultilineString type that's a sibling of String) and we don't 
want to go anywhere near there.


(I recall a library that took Runnable or somesuch, and fell over when 
the argument was a lambda expression; the library expected an anonymous 
inner class instance in order to do some peculiar introspection, which 
failed on the opaque object reifying a lambda expression. The library 
developer _might_ have considered lambda expressions an attractive 
nuisance for a few minutes, but who would have sympathy?)


Alex


Re: String reboot - (1a) incidental whitespace

2019-04-22 Thread Guy Steele
I think we actually are in “violent agreement” here, Alex, but just to be sure, 
see comments below.
 
> On Apr 22, 2019, at 3:04 PM, Alex Buckley  wrote:
> 
> Nope, I don't think multi-line string literals are an attractive nuisance in 
> any way. We should NOT deem it incorrect to refactor a sequence of 
> concatenations into a single multi-line string literal.

I didn’t say (or mean to imply that).  I think it’s a great thing to refactor 
concatenations into a single multi-line string literal WHEN IT IS DONE 
CORRECTLY.

However, if you blindly pull out the concatenations and thereby introduce 
newlines into the string when they were not there before and doing so violates 
some contract downstream, THAT IS AN INCORRECT TRANSFORMATION.

We certainly agree that it would be a good thing if everything that might be 
downstream were in fact reasonably tolerant of newlines.

BUT IF YOU DON’T KNOW FOR SURE THAT WHAT IS DOWNSTREAM IS TOLERANT OF NEWLINES, 
AND YOU BLINDLY TRANSFORM A STRING CONCATENATION INTO A MULTI-LINE STRING 
LITERAL THAT INCLUDES NEWLINES WHERE THERE WERE NONE BEFORE, THAT IS A BAD 
THING.

And a feature that makes it too easy to accidentally do a bad thing _might_ be 
considered an attractive nuisance, AS OPPOSED TO MY SCREAMING ALL-CAPS, WHICH 
ARE A REPULSIVE NUISANCE.

:-)

> Developers are chomping at the bit to do it, and if we cast doubt on the 
> ability then we're wasting everyone's time. We should deem it correct, and 
> 99% of the time no-one will care that newline characters exist in the string. 
> The rare library that subtly misbehaves or (and this is the better option) 
> actually blow ups when seeing newlines will feel great pressure to become 
> more liberal in what it accepts, and that is a good thing.

And it would probably also be a good thing to have a way to say that a newline 
in the string literal should not be part of the string content.  C programmers 
are certainly quite used to sticking a backslash in front of a newline to mean 
“not really a newline here”.



Re: String reboot - (1a) incidental whitespace

2019-04-22 Thread Alex Buckley
Nope, I don't think multi-line string literals are an attractive 
nuisance in any way. We should NOT deem it incorrect to refactor a 
sequence of concatenations into a single multi-line string literal. 
Developers are chomping at the bit to do it, and if we cast doubt on the 
ability then we're wasting everyone's time. We should deem it correct, 
and 99% of the time no-one will care that newline characters exist in 
the string. The rare library that subtly misbehaves or (and this is the 
better option) actually blow ups when seeing newlines will feel great 
pressure to become more liberal in what it accepts, and that is a good 
thing.


Alex

On 4/19/2019 7:42 PM, Guy Steele wrote:

So is your point that multiline string literals may be an “attractive nuisance” 
in that they may make it too convenient for inattentive programmers to perform 
_incorrect_ refactoring?



On Apr 19, 2019, at 8:16 PM, Alex Buckley  wrote:


On 4/10/2019 8:22 AM, Jim Laskey wrote:
Line terminators:  When strings span lines, they do so using the line
terminators present in the source file, which may vary depending on what
operating system the file was authored.  Should this be an aspect of
multi-line-ness, or should we normalize these to a standard line
terminator?  It seems a little weird to treat string literals quite so
literally; the choice of line terminator is surely an incidental one.  I
think we're all comfortable saying "these should be normalized", but its
worth bringing this up because it is merely one way in which incidental
artifacts of how the string is embedded in the source program force us
to interpret what the user meant.


No-one has commented on this, but it's important because some libraries are 
going to be surprised by the presence of line terminators, of any kind, in 
strings denoted by multi-line string literals.

To be clear, I agree with normalizing line terminators. And, I understand that 
any string could have contained line terminators thanks to escape sequences in 
traditional string literals. But, it was not common to see a \n except where 
multi-line-ness was expected or harmless. Going forward, who can guarantee that 
refactoring the argument of `prepareStatement` from a sequence of 
concatenations:

  try (PreparedStatement s = connection.prepareStatement(
  "SELECT * "
+ "FROM my_table "
+ "WHERE a = b "
  )) {
  ...
  }

to a multi-line string literal:

  try (PreparedStatement s = connection.prepareStatement(
  """SELECT *
 FROM my_table
 WHERE a = b"""
  )) {
  ...
  }

is behaviorally compatible for `prepareStatement`? It had no reason to expect 
\n in its string argument before.

(Hat tip: 
https://blog.jooq.org/2015/12/29/please-java-do-finally-support-multiline-strings/)

Maybe `prepareStatement` will work fine. But someone somewhere is going to take a program 
with a sequence of 2000 concatenations and turn them into a huge multi-line string 
literal, and the inserted line terminators are going to cause memory pressure, and GC is 
going to take a little longer, and eventually this bug will be filed: "My system 
runs 5% slower because the source code changed a teeny tiny bit."

In reality, a few libraries will need fixing, and that will happen quickly 
because developers are very keen to use multi-line string literals. But it's 
fair to point out that while everyone is worrying about whitespace on the left 
of the literal, the line terminators to the right are a novel artifact too.

Alex




A library for implementing equals and hashCode

2019-04-22 Thread Liam Miller-Cushon
Please consider this proposal for a library to help implement equals and
hashCode.

The doc includes a discussion of the motivation for adding such an API to
the JDK, a map of the design space, and some thoughts on the subset of that
space which might be most interesting:

http://cr.openjdk.java.net/~cushon/amber/equivalence.html


Re: Alignment algorithm (was: Wrapping up the first two courses)

2019-04-22 Thread Brian Goetz
> 
> Why didn't they write as single quote string in the first place?  Having a 
> """ removing incidentals works in our favour for examples like;
> 
>   in quotes vs """  "in quotes"  “""

I don’t think we know (or much care) they did or didn’t.  My point is, if our 
justification for stripping has to do with the 2D embedding of a ML string in 
the source code — which I think is where we are -- there is no 2D embedding 
here.  So stripping should have nothing to say about this case.






Re: Alignment algorithm (was: Wrapping up the first two courses)

2019-04-22 Thread Jim Laskey


> On Apr 22, 2019, at 1:04 PM, Brian Goetz  wrote:
> 
> 
> 
> On 4/22/2019 11:26 AM, Jim Laskey wrote:
>> 
>> 2. long count = lines().count();
>>if (count == 1) {
>>return strip();
>>}
>> 
>> Single line strings (no line terminators) are simply stripped.
>> 
>> """  single line  """ ==> "single line"
> 
> I think we should reconsider this one.  The interpretation we settled on is: 
> we're willing to treat a multi-line string as being a sequence of lines, not 
> just of characters, and we're willing to strip incidental whitespace that 
> arises from accidents of how the string is embedded in the program.  But a 
> single-line string doesn't have any of that; I think it should be left alone, 
> regardless of quotes.  

Why didn't they write as single quote string in the first place?  Having a """ 
removing incidentals works in our favour for examples like;

in quotes vs """  "in quotes"  """

> 
>>
>> 3.  int outdent = lines().skip(1) ...
>> 
>> Ignoring first line, determine least number of leading whitespaces for all
>> non-blank lines.
>> 
>> String s = """
>> line 1..
>> line 2.
>> """;  ==> 16
> 
> I think we should reconsider whether a non-blank first line means that we 
> should consider any indentation on the first line too.  This has the 
> likely-beneficial side-effect that having a non-blank character immediately 
> following the """ effectively means "no stripping." 

Not sure this is a workable perspective. Opting out this way forces the user 
into some weird configurations that they have to unmangle to get a useful 
result.

String s = """opt-out
  line 2
   """;

Result:
opt-out
.line 2
..




Alignment algorithm (was: Wrapping up the first two courses)

2019-04-22 Thread Brian Goetz



On 4/22/2019 11:26 AM, Jim Laskey wrote:
Current "strip incidentals" algorithm as captured in String::align 
(string-tapas branch)


    public String align(int n) {
        if (isEmpty()) {
return "";
        }
        long count = lines().count();
        if (count == 1) {
return strip();
        }
        int outdent = lines().skip(1)
             .filter(not(String::isBlank))
             .mapToInt(String::indexOfNonWhitespace)
             .min()
             .orElse(0);
        String last = lines().skip(count - 1).findFirst().orElse("");
        boolean lastIsBlank = last.isBlank();
        if (lastIsBlank) {
outdent = Integer.min(outdent, last.length());
        }
        return indentStream(lines(1, 1), n - outdent).map(s -> 
s.stripTrailing())

 .collect(Collectors.joining("\n", "", lastIsBlank ? "\n" : ""));
    }


2. long count = lines().count();
   if (count == 1) {
       return strip();
   }

Single line strings (no line terminators) are simply stripped.

    """  single line  """ ==> "single line"


I think we should reconsider this one.  The interpretation we settled on 
is: we're willing to treat a multi-line string as being a sequence of 
lines, not just of characters, and we're willing to strip incidental 
whitespace that arises from accidents of how the string is embedded in 
the program.  But a single-line string doesn't have any of that; I think 
it should be left alone, regardless of quotes.




3.  int outdent = lines().skip(1) ...

Ignoring first line, determine least number of leading whitespaces for all
non-blank lines.

    String s = """
line 1..
line 2.
""";  ==> 16


I think we should reconsider whether a non-blank first line means that 
we should consider any indentation on the first line too.  This has the 
likely-beneficial side-effect that having a non-blank character 
immediately following the """ effectively means "no stripping."


Considering the indentation of the _last_ blank line gives the user more 
control while not requiring the user to distort indentation for common 
cases.  So +1 to "CDI".




Options;

2. a) Single line strings would be just stripLeading, but should 
beconsistent

with multi-line and stripTrailing.

    """  single line  """ ==> "single line  "
    b) We could do nothing for single line.

    """  single line  """ ==> "  single line  "


I vote (b).

5. a) If we omit close delimiter influence, only the content 
influences the

indentation.  Loss of control by the user.


I think CDI is fine.

6. a) Could strip all leading/trailing blank lines, but awkward to 
recover the

LOI. Not recommending.


Agreed.

9. a) Always add a last \n. Loss of control by the user.


The current behavior pairs nicely with CDI.



Re: Wrapping up the first two courses

2019-04-22 Thread Jim Laskey
Current "strip incidentals" algorithm as captured in String::align 
(string-tapas branch)

public String align(int n) {
if (isEmpty()) {
return "";
}
long count = lines().count();
if (count == 1) {
return strip();
}
int outdent = lines().skip(1)
 .filter(not(String::isBlank))
 .mapToInt(String::indexOfNonWhitespace)
 .min()
 .orElse(0);
String last = lines().skip(count - 1).findFirst().orElse("");
boolean lastIsBlank = last.isBlank();
if (lastIsBlank) {
outdent = Integer.min(outdent, last.length());
}
return indentStream(lines(1, 1), n - outdent).map(s -> 
s.stripTrailing())
 
.collect(Collectors.joining("\n", "", lastIsBlank ? "\n" : ""));
}


1. if (isEmpty()) {
   return "";
   }
   
Empty strings returned as empty.

"" ==> ""

2. long count = lines().count();
   if (count == 1) {
   return strip();
   }

Single line strings (no line terminators) are simply stripped.

"""  single line  """ ==> "single line"

3.  int outdent = lines().skip(1) ...

Ignoring first line, determine least number of leading whitespaces for all
non-blank lines.

String s = """
line 1..
line 2.
""";  ==> 16

4. boolean lastIsBlank = last.isBlank();

Detect if last line is blank.

5. if (lastIsBlank) {
   outdent = Integer.min(outdent, last.length());
   }

If last line is blank, then check if it has least number of leading whitespaces.

String s = """
line 1
line 2
""";  ==> 12

* Breaking down the return statement

6. Stream stream1 = lines(1, 1);

Break string into a stream of lines, stripping line terminators, stripping
first line if blank and stripping last line if blank.

line 1..
line 2.

7. Stream stream2 = indentStream(stream1, n - outdent);

Remove indentation from each line in stream. It's possible that whitespace
gets added if n is larger than outdent.

line 1..
line 2.

8. Stream stream3 = stream2.map(s -> s.stripTrailing());

Remove incidental trailing whitespace.

line 1
line 2

9. return stream3.collect(Collectors.joining("\n", "", lastIsBlank ? "\n" : 
""));

Join lines with \n and add a \n at the end if the last line was blank.

line 1\n
line 2\n


Options;

2. a) Single line strings would be just stripLeading, but should beconsistent
with multi-line and stripTrailing.

"""  single line  """ ==> "single line  "

b) We could do nothing for single line.

"""  single line  """ ==> "  single line  "

3. a) If we include open delimiter influence, the equivalent library method
can not duplicate the influence unless the user supplied the first line 
indentation.

5. a) If we omit close delimiter influence, only the content influences the
indentation.  Loss of control by the user.

String s = """
line 1
line 2
   """;

==> 

line 1\n
line 2\n

6. a) Could strip all leading/trailing blank lines, but awkward to recover the
LOI. Not recommending.

String s = """


line 1
line 2


""";

==> 

 String s = "\n".repeat(3) + """
line 1
line 2
""" + "\n".repeat(3);
   
8. a) Not stripping trailing space might leave debris the user didn't expect,
but still a choice.

9. a) Always add a last \n. Loss of control by the user.








> On Apr 22, 2019, at 10:15 AM, Brian Goetz  wrote:
> 
>> The main thing Brian is waiting for, though, is not lots of new ideas,
>> but rather a consensus that (a) we can treat leading whitespace outside
>> of a given rectangle as syntax-not-payload (thus stripped), and (b) that
>> we should provide a way for programmers to opt out of the stripping
>> (making all space into syntax-and-payload).  It feels to me like we
>> have arrived there and are driving around the parking lot, checking
>> out all the parking spots, worrying that we will miss the best one.
> 
> Glad to hear it :)
> 
> So, I posit, we have consensus over the following things: 
> 
>  - Multi-line strings are a useful feature on their own
>  - Using “fat” delimiters for multi-line strings is practical and intuitive
>  - Multi-line string literals share the same escape language as single-line 
> string literals
>  - Newlines in MLSLs should be normalized to \n
>  - There exists a reasonable alignment algorithm, which users can learn 
> easily enough, and can be captured as a library m

Draft JEP on records and sealed types

2019-04-22 Thread Brian Goetz
For review.

https://bugs.openjdk.java.net/browse/JDK-8222777 





Wrapping up the first two courses

2019-04-22 Thread Brian Goetz
> The main thing Brian is waiting for, though, is not lots of new ideas,
> but rather a consensus that (a) we can treat leading whitespace outside
> of a given rectangle as syntax-not-payload (thus stripped), and (b) that
> we should provide a way for programmers to opt out of the stripping
> (making all space into syntax-and-payload).  It feels to me like we
> have arrived there and are driving around the parking lot, checking
> out all the parking spots, worrying that we will miss the best one.

Glad to hear it :)

So, I posit, we have consensus over the following things: 

 - Multi-line strings are a useful feature on their own
 - Using “fat” delimiters for multi-line strings is practical and intuitive
 - Multi-line string literals share the same escape language as single-line 
string literals
 - Newlines in MLSLs should be normalized to \n
 - There exists a reasonable alignment algorithm, which users can learn easily 
enough, and can be captured as a library method on String (some finer points to 
be hammered out)
 - To the extent the language performs alignment, it should be consistent with 
what the library-based version does, so that users can opt out and opt back in 
again
 - In the common case, a MLSL will be a combination of some intended and some 
incidental indentation, and it is reasonable for the default to be that the 
language attempts to normalize away the incidental indendentation
 - There needs to be an opt-out, for the cases where alignment is not the 
default the user wants

(A useful way to frame the discussion we had regarding linguistic alignment is: 
whether a string literal is “one dimensional” or “two dimensional.”  The 1D 
interpretation says a string literal is just a sequence of characters between 
two delimiters; the 2D interpretation says that it has an inherent line 
structure that could be manipulated directly.)

What I like about this proposal — much more than with the previous round — is 
that the two flavors of string literal (thin and fat) are clearly projections 
of the same feature, and their differences pertain solely to their essential 
difference — multi-line-ness.

I will leave it to Jim to summarize the current state of the alignment 
algorithm, and any open questions (e.g., closing delimiter influence, treatment 
of single-line strings, etc) that may still be lingering, but these are not 
blockers to placing our order for the first two courses.  

I am still having a hard time getting comfortable with Guy’s proposal to use 
more “envelope” here — I think others have expressed similar discomfort.  If I 
had to put my finger on it, it is that being able to cut and paste in and out 
is such a big part of what is currently missing, and there is insufficient 
trust that there would be ubiquitous IDE support in all the various ways that 
people edit Java code.  But given that this is framed as “let’s carve out some 
extra envelope space”, we can keep discussing this even as we move forward.  

We still need to make some decisions on syntax; the main one that is currently 
relevant being opt-out. (For any syntax issues, please create another thread.)  
Jim hinted at this earlier: use an escape sequence that is stripped out of the 
string but means “no alignment.”  Something like:

 String s = “"“\- 
 Leave me just the way 
 you found me”””

Obviously there is room to argue over the specific escape sequence, so let’s 
put this in the “open questions” bucket.
   
There was another proposal, which was to use a prefix character:

String s = a”…” // opt into alignment
String s = r”…” // raw string

I’d like to put this one to bed quickly, because I see it as having a number of 
issues.  

Having a set of prefix characters is one of those features that starts off weak 
and scales badly from there :). With only two prefixes, as suggested above, it 
has a feel of overgeneralization, but with a large number of candidate 
prefixes, it gets worse, because invariably as such a feature gets more 
complicated, there are interactions.  One need look only at a Perl regex that 
uses multiple modifiers:

/foo*/egimosx

to realize that what started as a simple feature (I think initially just `g`) 
had grown out of control.  

More importantly, of the two prefixes suggested, one doesn’t really make sense. 
 And that is: while the notion of “raw” string is attractive, one of the things 
that tripped us up the first time around is the believe that “raw” is a binary 
thing.  In reality, raw-ness comes in degrees — how hard you have to work to 
break out of the “string of uninterpreted characters” mode.  (Note: please 
let’s not start a discussion on raw strings; we’re wrapping up our orders for 
the first courses now.  I raise this only to put to bed a syntax choice 
predicated on the assumption that raw-ness is a binary characteristic.). 

If we’re pursuing align-by-default, we should consider a different name for the 
align() method; the name was origin