Re: [Issue 8660] New: Unclear semantics of array literals of char type, vs string literals

2012-09-14 Thread monarch_dodra

On Friday, 14 September 2012 at 11:28:04 UTC, Don wrote:
--- Comment #0 from Don clugd...@yahoo.com.au 2012-09-14 
04:28:17 PDT ---
Array literals of char type, have completely different 
semantics from string

literals. In module scope:

char[] x = ['a'];  // OK -- array literals can have an implicit 
.dup

char[] y = b;// illegal

A second difference is that string literals have a trailing \0. 
It's important
for compatibility with C, but is barely mentioned in the spec. 
The spec does
not state if the trailing \0 is still present after operations 
like

concatenation.


I think this is the normal behavior actually. When you write 
char[] x = ['a'];, you are not actually newing (or dup-ing) 
any data. You are just letting x point to a stack allocated array 
of chars. So the assignment is legal (but kind of unsafe 
actually, if you ever leak x).


On the other hand, you can't bind y to an array of immutable 
chars, as that would subvert the type system.


This, on the other hand, is legal.
char[] y = b.dup;

I do not know how to initialize a char[] on the stack though 
(Appart from writing ['h', 'e', 'l', ... ]). If utf8 also gets 
involved, then I don't know of any workaround.


I think a good solution would be to request the m prefix for 
literals, which would initialize them as mutable:

x = msome mutable string;

A second difference is that string literals have a trailing \0. 
It's important
for compatibility with C, but is barely mentioned in the spec. 
The spec does
not state if the trailing \0 is still present after operations 
like

concatenation.

CTFE can use either, but it has to choose one. This leads to 
odd effects:


string foo(bool b) {
string c = ['a'];
string d = a;
if (b)
return c ~ c;
else
return c ~ d;
}

char[] x = foo(true);   // ok
char[] y = foo(false);  // rejected!

This is really bizarre because at run time, there is no 
difference between
foo(true) and foo(false). They both return a slice of something 
allocated on
the heap. I think x = foo(true) should be rejected as well, it 
has an implicit

cast from immutable to mutable.


Good point. For anybody reading though, the actual code example 
should be

enum char[] x = foo(true);   // ok
enum char[] y = foo(false);  // rejected!

I think the best way to clean up this mess would be to convert 
char[] array
literals into string literals whenever possible. This would 
mean that string
literals may occasionally be of *mutable* type! This would 
means that whenever
they are assigned to a mutable variable, an implicit .dup gets 
added (just as
happens now with array literals). The trailing zero would not 
be duped.

ie:
A string literal of mutable type should behaves the way a 
char[] array literal

behaves now.
A char[] array literal of immutable type should behave the way 
a string literal

does now.


I think this would work with my m suggestion


Re: [Issue 8660] New: Unclear semantics of array literals of char type, vs string literals

2012-09-14 Thread Don Clugston

On 14/09/12 14:50, monarch_dodra wrote:

On Friday, 14 September 2012 at 11:28:04 UTC, Don wrote:

--- Comment #0 from Don clugd...@yahoo.com.au 2012-09-14 04:28:17
PDT ---
Array literals of char type, have completely different semantics from
string
literals. In module scope:

char[] x = ['a'];  // OK -- array literals can have an implicit .dup
char[] y = b;// illegal

A second difference is that string literals have a trailing \0. It's
important
for compatibility with C, but is barely mentioned in the spec. The
spec does
not state if the trailing \0 is still present after operations like
concatenation.


I think this is the normal behavior actually. When you write char[] x =
['a'];, you are not actually newing (or dup-ing) any data. You are
just letting x point to a stack allocated array of chars.


I don't think you've looked at the compiler source code...
The dup is in e2ir.c:4820.


So the
assignment is legal (but kind of unsafe actually, if you ever leak x).


Yes it's legal. In my view it is a design mistake in the language.
The issue now is how to minimize the damage from it.



On the other hand, you can't bind y to an array of immutable chars, as
that would subvert the type system.

This, on the other hand, is legal.
char[] y = b.dup;

I do not know how to initialize a char[] on the stack though (Appart
from writing ['h', 'e', 'l', ... ]). If utf8 also gets involved, then I
don't know of any workaround.

I think a good solution would be to request the m prefix for literals,
which would initialize them as mutable:
x = msome mutable string;


A second difference is that string literals have a trailing \0. It's
important
for compatibility with C, but is barely mentioned in the spec. The
spec does
not state if the trailing \0 is still present after operations like
concatenation.

CTFE can use either, but it has to choose one. This leads to odd effects:

string foo(bool b) {
string c = ['a'];
string d = a;
if (b)
return c ~ c;
else
return c ~ d;
}

char[] x = foo(true);   // ok
char[] y = foo(false);  // rejected!

This is really bizarre because at run time, there is no difference
between
foo(true) and foo(false). They both return a slice of something
allocated on
the heap. I think x = foo(true) should be rejected as well, it has an
implicit
cast from immutable to mutable.


Good point. For anybody reading though, the actual code example should be
enum char[] x = foo(true);   // ok
enum char[] y = foo(false);  // rejected!


No it should not.
The code example was correct. These are static variables.




I think the best way to clean up this mess would be to convert char[]
array
literals into string literals whenever possible. This would mean that
string
literals may occasionally be of *mutable* type! This would means that
whenever
they are assigned to a mutable variable, an implicit .dup gets added
(just as
happens now with array literals). The trailing zero would not be duped.
ie:
A string literal of mutable type should behaves the way a char[] array
literal
behaves now.
A char[] array literal of immutable type should behave the way a
string literal
does now.


I think this would work with my m suggestion


Not necessary. This is only a question about what happens with the 
compiler internals.


Re: [Issue 8660] New: Unclear semantics of array literals of char type, vs string literals

2012-09-14 Thread monarch_dodra

On Friday, 14 September 2012 at 15:00:29 UTC, Don Clugston wrote:

On 14/09/12 14:50, monarch_dodra wrote:

On Friday, 14 September 2012 at 11:28:04 UTC, Don wrote:
--- Comment #0 from Don clugd...@yahoo.com.au 2012-09-14 
04:28:17

PDT ---
Array literals of char type, have completely different 
semantics from

string
literals. In module scope:

char[] x = ['a'];  // OK -- array literals can have an 
implicit .dup

char[] y = b;// illegal

A second difference is that string literals have a trailing 
\0. It's

important
for compatibility with C, but is barely mentioned in the 
spec. The

spec does
not state if the trailing \0 is still present after 
operations like

concatenation.


I think this is the normal behavior actually. When you write 
char[] x =
['a'];, you are not actually newing (or dup-ing) any 
data. You are

just letting x point to a stack allocated array of chars.


I don't think you've looked at the compiler source code...
The dup is in e2ir.c:4820.


So the
assignment is legal (but kind of unsafe actually, if you ever 
leak x).


Yes it's legal. In my view it is a design mistake in the 
language.

The issue now is how to minimize the damage from it.


Thank you for taking the time to educate me. I still have a bit 
of trouble with static vs dynamic array initializations: Things 
don't work quite as in C++, which is confusing me. I'll need to 
study a bit harder how array initializations work. Good news is 
I'm learning.


I think ALL my comments were wrong.

In that case, you are right, since:
char[] x = a.dup;
Is legal.

Good point. For anybody reading though, the actual code 
example should be

enum char[] x = foo(true);   // ok
enum char[] y = foo(false);  // rejected!


No it should not.
The code example was correct. These are static variables.


I hadn't thought of static variables: I placed your code in a 
main, and both produced a compilation error. The enums reproduced 
the issue for me however.



I think this would work with my m suggestion


Not necessary. This is only a question about what happens with 
the compiler internals.


Yes.