Re: Programming in D book is about 88% translated

2014-03-10 Thread Lucifer

On Thursday, 6 March 2014 at 02:57:51 UTC, Puming wrote:
Hi, I am tranlating this book on 
http://git.oschina.net/lucifer2031/Programming-in-D-in-Chinese.


My email is 786325...@qq.com. Can we talk about this?


Thanks Ali,

I've sent you an email.

On Wednesday, 5 March 2014 at 19:05:51 UTC, Ali Çehreli wrote:

On 03/04/2014 09:58 PM, Puming wrote:

 I'd like to translate your book into Chinese, can we talk
about this?

Of course. :) You can email me at acehr...@yahoo.com or start 
translating from the sources:


 https://code.google.com/p/ddili/source/checkout

Here are the build instructions:

 https://code.google.com/p/ddili/source/browse/trunk/README

Somebody has created a git clone of that svn repo but I 
haven't gotten to use that yet. Sorry...


Ali


Re: Bounty for -minimal compiler flag

2014-03-10 Thread Rel

So? Is anyone working on these features?


Re: Article: Increasing the D Compiler Speed by Over 75%

2014-03-10 Thread Walter Bright

On 8/3/2013 1:54 PM, Andrej Mitrovic wrote:

On 8/3/13, Walter Bright newshou...@digitalmars.com wrote:

/delexe

  http://www.digitalmars.com/ctg/ctgLinkSwitches.html#delexecutable


Note that this switch doesn't actually work. We've talked about this
somewhere in an Optlink-related bugzilla issue.



I don't recall seeing it in bugzilla. If it isn't there, please add it.


Re: Emacs users: flycheck-dmd-dub

2014-03-10 Thread w0rp
Cool, I should get something similar for Vim. I keep finding 
myself in the situation where the linting (I think through DMD) 
with syntastic doesn't know where my source files are a lot of 
the time, so I get a lot of problems with it not knowing where to 
find imports.


Mono-D v1.7 - Struct init member completion parser refactorings

2014-03-10 Thread Alexander Bothe

Hi everyone,

just wanted to drop a small sign of life of Mono-D.

http://mono-d.alexanderbothe.com/mono-d-v1-7-struct-init-member-completion-massive-parse-improvements/


Cheers,
Alex


Re: Programming in D book is about 88% translated

2014-03-10 Thread Puming

Hi Lucifer,

Seems like you've got a team doing this :-)

Hope we can collaborate on this translation.

On Monday, 10 March 2014 at 10:12:47 UTC, Lucifer wrote:

On Thursday, 6 March 2014 at 02:57:51 UTC, Puming wrote:
Hi, I am tranlating this book on 
http://git.oschina.net/lucifer2031/Programming-in-D-in-Chinese.


My email is 786325...@qq.com. Can we talk about this?


Thanks Ali,

I've sent you an email.

On Wednesday, 5 March 2014 at 19:05:51 UTC, Ali Çehreli wrote:

On 03/04/2014 09:58 PM, Puming wrote:

 I'd like to translate your book into Chinese, can we talk
about this?

Of course. :) You can email me at acehr...@yahoo.com or start 
translating from the sources:


https://code.google.com/p/ddili/source/checkout

Here are the build instructions:

https://code.google.com/p/ddili/source/browse/trunk/README

Somebody has created a git clone of that svn repo but I 
haven't gotten to use that yet. Sorry...


Ali




Re: Mono-D v1.7 - Struct init member completion parser refactorings

2014-03-10 Thread Puming

Hi Alexander,

Thanks for the great work. I'm always using Mono-D.

On Monday, 10 March 2014 at 20:37:31 UTC, Alexander Bothe wrote:

Hi everyone,

just wanted to drop a small sign of life of Mono-D.

http://mono-d.alexanderbothe.com/mono-d-v1-7-struct-init-member-completion-massive-parse-improvements/


Cheers,
Alex


Re: DIP 57: static foreach

2014-03-10 Thread Kenji Hara
2014-03-10 6:31 GMT+09:00 Timon Gehr timon.g...@gmx.ch:

 http://wiki.dlang.org/DIP57

 Thoughts?


From the Semantics section:

 For static foreach statements, break and continue are supported and
treated like for foreach statements over tuples.

This is questionable sentence. On the foreach with tuple iteration, break
and continue have no effect for the unrolling.

void main()
{
import std.typetuple, std.stdio;

foreach (i; TypeTuple!(1, 2, 3))
{
static if (i == 2) continue;
else static if (i == 3) break;

pragma(msg, CT: i = , i); // prints 1, 2, and 3 in CT
writeln(RT: i = , i); // prints only 1 in RT
}
}

So, I think that static foreach *cannot* support break and continue as same
as foreach with tuples.

Kenji Hara


Re: Major performance problem with std.array.front()

2014-03-10 Thread Nick Sabalausky

On 3/10/2014 12:23 AM, Walter Bright wrote:

On 3/9/2014 9:19 PM, Nick Sabalausky wrote:

On 3/9/2014 6:31 PM, Walter Bright wrote:

On 3/9/2014 6:08 AM, Marc Schütz schue...@gmx.net wrote:

Also, `byCodeUnit` and `byCodePoint` would probably be better names
than `raw`
and `decode`, to much the already existing `byGrapheme` in std.uni.


I'd vastly prefer 'byChar', 'byWchar', 'byDchar' for each of string,
wstring, dstring, and InputRange!char, etc.


'byCodePoint' and 'byDchar' are the same. However, 'byCodeUnit' is
completely
different from anything else:

string  str;
wstring wstr;
dstring dstr;

(str|wchar|dchar).byChar  // Always range of char
(str|wchar|dchar).byWchar // Always range of wchar
(str|wchar|dchar).byDchar // Always range of dchar

str.representation  // Range of ubyte
wstr.representation // Range of ushort
dstr.representation // Range of uint

str.byCodeUnit  // Range of char
wstr.byCodeUnit // Range of wchar
dstr.byCodeUnit // Range of dchar


I don't see much point to the latter 3.



Do you mean:

1. You don't see the point to iterating by code unit?
2. You don't see the point to 'byCodeUnit' if we have 'representation'?
3. You don't see the point to 'byCodeUnit' if we have 
'byChar/byWchar/byDchar'?

4. You don't see the point to having 'byCodeUnit' work on UTF-32 dstrings?

Responses:

1. Iterating by code unit: Useful for tweaking performance anytime 
decoding is unnecessary. For example, parsing a grammar where the bulk 
of the keywords and operators are ASCII. (Occasional uses of Unicode, 
like unicode whitespace, can of course be handled easily enough by the 
lexer FSM).


2. 'byCodeUnit' if we have 'representation': This one I have trouble 
answering since I'm still unclear on the purpose of 'representation' (I 
wasn't even aware of it until a few days ago.) I've been assuming 
there's some specific use-case I've overlooked where it's useful to 
iterate by code unit *while* treating the code units as if they weren't 
UTF-8/16/32 at all. But since 'representation' is called *on* a 
string/wstring/dstring, they should already be UTF-8/16/32 anyway, not 
some other encoding that would necessitate using integer types. Or maybe 
it's just for working around problems with the auto-verification being 
too eager (I've ran into those)? I admit I don't quite get 'representation'.


3. 'byCodeUnit' if we have 'byChar/byWchar/byDchar': To avoid a static 
if chain every time you want to use code units inside generic code. 
Also, so in non-generic code you can change your data type without 
updating instances of 'by*char'.


4. Having 'byCodeUnit' work on UTF-32 dstrings: So generic code working 
on code units doesn't have to special-case UTF-32.





Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Philpax
Fantastic! The organization makes it easy to find the right tool 
for the job.


This is probably nitpicking, but in std.algorithm and other 
modules ( http://dlang.org/library/std/algorithm.html ) there are 
multiple overloads of the same function (splitter, reverse, etc); 
it'd be nice if these could be organized into their own 
sub-categories, so there's no unnecessary visual redundancy.


There's also the library list which displays all modules; do the 
internal modules (druntime, etc) need to be exposed? It might be 
nicer for the end-user for these to be hidden, or kept in their 
own category.


Otherwise, very nice! :)


Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Peter Alexander
On Monday, 10 March 2014 at 03:44:54 UTC, Andrei Alexandrescu 
wrote:

http://dlang.org/library


Looking good!

The module list current shows deeply nested modules (e.g. 
std.c.stdio) before less nested ones (std.stdio). I think it 
should be the other way round, otherwise you have all the std.c.* 
modules listed first.


Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Dmitry Olshansky

10-Mar-2014 07:44, Andrei Alexandrescu пишет:

Consider it alpha quality. Please don't announce yet before we put it in
good shape.

https://github.com/D-Programming-Language/dlang.org/pull/516

http://dlang.org/library

http://dlang.org/library-prerelease

I needed to change quite a bit about the makefile. It was building
everything over and over again, and it's _slow_.

Some functions are not ready, compare e.g.

http://dlang.org/library/std/algorithm/balancedParens.html

with

http://dlang.org/library/std/algorithm/any.html


Andrei



The front page shouldn't contain std.internal.* stuff and we probably 
need to adjust DDocs so that all modules have proper blurb text.


--
Dmitry Olshansky


Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Nicolas Sicard
On Monday, 10 March 2014 at 03:44:54 UTC, Andrei Alexandrescu 
wrote:
Consider it alpha quality. Please don't announce yet before we 
put it in good shape.


https://github.com/D-Programming-Language/dlang.org/pull/516

http://dlang.org/library

http://dlang.org/library-prerelease

I needed to change quite a bit about the makefile. It was 
building everything over and over again, and it's _slow_.


Some functions are not ready, compare e.g.

http://dlang.org/library/std/algorithm/balancedParens.html

with

http://dlang.org/library/std/algorithm/any.html


Andrei


For me it's a real improvement! One thing: symbol names (modules, 
functions, etc.) shouldn't be hyphenated, specially in tables.


Nicolas


Re: Major performance problem with std.array.front()

2014-03-10 Thread Walter Bright

On 3/10/2014 12:09 AM, Nick Sabalausky wrote:

On 3/10/2014 12:23 AM, Walter Bright wrote:

On 3/9/2014 9:19 PM, Nick Sabalausky wrote:

On 3/9/2014 6:31 PM, Walter Bright wrote:

On 3/9/2014 6:08 AM, Marc Schütz schue...@gmx.net wrote:

Also, `byCodeUnit` and `byCodePoint` would probably be better names
than `raw`
and `decode`, to much the already existing `byGrapheme` in std.uni.


I'd vastly prefer 'byChar', 'byWchar', 'byDchar' for each of string,
wstring, dstring, and InputRange!char, etc.


'byCodePoint' and 'byDchar' are the same. However, 'byCodeUnit' is
completely
different from anything else:

string  str;
wstring wstr;
dstring dstr;

(str|wchar|dchar).byChar  // Always range of char
(str|wchar|dchar).byWchar // Always range of wchar
(str|wchar|dchar).byDchar // Always range of dchar

str.representation  // Range of ubyte
wstr.representation // Range of ushort
dstr.representation // Range of uint

str.byCodeUnit  // Range of char
wstr.byCodeUnit // Range of wchar
dstr.byCodeUnit // Range of dchar


I don't see much point to the latter 3.



Do you mean:

1. You don't see the point to iterating by code unit?
2. You don't see the point to 'byCodeUnit' if we have 'representation'?
3. You don't see the point to 'byCodeUnit' if we have 'byChar/byWchar/byDchar'?
4. You don't see the point to having 'byCodeUnit' work on UTF-32 dstrings?


(3)


3. 'byCodeUnit' if we have 'byChar/byWchar/byDchar': To avoid a static if
chain every time you want to use code units inside generic code. Also, so in
non-generic code you can change your data type without updating instances of
'by*char'.


Just not sure I see a use for that.



Re: Major performance problem with std.array.front()

2014-03-10 Thread ponce

On Sunday, 9 March 2014 at 21:14:30 UTC, Nick Sabalausky wrote:
With all due respect, D string type is exclusively for UTF-8 
strings.
If it is not valid UTF-8, it should never had been a D string 
in the

first place. In the other cases, ubyte[] is there.


This is an arbitrary self-imposed limitation caused by the 
choice in how

strings are handled in Phobos.


Yea, I've had problems before - completely unnecessary problems 
that were *not* helpful or indicative of latent bugs - which 
were a direct result of Phobos being overly pedantic and eager 
about UTF validation. And yet the implicit UTF validation has 
never actually *helped* me in any way.




self-imposed limitation

For greater good.

I finds this article very telling about why string should be 
converted to UTF-8 as often as possible.

http://www.utf8everywhere.org/

I agree 100% with its content, it's impossibly hard to have a 
sane handling of encodings on WIndows (even more in a team), if 
not following the drastic rules the article exposes.


This happens to be what Phobos gently mandates, UTF validation is 
certainly the lesser evil as compared the mess that everything 
become without. How is mandating valid UTF-8 being overly 
pedantic? This is the sanest behaviour. Just use sanitizeUTF8 
(http://vibed.org/api/vibe.utils.string/sanitizeUTF8) or 
equivalent.




Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread John Colvin
On Monday, 10 March 2014 at 03:44:54 UTC, Andrei Alexandrescu 
wrote:
Consider it alpha quality. Please don't announce yet before we 
put it in good shape.


https://github.com/D-Programming-Language/dlang.org/pull/516

http://dlang.org/library

http://dlang.org/library-prerelease

I needed to change quite a bit about the makefile. It was 
building everything over and over again, and it's _slow_.


Some functions are not ready, compare e.g.

http://dlang.org/library/std/algorithm/balancedParens.html

with

http://dlang.org/library/std/algorithm/any.html


Andrei


Nice, but those duplicates have got to go!


Re: Major performance problem with std.array.front()

2014-03-10 Thread Andrea Fontana

I'm not sure I understood the point of this (long) thread.
The main problem is that decode() is called also if not needed?

Well, in this case that's not a problem only for string. I found
this problem also when I was writing other ranges. For example
when I read binary data from db stream. Front represent a single
row, and I decode it every time also if not needed.

On Friday, 7 March 2014 at 02:37:11 UTC, Walter Bright wrote:
In Lots of low hanging fruit in Phobos the issue came up 
about the automatic encoding and decoding of char ranges.


Throughout D's history, there are regular and repeated 
proposals to redesign D's view of char[] to pretend it is not 
UTF-8, but UTF-32. I.e. so D will automatically generate code 
to decode and encode on every attempt to index char[].


I have strongly objected to these proposals on the grounds that:

1. It is a MAJOR performance problem to do this.

2. Very, very few manipulations of strings ever actually need 
decoded values.


3. D is a systems/native programming language, and 
systems/native programming languages must not hide the 
underlying representation (I make similar arguments about 
proposals to make ints issue errors on overflow, etc.).


4. Users should choose when decode/encode happens, not the 
language.


and I have been successful at heading these off. But one 
slipped by me. See this in std.array:


  @property dchar front(T)(T[] a) @safe pure if 
(isNarrowString!(T[]))

  {
assert(a.length, Attempting to fetch the front of an empty 
array of  ~

   T.stringof);
size_t i = 0;
return decode(a, i);
  }

What that means is that if I implement an algorithm that 
accepts, as input, an InputRange of char's, it will ALWAYS try 
to decode it. This means that even:


   from.copy(to)

will decode 'from', and then re-encode it for 'to'. And it will 
do it SILENTLY. The user won't notice, and he'll just assume 
that D performance sux. Even if he does notice, his options to 
make his code run faster are poor.


If the user wants decoding, it should be explicit, as in:

from.decode.copy(encode!to)

The USER should decide where and when the decoding goes. 
'decode' should be just another algorithm.


(Yes, I know that std.algorithm.copy() has some specializations 
to take care of this. But these specializations would have to 
be written for EVERY algorithm, which is thoroughly 
unreasonable. Furthermore, copy()'s specializations only apply 
if BOTH source and destination are arrays. If just one is, the 
decode/encode penalty applies.)


Is there any hope of fixing this?


Re: Major performance problem with std.array.front()

2014-03-10 Thread Nick Sabalausky

On 3/10/2014 6:21 AM, ponce wrote:

On Sunday, 9 March 2014 at 21:14:30 UTC, Nick Sabalausky wrote:


Yea, I've had problems before - completely unnecessary problems that
were *not* helpful or indicative of latent bugs - which were a direct
result of Phobos being overly pedantic and eager about UTF validation.
And yet the implicit UTF validation has never actually *helped* me in
any way.




self-imposed limitation

For greater good.

I finds this article very telling about why string should be converted
to UTF-8 as often as possible.
http://www.utf8everywhere.org/

I agree 100% with its content, it's impossibly hard to have a sane
handling of encodings on WIndows (even more in a team), if not following
the drastic rules the article exposes.



I may have missed it, but I don't see where it says anything about 
validation or immediate sanitation of invalid sequences. It's mostly 
UTF-16 sucks and so does Windows (not that I'm necessarily disagreeing 
with it). (ot: Kinda wish they hadn't used such a hard to read font...)




Re: Major performance problem with std.array.front()

2014-03-10 Thread ponce

On Monday, 10 March 2014 at 11:04:43 UTC, Nick Sabalausky wrote:


I may have missed it, but I don't see where it says anything 
about validation or immediate sanitation of invalid sequences. 
It's mostly UTF-16 sucks and so does Windows (not that I'm 
necessarily disagreeing with it). (ot: Kinda wish they hadn't 
used such a hard to read font...)


I should have highlighted it, their recommendations for proper 
encoding handling on Windows are in section 5 (How to do text on 
Windows).


One of them is std::strings and char*, anywhere in the program, 
are considered UTF-8 (if not said otherwise).


I finds it interesting that D tends to enforce this lesson 
learned with mixed-encodings codebases.




Re: Major performance problem with std.array.front()

2014-03-10 Thread Nick Sabalausky

On 3/9/2014 11:27 AM, Vladimir Panteleev wrote:

On Sunday, 9 March 2014 at 08:32:09 UTC, monarch_dodra wrote:

On topic, I think D's implicit default decode to dchar is *infinity*
times better than C++'s char-based strings. While imperfect in terms
of grapheme, it was still a design decision made of win.


Care to argument?



It's simple: Breaking things on all non-English languages is worse than 
breaking things on non-western[1] languages. Is still breakage, and that 
*is* bad, but there's no question which breakage is significantly larger.


[1] (And yes, I realize western is a gross over-simplification here. 
Point is one working language vs several working languages.)




Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Jonas Drewsen

Very nice!

std.algorithm, std.net.curl etc. have their functions/classes 
split in categories.


I haven't used ddox myself but would it be possible to modify it 
to read a category variable in the documentation for a function 
and then use that to group things in the resulting html file?


Or would that need modifications to dmd itself.

/Jonas


Re: DIP 57: static foreach

2014-03-10 Thread Dicebot

On Sunday, 9 March 2014 at 21:31:40 UTC, Timon Gehr wrote:

http://wiki.dlang.org/DIP57/

Thoughts?


1)

Additionally, CTFE is invoked on all expressions occurring in 
the ForeachAggregate


I think it can be phrased more universally ForeachTypeList 
symbols must be evaluated as compile-time entities, if it is not 
possible, implementation-defined compilation error happens.


2) Saying that it does not introduce a new scope is not entirely 
true as symbols from ForeachTypeList should not be available 
outside of static foreach. You mention it later in the same block 
but it is important concept to define as we currently don't have 
such pseudo-scopes (do we?)


3)
The body of the static foreach statement or static foreach 
declaration is duplicated once for each iteration which the 
corresponding foreach statement with an empty body would 
perform when executed in CTFE


I don't understand the reason behind limiting static foreach to 
CTFE semantics. Simply evaluating and pasting the body for each 
iteration should be enough. It is much closer to mixin template 
instances in that regard.


This will also remove necessity to rely on shadowing rules to 
re-define ForeachTypeList symbols as at the time of pasting the 
body those won't exist anymore.


4)

 Declarations introduced in the body itself are inserted into 
this enclosing scope


Isn't enclosing term used only for scope-to-scope relations or 
it is applicable to any language construct? (I don't know)


5)

For static foreach statements, break and continue are supported 
and treated like for foreach statements over tuples.


It is impossible as far as I understand existing semantics. 
Currently placed continue/break refer to created scope and don't 
stop iteration over remaining template argument list members. 
This is not applicable to generic foreach.


6)

In Iterating over members of a scope example there is a strange 
Python-like colon after `static if` condition. Typo? :)


7)

In Relation to tuple foreach stating equivalency is not 
correct. It is more of subset and even not a strict one as 
semantics will differ in some corner cases. For example, 
iterating over expression list will create a local copy right now 
if `ref` is not used. I'd really want this to not be the case for 
static foreach.


Overall provided examples seem to much my expectations but 
semantics description can be more structured and detailed.


Re: DIP 57: static foreach

2014-03-10 Thread Dicebot

On Sunday, 9 March 2014 at 21:53:45 UTC, Adam D. Ruppe wrote:

On Sunday, 9 March 2014 at 21:47:17 UTC, bearophile wrote:
suggest to add to DIP57 one more thing: that the introduction 
of static foreach should come with a warning against the usage 
of not-static foreach on tuples (and eventually this warning 
should become a deprecation message).



I don't agree because foreach on a tuple is just plain foreach. 
That it unrolls is just an implementation detail that doesn't 
change much else. I think considering it to be a separate kind 
of loop is like considering foreach over arrays, ranges, and 
opApply items separate loops. Those are just different 
implementation details of the same user concept.


Can't agree. You can't call it implementation detail if it is a 
property that leaks into user code and can be relied upon. I 
sometimes hear statements akin to tuple is like container and 
tuple foreach is just like foreach but it is a very idealistic 
view that simply does not match current D state. Despite all 
behavior hacks that try to make it look so.


So right now it _is_ a separate and distinctive kind of loop. At 
the same time it is a very specialized tool and deprecating it 
does not sound like a practical approach for reducing language 
complexity. Probably some years later if we eventually find out 
no one uses it anymore.


Re: Major performance problem with std.array.front()

2014-03-10 Thread Dicebot
On Sunday, 9 March 2014 at 17:27:20 UTC, Andrei Alexandrescu 
wrote:

On 3/9/14, 6:47 AM, Marc Schütz schue...@gmx.net wrote:

On Friday, 7 March 2014 at 15:03:24 UTC, Dicebot wrote:
2) It is regression back to C++ days of 
no-one-cares-about-Unicode
pain. Thinking about strings as character arrays is so 
natural and
convenient that if language/Phobos won't punish you for that, 
it will

be extremely widespread.


Not with Nick Sabalausky's suggestion to remove the 
implementation of
front from char arrays. This way, everyone will be forced to 
decide

whether they want code units or code points or something else.


Such as giving up on that crappy language that keeps on 
breaking their code.


Andrei



That was more about if you are that crazy to even consider such 
breakage, this is closer my personal perfection than actual 
proposal ;)


Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer
I proposed this inside the long major performance problem with  
std.array.front, I've also proposed it before, a long time ago.


But seems to be getting no attention buried in that thread, not even  
negative attention :)


An idea to fix the whole problems I see with char[] being treated  
specially by phobos: introduce an actual string type, with char[] as  
backing, that is a dchar range, that actually dictates the rules we want.  
Then, make the compiler use this type for literals.


e.g.:

struct string {
   immutable(char)[] representation;
   this(char[] data) { representation = data;}
   ... // dchar range primitives
}

Then, a char[] array is simply an array of char[].

points:

1. No more issues with foreach(c; cassé), it iterates via dchar
2. No more issues with cassé[4], it is a static compiler error.
3. No more awkward ASCII manipulation using ubyte[].
4. No more phobos schizophrenia saying char[] is not an array.
5. No more special casing char[] array templates to fool the compiler.
6. Any other special rules we come up with can be dictated by the library,  
and not ignored by the compiler.


Note, std.algorithm.copy(string1, mutablestring) will still decode/encode,  
but it's more explicit. It's EXPLICITLY a dchar range. Use  
std.algorithm.copy(string1.representation, mutablestring.representation)  
will avoid the issues.


I imagine only code that is currently UTF ignorant will break, and that  
code is easily 'fixed' by adding the 'representation' qualifier.


-Steve


Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Steven Schveighoffer
On Sun, 09 Mar 2014 23:44:43 -0400, Andrei Alexandrescu  
seewebsiteforem...@erdani.org wrote:


Consider it alpha quality. Please don't announce yet before we put it in  
good shape.




I LOVE this. Been waiting for it for a long time. The cross-links  
themselves are worth the wait.


Just look at how organized std.datetime has become!

Now, one nitpick -- I would like to see leaf links expand locally instead  
of opening a new page. Perhaps you can click on the link, and it opens a  
new page, but have a + button to expand in-line if desired. Essentially,  
the disruption of going to a new page when looking at the details of a  
function, I feel is too much.


And look at that, disqus comments!

-Steve


Re: Major performance problem with std.array.front()

2014-03-10 Thread Dicebot

On Friday, 7 March 2014 at 19:43:57 UTC, Walter Bright wrote:

On 3/7/2014 7:03 AM, Dicebot wrote:
1) It is a huge breakage and you have been refusing to do one 
even for more

important problems. What is about this sudden change of mind?


1. Performance Performance Performance


Not important enough. D has always been safe by default, fast 
when asked to language, not other way around. There is no 
fundamental performance problem here, only lack of knowledge 
about Phobos.


2. The current behavior is surprising (it sure surprised me, I 
didn't notice it until I looked at the assembler to figure out 
why the performance sucked)


That may imply that better documentation is needed. You were only 
surprised because of wrong initial assumption about what `char[]` 
type means.



3. Weirdnesses like ElementEncodingType


ElementEncodingType is extremely annoying but I think it is just 
a side effect of more bigger problem how string algorithms are 
handled currently. It does not need to be that way.


4. Strange behavior differences between char[], char*, and 
InputRange!char types


Again, there is nothing strange about it. `char[]` is a special 
type with special semantics that is defined in documentation and 
consistently following  that definition in all but raw array 
indexing/slicing (which is what I find unfortunate but also 
beyond fixing feasibility).


5. Funky anomalous issues with writing OutputRange!char (the 
put(T) must take a dchar)


Bad but not worth even a small breaking change.

2) lack of convenient .raw property which will effectively do 
cast(ubyte[])


I've done the cast as a workaround, but when working with 
generic code it turns out the ubyte type becomes viral - you 
have to use it everywhere. So all over the place you're having 
casts between ubyte = char in unexpected places. You also 
wind up with ugly ubyte = dchar casts, with the commensurate 
risk that you goofed and have a truncation bug.


Of course it is viral. Because you never ever wan't to have 
char[] at all if you don't work with Unicode (or work with it on 
raw byte level). And in that case it is your responsibility to do 
manual decoding when appropriate. Trying to dish out that 
performance often means going at low level with all associated 
risks, there is nothing special about char[] here. It is not a 
common use case.


Essentially, the auto-decode makes trivial code look better, 
but if you're writing a more comprehensive string processing 
program, and care about performance, it makes a regular ugly 
mess of things.


And this is how it should be. Again, I am all for creating 
language that favors performance-critical power programming needs 
over common/casual needs but it is not what D is and you have 
been making such choices consistently over quite a long time now 
(array literals that allocate, I will never forgive that). 
Suddenly changing your mind only because you have encountered 
this specific issue personally as opposed to just reports does 
not fit a language author role. It does not really matter if any 
new approach itself is good or bad - being unpredictable is a 
reputation damage D simply can't afford.


Re: Major performance problem with std.array.front()

2014-03-10 Thread Abdulhaq

On Monday, 10 March 2014 at 10:52:02 UTC, Andrea Fontana wrote:

I'm not sure I understood the point of this (long) thread.
The main problem is that decode() is called also if not needed?



I'd like to offer up one D 'user' perspective, it's just a single 
data point but perhaps useful. I write applications that process 
Arabic, and I'm thinking about converting one of those apps from 
python to D, for performance reasons.


My app deals with unicode arabic text that is 'out there', and 
the UnicodeTM support for Arabic is not that well thought out, so 
the data is often (always) inconsistent in terms of sequencing 
diacritics etc. Even the code page can vary. Therefore my code 
has to cater to various ways that other developers have sequenced 
the code points.


So, my needs as a 'user' are:
* I want to encode all incoming data immediately into unicode, 
usually UTF8, if isn't already.
* I want to iterate over code points. I don't care about the raw 
data.
* When I get the length of my string it should be the number of 
code points.

* When I index my string it should return the nth code point.
* When I manipulate my strings I want to work with code points
... you get the drift.

If I want to access the raw data, which I don't, then I'm very 
happy to cast to ubyte etc.


If encode/decode is a performance issue then perhaps there could 
be a cache for recently used strings where the code point 
representation is held.


BTW to answer a question in the thread, yes the data is 
left-to-right and visualised right-to-left.






Re: Major performance problem with std.array.front()

2014-03-10 Thread Andrea Fontana
In italian we need unicode too. We have several accented letters 
and often programming languages don't handle utf-8 and other 
encoding so well...


In D I never had any problem with this, and I work a lot on text 
processing.


So my question: is there any problem I'm missing in D with 
unicode support or is just a performance problem on algorithms?


If the problem is performance on algorithms that use .front() but 
don't care to understand its data, why don't we add a .rawFront() 
property to implement only when make sense and then a fallback 
like:


auto rawFront(R)(R range) if ( ... isrange ...  
!__traits(compiles, range.rawFront))  { return range.front; }


In this way on copy() or other algorithms we can use rawFront() 
and it's backward compatible with other ranges too.


But I guess I'm missing the point :)


On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote:

On Monday, 10 March 2014 at 10:52:02 UTC, Andrea Fontana wrote:

I'm not sure I understood the point of this (long) thread.
The main problem is that decode() is called also if not needed?



I'd like to offer up one D 'user' perspective, it's just a 
single data point but perhaps useful. I write applications that 
process Arabic, and I'm thinking about converting one of those 
apps from python to D, for performance reasons.


My app deals with unicode arabic text that is 'out there', and 
the UnicodeTM support for Arabic is not that well thought out, 
so the data is often (always) inconsistent in terms of 
sequencing diacritics etc. Even the code page can vary. 
Therefore my code has to cater to various ways that other 
developers have sequenced the code points.


So, my needs as a 'user' are:
* I want to encode all incoming data immediately into unicode, 
usually UTF8, if isn't already.
* I want to iterate over code points. I don't care about the 
raw data.
* When I get the length of my string it should be the number of 
code points.

* When I index my string it should return the nth code point.
* When I manipulate my strings I want to work with code points
... you get the drift.

If I want to access the raw data, which I don't, then I'm very 
happy to cast to ubyte etc.


If encode/decode is a performance issue then perhaps there 
could be a cache for recently used strings where the code point 
representation is held.


BTW to answer a question in the thread, yes the data is 
left-to-right and visualised right-to-left.




Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Dicebot
On Monday, 10 March 2014 at 03:44:54 UTC, Andrei Alexandrescu 
wrote:
Consider it alpha quality. Please don't announce yet before we 
put it in good shape.


https://github.com/D-Programming-Language/dlang.org/pull/516

http://dlang.org/library

http://dlang.org/library-prerelease

I needed to change quite a bit about the makefile. It was 
building everything over and over again, and it's _slow_.


Some functions are not ready, compare e.g.

http://dlang.org/library/std/algorithm/balancedParens.html

with

http://dlang.org/library/std/algorithm/any.html


Andrei


I still don't like disqus :)

Documentation in general may probably benefit from some styling 
tweaks - for example, std.alogrithm looks funny when manually 
crafted tables turn into usual generated function list. But 
overall look solid.


Re: Major performance problem with std.array.front()

2014-03-10 Thread dennis luehring

Am 07.03.2014 03:37, schrieb Walter Bright:

In Lots of low hanging fruit in Phobos the issue came up about the automatic
encoding and decoding of char ranges.


after reading many of the attached posts the question is - what
could be Ds future design of introducing breaking changes, its
not a solution to say its not possible because of too many breaking 
changes - that will become more and more a problem of Ds evolution

- much like C++



Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Mike
Thank you, to everone who worked on this.  It's quite an 
improvement.


Problem:
http://dlang.org/library/std/compiler/vendor.html is a 404

Recommendation:
I really liked the immediate link to the source file on github in 
the old layout.  If possible please add it to the new layout.


Mike




Re: DIP 57: static foreach

2014-03-10 Thread Andrej Mitrovic
On 3/10/14, Kenji Hara k.hara...@gmail.com wrote:
 This is questionable sentence. On the foreach with tuple iteration, break
 and continue have no effect for the unrolling.

Whatever is implemented, we need to make sure the current code is
possible. in std.conv.to:

-
switch(value)
{
foreach (I, member; NoDuplicates!(EnumMembers!S))
{
case member:
return to!T(enumRep!(immutable(T), S, I));
}
default:
}
-


Re: Formal review of std.lexer

2014-03-10 Thread Dicebot

Reminder about benchmarks.

By the way, is generated lexer usable at CTFE? Imaginary use case
: easier DSL implementation.


Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Vladimir Panteleev

On Monday, 10 March 2014 at 14:08:07 UTC, Mike wrote:
Thank you, to everone who worked on this.  It's quite an 
improvement.


Problem:
http://dlang.org/library/std/compiler/vendor.html is a 404

Recommendation:
I really liked the immediate link to the source file on github 
in the old layout.  If possible please add it to the new layout.


Since (IIRC) DDox parses JSON layout, I think it is capable of 
generating exact links to the file:line of each symbol. That 
would be neat, as it allows quickly seeing the implementation if 
the documentation is not sufficient.


Re: Major performance problem with std.array.front()

2014-03-10 Thread Dicebot

On Monday, 10 March 2014 at 14:05:39 UTC, dennis luehring wrote:

Am 07.03.2014 03:37, schrieb Walter Bright:
In Lots of low hanging fruit in Phobos the issue came up 
about the automatic

encoding and decoding of char ranges.


after reading many of the attached posts the question is - what
could be Ds future design of introducing breaking changes, its
not a solution to say its not possible because of too many 
breaking changes - that will become more and more a problem of 
Ds evolution

- much like C++


Historically 2 approaches has been practiced:

1) argue a lot and then do nothing
2) suddenly change something and tell users is was necessary

I also think that this is much more important issue than this 
whole thread but it does not seem to attract any real attention 
when mentioned.


Re: Major performance problem with std.array.front()

2014-03-10 Thread Abdulhaq

On Monday, 10 March 2014 at 14:05:39 UTC, dennis luehring wrote:

Am 07.03.2014 03:37, schrieb Walter Bright:
In Lots of low hanging fruit in Phobos the issue came up 
about the automatic

encoding and decoding of char ranges.


after reading many of the attached posts the question is - what
could be Ds future design of introducing breaking changes, its
not a solution to say its not possible because of too many 
breaking changes - that will become more and more a problem of 
Ds evolution

- much like C++


I'm a newbie here but I've been waiting for D to mature for a 
long time. D IMHO has to stabilise now because:
* D needs a bigger community so that the the big fish who have 
learnt the ins and outs don't get bored and move on due to lack 
of kudos etc.
* To get the bigger community D needs more _working_ libraries 
for major toolkits (GUI etc. etc.)
* Libraries will cease to work if there is significant change in 
D, and then can stay broken because there isn't the inertial mass 
of other developers to maintain it after the intial developer has 
moved on. You can see that this has happened a LOT
* Anyway the D that I read about in TDPL is already very exciting 
for programmers like myself, we just want that thanks.


Breaking changes can go into D3, if and whenever that is. Keep 
breaking D2 now and it risks just being forevermore a playpen for 
computer scientist types.


Anyway who cares what I think but I think it reflects a lot of 
people's opinions too.





Re: Formal review of std.lexer

2014-03-10 Thread Dicebot
On Wednesday, 26 February 2014 at 18:07:37 UTC, Jacob Carlborg 
wrote:

On 2014-02-26 00:25, Dicebot wrote:

Don't know if it makes sense to introduce random package 
categorizatin.
I'd love to see more hierarchy in Phobos too but we'd first 
need to

agree to package separation principles then.


Then that's what we need to do. I don't want any more top level 
modules. There are already too many.


As much as I hate to say it, but such hierarchy is worth a DIP. 
Once it is formalized, I can proceed with it in review queue as 
if it was a new module proposal.


Re: Major performance problem with std.array.front()

2014-03-10 Thread Vladimir Panteleev

On Monday, 10 March 2014 at 14:11:13 UTC, Dicebot wrote:

Historically 2 approaches has been practiced:

1) argue a lot and then do nothing
2) suddenly change something and tell users is was necessary


These are one and the same, just from the two opposing points of 
view.


I also think that this is much more important issue than this 
whole thread but it does not seem to attract any real attention 
when mentioned.


You mean the whole policy on breaking changes?


Re: Major performance problem with std.array.front()

2014-03-10 Thread Abdulhaq




Historically 2 approaches has been practiced:

1) argue a lot and then do nothing


This happens (I think) because Andrei and Walter really value 
your's and other expert's opinions, but nevertheless have to 
preserve the general way things work to preserve the long term 
future of D. They have to be open to persuasion but it would have 
to be very compelling to get them to change basics now - it seems 
to me.


D is at that difficult 90% stage that we all know about where the 
boring difficult stuff is left to do. People like to discuss 
interesting new stuff which at the time seems oh-so-important.




Re: Major performance problem with std.array.front()

2014-03-10 Thread Dicebot
On Monday, 10 March 2014 at 14:27:02 UTC, Vladimir Panteleev 
wrote:

On Monday, 10 March 2014 at 14:11:13 UTC, Dicebot wrote:

Historically 2 approaches has been practiced:

1) argue a lot and then do nothing
2) suddenly change something and tell users is was necessary


These are one and the same, just from the two opposing points 
of view.


/sarcasm :)



I also think that this is much more important issue than this 
whole thread but it does not seem to attract any real 
attention when mentioned.


You mean the whole policy on breaking changes?


Yes. I have given up about this idea at some point as there 
seemed to be consensus that no breaking changes will be even 
considered for D2 and those that come from fixing bugs are not 
worth the fuss. This is exactly why I was so shocked that Walter 
has even started this thread. If breaking changes are actually 
considered (rare or not), then it is absolutely critical to 
define the process for it and put link to its description to 
dlang.org front page.


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer
On Mon, 10 Mar 2014 09:35:44 -0400, Steven Schveighoffer  
schvei...@yahoo.com wrote:



Then, a char[] array is simply an array of char[].


An array of char even.

-Steve


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Dicebot
On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer 
wrote:
I proposed this inside the long major performance problem with 
std.array.front, I've also proposed it before, a long time ago.


But seems to be getting no attention buried in that thread, not 
even negative attention :)


An idea to fix the whole problems I see with char[] being 
treated specially by phobos: introduce an actual string type, 
with char[] as backing, that is a dchar range, that actually 
dictates the rules we want. Then, make the compiler use this 
type for literals.


e.g.:

struct string {
   immutable(char)[] representation;
   this(char[] data) { representation = data;}
   ... // dchar range primitives
}

Then, a char[] array is simply an array of char[].

points:

1. No more issues with foreach(c; cassé), it iterates via 
dchar
2. No more issues with cassé[4], it is a static compiler 
error.

3. No more awkward ASCII manipulation using ubyte[].
4. No more phobos schizophrenia saying char[] is not an array.
5. No more special casing char[] array templates to fool the 
compiler.
6. Any other special rules we come up with can be dictated by 
the library, and not ignored by the compiler.


Note, std.algorithm.copy(string1, mutablestring) will still 
decode/encode, but it's more explicit. It's EXPLICITLY a dchar 
range. Use std.algorithm.copy(string1.representation, 
mutablestring.representation) will avoid the issues.


I imagine only code that is currently UTF ignorant will break, 
and that code is easily 'fixed' by adding the 'representation' 
qualifier.


-Steve


It will break any code that slices stored char[] strings directly 
which may or may not be breaking UTF depending on how indices are 
calculated. Also adding one more runtime dependency into language 
but there are so many that it probably does not matter.


Maybe in D3...

2014-03-10 Thread Vladimir Panteleev
From time to time, there are discussions concerning ideas which 
would impact the language, as it is now, too drastically to be 
implemented (it would break too much code or require a 
significant reengineering effort). These discussions get lost, 
which is regrettable since some of the discussions sometimes 
produce genuinely great ideas.


Although there is no D3 on the horizon, I think it would be nice 
to keep track of these ideas anyway.


http://wiki.dlang.org/Language_issues


Re: Proposal for fixing dchar ranges

2014-03-10 Thread H. S. Teoh
On Mon, Mar 10, 2014 at 09:35:44AM -0400, Steven Schveighoffer wrote:
[...]
 An idea to fix the whole problems I see with char[] being treated
 specially by phobos: introduce an actual string type, with char[] as
 backing, that is a dchar range, that actually dictates the rules we
 want. Then, make the compiler use this type for literals.
 
 e.g.:
 
 struct string {
immutable(char)[] representation;
this(char[] data) { representation = data;}
... // dchar range primitives
 }
 
 Then, a char[] array is simply an array of char[].
 
 points:
 
 1. No more issues with foreach(c; cassé), it iterates via dchar
 2. No more issues with cassé[4], it is a static compiler error.
 3. No more awkward ASCII manipulation using ubyte[].
 4. No more phobos schizophrenia saying char[] is not an array.
 5. No more special casing char[] array templates to fool the compiler.
 6. Any other special rules we come up with can be dictated by the
 library, and not ignored by the compiler.

I like this idea. Special-casing char[] in templates was a bad idea. It
makes Phobos code needlessly complex, and the inconsistent treatment of
char[] sometimes as an array of char and sometimes not causes silly
issues like foreach defaulting to char but range iteration defaulting to
dchar. Enclosing it in a struct means we can enforce string rules
separately from the fact that it's a char array.


 Note, std.algorithm.copy(string1, mutablestring) will still
 decode/encode, but it's more explicit. It's EXPLICITLY a dchar
 range. Use std.algorithm.copy(string1.representation,
 mutablestring.representation) will avoid the issues.
 
 I imagine only code that is currently UTF ignorant will break, and
 that code is easily 'fixed' by adding the 'representation'
 qualifier.
[...]

The only concern I have is the current use of char[] and const(char)[]
as mutable strings, and the current implicit conversion from string to
const(char)[]. We would need similar wrappers for char[] and
const(char)[], and string and mutablestring must be implicitly
convertible to conststring, otherwise a LOT of existing code will break
in a major way. Plus, these wrappers should also expose the same dchar
range API with .representation giving a way to get at the raw code
units.


T

-- 
It is the quality rather than the quantity that matters. -- Lucius Annaeus 
Seneca


Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Andrei Alexandrescu

On 3/10/14, 1:35 AM, Nicolas Sicard wrote:

For me it's a real improvement! One thing: symbol names (modules,
functions, etc.) shouldn't be hyphenated, specially in tables.


All: how does one turn off css hyphenation?

Andrei


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer

On Mon, 10 Mar 2014 10:48:26 -0400, Dicebot pub...@dicebot.lv wrote:


On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer wrote:
I proposed this inside the long major performance problem with  
std.array.front, I've also proposed it before, a long time ago.


But seems to be getting no attention buried in that thread, not even  
negative attention :)


An idea to fix the whole problems I see with char[] being treated  
specially by phobos: introduce an actual string type, with char[] as  
backing, that is a dchar range, that actually dictates the rules we  
want. Then, make the compiler use this type for literals.


e.g.:

struct string {
   immutable(char)[] representation;
   this(char[] data) { representation = data;}
   ... // dchar range primitives
}

Then, a char[] array is simply an array of char[].

points:

1. No more issues with foreach(c; cassé), it iterates via dchar
2. No more issues with cassé[4], it is a static compiler error.
3. No more awkward ASCII manipulation using ubyte[].
4. No more phobos schizophrenia saying char[] is not an array.
5. No more special casing char[] array templates to fool the compiler.
6. Any other special rules we come up with can be dictated by the  
library, and not ignored by the compiler.


Note, std.algorithm.copy(string1, mutablestring) will still  
decode/encode, but it's more explicit. It's EXPLICITLY a dchar range.  
Use std.algorithm.copy(string1.representation,  
mutablestring.representation) will avoid the issues.


I imagine only code that is currently UTF ignorant will break, and that  
code is easily 'fixed' by adding the 'representation' qualifier.




It will break any code that slices stored char[] strings directly which  
may or may not be breaking UTF depending on how indices are calculated.


That is already broken. What I'm looking to do is remove the cruft and  
WTF factor of the current state of affairs (an array that's not an  
array).


Originally (in that long ago proposal) I had proposed to check for and  
disallow invalid slicing during runtime. In fact, it could be added if  
desired with the type defined by the library.


Also adding one more runtime dependency into language but there are so  
many that it probably does not matter.


alias string = immutable(char)[];

There isn't much extra dependency one must add to revert to the original  
behavior. In fact, one nice thing about this proposal is the compiler  
changes can be done and tested before any real meddling with the string  
type is done.


-Steve


Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Andrei Alexandrescu

On 3/10/14, 7:00 AM, Dicebot wrote:

I still don't like disqus :)


Are there better such systems available?


Documentation in general may probably benefit from some styling tweaks -
for example, std.alogrithm looks funny when manually crafted tables turn
into usual generated function list. But overall look solid.


Yah, we need a solid community effort on this all. Please file issues 
appropriately, and hopefully fix others directly.


Folks, this is the long tail. Please help us improve our documentation.


Andrei



Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer
On Mon, 10 Mar 2014 10:54:50 -0400, H. S. Teoh hst...@quickfur.ath.cx  
wrote:




The only concern I have is the current use of char[] and const(char)[]
as mutable strings, and the current implicit conversion from string to
const(char)[]. We would need similar wrappers for char[] and
const(char)[], and string and mutablestring must be implicitly
convertible to conststring, otherwise a LOT of existing code will break
in a major way.


I agree that is a limitation of the proposal. It's more of a language-wide  
problem that one cannot make a struct that can be tail-const-ified.


One idea to begin with is to weakly bind to immutable(char)[] using alias  
this. That way, existing code devolves to current behavior. Then you pick  
off the primitives you want by defining them in the struct itself.



Plus, these wrappers should also expose the same dchar
range API with .representation giving a way to get at the raw code
units.


It already does that, representation is a public member.

-Steve


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer

On Mon, 10 Mar 2014 11:11:50 -0400, Boyd gaboonvi...@gmx.net wrote:

I personally love this idea, though I think it probably introduces too  
much silent breaking changes for it to be universally acceptable by D  
users.


What silent breaking changes?

-Steve


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Dicebot
On Monday, 10 March 2014 at 15:01:54 UTC, Steven Schveighoffer 
wrote:
That is already broken. What I'm looking to do is remove the 
cruft and WTF factor of the current state of affairs (an 
array that's not an array).


Originally (in that long ago proposal) I had proposed to check 
for and disallow invalid slicing during runtime. In fact, it 
could be added if desired with the type defined by the library.


Broken as if in you are not supposed to do it user code? Yes. 
Broken as in does the wrong thing - no. If your index is 
properly calculated, it is no different from casting to ubyte[] 
and then slicing. I am pretty sure even Phobos does it here and 
there.


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Boyd
I personally love this idea, though I think it probably 
introduces too much silent breaking changes for it to be 
universally acceptable by D users.


Perhaps naming it 'String', and deprecating 'string' would make 
it more acceptable?



On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer 
wrote:
I proposed this inside the long major performance problem with 
std.array.front, I've also proposed it before, a long time ago.


But seems to be getting no attention buried in that thread, not 
even negative attention :)


An idea to fix the whole problems I see with char[] being 
treated specially by phobos: introduce an actual string type, 
with char[] as backing, that is a dchar range, that actually 
dictates the rules we want. Then, make the compiler use this 
type for literals.


e.g.:

struct string {
   immutable(char)[] representation;
   this(char[] data) { representation = data;}
   ... // dchar range primitives
}

Then, a char[] array is simply an array of char[].

points:

1. No more issues with foreach(c; cassé), it iterates via 
dchar
2. No more issues with cassé[4], it is a static compiler 
error.

3. No more awkward ASCII manipulation using ubyte[].
4. No more phobos schizophrenia saying char[] is not an array.
5. No more special casing char[] array templates to fool the 
compiler.
6. Any other special rules we come up with can be dictated by 
the library, and not ignored by the compiler.


Note, std.algorithm.copy(string1, mutablestring) will still 
decode/encode, but it's more explicit. It's EXPLICITLY a dchar 
range. Use std.algorithm.copy(string1.representation, 
mutablestring.representation) will avoid the issues.


I imagine only code that is currently UTF ignorant will break, 
and that code is easily 'fixed' by adding the 'representation' 
qualifier.


-Steve


Re: Maybe in D3...

2014-03-10 Thread Dicebot
On Monday, 10 March 2014 at 14:50:27 UTC, Vladimir Panteleev 
wrote:
From time to time, there are discussions concerning ideas which 
would impact the language, as it is now, too drastically to be 
implemented (it would break too much code or require a 
significant reengineering effort). These discussions get lost, 
which is regrettable since some of the discussions sometimes 
produce genuinely great ideas.


Although there is no D3 on the horizon, I think it would be 
nice to keep track of these ideas anyway.


http://wiki.dlang.org/Language_issues


I remember someone already creating such page but can't remember 
the title :(
Main problem with it is that with D3 not being a realistic option 
there is not much motivation into maintaining it. Some ideas are 
great but by time those may become demanded collective conscious 
is likely to produce even greater ideas :)


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer

On Mon, 10 Mar 2014 11:20:49 -0400, Boyd gaboonvi...@gmx.net wrote:


Utf8 aware slicing for strings would be an issue.


I'm not proposing to add this.

-Steve


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Boyd

Utf8 aware slicing for strings would be an issue.

--
On Monday, 10 March 2014 at 15:13:26 UTC, Steven Schveighoffer 
wrote:
On Mon, 10 Mar 2014 11:11:50 -0400, Boyd gaboonvi...@gmx.net 
wrote:


I personally love this idea, though I think it probably 
introduces too much silent breaking changes for it to be 
universally acceptable by D users.


What silent breaking changes?

-Steve


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Boyd
Ok, then you just destroyed my sole hypothetical objection to 
this.

---
On Monday, 10 March 2014 at 15:22:41 UTC, Steven Schveighoffer 
wrote:
On Mon, 10 Mar 2014 11:20:49 -0400, Boyd gaboonvi...@gmx.net 
wrote:



Utf8 aware slicing for strings would be an issue.


I'm not proposing to add this.

-Steve


Re: DIP 57: static foreach

2014-03-10 Thread Timon Gehr

On 03/10/2014 02:08 PM, Dicebot wrote:

On Sunday, 9 March 2014 at 21:31:40 UTC, Timon Gehr wrote:

http://wiki.dlang.org/DIP57/

Thoughts?


1)


Additionally, CTFE is invoked on all expressions occurring in the
ForeachAggregate


I think it can be phrased more universally ForeachTypeList symbols must
be evaluated as compile-time entities, if it is not possible,
implementation-defined compilation error happens.
...


I don't see how this is more universal.


2) Saying that it does not introduce a new scope is not entirely true as
symbols from ForeachTypeList should not be available outside of static
foreach. You mention it later in the same block but it is important
concept to define as we currently don't have such pseudo-scopes

 ...

The description only says that the usual scope for foreach statements is 
not introduced.



(do we?)
...


Nope.


3)

The body of the static foreach statement or static foreach declaration
is duplicated once for each iteration which the corresponding foreach
statement with an empty body would perform when executed in CTFE


I don't understand the reason behind limiting static foreach to CTFE
semantics. Simply evaluating and pasting the body for each iteration
should be enough. It is much closer to mixin template instances in that
regard.
...


I don't understand how the DIP is 'limiting static foreach to CTFE 
semantics' and/or why this is a bad thing or how your suggestion is 
different.



This will also remove necessity to rely on shadowing rules to re-define
ForeachTypeList symbols as at the time of pasting the body those won't
exist anymore.
...


I have no idea what this means.


4)


 Declarations introduced in the body itself are inserted into this
enclosing scope


Isn't enclosing term used only for scope-to-scope relations or it is
applicable to any language construct? (I don't know)
...


There is no formal language spec. What is meant is the scope `hosting' 
the static foreach construct.



5)


For static foreach statements, break and continue are supported and
treated like for foreach statements over tuples.


It is impossible as far as I understand existing semantics. Currently
placed continue/break refer to created scope and don't stop iteration
over remaining template argument list members. This is not applicable to
generic foreach.
...


This is not 'impossible', it is trivial to implement. Is your point that 
you would prefer break and continue to affect static foreach expansion?



6)

In Iterating over members of a scope example there is a strange
Python-like colon after `static if` condition. Typo? :)
...


Nope. This is a language feature. See:
http://dlang.org/version.html


7)

In Relation to tuple foreach stating equivalency is not correct.


I have removed the section.


It is more of subset and even not a strict one as semantics will differ in
some corner cases.


I think as described they would not need to.


For example, iterating over expression list will
create a local copy right now if `ref` is not used. I'd really want this
to not be the case for static foreach.
...


I think the description is actually not detailed enough to warrant this 
critique. (In particular, it is not clear what 'ref' should do.)


I.e., I think currently the following code is ambiguous:

int y,z;
static foreach(x;Seq!(y,z)) x = 2;
// what is the value of y and z now?


Overall provided examples seem to much my expectations but semantics
description can be more structured and detailed.


Agreed. I will do another iteration when I can find the time. Maybe I 
will have to re-specify the behaviour of foreach though.


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer

On Mon, 10 Mar 2014 11:11:23 -0400, Dicebot pub...@dicebot.lv wrote:


On Monday, 10 March 2014 at 15:01:54 UTC, Steven Schveighoffer wrote:
That is already broken. What I'm looking to do is remove the cruft and  
WTF factor of the current state of affairs (an array that's not an  
array).


Originally (in that long ago proposal) I had proposed to check for and  
disallow invalid slicing during runtime. In fact, it could be added if  
desired with the type defined by the library.


Broken as if in you are not supposed to do it user code? Yes. Broken  
as in does the wrong thing - no. If your index is properly calculated,  
it is no different from casting to ubyte[] and then slicing. I am pretty  
sure even Phobos does it here and there.


If the idea to ensure the user cannot slice a code point was added, you  
would still be able to slice via str.representation[a..b], or even  
str.ptr[a..b] if you were so sure of the length you didn't want it to be  
checked ;)


The idea behind the proposal is to make it fully backwards compatible with  
existing code, except for randomly accessing a char, and probably .length.  
Slicing would still work as it does now, but could be adjusted later.


It will break existing code. To fix those breaks, you would need to use  
the char[] array directly via the representation member, or rethink your  
code to be UTF-correct. Basically, instead of pretending an array isn't an  
array, create a new mostly-compatible type that behaves as we want it to  
behave in all circumstances, not just when you use phobos algorithms.


The breaks may be trivial to work around, and might seem annoying.  
However, they may be actual UTF bugs that make your code more correct when  
you fix them.


The biggest problem right now is the lack of the ability to implicitly  
cast to tail-const with a custom struct. We can keep an alias-this link  
for those cases until we can fix that in the compiler.


-Steve


Re: DIP 57: static foreach

2014-03-10 Thread Timon Gehr

On 03/10/2014 07:40 AM, Kenji Hara wrote:

2014-03-10 6:31 GMT+09:00 Timon Gehr timon.g...@gmx.ch
mailto:timon.g...@gmx.ch:

http://wiki.dlang.org/DIP57 http://wiki.dlang.org/DIP57/

Thoughts?


 From the Semantics section:

  For static foreach statements, break and continue are supported and
treated like for foreach statements over tuples.

This is questionable sentence. On the foreach with tuple iteration,
break and continue have no effect for the unrolling.
...


That's what is meant, and indeed this is visible in the examples section.


void main()
{
 import std.typetuple, std.stdio;

 foreach (i; TypeTuple!(1, 2, 3))
 {
 static if (i == 2) continue;
 else static if (i == 3) break;

 pragma(msg, CT: i = , i); // prints 1, 2, and 3 in CT
 writeln(RT: i = , i); // prints only 1 in RT
 }
}

So, I think that static foreach *cannot* support break and continue as
same as foreach with tuples.

Kenji Hara



Yes it can. What is your suggestion? Influencing the unrolling?


Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread ralex
On Monday, 10 March 2014 at 14:56:13 UTC, Andrei Alexandrescu 
wrote:

On 3/10/14, 1:35 AM, Nicolas Sicard wrote:
For me it's a real improvement! One thing: symbol names 
(modules,

functions, etc.) shouldn't be hyphenated, specially in tables.


All: how does one turn off css hyphenation?

Andrei


word-wrap: break-word;
-webkit-hypens: none;
-moz-hypens: none;
-ms-hypens: none;
hypens: none;


should do the trick..


Re: Maybe in D3...

2014-03-10 Thread John Colvin

On Monday, 10 March 2014 at 15:16:13 UTC, Dicebot wrote:
On Monday, 10 March 2014 at 14:50:27 UTC, Vladimir Panteleev 
wrote:
From time to time, there are discussions concerning ideas 
which would impact the language, as it is now, too drastically 
to be implemented (it would break too much code or require a 
significant reengineering effort). These discussions get lost, 
which is regrettable since some of the discussions sometimes 
produce genuinely great ideas.


Although there is no D3 on the horizon, I think it would be 
nice to keep track of these ideas anyway.


http://wiki.dlang.org/Language_issues


I remember someone already creating such page but can't 
remember the title :(
Main problem with it is that with D3 not being a realistic 
option there is not much motivation into maintaining it. Some 
ideas are great but by time those may become demanded 
collective conscious is likely to produce even greater ideas :)


Keeping track of the ideas is still worthwhile though, if only to 
bring people up to speed who haven't been part of the whole 
conversation.


Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Brad Anderson
On Monday, 10 March 2014 at 14:56:13 UTC, Andrei Alexandrescu 
wrote:

On 3/10/14, 1:35 AM, Nicolas Sicard wrote:
For me it's a real improvement! One thing: symbol names 
(modules,

functions, etc.) shouldn't be hyphenated, specially in tables.


All: how does one turn off css hyphenation?

Andrei


class=donthyphenate


Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Brad Anderson
On Monday, 10 March 2014 at 14:11:06 UTC, Vladimir Panteleev 
wrote:

On Monday, 10 March 2014 at 14:08:07 UTC, Mike wrote:
Thank you, to everone who worked on this.  It's quite an 
improvement.


Problem:
http://dlang.org/library/std/compiler/vendor.html is a 404

Recommendation:
I really liked the immediate link to the source file on github 
in the old layout.  If possible please add it to the new 
layout.


Since (IIRC) DDox parses JSON layout, I think it is capable of 
generating exact links to the file:line of each symbol. That 
would be neat, as it allows quickly seeing the implementation 
if the documentation is not sufficient.


I wanted to do just this so I considered adding a predefined 
macro to ddoc to get line numbers like I did to get filenames (I 
needed SRCFILENAME to add the Improve This Page button) but the 
line numbers would pretty quickly lose sync between master and 
the documentation so that would also require integrating the 
release tag into the documentation somehow so I gave up on that 
idea.


Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Vladimir Panteleev

On Monday, 10 March 2014 at 16:54:37 UTC, Brad Anderson wrote:
On Monday, 10 March 2014 at 14:11:06 UTC, Vladimir Panteleev 
wrote:

On Monday, 10 March 2014 at 14:08:07 UTC, Mike wrote:
Thank you, to everone who worked on this.  It's quite an 
improvement.


Problem:
http://dlang.org/library/std/compiler/vendor.html is a 404

Recommendation:
I really liked the immediate link to the source file on 
github in the old layout.  If possible please add it to the 
new layout.


Since (IIRC) DDox parses JSON layout, I think it is capable of 
generating exact links to the file:line of each symbol. That 
would be neat, as it allows quickly seeing the implementation 
if the documentation is not sufficient.


I wanted to do just this so I considered adding a predefined 
macro to ddoc to get line numbers like I did to get filenames 
(I needed SRCFILENAME to add the Improve This Page button) but 
the line numbers would pretty quickly lose sync between master 
and the documentation so that would also require integrating 
the release tag into the documentation somehow so I gave up on 
that idea.


So... don't link to master?

The dmd repo has a VERSION file. Can that be used to link to the 
respective tag instead?


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Brad Anderson
On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer 
wrote:
I proposed this inside the long major performance problem with 
std.array.front, I've also proposed it before, a long time ago.


But seems to be getting no attention buried in that thread, not 
even negative attention :)


An idea to fix the whole problems I see with char[] being 
treated specially by phobos: introduce an actual string type, 
with char[] as backing, that is a dchar range, that actually 
dictates the rules we want. Then, make the compiler use this 
type for literals.


e.g.:

struct string {
   immutable(char)[] representation;
   this(char[] data) { representation = data;}
   ... // dchar range primitives
}

Then, a char[] array is simply an array of char[].

points:

1. No more issues with foreach(c; cassé), it iterates via 
dchar
2. No more issues with cassé[4], it is a static compiler 
error.

3. No more awkward ASCII manipulation using ubyte[].
4. No more phobos schizophrenia saying char[] is not an array.
5. No more special casing char[] array templates to fool the 
compiler.
6. Any other special rules we come up with can be dictated by 
the library, and not ignored by the compiler.


Note, std.algorithm.copy(string1, mutablestring) will still 
decode/encode, but it's more explicit. It's EXPLICITLY a dchar 
range. Use std.algorithm.copy(string1.representation, 
mutablestring.representation) will avoid the issues.


I imagine only code that is currently UTF ignorant will break, 
and that code is easily 'fixed' by adding the 'representation' 
qualifier.


-Steve


Generally I think it's a good idea. Going a bit further you could 
also enable Short String Optimization but you'd have to 
encapsulate the backing array.


It seems like this would be an even bigger breaking change than 
Walter's proposal though (right or wrong, slicing strings is very 
common).


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer

On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson e...@gnuk.net wrote:

It seems like this would be an even bigger breaking change than Walter's  
proposal though (right or wrong, slicing strings is very common).


You're the second person to mention that, I was not planning on disabling  
string slicing. Just random access to individual chars, and probably  
.length.


-Steve


Re: Proposal for fixing dchar ranges

2014-03-10 Thread John Colvin
On Monday, 10 March 2014 at 17:54:49 UTC, Steven Schveighoffer 
wrote:
On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson 
e...@gnuk.net wrote:


It seems like this would be an even bigger breaking change 
than Walter's proposal though (right or wrong, slicing strings 
is very common).


You're the second person to mention that, I was not planning on 
disabling string slicing. Just random access to individual 
chars, and probably .length.


-Steve


How is slicing any better than indexing?


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer
On Mon, 10 Mar 2014 14:01:45 -0400, John Colvin  
john.loughran.col...@gmail.com wrote:



On Monday, 10 March 2014 at 17:54:49 UTC, Steven Schveighoffer wrote:

On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson e...@gnuk.net wrote:

It seems like this would be an even bigger breaking change than  
Walter's proposal though (right or wrong, slicing strings is very  
common).


You're the second person to mention that, I was not planning on  
disabling string slicing. Just random access to individual chars, and  
probably .length.


-Steve


How is slicing any better than indexing?


Because one can slice out a multi-code-unit code point, one cannot access  
it via index. Strings would be horribly crippled without slicing. Without  
indexing, they are fine.


A possibility is to allow index, but actually decode the code point at  
that index (error on invalid index). That might actually be the correct  
mechanism.


-Steve


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer
On Mon, 10 Mar 2014 13:59:53 -0400, John Colvin  
john.loughran.col...@gmail.com wrote:



On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer wrote:
I proposed this inside the long major performance problem with  
std.array.front, I've also proposed it before, a long time ago.


But seems to be getting no attention buried in that thread, not even  
negative attention :)


An idea to fix the whole problems I see with char[] being treated  
specially by phobos: introduce an actual string type, with char[] as  
backing, that is a dchar range, that actually dictates the rules we  
want. Then, make the compiler use this type for literals.


e.g.:

struct string {
   immutable(char)[] representation;
   this(char[] data) { representation = data;}
   ... // dchar range primitives
}

Then, a char[] array is simply an array of char[].

points:

1. No more issues with foreach(c; cassé), it iterates via dchar
2. No more issues with cassé[4], it is a static compiler error.
3. No more awkward ASCII manipulation using ubyte[].
4. No more phobos schizophrenia saying char[] is not an array.
5. No more special casing char[] array templates to fool the compiler.
6. Any other special rules we come up with can be dictated by the  
library, and not ignored by the compiler.


Note, std.algorithm.copy(string1, mutablestring) will still  
decode/encode, but it's more explicit. It's EXPLICITLY a dchar range.  
Use std.algorithm.copy(string1.representation,  
mutablestring.representation) will avoid the issues.


I imagine only code that is currently UTF ignorant will break, and that  
code is easily 'fixed' by adding the 'representation' qualifier.


-Steve


I know warnings are disliked, but couldn't we make the slicing and  
indexing work as currently but issue a warning*? It's not ideal but it  
does mean we get backwards compatibility.


As I mentioned elsewhere (but repeating here for viewers), I was not  
planning on disabling slicing.


Indexing is rarely a feature one needs or should use, especially with  
encoded strings.


-Steve


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer
On Mon, 10 Mar 2014 14:30:07 -0400, Walter Bright  
newshou...@digitalmars.com wrote:



On 3/10/2014 6:35 AM, Steven Schveighoffer wrote:
An idea to fix the whole problems I see with char[] being treated  
specially by
phobos: introduce an actual string type, with char[] as backing, that  
is a dchar
range, that actually dictates the rules we want. Then, make the  
compiler use

this type for literals.


Proposals to make a string class for D have come up many times. I have a  
kneejerk dislike for it. It's a really strong feature for D to have  
strings be an array type, and I'll go to great lengths to keep it that  
way.


I wholly agree, they should be an array type. But what they are now is  
worse.


-Steve


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Walter Bright

On 3/10/2014 6:35 AM, Steven Schveighoffer wrote:

An idea to fix the whole problems I see with char[] being treated specially by
phobos: introduce an actual string type, with char[] as backing, that is a dchar
range, that actually dictates the rules we want. Then, make the compiler use
this type for literals.


Proposals to make a string class for D have come up many times. I have a 
kneejerk dislike for it. It's a really strong feature for D to have strings be 
an array type, and I'll go to great lengths to keep it that way.


Re: Major performance problem with std.array.front()

2014-03-10 Thread Johannes Pfau
Am Mon, 10 Mar 2014 14:05:03 +
schrieb Andrea Fontana nos...@example.com:

 In italian we need unicode too. We have several accented letters 
 and often programming languages don't handle utf-8 and other 
 encoding so well...
 
 In D I never had any problem with this, and I work a lot on text 
 processing.
 
 So my question: is there any problem I'm missing in D with 
 unicode support or is just a performance problem on algorithms?

The only real problem apart from potential performance issues I've seen
mentioned in this thread is that indexing/slicing is done with code
units. I think this:

auto index = countUntil(...);
auto slice = str[0 .. index];

is really the only problem with the current implementation.


If we could start from scratch I'd say we keep operating on code points
by default but don't make strings arrays of char/wchar/dchar. Instead
they should be special types which do all operations (especially
indexing, slicing) on code points. This would be as safe as the current
implementation, always consistent but probably even slower in some
cases. Then offer some nice way to get the raw data for algorithms
which can deal with it.
However, I think it's too late to make these changes. 


Re: Major performance problem with std.array.front()

2014-03-10 Thread Marc Schütz

On Monday, 10 March 2014 at 13:18:50 UTC, Dicebot wrote:
On Sunday, 9 March 2014 at 17:27:20 UTC, Andrei Alexandrescu 
wrote:

On 3/9/14, 6:47 AM, Marc Schütz schue...@gmx.net wrote:

On Friday, 7 March 2014 at 15:03:24 UTC, Dicebot wrote:
2) It is regression back to C++ days of 
no-one-cares-about-Unicode
pain. Thinking about strings as character arrays is so 
natural and
convenient that if language/Phobos won't punish you for 
that, it will

be extremely widespread.


Not with Nick Sabalausky's suggestion to remove the 
implementation of
front from char arrays. This way, everyone will be forced to 
decide

whether they want code units or code points or something else.


Such as giving up on that crappy language that keeps on 
breaking their code.


Andrei



That was more about if you are that crazy to even consider 
such breakage, this is closer my personal perfection than 
actual proposal ;)


BTW, I don't believe it would be that bad, because there's a 
straight-forward path of deprecation:


First, std.range.front for narrow strings (and dchar, for 
consistency) can be marked as deprecated. The deprecation message 
can say: Please specify .byCodePoint()/.byCodeUnit(), guiding 
the users towards a better style (assuming one agrees that 
explicit is indeed better than implicit in this case).


After some time, the functionality can be moved into a 
compatibility module, with the deprecated functions still there, 
but now additionally telling the user about the quick fix of 
importing that module.


The deprecation period can be very long, and even if the 
functions should never be removed, at least everyone writing new 
code would do so in the new style.


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer
On Mon, 10 Mar 2014 14:30:07 -0400, Walter Bright  
newshou...@digitalmars.com wrote:



On 3/10/2014 6:35 AM, Steven Schveighoffer wrote:
An idea to fix the whole problems I see with char[] being treated  
specially by
phobos: introduce an actual string type, with char[] as backing, that  
is a dchar
range, that actually dictates the rules we want. Then, make the  
compiler use

this type for literals.


Proposals to make a string class for D have come up many times. I have a  
kneejerk dislike for it. It's a really strong feature for D to have  
strings be an array type, and I'll go to great lengths to keep it that  
way.


BTW, this escaped my view the first time reading your post, but I am NOT  
proposing a string *class*. In fact, I'm not proposing we change anything  
technical about strings, the code generated should be basically identical.  
What I'm proposing is to encapsulate what you can and can't do with a  
string in the type itself, instead of making the standard library flip  
over backwards to treat it as something else when the compiler treats it  
as a simple array of char.


-Steve


Re: Major performance problem with std.array.front()

2014-03-10 Thread Marc Schütz

On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote:
My app deals with unicode arabic text that is 'out there', and 
the UnicodeTM support for Arabic is not that well thought out, 
so the data is often (always) inconsistent in terms of 
sequencing diacritics etc. Even the code page can vary. 
Therefore my code has to cater to various ways that other 
developers have sequenced the code points.


So, my needs as a 'user' are:
* I want to encode all incoming data immediately into unicode, 
usually UTF8, if isn't already.
* I want to iterate over code points. I don't care about the 
raw data.
* When I get the length of my string it should be the number of 
code points.

* When I index my string it should return the nth code point.
* When I manipulate my strings I want to work with code points
... you get the drift.


Are you sure that code points is what you want? AFAIK there are 
lots of diacritics in Arabic, and I believe they are not 
precomposed with their carrying letters...


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Johannes Pfau
Am Mon, 10 Mar 2014 11:30:07 -0700
schrieb Walter Bright newshou...@digitalmars.com:

 On 3/10/2014 6:35 AM, Steven Schveighoffer wrote:
  An idea to fix the whole problems I see with char[] being treated
  specially by phobos: introduce an actual string type, with char[]
  as backing, that is a dchar range, that actually dictates the rules
  we want. Then, make the compiler use this type for literals.
 
 Proposals to make a string class for D have come up many times. I
 have a kneejerk dislike for it. It's a really strong feature for D to
 have strings be an array type, and I'll go to great lengths to keep
 it that way.

Question: which type T doesn't have slicing, has an ElementType of
dchar, has typeof(T[0]).sizeof == 4, ElementEncodingType!T == char and
still satisfies isArray?

It's a string. Would you call that 'an array type'?

writeln(isArray!string);   //true
writeln(hasSlicing!string); //false
writeln(ElementType!string.stringof); //dchar
writeln(ElementEncodingType!string.stringof); //char

I wouldn't call that an array. Part of the problem is that you want
string to be arrays (fixed size elements, direct indexing) and Andrei
doesn't want them to be arrays (operating on code points = not fixed
size = not arrays).


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Johannes Pfau
Am Mon, 10 Mar 2014 13:55:00 -0400
schrieb Steven Schveighoffer schvei...@yahoo.com:

 On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson e...@gnuk.net
 wrote:
 
  It seems like this would be an even bigger breaking change than
  Walter's proposal though (right or wrong, slicing strings is very
  common).
 
 You're the second person to mention that, I was not planning on
 disabling string slicing. Just random access to individual chars, and
 probably .length.
 
 -Steve

Unfortunately slicing by code units is probably the most important
safety issue with the current implementation: As was mentioned in the
other thread:

size_t index = str.countUntil('a');
auto slice = str[0..index];

This can be a safety and security issue. (I realize that this would
break lots of code so I'm not sure if we should/can fix it. But I think
this was the most important problem mentioned in the other thread.)


Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread Sönke Ludwig

Am 10.03.2014 15:11, schrieb Vladimir Panteleev:

On Monday, 10 March 2014 at 14:08:07 UTC, Mike wrote:

Thank you, to everone who worked on this.  It's quite an improvement.

Problem:
http://dlang.org/library/std/compiler/vendor.html is a 404

Recommendation:
I really liked the immediate link to the source file on github in the
old layout.  If possible please add it to the new layout.


Since (IIRC) DDox parses JSON layout, I think it is capable of
generating exact links to the file:line of each symbol. That would be
neat, as it allows quickly seeing the implementation if the
documentation is not sufficient.


It's actually already there - at the top of each page, there is a View 
source code button that goes to the proper file/line and to the proper 
branch/tag. I've used the same style as the already existing buttons, 
but those are indeed not very noticeable on the right side of the page.


Any suggestions for a better place/style without visually cluttering up 
the actual documentation?


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Walter Bright

On 3/10/2014 11:54 AM, Steven Schveighoffer wrote:

BTW, this escaped my view the first time reading your post, but I am NOT
proposing a string *class*.


Right, but here I used the term class to be more generic as in being a user 
defined type, i.e. struct or class. I should have been more clear.


Re: Duals or ranges and reactive D

2014-03-10 Thread Szymon Gatner

On Saturday, 8 March 2014 at 12:01:10 UTC, Timon Gehr wrote:

On 02/27/2014 01:41 PM, Szymon Gatner wrote:
C#'s IObservable/IObserver made me think how would a dual 
[1][2] of
a range concept look in D. Since D has no equivalent 
IEnumerable (as
it is no needed thanks to templates) it is only about 
IEnumerator /

IObserver part which relates to a D range.

Ranges/enumerators are models of 'pull' style interface 
whereas their
duals represent models of 'push' style enabling reactive 
programming [3]
techniques which are really nicely solving issues of 
asynchronous /

event - based programming.

I suppose OutptRange is similar in concept, although it has
'OnCompleted' / 'OnError' missing.

What do you think? Rx along with LINQ is a really clean 
solution to the
problem of asynchronous ranges of values. I think it would be 
very nice

to have in D too.

[1] 
http://csl.stanford.edu/~christos/pldi2010.fit/meijer.duality.pdf
[2] 
http://josemigueltorres.net/index.php/ienumerableiobservable-duality/

[3]
https://channel9.msdn.com/Shows/Going+Deep/Expert-to-Expert-Brian-Beckman-and-Erik-Meijer-Inside-the-NET-Reactive-Framework-Rx



In case you are interested, I have thrown together a small 
proof of concept implementation: 
http://dpaste.dzfl.pl/9d8386768da0


Wow, that is now what I'd small ;) I will definitely take a look.

Is it something you already had written or something new? How do 
you feel about the concept?


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Artem Tarasov

On Monday, 10 March 2014 at 18:50:28 UTC, Johannes Pfau wrote:


Question: which type T doesn't have slicing, has an ElementType 
of
dchar, has typeof(T[0]).sizeof == 4, ElementEncodingType!T == 
char and

still satisfies isArray?


In addition, hasLength!T == false, which totally freaked me out 
when I first discovered that.




Re: Proposal for fixing dchar ranges

2014-03-10 Thread John Colvin
On Monday, 10 March 2014 at 18:09:51 UTC, Steven Schveighoffer 
wrote:
On Mon, 10 Mar 2014 14:01:45 -0400, John Colvin 
john.loughran.col...@gmail.com wrote:


On Monday, 10 March 2014 at 17:54:49 UTC, Steven Schveighoffer 
wrote:
On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson 
e...@gnuk.net wrote:


It seems like this would be an even bigger breaking change 
than Walter's proposal though (right or wrong, slicing 
strings is very common).


You're the second person to mention that, I was not planning 
on disabling string slicing. Just random access to individual 
chars, and probably .length.


-Steve


How is slicing any better than indexing?


Because one can slice out a multi-code-unit code point, one 
cannot access it via index. Strings would be horribly crippled 
without slicing. Without indexing, they are fine.


A possibility is to allow index, but actually decode the code 
point at that index (error on invalid index). That might 
actually be the correct mechanism.


-Steve


In order to be correct, both require exactly the same knowledge: 
The beginning of a code point, followed by the end of a code 
point. In the indexing case they just happen to be the same 
code-point and happen to be one code unit from each other. I 
don't see how one is any more or less errror-prone or 
fundamentally wrong than the other.


I do understand that slicing is more important however.


Re: ddox-generated Phobos documentation is available for review

2014-03-10 Thread w0rp
The documentation is looking very good, good work to all 
involved. There are a few bugs here and there. Appender's docs 
were missing, some runtime modules are in there which should 
maybe be hidden. Still, this is a massive improvement, and I love 
it.


Re: Proposal for fixing dchar ranges

2014-03-10 Thread H. S. Teoh
On Mon, Mar 10, 2014 at 07:49:04PM +0100, Johannes Pfau wrote:
 Am Mon, 10 Mar 2014 11:30:07 -0700
 schrieb Walter Bright newshou...@digitalmars.com:
 
  On 3/10/2014 6:35 AM, Steven Schveighoffer wrote:
   An idea to fix the whole problems I see with char[] being treated
   specially by phobos: introduce an actual string type, with char[]
   as backing, that is a dchar range, that actually dictates the
   rules we want. Then, make the compiler use this type for literals.
  
  Proposals to make a string class for D have come up many times. I
  have a kneejerk dislike for it. It's a really strong feature for D
  to have strings be an array type, and I'll go to great lengths to
  keep it that way.

I'm on the fence about this one. The nice thing about strings being an
array type, is that it is a familiar concept to C coders, and it allows
array slicing for extracting substrings, etc., which fits nicely with
the C view of strings as character arrays. As a C coder myself, I like
it this way too. But the bad thing about strings being an array type, is
that it's a holdover from C, and it allows slicing for extracting
substrings -- malformed substrings by permitting slicing a multibyte
(multiword) character.

Basically, the nice aspects of strings being arrays only apply when
you're dealing with ASCII (or mostly-ASCII) strings. These very same
nice aspects turn into problems when dealing with anything non-ASCII.
The only way the user can get it right using only array operations, is
if they understand the whole of Unicode in their head and are willing to
reinvent Unicode algorithms every time they slice a string or do some
operation on it. Since D purportedly supports Unicode by default, it
shouldn't be this way. D should *actually* support Unicode all the way
-- use proper Unicode algorithms for substring extraction, collation,
line-breaking, normalization, etc.. Being a systems language, of course,
means that D should allow you to get under the hood and do things
directly with the raw string representation -- but this shouldn't be the
*default* modus operandi.  The default should be a properly-encapsulated
string type with Unicode algorithms to operate on it (with the option of
reaching into the raw representation where necessary).


 Question: which type T doesn't have slicing, has an ElementType of
 dchar, has typeof(T[0]).sizeof == 4, ElementEncodingType!T == char and
 still satisfies isArray?
 
 It's a string. Would you call that 'an array type'?
 
   writeln(isArray!string);   //true
   writeln(hasSlicing!string); //false
   writeln(ElementType!string.stringof); //dchar
   writeln(ElementEncodingType!string.stringof); //char
 
 I wouldn't call that an array. Part of the problem is that you want
 string to be arrays (fixed size elements, direct indexing) and Andrei
 doesn't want them to be arrays (operating on code points = not fixed
 size = not arrays).

Exactly. What we have right now is a frankensteinian hybrid that's
neither fully an array, nor fully a Unicode string type. If we call the
current messy AA implementation split between compiler, aaA.d, and
object.di a design problem, then I'd call the current state of D strings
a design problem too. This underlying inconsistency is ultimately what
leads to the poor performance of strings in std.algorithm.

It's precisely because of this that I've given up on using std.algorithm
for strings altogether -- std.regex is far better: more flexible, more
expressive, and more performant, and specifically designed to operate on
strings. Nowadays I only use std.algorithm for non-string ranges
(because then the behaviour is actually consistent!!).


T

-- 
MS Windows: 64-bit overhaul of 32-bit extensions and a graphical shell for a 
16-bit patch to an 8-bit operating system originally coded for a 4-bit 
microprocessor, written by a 2-bit company that can't stand 1-bit of 
competition.


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer
On Mon, 10 Mar 2014 15:30:00 -0400, John Colvin  
john.loughran.col...@gmail.com wrote:



On Monday, 10 March 2014 at 18:09:51 UTC, Steven Schveighoffer wrote:


Because one can slice out a multi-code-unit code point, one cannot  
access it via index. Strings would be horribly crippled without  
slicing. Without indexing, they are fine.


A possibility is to allow index, but actually decode the code point at  
that index (error on invalid index). That might actually be the correct  
mechanism.




In order to be correct, both require exactly the same knowledge: The  
beginning of a code point, followed by the end of a code point. In the  
indexing case they just happen to be the same code-point and happen to  
be one code unit from each other. I don't see how one is any more or  
less errror-prone or fundamentally wrong than the other.


Using indexing, you simply cannot get the single code unit that represents  
a multi-code-unit code point. It doesn't fit in a char. It's guaranteed to  
fail, whereas slicing will give you access to the all the data in the  
string.


Now, with indexing actually decoding a code point, one can alias a[i] to  
a[i..$].front(), which means decode the first code point you come to at  
index i. This means indexing is slow(er), and returns a dchar. I think as  
a first step, that might be too much to add silently. I'd rather break it  
first, then add it back later.


-Steve


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer
On Mon, 10 Mar 2014 14:54:22 -0400, Johannes Pfau nos...@example.com  
wrote:



Am Mon, 10 Mar 2014 13:55:00 -0400
schrieb Steven Schveighoffer schvei...@yahoo.com:


On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson e...@gnuk.net
wrote:

 It seems like this would be an even bigger breaking change than
 Walter's proposal though (right or wrong, slicing strings is very
 common).

You're the second person to mention that, I was not planning on
disabling string slicing. Just random access to individual chars, and
probably .length.

-Steve


Unfortunately slicing by code units is probably the most important
safety issue with the current implementation: As was mentioned in the
other thread:

size_t index = str.countUntil('a');
auto slice = str[0..index];

This can be a safety and security issue. (I realize that this would
break lots of code so I'm not sure if we should/can fix it. But I think
this was the most important problem mentioned in the other thread.)


Slicing can never be a code point based operation. It would be too slow  
(read linear complexity). What needs to be broken is the expectation that  
an index is the number of code points or characters in a string. Think of  
an index as a position that has no real meaning except they are ordered in  
the stream. Like a set of ordered numbers, not necessarily consecutive.  
The index 4 may not exist, while 5 does.


At this point, my proposal does not fix that particular problem, but I  
don't think there's any way to fix that problem except to train the user  
who wrote it not to do that. However, it does not leave us in a worse  
position.


-Steve


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer
On Mon, 10 Mar 2014 16:06:25 -0400, Steven Schveighoffer  
schvei...@yahoo.com wrote:



Think of an index as a position that has no real meaning except they are  
ordered in the stream. Like a set of ordered numbers, not necessarily  
consecutive. The index 4 may not exist, while 5 does.


I said that wrong, of course it has meaning. What I mean is that if you  
have two positions, the ordering will indicate where the  
characters/graphemes/code points occur in the stream, but their value will  
not be indicative of how far they are apart in terms of  
characters/graphemes/code points.


In other words, if I have two characters, at position p1 and p2, then

p1  p2 = p1 comes later in the string than p2
p1 == p2 = p1 and p2 refer to the same character
p1 - p2 = not defined to any particular value.

-Steve


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Brad Anderson
On Monday, 10 March 2014 at 17:54:49 UTC, Steven Schveighoffer 
wrote:
On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson 
e...@gnuk.net wrote:


It seems like this would be an even bigger breaking change 
than Walter's proposal though (right or wrong, slicing strings 
is very common).


You're the second person to mention that, I was not planning on 
disabling string slicing. Just random access to individual 
chars, and probably .length.


-Steve


Sorry, I misunderstood. That sounds reasonable.


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Walter Bright

On 3/10/2014 1:36 PM, Steven Schveighoffer wrote:

What strings are already is a user-defined type,


No, they are not.


but with horrible enforcement.


With no enforcement, and that is by design.

Keep in mind that D is a systems programming language, and that means unfettered 
access to strings.


Re: Proposal for fixing dchar ranges

2014-03-10 Thread John Colvin
On Monday, 10 March 2014 at 20:00:07 UTC, Steven Schveighoffer 
wrote:
On Mon, 10 Mar 2014 15:30:00 -0400, John Colvin 
john.loughran.col...@gmail.com wrote:


On Monday, 10 March 2014 at 18:09:51 UTC, Steven Schveighoffer 
wrote:


Because one can slice out a multi-code-unit code point, one 
cannot access it via index. Strings would be horribly 
crippled without slicing. Without indexing, they are fine.


A possibility is to allow index, but actually decode the code 
point at that index (error on invalid index). That might 
actually be the correct mechanism.




In order to be correct, both require exactly the same 
knowledge: The beginning of a code point, followed by the end 
of a code point. In the indexing case they just happen to be 
the same code-point and happen to be one code unit from each 
other. I don't see how one is any more or less errror-prone or 
fundamentally wrong than the other.


Using indexing, you simply cannot get the single code unit that 
represents a multi-code-unit code point. It doesn't fit in a 
char. It's guaranteed to fail, whereas slicing will give you 
access to the all the data in the string.




I think I understand your motivation now. Indexing never provides 
anything that slicing doesn't do more generally.


Now, with indexing actually decoding a code point, one can 
alias a[i] to a[i..$].front(), which means decode the first 
code point you come to at index i. This means indexing is 
slow(er), and returns a dchar. I think as a first step, that 
might be too much to add silently. I'd rather break it first, 
then add it back later.


-Steve


Of course that i has to be at the beginning of a code-point. 
Doesn't seem like that useful a feature and potentially very 
confusing for people who naively expect normal indexing.


Re: Maybe in D3...

2014-03-10 Thread Chris Williams
On Monday, 10 March 2014 at 14:50:27 UTC, Vladimir Panteleev 
wrote:
From time to time, there are discussions concerning ideas which 
would impact the language, as it is now, too drastically to be 
implemented (it would break too much code or require a 
significant reengineering effort). These discussions get lost, 
which is regrettable since some of the discussions sometimes 
produce genuinely great ideas.


Although there is no D3 on the horizon, I think it would be 
nice to keep track of these ideas anyway.


http://wiki.dlang.org/Language_issues


I imagine that someone else could write it better than I, but 
having to explicitly break out of safe, pure, nothrow, etc. 
should be the default, rather than the reverse.


Of course, no option has to be language breaking. If two 
alternate implementations are incompatible, you can add a 
version/feature flag to the compiler and deprecate the older 
versions over time. Releasing a new version of the compiler which 
breaks everything ever made is bad, but if people have 12 months 
and a working compiler for both versions, moving over to the new 
expectations isn't unreasonable.


Re: Proposal for fixing dchar ranges

2014-03-10 Thread John Colvin

On Monday, 10 March 2014 at 19:48:34 UTC, H. S. Teoh wrote:

On Mon, Mar 10, 2014 at 07:49:04PM +0100, Johannes Pfau wrote:

Am Mon, 10 Mar 2014 11:30:07 -0700
schrieb Walter Bright newshou...@digitalmars.com:

 On 3/10/2014 6:35 AM, Steven Schveighoffer wrote:
  An idea to fix the whole problems I see with char[] being 
  treated
  specially by phobos: introduce an actual string type, with 
  char[]
  as backing, that is a dchar range, that actually dictates 
  the
  rules we want. Then, make the compiler use this type for 
  literals.
 
 Proposals to make a string class for D have come up many 
 times. I
 have a kneejerk dislike for it. It's a really strong feature 
 for D
 to have strings be an array type, and I'll go to great 
 lengths to

 keep it that way.


I'm on the fence about this one. The nice thing about strings 
being an
array type, is that it is a familiar concept to C coders, and 
it allows
array slicing for extracting substrings, etc., which fits 
nicely with
the C view of strings as character arrays. As a C coder myself, 
I like
it this way too. But the bad thing about strings being an array 
type, is
that it's a holdover from C, and it allows slicing for 
extracting
substrings -- malformed substrings by permitting slicing a 
multibyte

(multiword) character.

Basically, the nice aspects of strings being arrays only apply 
when
you're dealing with ASCII (or mostly-ASCII) strings. These very 
same
nice aspects turn into problems when dealing with anything 
non-ASCII.
The only way the user can get it right using only array 
operations, is
if they understand the whole of Unicode in their head and are 
willing to
reinvent Unicode algorithms every time they slice a string or 
do some
operation on it. Since D purportedly supports Unicode by 
default, it
shouldn't be this way. D should *actually* support Unicode all 
the way
-- use proper Unicode algorithms for substring extraction, 
collation,
line-breaking, normalization, etc.. Being a systems language, 
of course,
means that D should allow you to get under the hood and do 
things
directly with the raw string representation -- but this 
shouldn't be the
*default* modus operandi.  The default should be a 
properly-encapsulated
string type with Unicode algorithms to operate on it (with the 
option of

reaching into the raw representation where necessary).




You started off on the fence, but you seem pretty convinced by 
the end!


Re: Major performance problem with std.array.front()

2014-03-10 Thread Abdulhaq

On Monday, 10 March 2014 at 18:54:26 UTC, Marc Schütz wrote:

On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote:
My app deals with unicode arabic text that is 'out there', and 
the UnicodeTM support for Arabic is not that well thought out, 
so the data is often (always) inconsistent in terms of 
sequencing diacritics etc. Even the code page can vary. 
Therefore my code has to cater to various ways that other 
developers have sequenced the code points.


So, my needs as a 'user' are:
* I want to encode all incoming data immediately into unicode, 
usually UTF8, if isn't already.
* I want to iterate over code points. I don't care about the 
raw data.
* When I get the length of my string it should be the number 
of code points.

* When I index my string it should return the nth code point.
* When I manipulate my strings I want to work with code points
... you get the drift.


Are you sure that code points is what you want? AFAIK there are 
lots of diacritics in Arabic, and I believe they are not 
precomposed with their carrying letters...


I checked the terminology before posting so I'm pretty sure. 
Arabic has a code page for the logical characters, one code point 
for each letter of the alphabet and others for various diacritics.


Because of the 'shaping' each logical character has various 
glyphs, found on other code pages.


Text editing programs tend to store typed Arabic as the user 
entered it, and because there can be more than one diacritic per 
alphabetic letter the sequence varies as to how the user 
sequenced them.


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer
On Mon, 10 Mar 2014 16:52:27 -0400, Walter Bright  
newshou...@digitalmars.com wrote:



On 3/10/2014 1:36 PM, Steven Schveighoffer wrote:

What strings are already is a user-defined type,


No, they are not.


The functionality added via phobos can hardly be considered extraneous.  
One would not use strings without the library.



but with horrible enforcement.


With no enforcement, and that is by design.


The enforcement is opt-in. That is, you have to use phobos' templates in  
order to use them properly:


auto getIt(R)(R r, size_t idx)
{
   if(idx  r.length)
  return r[idx];
}

The above compiles fine for strings. However, it does not compile fine if  
you do:


auto getIt(R)(R r, size_t idx) if(hasLength!R  isRandomAccessRange!R)

Any other range will fail to compile for the more strict version and the  
simple implementation without template constraints. In other words, the  
compiler doesn't believe the same thing phobos does. shooting one's foot  
is quite easy.


Keep in mind that D is a systems programming language, and that means  
unfettered access to strings.


Access is fine, with clear intentions. And we do not have unfettered  
access. I cannot sort a mutable string of ASCII characters without first  
converting it to ubyte[].


What in my proposal makes you think you don't have unfettered access? The  
underlying immutable(char)[] representation is accessible. In fact, you  
would have more access, since phobos functions would then work with a  
char[] like it's a proper array.


-Steve


Re: Major performance problem with std.array.front()

2014-03-10 Thread Abdulhaq

On Monday, 10 March 2014 at 18:54:26 UTC, Marc Schütz wrote:

On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote:
My app deals with unicode arabic text that is 'out there', and 
the UnicodeTM support for Arabic is not that well thought out, 
so the data is often (always) inconsistent in terms of 
sequencing diacritics etc. Even the code page can vary. 
Therefore my code has to cater to various ways that other 
developers have sequenced the code points.


So, my needs as a 'user' are:
* I want to encode all incoming data immediately into unicode, 
usually UTF8, if isn't already.
* I want to iterate over code points. I don't care about the 
raw data.
* When I get the length of my string it should be the number 
of code points.

* When I index my string it should return the nth code point.
* When I manipulate my strings I want to work with code points
... you get the drift.


Are you sure that code points is what you want? AFAIK there are 
lots of diacritics in Arabic, and I believe they are not 
precomposed with their carrying letters...


Adding to my other comment I don't expect a string type to 
understand arabic and merge the diacritics for me. In fact there 
are other symbols (code points) that can also be present, for 
instance instructions on how Quranic text is to be read. These 
issues have not been standardised and I would say are not well 
understood generally.


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Steven Schveighoffer
On Mon, 10 Mar 2014 16:54:34 -0400, John Colvin   
john.loughran.col...@gmail.com wrote:
Of course that i has to be at the beginning of a code-point. Doesn't  
seem like that useful a feature and potentially very confusing for  
people who naively expect normal indexing.


What it would do is remove the confusion of is(typeof(r.front) !=   
typeof(r[0]))


Naivety is to be expected when you have made your C-derived language's  
default string type an encoded UTF8 array called char[]. It doesn't  
magically make D programs UTF aware.


I would suggest that a lofty goal is for the string type to be completely  
safe, and efficient, and only allow raw access via the .representation  
member. But I don't think, given the current code base,
that we can achieve that in one proposal. It has to be gradual. This is a  
first step.


-Steve


Re: Major performance problem with std.array.front()

2014-03-10 Thread Nick Sabalausky

On 3/7/2014 8:40 AM, Michel Fortin wrote:

On 2014-03-07 03:59:55 +, bearophile bearophileh...@lycos.com said:


Walter Bright:


I understand this all too well. (Note that we currently have a
different silent problem: unnoticed large performance problems.)


On the other hand your change could introduce Unicode-related bugs in
future code (that the current Phobos avoids) (and here I am not
talking about code breakage).


The way Phobos works isn't any more correct than dealing with code
units. Many graphemes span on multiple code points -- because of
combined diacritics or character variant modifiers -- and decoding at
the code-point level is thus often insufficient for correctness.



Well, it is *more* correct, as many western languages are more likely in 
current Phobos to just work in most cases. It's just that things still 
aren't completely correct overall.



 From my experience, I'd suggest these basic operations for a string
range instead of the regular range interface:

.empty
.frontCodeUnit
.frontCodePoint
.frontGrapheme
.popFrontCodeUnit
.popFrontCodePoint
.popFrontGrapheme
.codeUnitLength (aka length)
.codePointLength (for dchar[] only)
.codePointLengthLinear
.graphemeLengthLinear

Someone should be able to mix all the three 'front' and 'pop' function
variants above in any code dealing with a string type. In my XML parser
for instance I regularly use frontCodeUnit to avoid the decoding penalty
when matching the next character with an ASCII one such as '' or ''.
An API like the one above forces you to be aware of the level you're
working on, making bugs and inefficiencies stand out (as long as you're
familiar with each representation).

If someone wants to use a generic array/range algorithm with a string,
my opinion is that he should have to wrap it in a range type that maps
front and popFront to one of the above variant. Having to do that should
make it obvious that there's an inefficiency there, as you're using an
algorithm that wasn't tailored to work with strings and that more
decoding than strictly necessary is being done.



I actually like this suggestion quite a bit.




Re: Proposal for fixing dchar ranges

2014-03-10 Thread John Colvin
On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer 
wrote:
I proposed this inside the long major performance problem with 
std.array.front, I've also proposed it before, a long time ago.


But seems to be getting no attention buried in that thread, not 
even negative attention :)


An idea to fix the whole problems I see with char[] being 
treated specially by phobos: introduce an actual string type, 
with char[] as backing, that is a dchar range, that actually 
dictates the rules we want. Then, make the compiler use this 
type for literals.


e.g.:

struct string {
   immutable(char)[] representation;
   this(char[] data) { representation = data;}
   ... // dchar range primitives
}

Then, a char[] array is simply an array of char[].

points:

1. No more issues with foreach(c; cassé), it iterates via 
dchar
2. No more issues with cassé[4], it is a static compiler 
error.

3. No more awkward ASCII manipulation using ubyte[].
4. No more phobos schizophrenia saying char[] is not an array.
5. No more special casing char[] array templates to fool the 
compiler.
6. Any other special rules we come up with can be dictated by 
the library, and not ignored by the compiler.


Note, std.algorithm.copy(string1, mutablestring) will still 
decode/encode, but it's more explicit. It's EXPLICITLY a dchar 
range. Use std.algorithm.copy(string1.representation, 
mutablestring.representation) will avoid the issues.


I imagine only code that is currently UTF ignorant will break, 
and that code is easily 'fixed' by adding the 'representation' 
qualifier.


-Steve


just to check I understand this fully:

in this new scheme, what would this do?

auto s = cassé.representation;
foreach(i, c; s) write(i, ':', c, ' ');
writeln(s);

Currently - without the .representation - I get

0:c 1:a 2:s 3:s 4:e 5:̠6:`
cassé

or, to spell it out a bit more:
0:c 1:a 2:s 3:s 4:e 5:xCC 6:x81
cassé


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Chris Williams
On Monday, 10 March 2014 at 18:13:14 UTC, Steven Schveighoffer 
wrote:
Indexing is rarely a feature one needs or should use, 
especially with encoded strings.


If I was writing something like a chat or terminal window, I 
would want to be able to jump to chunks of text based on some 
sort of buffer length, then search for actual character 
boundaries. Similarly, if I was indexing text, I don't care what 
the underlying data is just whether any particular set of n-bytes 
have been seen together among some document. For the latter case, 
I don't need to be able to interpret the data as text while 
indexing, but once I perform an actual search and want to jump 
the user to that line in the file, being able to take a byte 
offset that I had stored in the index and convert that to a 
textual position would be good.


I do think that D should have something like

alias String8 = UTF!char;
alias String16 = UTF!wchar;
alias String32 = UTF!dchar;

And that those sit on top of an underlying immutable(xchar)[] 
buffer, providing variants of things like foreach and length 
based on code-point or grapheme boundaries. But I don't think 
there's any value in reinterpretting string. Not being a struct 
or an object, it doesn't have the extensibility to be useful for 
all the variations of access that working with Unicode and the 
underlying bytes warrants.


Re: Proposal for fixing dchar ranges

2014-03-10 Thread Walter Bright

On 3/10/2014 2:09 PM, Steven Schveighoffer wrote:

What in my proposal makes you think you don't have unfettered access? The
underlying immutable(char)[] representation is accessible. In fact, you would
have more access, since phobos functions would then work with a char[] like it's
a proper array.


You divide the D world into two camps - those that use 'struct string', and 
those that use immutable(char)[] strings.


 I imagine only code that is currently UTF ignorant will break,

This also makes it a non-starter.


  1   2   >