Re: Why I won't fix phobos bugs anymore

2016-06-02 Thread Pie? via Digitalmars-d

On Friday, 3 June 2016 at 05:18:49 UTC, Patrick Schluter wrote:
On Thursday, 2 June 2016 at 20:20:58 UTC, Andrei Alexandrescu 
wrote:

On 06/02/2016 03:41 PM, Basile B. wrote:
Yesterday I've took the decision not to propose anymore PR 
for Phobos

bugfixes, even if most of the time it's easy.

1)

It can take up to 2 or 3 weeks until a "phobos bugfix" get 
merged. Even

a straight one.

2)

Once a pr gets the label "@andrei". It basically means that 
"it's dead".


Also meant to add: email should help. I am currently nursing 
3-4 emails that are phobos-related among some 50 other 
important and urgent emails. There's some Midas effect - every 
day I wake up thinking "I'll halve my inbox today" and by the 
evening I still have 50 emails. It's maddening. But please 
don't take offense. Do email and I'll get to your work. Thanks!


That would be Sisyphus, Midas was the king that starved because 
everything he touches becomes gold.


Kinda silly, right? He could have just easily paid someone to 
feed him with all the gold he made, right?


Also, Sisyphus must not have been too crafty! If he spend all 
that time digging out the hill then it would have been lower in 
gravity and he wouldn't have to carry it for eternity... just 
give it a nudge and it would roll down. Hen he could use all that 
dirt to build a barricade to keep the boulder from rolling 
away(unless it could magically go through the dirt).







Re: Why I won't fix phobos bugs anymore

2016-06-02 Thread Patrick Schluter via Digitalmars-d
On Thursday, 2 June 2016 at 20:20:58 UTC, Andrei Alexandrescu 
wrote:

On 06/02/2016 03:41 PM, Basile B. wrote:
Yesterday I've took the decision not to propose anymore PR for 
Phobos

bugfixes, even if most of the time it's easy.

1)

It can take up to 2 or 3 weeks until a "phobos bugfix" get 
merged. Even

a straight one.

2)

Once a pr gets the label "@andrei". It basically means that 
"it's dead".


Also meant to add: email should help. I am currently nursing 
3-4 emails that are phobos-related among some 50 other 
important and urgent emails. There's some Midas effect - every 
day I wake up thinking "I'll halve my inbox today" and by the 
evening I still have 50 emails. It's maddening. But please 
don't take offense. Do email and I'll get to your work. Thanks!


That would be Sisyphus, Midas was the king that starved because 
everything he touches becomes gold.


Re: Phobos needs a (part-time) maintainer

2016-06-02 Thread Basile B. via Digitalmars-d

On Thursday, 2 June 2016 at 21:04:46 UTC, qznc wrote:

On Thursday, 2 June 2016 at 20:59:52 UTC, Basile B. wrote:
Eventually I'll come back to bugfix if they take Jake, but not 
you Seb.

For a reason or another I don't like you wilzbach.


You are frustrated. I get that.

Don't make this personal for others, please. Maybe you should 
ignore this thread for today?


My POV is that it's easy to fix some phobos bug. But to get the 
easy fixes merged is a PITA. PRs that fixe a bug are hard to get 
merged. Why ? I don't know. Sometime we have to act like a bully 
to get a PR merged and that's not normal.


Perso I give up.


Re: Free the DMD backend

2016-06-02 Thread Eugene Wissner via Digitalmars-d

On Thursday, 2 June 2016 at 18:16:33 UTC, Basile B. wrote:

It's also that LDC is at front end 2.070 and GDC 2.067 ;););)



GDC is actively maintained and it would have the latest features 
if more developers come, what would happen if it would be the 
reference compiler.


Re: D's Auto Decoding and You

2016-06-02 Thread jmh530 via Digitalmars-d-announce
On Thursday, 2 June 2016 at 21:33:02 UTC, Andrei Alexandrescu 
wrote:


Should I assume some normalization occurred on the way?



I'm just looking over std.uni's section on normalization and 
realizing that I had basically no idea what it is or what's going 
on. The wikipedia page on unicode equivalence is a bit clearer.


I'm definitely nowhere near qualified to have an opinion on these 
issues.


Re: [OT] Things I like about Andrei

2016-06-02 Thread rikki cattermole via Digitalmars-d

On 03/06/2016 2:17 PM, Adam D. Ruppe wrote:

A lot of us, myself included, have been very critical of Andrei lately
but I want to list of the excellent work he has done over the years:

First, early D was very different to D of today. Andrei changed that,
for the better. He's a genius of innovation with templates and good at
getting to the bottom of generic code.

The Range concept is excellent, the logical extension of iterators like
slices are to pointers, and std.algorithm is generally brilliant.

Many of the patterns we take for granted in D, from templates in general
to conversion and literals on top of them, to ranges and algorithms,
were principally designed and implemented by Andrei.

std.experimental.allocator is very well done and the Design by
Introspection is not just smart insight to the generic programming
problem, but actually explored and explained in such a way that we can
hook onto it.

His talks and writing are amusing and informative, and his dedication
unquestionable.

Andrei Alexandrescu is a very good, innovative programmer and writer who
invents and explains things that others can't even consider.

We're lucky to have him with us!


Wooow, go Andrei!


Re: [OT] Things I like about Andrei

2016-06-02 Thread Pie? via Digitalmars-d

He's also very good looking!! That makes a difference! ;)



Re: D's Auto Decoding and You

2016-06-02 Thread jmh530 via Digitalmars-d-announce

On Thursday, 2 June 2016 at 21:31:39 UTC, Jack Stouffer wrote:

On Thursday, 2 June 2016 at 21:21:50 UTC, jmh530 wrote:
I was a little confused by something in the main autodecoding 
thread, so I read your article again. Unfortunately, I don't 
think my confusion is resolved. I was trying one of your 
examples (full code I used below). You claim it works, but I 
keep getting assertion failures. I'm just running it with rdmd 
on Windows 7.



import std.algorithm : canFind;

void main()
{
string s = "cassé";

assert(s.canFind!(x => x == 'é'));
}


Your browser is turning the é in the string into two code 
points via normalization whereas it should be one. Try using 
\u00E9 instead.


That doesn't cause an assert to fail, but when I do  
writeln('\u00E9') I get é. So there might still be something 
wonky going on. I looked up \u00E9 online and I don't think 
there's an error with that.


Re: [OT] Things I like about Andrei

2016-06-02 Thread docandrew via Digitalmars-d

On Friday, 3 June 2016 at 02:17:51 UTC, Adam D. Ruppe wrote:
A lot of us, myself included, have been very critical of Andrei 
lately but I want to list of the excellent work he has done 
over the years:


First, early D was very different to D of today. Andrei changed 
that, for the better. He's a genius of innovation with 
templates and good at getting to the bottom of generic code.


The Range concept is excellent, the logical extension of 
iterators like slices are to pointers, and std.algorithm is 
generally brilliant.


Many of the patterns we take for granted in D, from templates 
in general to conversion and literals on top of them, to ranges 
and algorithms, were principally designed and implemented by 
Andrei.


std.experimental.allocator is very well done and the Design by 
Introspection is not just smart insight to the generic 
programming problem, but actually explored and explained in 
such a way that we can hook onto it.


His talks and writing are amusing and informative, and his 
dedication unquestionable.


Andrei Alexandrescu is a very good, innovative programmer and 
writer who invents and explains things that others can't even 
consider.


We're lucky to have him with us!


+1.

I've been following D since the (dead tree) Dr. Dobbs article I 
found in a supermarket a decade ago, and it's been amazing to 
watch it grow since his participation. Even concepts that Walter 
used to swear off, like templates, have become not just bearable 
but a legitimately kick-ass feature thanks to Andrei's help. D 
owes him a lot!


-Jon


Re: Lifetime tracking

2016-06-02 Thread docandrew via Digitalmars-d

On Friday, 3 June 2016 at 00:40:09 UTC, Stefan Koch wrote:

On Friday, 3 June 2016 at 00:31:31 UTC, Walter Bright wrote:

If they cover the cases that matter, it's good. Rust has the 
type system annotations you want, but Rust has a reputation 
for being difficult to write code for.


I think we can incorporate typesafe borrowing without making it 
difficult to write.


+1, a big problem with Rust is just that the syntax is really 
ugly to those coming from D/C/C++/Java. An idea I had was using 
plain English attributes in function signatures to denote 
ownership.


e.g.

void myfunc(sees Myobj arg1, copies Myobj arg2, borrows Myobj 
arg3, owns Myobj arg4)

{
	//"sees arg1" - read-only reference (basically const now, but 
cannot be

//cast away)

	//"copies arg2" - read/write copy of argument. It works the same 
way
//value types work now, and will be freed after function 
exit (unless it is

//returned).

	//"borrows arg3" is a by-reference pass, may have the benefit of 
enabling
//optimization for small functions since it eliminates a 
copy
	// (maybe save a stack push and allow register re-use?). Will 
not be freed

//after function exits (ownership returns to calling
// function). Reference can be locked for multi-threaded apps.

	//"owns arg4" - frees after function exit (unless it is 
returned).

}

At a glance it's obvious who owns what, what's read-only, etc.

Also, a nice bonus is that "const" can become a more rigid 
guarantee - as in Rust,  there can exist multiple const 
references to an object, but only one mutable reference. 
Immutable or const by default is probably a bridge too far from 
what we're used to.


There are still a lot of corner-cases that I'd have to think 
through, i.e. calling class methods through a const/"sees" 
reference (would have to be "pure" calls only), good syntax for 
ownership changes mid-function (maybe use "sees" "copies" 
"borrows" and "owns" as operators?), passing to C functions, 
mangling, etc.


Anyhow, just some brainstorming to stir discussion. It looks 
pleasant to me, but I'm not sure if you can call it "D" anymore.


-Jon


[OT] Things I like about Andrei

2016-06-02 Thread Adam D. Ruppe via Digitalmars-d
A lot of us, myself included, have been very critical of Andrei 
lately but I want to list of the excellent work he has done over 
the years:


First, early D was very different to D of today. Andrei changed 
that, for the better. He's a genius of innovation with templates 
and good at getting to the bottom of generic code.


The Range concept is excellent, the logical extension of 
iterators like slices are to pointers, and std.algorithm is 
generally brilliant.


Many of the patterns we take for granted in D, from templates in 
general to conversion and literals on top of them, to ranges and 
algorithms, were principally designed and implemented by Andrei.


std.experimental.allocator is very well done and the Design by 
Introspection is not just smart insight to the generic 
programming problem, but actually explored and explained in such 
a way that we can hook onto it.


His talks and writing are amusing and informative, and his 
dedication unquestionable.


Andrei Alexandrescu is a very good, innovative programmer and 
writer who invents and explains things that others can't even 
consider.


We're lucky to have him with us!


Re: Broken links continue to exist on major pages on dlang.org

2016-06-02 Thread Adam D. Ruppe via Digitalmars-d
On Thursday, 2 June 2016 at 20:34:24 UTC, Andrei Alexandrescu 
wrote:
Interestingly it came as encouraging and empowering some 
fledgling work that had compelling things going for it 
(including but not limited to enthusiastic receipt in this 
forum), which ironically is exactly what you just asked for.


Yes, indeed, it was a good first (and second) step. But further 
steps are necessary too in order to finish a project.


Here's what would have been ideal to me:

1) Someone writes a cool thing.

2) We encourage further exploration and see interest.

3) After deciding there's serious potential, we decide on the end 
goal, a timeframe, and set the conditions of success. For 
example: ddox becomes the official documentation generator at the 
end of the year if there are no major bugs remaining open.


4) We put it on the website and work toward the goal, with all 
the teams - Phobos, dlang.org, RejectedSoftware, etc., 
understanding their role.


5) When the goal deadline arrives, if it passes the major bug 
test, it goes live and we are committed to it going forward.




Why this order? First, someone writing the cool thing means we 
actually have something to sink our teeth into and a de facto 
champion in the original author.


Second, we need to incubate this work and not discourage the 
author.


ddox got a decent go up to here.

But then we need to decide what's next - a clear goal, including 
a due date, gets us all aligned and removes a lot of the 
uncertainty on the author's side; it is some reassurance that 
they aren't wasting their time, and encourages outside teams to 
get onboard.


That leads directly into step four, and then step five actually 
proves that the others were not in vain.


Re: Unicode Normalization (and graphemes and locales)

2016-06-02 Thread Jack Stouffer via Digitalmars-d

On Friday, 3 June 2016 at 00:14:13 UTC, Walter Bright wrote:
5. Normalization, graphemes, and locales should all be 
explicitly opt-in with corresponding library code.


Add decoding to that list and we're right there with you.

7. At some point, as the threads on autodecode amply 
illustrate, working with level 2 or level 3 Unicode requires a 
certain level of understanding on the part of the programmer 
writing the code, because there simply is no overarching 
correct way to do things. The programmer is going to have to 
understand what he is trying to accomplish with Unicode and 
select the code/algorithms accordingly.


Working at any level of Unicode in a systems programming language 
requires knowledge of Unicode. The thing is, because D is a 
systems language, we can't have the default behavior to decode to 
grapheme clusters, and because of that, we have to have 
everything be opt-in, because everything else is fundamentally 
wrong on some level. Once you step out of scripting language 
land, you can't get around requiring Unicode knowledge. Like I 
said in my blog,



Unicode is hard. Trying to hide Unicode specifics helps
no one because it's going to bite you in the ass eventually.


Re: Lifetime tracking

2016-06-02 Thread Stefan Koch via Digitalmars-d

On Friday, 3 June 2016 at 00:31:31 UTC, Walter Bright wrote:

If they cover the cases that matter, it's good. Rust has the 
type system annotations you want, but Rust has a reputation for 
being difficult to write code for.


I think we can incorporate typesafe borrowing without making it 
difficult to write.




Re: Lifetime tracking

2016-06-02 Thread Walter Bright via Digitalmars-d

On 6/2/2016 5:21 PM, Walter Bright wrote:

Please give an example.


I see you did, so ignore that.



Re: Lifetime tracking

2016-06-02 Thread Walter Bright via Digitalmars-d

On 6/2/2016 4:29 PM, Timon Gehr wrote:

// need to know that lifetime of a ends not after lifetime of b
void assign(S,T)(ref S a, T b){ a = b; }

void foo(scope int* k){
void bar(){
scope int* x;
// need to check that lifetime of x ends not after lifetime of k
assign(x,k);


It'll fail to compile because T is not annotated with 'scope'. Annotating T with 
scope will then fail to compile because the assignment to 'a' may outlive 'b'.



}
}



> Note that it is tempting to come up with ad-hoc solutions that make some 
small finite set of examples work.


If they cover the cases that matter, it's good. Rust has the type system 
annotations you want, but Rust has a reputation for being difficult to write 
code for.


Re: Lifetime tracking

2016-06-02 Thread Walter Bright via Digitalmars-d

On 6/2/2016 4:05 PM, Timon Gehr wrote:

I'd like to point out again why that design is inadequate:

Whenever the type checker is using a certain piece of information to check
validity of a program, there should be a way to pass that kind of information
across function boundaries. Otherwise the type system is not modular. This is a
serious defect.


I don't understand where the defect is. Please give an example.


Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d

On 6/2/2016 3:27 PM, John Colvin wrote:

I wonder what rationale there is for Unicode to have two different sequences
of codepoints be treated as the same. It's madness.


There are languages that make heavy use of diacritics, often several on a single
"character". Hebrew is a good example. Should there be only one valid ordering
of any given set of diacritics on any given character?


I didn't say ordering, I said there should be no such thing as "normalization" 
in Unicode, where two codepoints are considered to be identical to some other 
codepoint.




Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d

On 6/2/2016 2:25 PM, deadalnix wrote:

On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote:

I wonder what rationale there is for Unicode to have two different sequences
of codepoints be treated as the same. It's madness.

To be able to convert back and forth from/to unicode in a lossless manner.



Sorry, that makes no sense, as it is saying "they're the same, only different."


Unicode Normalization (and graphemes and locales)

2016-06-02 Thread Walter Bright via Digitalmars-d

On 6/2/2016 4:29 PM, Jonathan M Davis via Digitalmars-d wrote:
> How do you suggest that we handle the normalization issue? Should we just
> assume NFC like std.uni.normalize does and provide an optional template
> argument to indicate a different normalization (like normalize does)? Since
> without providing a way to deal with the normalization, we're not actually
> making the code fully correct, just faster.

The short answer is, we don't.

1. D is a systems programming language. Baking normalization, graphemes and 
Unicode locales in at a low level will have a disastrous negative effect on 
performance and size.


2. Very little systems programming work requires level 2 or 3 Unicode support.

3. Are they needed? Pedantically, yes. Practically, not necessarily.

4. What we must do is, for each algorithm, document how it handles Unicode.

5. Normalization, graphemes, and locales should all be explicitly opt-in with 
corresponding library code.


Normalization: s.normalize.algorithm()
Graphemes: may require separate algorithms, maybe std.grapheme?
Locales: I have no idea, given that I have not studied that issue

6. std.string has many analogues for std.algorithms that are specific to the 
peculiarities of strings. I think this is a perfectly acceptable approach. For 
example, there are many ways to sort Unicode strings, and many of them do not 
fit in with std.algorithm.sort's ways. Having special std.string.sort's for them 
would be the most practical solution.


7. At some point, as the threads on autodecode amply illustrate, working with 
level 2 or level 3 Unicode requires a certain level of understanding on the part 
of the programmer writing the code, because there simply is no overarching 
correct way to do things. The programmer is going to have to understand what he 
is trying to accomplish with Unicode and select the code/algorithms accordingly.


Re: Areas of D usage

2016-06-02 Thread jmh530 via Digitalmars-d

On Thursday, 2 June 2016 at 21:47:13 UTC, qznc wrote:

On Thursday, 2 June 2016 at 13:59:13 UTC, Seb wrote:
If I left out an area or you miss an application/usage - 
please let me know!


The Javascript JIT Compiler Higgs: 
https://github.com/higgsjs/Higgs


Vibe.d needs some examples. Looks like their website does not 
have any.


Wasn't too many clicks away to get to the tutorial on building a 
chat service.


Re: non empty slices

2016-06-02 Thread Alex via Digitalmars-d-learn

On Thursday, 2 June 2016 at 23:44:49 UTC, ag0aep6g wrote:

On 06/03/2016 01:35 AM, ag0aep6g wrote:
The alternative `peek` method is not documented to throw an 
exception,

but it's not @nogc either. No idea why. Maybe Algebraic does GC
allocations internally. I wouldn't know for what, though. Or 
it misses a

@nogc somewhere.


I've looked at the source to see if it's something simple, and 
Algebraic/VariantN seems to be terribly complicated. Writing a 
simpler @nogc tagged union may be easier than fixing the phobos 
one, if the phobos one can even be made @nogc.


I'm also inside the source... yes, its not a simple one. I think 
I will try to write my own one...


Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d

On 6/2/2016 4:29 PM, Jonathan M Davis via Digitalmars-d wrote:

How do you suggest that we handle the normalization issue?


Started a new thread for that one.



[Issue 14403] DDox: std.algorithm index links are 404

2016-06-02 Thread via Digitalmars-d-bugs
https://issues.dlang.org/show_bug.cgi?id=14403

--- Comment #3 from github-bugzi...@puremagic.com ---
Commit pushed to master at https://github.com/dlang/dlang.org

https://github.com/dlang/dlang.org/commit/8e12e01f388097bc947ef8e7ace1fef5926b3521
Merge pull request #1322 from s-ludwig/master

Fix formatting of (M)REF_ALTTEXT. See issue 14403.

--


Re: Lifetime tracking

2016-06-02 Thread tsbockman via Digitalmars-d

On Thursday, 2 June 2016 at 23:29:57 UTC, Timon Gehr wrote:

On 03.06.2016 01:12, tsbockman wrote:

On Thursday, 2 June 2016 at 23:05:40 UTC, Timon Gehr wrote:
Whenever the type checker is using a certain piece of 
information to
check validity of a program, there should be a way to pass 
that kind
of information across function boundaries. Otherwise the type 
system

is not modular. This is a serious defect.


Would you mind giving a brief example of how that applies to 
`scope`?


(I'm asking for my own education; I have no personal opinion 
as to the

right implementation at the moment.)


The simplest example is this [1]:


Thanks for the explanation.


Re: Lifetime tracking

2016-06-02 Thread Timon Gehr via Digitalmars-d

On 03.06.2016 01:29, Timon Gehr wrote:



[1] It might be possible to get that example to pass the type checker
with 'return' annotations only if I change 'ref' to 'out', but often
more than two lifetimes are involved, and then it falls flat on its face.


To be slightly more explicit:

void multiAssign(A,B,C,D)(ref A a,B b,ref C c,D d){ a = b; c = d; }


Re: non empty slices

2016-06-02 Thread Alex via Digitalmars-d-learn

On Thursday, 2 June 2016 at 23:35:53 UTC, ag0aep6g wrote:
It's the Algebraic. The `get` method isn't @nogc. The 
documentation [1] says that it may throw an exception, which is 
most probably being allocated through the GC. So that's a 
reason why it can't be @nogc.


The alternative `peek` method is not documented to throw an 
exception, but it's not @nogc either. No idea why. Maybe 
Algebraic does GC allocations internally. I wouldn't know for 
what, though. Or it misses a @nogc somewhere.



[1] http://dlang.org/phobos/std_variant#.VariantN.get


Yeah... thanks a lot!


Re: non empty slices

2016-06-02 Thread ag0aep6g via Digitalmars-d-learn

On 06/03/2016 01:35 AM, ag0aep6g wrote:

The alternative `peek` method is not documented to throw an exception,
but it's not @nogc either. No idea why. Maybe Algebraic does GC
allocations internally. I wouldn't know for what, though. Or it misses a
@nogc somewhere.


I've looked at the source to see if it's something simple, and 
Algebraic/VariantN seems to be terribly complicated. Writing a simpler 
@nogc tagged union may be easier than fixing the phobos one, if the 
phobos one can even be made @nogc.


Re: non empty slices

2016-06-02 Thread ag0aep6g via Digitalmars-d-learn

On 06/03/2016 01:17 AM, Alex wrote:

But still, I can't mark the f-method @nogc, and this is not due to the
writeln calls... why GC is invoked, although everything is known and no
memory allocation should happen?


It's the Algebraic. The `get` method isn't @nogc. The documentation [1] 
says that it may throw an exception, which is most probably being 
allocated through the GC. So that's a reason why it can't be @nogc.


The alternative `peek` method is not documented to throw an exception, 
but it's not @nogc either. No idea why. Maybe Algebraic does GC 
allocations internally. I wouldn't know for what, though. Or it misses a 
@nogc somewhere.



[1] http://dlang.org/phobos/std_variant#.VariantN.get


Re: Lifetime tracking

2016-06-02 Thread Timon Gehr via Digitalmars-d

On 03.06.2016 01:12, tsbockman wrote:

On Thursday, 2 June 2016 at 23:05:40 UTC, Timon Gehr wrote:

Whenever the type checker is using a certain piece of information to
check validity of a program, there should be a way to pass that kind
of information across function boundaries. Otherwise the type system
is not modular. This is a serious defect.


Would you mind giving a brief example of how that applies to `scope`?

(I'm asking for my own education; I have no personal opinion as to the
right implementation at the moment.)


The simplest example is this [1]:

void foo(scope int* k){
void bar(){
scope int* x;
x = k; // ok: lifetime of x ends not after lifetime of k
}
}

Now we factor out the assignment:

// need to know that lifetime of a ends not after lifetime of b
void assign(S,T)(ref S a, T b){ a = b; }

void foo(scope int* k){
void bar(){
scope int* x;
// need to check that lifetime of x ends not after lifetime of k
assign(x,k);
}
}

I.e. now we need a way to annotate 'assign' in order to specify the 
contract I have written down in the comments.


Note that it is tempting to come up with ad-hoc solutions that make some 
small finite set of examples work. This is not how well-designed type 
systems usually come about. You need to think about what information the 
type checker requires, and how to pass it across function boundaries 
(i.e. how to encode that information in types). Transfer of information 
must be lossless, so one should resist the temptation to use more 
information than can be passed across function boundaries in case it is 
accidentally available.




[1] It might be possible to get that example to pass the type checker 
with 'return' annotations only if I change 'ref' to 'out', but often 
more than two lifetimes are involved, and then it falls flat on its face.


Re: The Case Against Autodecode

2016-06-02 Thread Jonathan M Davis via Digitalmars-d
On Thursday, June 02, 2016 15:48:03 Walter Bright via Digitalmars-d wrote:
> On 6/2/2016 3:23 PM, Andrei Alexandrescu wrote:
> > On 06/02/2016 05:58 PM, Walter Bright wrote:
> >>  > * s.balancedParens('〈', '〉') works only with autodecoding.
> >>  > * s.canFind('ö') works only with autodecoding. It returns always
> >>
> >> false without.
> >>
> >> Can be made to work without autodecoding.
> >
> > By special casing? Perhaps.
>
> The argument to canFind() can be detected as not being a char, then decoded
> into a sequence of char's, then forwarded to a substring search.

How do you suggest that we handle the normalization issue? Should we just
assume NFC like std.uni.normalize does and provide an optional template
argument to indicate a different normalization (like normalize does)? Since
without providing a way to deal with the normalization, we're not actually
making the code fully correct, just faster.

- Jonathan M Davis




Re: Creating a "fixed-range int" with opDispatch and/or alias this?

2016-06-02 Thread tsbockman via Digitalmars-d-learn

On Wednesday, 1 June 2016 at 19:59:51 UTC, Mark Isaacson wrote:
I'm trying to create a type that for all intents and purposes 
behaves exactly like an int except that it limits its values to 
be within a certain range [a,b]. Theoretically, I would think 
this looks something like:


...

It looks like opDispatch doesn't participate in resolution of 
operator overloads. Is there any way I can achieve my desired 
result? I know alias this forwards operations like +=, but with 
alias this I cannot wrap the operation to do the bounds 
checking.


I think you would need to implement all of:

* this(...)

* opAssign(...)

* opOpAssign(...)

* opBinary(...)

* opBinaryRight(...)

* opUnary(...)


FWIW, the fixed range int part of this question is just an 
example, I'm mostly just interested in whether this idea is 
possible without a lot of bloat/duplication.


For a single type, I think the bloat is required. If you want to 
generate a lot of similar types, though, you could probably write 
a mixin template to generate the methods for you.


Re: The Case Against Autodecode

2016-06-02 Thread Jonathan M Davis via Digitalmars-d
On Thursday, June 02, 2016 22:27:16 John Colvin via Digitalmars-d wrote:
> On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote:
> > I wonder what rationale there is for Unicode to have two
> > different sequences of codepoints be treated as the same. It's
> > madness.
>
> There are languages that make heavy use of diacritics, often
> several on a single "character". Hebrew is a good example. Should
> there be only one valid ordering of any given set of diacritics
> on any given character? It's an interesting idea, but it's not
> how things are.

Yeah. I'm inclined to think that the fact that there are multiple
normalizations was a huge mistake on the part of the Unicode folks, but
we're stuck dealing with it. And as horrible as it is for most cases, maybe
it _does_ ultimately make sense because of certain use cases; I don't know.
But bad idea or not, we're stuck. :(

- Jonathan M Davis



Re: The Case Against Autodecode

2016-06-02 Thread Jonathan M Davis via Digitalmars-d
On Thursday, June 02, 2016 18:23:19 Andrei Alexandrescu via Digitalmars-d 
wrote:
> On 06/02/2016 05:58 PM, Walter Bright wrote:
> > On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote:
> >> The lambda returns bool. -- Andrei
> >
> > Yes, I was wrong about that. But the point still stands with:
> >  > * s.balancedParens('〈', '〉') works only with autodecoding.
> >  > * s.canFind('ö') works only with autodecoding. It returns always
> >
> > false without.
> >
> > Can be made to work without autodecoding.
>
> By special casing? Perhaps. I seem to recall though that one major issue
> with autodecoding was that it special-cases certain algorithms. So you'd
> need to go through all of std.algorithm and make sure you can
> special-case your way out of situations that work today.

Yeah, I believe that you do have to do some special casing, though it would
be special casing on ranges of code units in general and not strings
specifically, and a lot of those functions are already special cased on
string in an attempt be efficient. In particular, with a function like find
or canFind, you'd take the needle and encode it to match the haystack it was
passed so that you can do the comparisons via code units. So, you incur the
encoding cost once when encoding the needle rather than incurring the
decoding cost of each code point or grapheme as you iterate over the
haystack. So, you end up with something that's correct and efficient. It's
also much friendlier to code that only operates on ASCII.

The one issue that I'm not quite sure how we'd handle in that case is
normalization (which auto-decoding doesn't handle either), since you'd need
to normalize the needle to match the haystack (which also assumes that the
haystack was already normalized). Certainly, it's the sort of thing that
makes it so that you kind of wish you were dealing with a string type that
had the normalization built into it rather than either an array of code
units or an arbitrary range of code units. But maybe we could assume the NFC
normalization like std.uni.normalize does and provide an optional template
argument for the normalization scheme.

In any case, while it's not entirely straightforward, it is quite possible
to write some algorithms in a way which works on arbitrary ranges of code
units and deals with Unicode correctly without auto-decoding or requiring
that the user convert it to a range of code points or graphemes in order to
properly handle the full range of Unicode. And even if we keep
auto-decoding, we pretty much need to fix it so that std.algorithm and
friends are Unicode-aware in this manner so that ranges of code units work
in general without requiring that you use byGrapheme. So, this sort of thing
could have a large impact on RCStr, even if we keep auto-decoding for narrow
strings.

Other algorithms, however, can't be made to work automatically with Unicode
- at least not with the current range paradigm. filter, for instance, really
needs to operate on graphemes to filter on characters, but with a range of
code units, that would mean operating on groups of code units as a single
element, which you can't do with something like a range of char, since that
essentially becomes a range of ranges. It has to be wrapped in a range
that's going to provide graphemes - and of course, if you know that you're
operating only on ASCII, then you wouldn't want to deal with graphemes
anyway, so automatically converting to graphemes would be undesirable. So,
for a function like filter, it really does have to be up to the programmer
to indicate what level of Unicode they want to operate at.

But if we don't make functions Unicode-aware where possible, then we're
going to take a performance hit by essentially forcing everyone to use
explicit ranges of code points or graphemes even when they should be
unnecessary. So, I think that we're stuck with some level of special casing,
but it would then be for ranges of code units and code points and not
strings. So, it would work efficiently for stuff like RCStr, which the
current scheme does not.

I think that the reality of the matter is that regardless of whether we keep
auto-decoding for narrow strings in place, we need to make Phobos operate on
arbitrary ranges of code units and code points, since even stuff like RCStr
won't work efficiently otherwise, and stuff like byCodeUnit won't be usuable
in as many cases otherwise, because if a generic function isn't
Unicode-aware, then in many cases, byCodeUnit will be very wrong, just like
byCodePoint would be wrong. So, as far as Phobos goes, I'm not sure that the
question of auto-decoding matters much for what we need to do at this point.
If we do what we need to do, then Phobos will work whether we have
auto-decoding or not (working in a Unicode-aware manner where possible and
forcing the user to decide the correct level of Unicode to work at where
not), and then it just becomes a question of whether we can or should
deprecate auto-decoding once all that's done.

- 

Re: non empty slices

2016-06-02 Thread Alex via Digitalmars-d-learn

On Thursday, 2 June 2016 at 22:17:32 UTC, ag0aep6g wrote:


Yeah, can't do it that way. You have only one f_impl call, but 
want it to go to different overloads based on dynamic 
information (caseS). That doesn't work.


You need three different f_impl calls. You can generate them, 
so there's only one in the source, but it's a bit involved:


sw: switch (caseS)
{
foreach (i, T; TL)
{
case i: f_impl(result.get!T); break sw;
}
default: assert(false);
}

Oh... wow... cool! :)
But still, I can't mark the f-method @nogc, and this is not due 
to the writeln calls... why GC is invoked, although everything is 
known and no memory allocation should happen?


Re: The Case Against Autodecode

2016-06-02 Thread Vladimir Panteleev via Digitalmars-d

On Thursday, 2 June 2016 at 21:56:10 UTC, Walter Bright wrote:

Yes, you have a good point. But we do allow things like:

   byte b;
   if (b == 1) ...


Why allowing char/wchar/dchar comparisons is wrong:

void main()
{
string s = "Привет";
foreach (c; s)
assert(c != 'Ñ');
}

From my post from 2014:

http://forum.dlang.org/post/knrwiqxhlvqwxqshy...@forum.dlang.org



Re: Lifetime tracking

2016-06-02 Thread Stefan Koch via Digitalmars-d

On Thursday, 2 June 2016 at 23:05:40 UTC, Timon Gehr wrote:

On 03.06.2016 00:29, Walter Bright wrote:

On 6/2/2016 3:10 PM, Marco Leise wrote:

we haven't looked into borrowing/scoped enough


That's my fault.

As for scoped, the idea is to make scope work analogously to 
DIP25's
'return ref'. I don't believe we need borrowing, we've worked 
out

another solution that will work for ref counting.

Please do not reply to this in this thread - start a new one 
if you wish

to continue with this topic.



I'd like to point out again why that design is inadequate:

Whenever the type checker is using a certain piece of 
information to check validity of a program, there should be a 
way to pass that kind of information across function 
boundaries. Otherwise the type system is not modular. This is a 
serious defect.


Seconded.




Lifetime tracking

2016-06-02 Thread Timon Gehr via Digitalmars-d

On 03.06.2016 00:29, Walter Bright wrote:

On 6/2/2016 3:10 PM, Marco Leise wrote:

we haven't looked into borrowing/scoped enough


That's my fault.

As for scoped, the idea is to make scope work analogously to DIP25's
'return ref'. I don't believe we need borrowing, we've worked out
another solution that will work for ref counting.

Please do not reply to this in this thread - start a new one if you wish
to continue with this topic.



I'd like to point out again why that design is inadequate:

Whenever the type checker is using a certain piece of information to 
check validity of a program, there should be a way to pass that kind of 
information across function boundaries. Otherwise the type system is not 
modular. This is a serious defect.


Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d

On 03.06.2016 00:23, Andrei Alexandrescu wrote:

On 06/02/2016 05:58 PM, Walter Bright wrote:

On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote:

The lambda returns bool. -- Andrei


Yes, I was wrong about that. But the point still stands with:

 > * s.balancedParens('〈', '〉') works only with autodecoding.
 > * s.canFind('ö') works only with autodecoding. It returns always
false without.

Can be made to work without autodecoding.


By special casing? Perhaps. I seem to recall though that one major issue
with autodecoding was that it special-cases certain algorithms.


The major issue is that it special cases when there's different, more 
natural semantics available.


Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d

On 03.06.2016 00:26, Walter Bright wrote:

On 6/2/2016 3:11 PM, Timon Gehr wrote:

Well, this is a somewhat different case, because 1 is just not
representable
as a byte. Every value that fits in a byte fits in an int though.

It's different for code units. They are incompatible both ways.


Not exactly. (c == 'ö') is always false for the same reason that (b ==
1000) is always false.
...


Yes. And _additionally_, some other concerns apply that are not there 
for byte vs. int. I.e. if b == 1 is disallowed, then c == d should 
be disallowed too, but b == 1 can be allowed even if c == d is 
disallowed.



I'm not sure what the right answer is here.


char to dchar is a lossy conversion, so it shouldn't happen.
byte to int is a lossless conversion, so there is no problem a priori.


Re: The Case Against Autodecode

2016-06-02 Thread tsbockman via Digitalmars-d

On Thursday, 2 June 2016 at 22:20:49 UTC, Walter Bright wrote:

On 6/2/2016 2:05 PM, tsbockman wrote:

Presumably if someone marks their own
PR as "do not merge", it means they're planning to either 
close it themselves
after it has served its purpose, or they plan to fix/finish it 
and then remove

the "do not merge" label.


That doesn't seem to apply here, either.


Either way, they shouldn't be closed just because they say "do 
not merge"

(unless they're abandoned or something, obviously).


Something like that could not be merged until 132 other PRs are 
done to fix Phobos. It doesn't belong as a PR.


I was just responding to the general question you posed about "do 
not merge" PRs, not really arguing for that one, in particular, 
to be re-opened. I'm sure @wilzbach is willing to explain if 
anyone cares to ask him why he did it as a PR, though.


Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d

On 6/2/2016 3:10 PM, Marco Leise wrote:

we haven't looked into borrowing/scoped enough


That's my fault.

As for scoped, the idea is to make scope work analogously to DIP25's 'return 
ref'. I don't believe we need borrowing, we've worked out another solution that 
will work for ref counting.


Please do not reply to this in this thread - start a new one if you wish to 
continue with this topic.




Re: Why does DMD on Debian need xdg-utils

2016-06-02 Thread flamencofantasy via Digitalmars-d

On Thursday, 2 June 2016 at 21:32:28 UTC, Mathias Lang wrote:
It shouldn't be necessary. I believe that is because of `dmd 
-man`, which open a web browser.


That's an apt-d issue (in hopefully Jordi Sayol will read this) 
which
prevents using this repository if your machine has no X (I 
guess you

discovered that on a server, as I did).




Yes. It also supports multiple versions side by side and also 
installs dub. You can find the source here: 
https://github.com/dlang/installer/blob/master/script/install.sh


Example usage:

# Install dmd 2.70.0
~/dlang/install.sh install dmd-2.70.0

# Install dmd 2.69.0
~/dlang/install.sh install dmd-2.69.0

# start using version 2.70.0
activate ~/dlang/dmd-2.70.0

# stop using version 2.70.0
deactivate

# start using version 2.69.0
activate ~/dlang/dmd-2.69.0

# stop using version 2.69.0
deactivate

# uninstall version 2.69.0
~/dlang/install.sh uninstall dmd-2.69.0

# removes everything installed so far
rm -rf ~/dlang

# downloads (again) the install script and
# installs the latest stable version of the compiler.
curl -fsS https://dlang.org/install.sh | bash -s dmd


Yes, it's a server.
It's actually a linux branded SmartOS zone and the install script 
does not seem to work.
I have always been using the .deb package and it's been working, 
I just didn't want xdg-utils and all the other stuff that comes 
with it.







Re: The Case Against Autodecode

2016-06-02 Thread John Colvin via Digitalmars-d

On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote:

On 6/2/2016 12:34 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu 
wrote:
Pretty much everything. Consider s and s1 string variables 
with possibly

different encodings (UTF8/UTF16).

* s.all!(c => c == 'ö') works only with autodecoding. It 
returns always false

without.



False. Many characters can be represented by different 
sequences of codepoints.
For instance, ê can be ê as one codepoint or ^ as a modifier 
followed by e. ö is

one such character.


There are 3 levels of Unicode support. What Andrei is talking 
about is Level 1.


http://unicode.org/reports/tr18/tr18-5.1.html

I wonder what rationale there is for Unicode to have two 
different sequences of codepoints be treated as the same. It's 
madness.


There are languages that make heavy use of diacritics, often 
several on a single "character". Hebrew is a good example. Should 
there be only one valid ordering of any given set of diacritics 
on any given character? It's an interesting idea, but it's not 
how things are.


Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d

On 6/2/2016 3:11 PM, Timon Gehr wrote:

Well, this is a somewhat different case, because 1 is just not representable
as a byte. Every value that fits in a byte fits in an int though.

It's different for code units. They are incompatible both ways.


Not exactly. (c == 'ö') is always false for the same reason that (b == 1000) is 
always false.


I'm not sure what the right answer is here.


Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 06/02/2016 05:58 PM, Walter Bright wrote:

On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote:

The lambda returns bool. -- Andrei


Yes, I was wrong about that. But the point still stands with:

 > * s.balancedParens('〈', '〉') works only with autodecoding.
 > * s.canFind('ö') works only with autodecoding. It returns always
false without.

Can be made to work without autodecoding.


By special casing? Perhaps. I seem to recall though that one major issue 
with autodecoding was that it special-cases certain algorithms. So you'd 
need to go through all of std.algorithm and make sure you can 
special-case your way out of situations that work today.



Andrei



Re: non empty slices

2016-06-02 Thread ag0aep6g via Digitalmars-d-learn

On 06/02/2016 11:37 PM, Alex wrote:

Just tried this instead of your f-function:
void f(int[] arr)
{
 A result;
 import std.meta;
 alias TL = AliasSeq!(Empty, int, Many!int);
 int caseS;
 switch (arr.length)
 {
 case 0: result = Empty.init; caseS = 0; break;
 case 1: result = arr[0]; caseS = 1;  break;
 default: result = Many!int(arr); caseS = 2;
 }
 f_impl(*result.get!(TL[caseS]));
}
But got: Error: variable caseS cannot be read at compile time
which is obviously true...


Yeah, can't do it that way. You have only one f_impl call, but want it 
to go to different overloads based on dynamic information (caseS). That 
doesn't work.


You need three different f_impl calls. You can generate them, so there's 
only one in the source, but it's a bit involved:


sw: switch (caseS)
{
foreach (i, T; TL)
{
case i: f_impl(result.get!T); break sw;
}
default: assert(false);
}



Re: The Case Against Autodecode

2016-06-02 Thread tsbockman via Digitalmars-d

On Thursday, 2 June 2016 at 22:03:01 UTC, default0 wrote:

*sigh* reading comprehension.
...
Please do not take what I say out of context, thank you.


Earlier you said:

The level 2 support description noted that it should be opt-in 
because its slow.


My main point is simply that you mischaracterized what the 
standard says. Making level 1 opt-in, rather than level 2, would 
be just as compliant as the reverse. The standard makes no 
suggestion as to which should be default.


Re: The Case Against Autodecode

2016-06-02 Thread Marco Leise via Digitalmars-d
Am Thu, 2 Jun 2016 15:05:44 -0400
schrieb Andrei Alexandrescu :

> On 06/02/2016 01:54 PM, Marc Schütz wrote:
> > Which practical tasks are made possible (and work _correctly_) if you
> > decode to code points, that don't already work with code units?  
> 
> Pretty much everything.
>
> s.all!(c => c == 'ö')

Andrei, your ignorance is really starting to grind on
everyones nerves. If after 350 posts you still don't see
why this is incorrect: s.any!(c => c == 'o'), you must be
actively skipping the informational content of this thread.

You are in error, no one agrees with you, and you refuse to see
it and in the end we have to assume you will make a decisive
vote against any PR with the intent to remove auto-decoding
from Phobos.

Your so called vocal minority is actually D's panel of Unicode
experts who understand that auto-decoding is a false ally and
should be on the deprecation track.

Remember final-by-default? You promised, that your objection
about breaking code means that D2 will only continue to be
fixed in a backwards compatible way, be it the implementation
of shared or whatever else. Yet months later you opened a
thread with the title "inout must go". So that must have been
an appeasement back then. People don't forget these things
easily and RCStr seems to be a similar distraction,
considering we haven't looked into borrowing/scoped enough and
you promise wonders from it.

-- 
Marco



Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d

On 02.06.2016 23:56, Walter Bright wrote:

On 6/2/2016 1:12 PM, Timon Gehr wrote:

...
It is not
meaningful to compare utf-8 and utf-16 code units directly.


Yes, you have a good point. But we do allow things like:

byte b;
if (b == 1) ...



Well, this is a somewhat different case, because 1 is just not 
representable as a byte. Every value that fits in a byte fits in an int 
though.


It's different for code units. They are incompatible both ways. E.g. 
dchar obviously does not fit in a char, and while the lower half of char 
is compatible with dchar, the upper half is specific to the encoding. 
dchar cannot represent upper half char code units. You get the code 
points with the corresponding values instead.


E.g.:

void main(){
import std.stdio,std.utf;
foreach(dchar d;"ö".byCodeUnit)
writeln(d); // "Ã", "¶"
}



Re: Areas of D usage

2016-06-02 Thread Seb via Digitalmars-d

On Thursday, 2 June 2016 at 21:47:13 UTC, qznc wrote:

On Thursday, 2 June 2016 at 13:59:13 UTC, Seb wrote:
If I left out an area or you miss an application/usage - 
please let me know!


The Javascript JIT Compiler Higgs: 
https://github.com/higgsjs/Higgs




Wow that's a great example!

Vibe.d needs some examples. Looks like their website does not 
have any.


I was also looking for public Vibe.d instances out there - does 
anyone know a large website using Vibe.d that we could quote?


Re: non empty slices

2016-06-02 Thread ag0aep6g via Digitalmars-d-learn

On 06/02/2016 10:11 PM, Alex wrote:

The cool thing about the Algebraic is as I expected, that it doesn't
change it's type... And the hard thing is, that I'm not used to its
Empty, Many, ... things yet.


I just made those up on the spot. Note that Many is not actually 
implemented at all. There is no check that the array has at least two 
elements. And Empty is just there, because I needed a type for the 
Algebraic.



But the question remains how to keep this @nogc?


Apparently, it's Algebraic that isn't @nogc. I don't know what it 
allocates. Maybe it allocates space for large types (but there aren't 
any here), or maybe it can throw a GC-allocated exception.



I wonder at the line
with peek... and why it is not just returning the value...


I wouldn't expect that to be the problem with @nogc. As far as I see, 
the pointer is used as a way to return "not found" in the form of null. 
When you get a non-null pointer it's probably just into the Algebraic.


Re: The Case Against Autodecode

2016-06-02 Thread default0 via Digitalmars-d

On Thursday, 2 June 2016 at 21:51:51 UTC, tsbockman wrote:

On Thursday, 2 June 2016 at 21:38:02 UTC, default0 wrote:

On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote:
1) It does not say that level 2 should be opt-in; it says 
that level 2 should be toggle-able. Nowhere does it say which 
of level 1 and 2 should be the default.


2) It says that working with graphemes is slower than UTF-16 
code UNITS (level 1), but says nothing about streaming 
decoding of code POINTS (what we have).


3) That document is from 2000, and its claims about 
performance are surely extremely out-dated, anyway. Computers 
and the Unicode standard have both changed much since then.


1) Right because a special toggleable syntax is definitely not 
"opt-in".


It is not "opt-in" unless it is toggled off by default. The 
only reason it doesn't talk about toggling in the level 1 
section, is because that section is written with the assumption 
that many programs will *only* support level 1.




*sigh* reading comprehension. Needing to write .byGrapheme or 
similar to enable the behaviour qualifies for what that 
description was arguing for. I hope you understand that now that 
I am repeating this for you.


2) Several people in this thread noted that working on 
graphemes is way slower (which makes sense, because its yet 
another processing you need to do after you decoded - 
therefore more work - therefore slower) than working on code 
points.


And working on code points is way slower than working on code 
units (the actual level 1).




Never claimed the opposite. Do note however that its specifically 
talking about UTF-16 code units.



3) Not an argument - doing more work makes code slower.


What do you think I'm arguing for? It's not 
graphemes-by-default.


Unrelated. I was refuting the point you made about the relevance 
of the performance claims of the unicode level 2 support 
description, not evaluating your hypothetical design. Please do 
not take what I say out of context, thank you.




Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d

On 02.06.2016 23:46, Andrei Alexandrescu wrote:

On 6/2/16 5:43 PM, Timon Gehr wrote:


.̂ ̪.̂

(Copy-paste it somewhere else, I think it might not be rendered
correctly on the forum.)

The point is that if I do:

".̂ ̪.̂".normalize!NFC.byGrapheme.findAmong([Grapheme("."),Grapheme(",")])

no match is returned.

If I use your method with dchars, I will get spurious matches. I.e. the
suggested method to look for punctuation symbols is incorrect:

writeln(".̂ ̪.̂".findAmong(",.")); // ".̂ ̪.̂"


Nice example.
...


Thanks! :o)


(Also, do you have an use case for this?)


Count delimited words. Did you also look at balancedParens?


Andrei



On 02.06.2016 22:01, Timon Gehr wrote:



* s.balancedParens('〈', '〉') works only with autodecoding.
...


Doesn't work, e.g. s="⟨⃖". Shouldn't compile.


assert("⟨⃖".normalize!NFC.byGrapheme.balancedParens(Grapheme("⟨"),Grapheme("⟩")));

writeln("⟨⃖".balancedParens('⟨','⟩')); // false




Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d

On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote:

The lambda returns bool. -- Andrei


Yes, I was wrong about that. But the point still stands with:

> * s.balancedParens('〈', '〉') works only with autodecoding.
> * s.canFind('ö') works only with autodecoding. It returns always false 
without.

Can be made to work without autodecoding.



Re: The Case Against Autodecode

2016-06-02 Thread Walter Bright via Digitalmars-d

On 6/2/2016 1:12 PM, Timon Gehr wrote:

On 02.06.2016 22:07, Walter Bright wrote:

On 6/2/2016 12:05 PM, Andrei Alexandrescu wrote:

* s.all!(c => c == 'ö') works only with autodecoding. It returns
always false
without.


The o is inferred as a wchar. The lamda then is inferred to return a
wchar.


No, the lambda returns a bool.


Thanks for the correction.



The algorithm can check that the input is char[], and is being
tested against a wchar. Therefore, the algorithm can specialize to do
the decoding itself.

No autodecoding necessary, and it does the right thing.


It still would not be the right thing. The lambda shouldn't compile. It is not
meaningful to compare utf-8 and utf-16 code units directly.


Yes, you have a good point. But we do allow things like:

   byte b;
   if (b == 1) ...



Re: The Case Against Autodecode

2016-06-02 Thread tsbockman via Digitalmars-d

On Thursday, 2 June 2016 at 21:38:02 UTC, default0 wrote:

On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote:
1) It does not say that level 2 should be opt-in; it says that 
level 2 should be toggle-able. Nowhere does it say which of 
level 1 and 2 should be the default.


2) It says that working with graphemes is slower than UTF-16 
code UNITS (level 1), but says nothing about streaming 
decoding of code POINTS (what we have).


3) That document is from 2000, and its claims about 
performance are surely extremely out-dated, anyway. Computers 
and the Unicode standard have both changed much since then.


1) Right because a special toggleable syntax is definitely not 
"opt-in".


It is not "opt-in" unless it is toggled off by default. The only 
reason it doesn't talk about toggling in the level 1 section, is 
because that section is written with the assumption that many 
programs will *only* support level 1.


2) Several people in this thread noted that working on 
graphemes is way slower (which makes sense, because its yet 
another processing you need to do after you decoded - therefore 
more work - therefore slower) than working on code points.


And working on code points is way slower than working on code 
units (the actual level 1).



3) Not an argument - doing more work makes code slower.


What do you think I'm arguing for? It's not graphemes-by-default.

What I actually want to see: permanently deprecate the 
auto-decoding range primitives. Force the user to explicitly 
specify whichever of `by!dchar`, `byCodePoint`, or `byGrapheme` 
their specific algorithm actually needs. Removing the implicit 
conversions between `char`, `wchar`, and `dchar` would also be 
nice, but isn't really necessary I think.


That would be a standards-compliant solution (one of several 
possible). What we have now is non-standard, at least going by 
the old version Walter linked.


Re: Areas of D usage

2016-06-02 Thread qznc via Digitalmars-d

On Thursday, 2 June 2016 at 13:59:13 UTC, Seb wrote:
If I left out an area or you miss an application/usage - please 
let me know!


The Javascript JIT Compiler Higgs: 
https://github.com/higgsjs/Higgs


Vibe.d needs some examples. Looks like their website does not 
have any.




Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 6/2/16 5:43 PM, Timon Gehr wrote:


.̂ ̪.̂

(Copy-paste it somewhere else, I think it might not be rendered
correctly on the forum.)

The point is that if I do:

".̂ ̪.̂".normalize!NFC.byGrapheme.findAmong([Grapheme("."),Grapheme(",")])

no match is returned.

If I use your method with dchars, I will get spurious matches. I.e. the
suggested method to look for punctuation symbols is incorrect:

writeln(".̂ ̪.̂".findAmong(",.")); // ".̂ ̪.̂"


Nice example.


(Also, do you have an use case for this?)


Count delimited words. Did you also look at balancedParens?


Andrei


Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 6/2/16 5:38 PM, cym13 wrote:

Allow me to try another angle:

- There are different levels of unicode support and you don't want to
support them all transparently. That's understandable.


Cool.


- The level you choose to support is the code point level. There are
many good arguments about why this isn't a good default but you won't
change your mind. I don't like that at all and I'm not alone but let's
forget the entirety of the vocal D community for a moment.


You mean all 35 of them?

It's not about changing my mind! A massive thing that the code point 
level handling is the incumbent, and that changing it would need to mark 
an absolutely Earth-shattering improvement to be worth it!



- A huge part of unicode chars can be normalized to fit your
definition. That way not everything work (far from it) but a
sufficiently big subset works.


Cool.


- On the other hand without normalization it just doesn't make any
sense from a user perspective.The ö example has clearly shown that
much, you even admitted it yourself by stating that many counter
arguments would have worked had the string been normalized).


Yah, operating at code point level does not come free of caveats. It is 
vastly superior to operating on code units, and did I mention it's the 
incumbent.



- The most proeminent problem is with graphems that can have different
representations as those that can't be normalized can't be searched as
dchars as well.


Yah, I'd say if the program needs graphemes the option is there. Phobos 
by default deals with code points which are not perfect but are 
independent of representation, produce meaningful and consistent results 
with std.algorithm etc.



- If autodecoding to code points is to stay and in an effort to find a
compromise then normalizing should be done by default. Sure it would
take some more time but it wouldn't break any code (I think) and would
actually make things more correct. They still wouldn't be correct but
I feel that something as crazy as unicode cannot be tackled
generically anyway.


Some more work on normalization at strategic points in Phobos would be 
interesting!



Andrei




Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d

On 02.06.2016 23:23, Andrei Alexandrescu wrote:

On 6/2/16 5:19 PM, Timon Gehr wrote:

On 02.06.2016 23:16, Timon Gehr wrote:

On 02.06.2016 23:06, Andrei Alexandrescu wrote:

As the examples show, the examples would be entirely meaningless at
code
unit level.


So far, I needed to count the number of characters 'ö' inside some
string exactly zero times,


(Obviously this isn't even what the example would do. I predict I will
never need to count the number of code points 'ö' by calling some
function from std.algorithm directly.)


You may look for a specific dchar, and it'll work. How about
findAmong("...") with a bunch of ASCII and Unicode punctuation symbols?
-- Andrei




.̂ ̪.̂

(Copy-paste it somewhere else, I think it might not be rendered 
correctly on the forum.)


The point is that if I do:

".̂ ̪.̂".normalize!NFC.byGrapheme.findAmong([Grapheme("."),Grapheme(",")])

no match is returned.

If I use your method with dchars, I will get spurious matches. I.e. the 
suggested method to look for punctuation symbols is incorrect:


writeln(".̂ ̪.̂".findAmong(",.")); // ".̂ ̪.̂"


(Also, do you have an use case for this?)


Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 6/2/16 5:38 PM, deadalnix wrote:

On Thursday, 2 June 2016 at 21:37:11 UTC, Andrei Alexandrescu wrote:

On 6/2/16 5:35 PM, deadalnix wrote:

On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote:

On 6/2/16 5:20 PM, deadalnix wrote:

The good thing when you define works by whatever it does right now


No, it works as it was designed. -- Andrei


Nobody says it doesn't. Everybody says the design is crap.


I think I like it more after this thread. -- Andrei


You start reminding me of the joke with that guy complaining that
everybody is going backward on the highway.


Touché. (Get it?) -- Andrei



Re: non empty slices

2016-06-02 Thread Alex via Digitalmars-d-learn

On Thursday, 2 June 2016 at 20:11:21 UTC, Alex wrote:

On Thursday, 2 June 2016 at 16:21:03 UTC, ag0aep6g wrote:

void f(int[] arr)
{
A a = arrayToA(arr);
foreach (T; A.AllowedTypes)
{
if (T* p = a.peek!T) f_impl(*p);
}
}

You totally hit the point!
The cool thing about the Algebraic is as I expected, that it 
doesn't change it's type... And the hard thing is, that I'm not 
used to its Empty, Many, ... things yet.
But the question remains how to keep this @nogc? I wonder at 
the line with peek... and why it is not just returning the 
value...


Just tried this instead of your f-function:
void f(int[] arr)
{
A result;
import std.meta;
alias TL = AliasSeq!(Empty, int, Many!int);
int caseS;
switch (arr.length)
{
case 0: result = Empty.init; caseS = 0; break;
case 1: result = arr[0]; caseS = 1;  break;
default: result = Many!int(arr); caseS = 2;
}
f_impl(*result.get!(TL[caseS]));
}
But got: Error: variable caseS cannot be read at compile time
which is obviously true...


Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 6/2/16 5:37 PM, Andrei Alexandrescu wrote:

On 6/2/16 5:35 PM, deadalnix wrote:

On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote:

On 6/2/16 5:20 PM, deadalnix wrote:

The good thing when you define works by whatever it does right now


No, it works as it was designed. -- Andrei


Nobody says it doesn't. Everybody says the design is crap.


I think I like it more after this thread. -- Andrei


Meh, thinking of it again: I don't like it more, I'd still do it 
differently given a clean slate (viz. RCStr). But let's say I didn't get 
many compelling reasons to remove autodecoding from this thread. -- Andrei




Re: The Case Against Autodecode

2016-06-02 Thread cym13 via Digitalmars-d
On Thursday, 2 June 2016 at 20:29:48 UTC, Andrei Alexandrescu 
wrote:

On 06/02/2016 04:22 PM, cym13 wrote:


A:“We should decode to code points”
B:“No, decoding to code points is a stupid idea.”
A:“No it's not!”
B:“Can you show a concrete example where it does something 
useful?”

A:“Sure, look at that!”
B:“This isn't working at all, look at all those 
counter-examples!”

A:“It may not work for your examples but look how easy it is to
find code points!”


With autodecoding all of std.algorithm operates correctly on 
code points. Without it all it does for strings is gibberish. 
-- Andrei


Allow me to try another angle:

- There are different levels of unicode support and you don't 
want to

support them all transparently. That's understandable.

- The level you choose to support is the code point level. There 
are
many good arguments about why this isn't a good default but you 
won't
change your mind. I don't like that at all and I'm not alone but 
let's

forget the entirety of the vocal D community for a moment.

- A huge part of unicode chars can be normalized to fit your
definition. That way not everything work (far from it) but a
sufficiently big subset works.

- On the other hand without normalization it just doesn't make any
sense from a user perspective.The ö example has clearly shown that
much, you even admitted it yourself by stating that many counter
arguments would have worked had the string been normalized).

- The most proeminent problem is with graphems that can have 
different
representations as those that can't be normalized can't be 
searched as

dchars as well.

- If autodecoding to code points is to stay and in an effort to 
find a
compromise then normalizing should be done by default. Sure it 
would
take some more time but it wouldn't break any code (I think) and 
would
actually make things more correct. They still wouldn't be correct 
but

I feel that something as crazy as unicode cannot be tackled
generically anyway.



Re: The Case Against Autodecode

2016-06-02 Thread deadalnix via Digitalmars-d
On Thursday, 2 June 2016 at 21:37:11 UTC, Andrei Alexandrescu 
wrote:

On 6/2/16 5:35 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu 
wrote:

On 6/2/16 5:20 PM, deadalnix wrote:
The good thing when you define works by whatever it does 
right now


No, it works as it was designed. -- Andrei


Nobody says it doesn't. Everybody says the design is crap.


I think I like it more after this thread. -- Andrei


You start reminding me of the joke with that guy complaining that 
everybody is going backward on the highway.




Re: The Case Against Autodecode

2016-06-02 Thread default0 via Digitalmars-d

On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote:

On Thursday, 2 June 2016 at 21:07:19 UTC, default0 wrote:
The level 2 support description noted that it should be opt-in 
because its slow.


1) It does not say that level 2 should be opt-in; it says that 
level 2 should be toggle-able. Nowhere does it say which of 
level 1 and 2 should be the default.


2) It says that working with graphemes is slower than UTF-16 
code UNITS (level 1), but says nothing about streaming decoding 
of code POINTS (what we have).


3) That document is from 2000, and its claims about performance 
are surely extremely out-dated, anyway. Computers and the 
Unicode standard have both changed much since then.


1) Right because a special toggleable syntax is definitely not 
"opt-in".
2) Several people in this thread noted that working on graphemes 
is way slower (which makes sense, because its yet another 
processing you need to do after you decoded - therefore more work 
- therefore slower) than working on code points.
3) Not an argument - doing more work makes code slower. The only 
thing that changes is what specific operations have what cost 
(for instance, memory access has a much higher cost now than it 
had then). Considering the way the process works and judging from 
what others in this thread have said about it, I will stick with 
"always decoding to graphemes for all operations is very slow" 
and indulge in being too lazy to write benchmarks for it to show 
just how bad it is.


Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 6/2/16 5:35 PM, deadalnix wrote:

On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote:

On 6/2/16 5:20 PM, deadalnix wrote:

The good thing when you define works by whatever it does right now


No, it works as it was designed. -- Andrei


Nobody says it doesn't. Everybody says the design is crap.


I think I like it more after this thread. -- Andrei


Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 6/2/16 5:35 PM, ag0aep6g wrote:

On 06/02/2016 11:27 PM, Andrei Alexandrescu wrote:

On 6/2/16 5:24 PM, ag0aep6g wrote:

On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote:

Nope, that's a radically different matter. As the examples show, the
examples would be entirely meaningless at code unit level.


They're simply not possible. Won't compile.


They do compile.


Yes, you're right, of course they do. char implicitly converts to dchar.
I didn't think of that anti-feature.


As I said: this thread produces an unpleasant amount of arguments in
favor of autodecoding. Even I don't like that :o).


It's more of an argument against char : dchar, I'd say.


I do think that's an interesting option in PL design space, but that 
would be super disruptive. -- Andrei


Re: The Case Against Autodecode

2016-06-02 Thread deadalnix via Digitalmars-d
On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu 
wrote:

On 6/2/16 5:20 PM, deadalnix wrote:
The good thing when you define works by whatever it does right 
now


No, it works as it was designed. -- Andrei


Nobody says it doesn't. Everybody says the design is crap.


Re: The Case Against Autodecode

2016-06-02 Thread ag0aep6g via Digitalmars-d

On 06/02/2016 11:27 PM, Andrei Alexandrescu wrote:

On 6/2/16 5:24 PM, ag0aep6g wrote:

On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote:

Nope, that's a radically different matter. As the examples show, the
examples would be entirely meaningless at code unit level.


They're simply not possible. Won't compile.


They do compile.


Yes, you're right, of course they do. char implicitly converts to dchar. 
I didn't think of that anti-feature.



As I said: this thread produces an unpleasant amount of arguments in
favor of autodecoding. Even I don't like that :o).


It's more of an argument against char : dchar, I'd say.


Re: The Case Against Autodecode

2016-06-02 Thread tsbockman via Digitalmars-d

On Thursday, 2 June 2016 at 21:07:19 UTC, default0 wrote:
The level 2 support description noted that it should be opt-in 
because its slow.


1) It does not say that level 2 should be opt-in; it says that 
level 2 should be toggle-able. Nowhere does it say which of level 
1 and 2 should be the default.


2) It says that working with graphemes is slower than UTF-16 code 
UNITS (level 1), but says nothing about streaming decoding of 
code POINTS (what we have).


3) That document is from 2000, and its claims about performance 
are surely extremely out-dated, anyway. Computers and the Unicode 
standard have both changed much since then.




Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 6/2/16 5:27 PM, Andrei Alexandrescu wrote:

On 6/2/16 5:24 PM, ag0aep6g wrote:

Just like there is no single code point for 'a⃗' so you can't
search for it in a range of code points.


Of course you can.


Correx, indeed you can't. -- Andrei


Re: D's Auto Decoding and You

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d-announce

On 6/2/16 5:27 PM, Steven Schveighoffer wrote:

On 6/2/16 5:21 PM, jmh530 wrote:

On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:


If you think there should be any more information included in the
article, please let me know so I can add it.


I was a little confused by something in the main autodecoding thread, so
I read your article again. Unfortunately, I don't think my confusion is
resolved. I was trying one of your examples (full code I used below).
You claim it works, but I keep getting assertion failures. I'm just
running it with rdmd on Windows 7.


import std.algorithm : canFind;

void main()
{
string s = "cassé";

assert(s.canFind!(x => x == 'é'));
}


If that é above is an e followed by a combining character, then you will
get the error. This is because autodecoding does not auto normalize as
well -- the code points have to match exactly.

-Steve


Indeed. FWIW I just copied OP's code from Thunderbird into Chrome (on 
OSX) and it worked: https://dpaste.dzfl.pl/09b9188d87a5


Should I assume some normalization occurred on the way?


Andrei



Re: D's Auto Decoding and You

2016-06-02 Thread Steven Schveighoffer via Digitalmars-d-announce

On 6/2/16 5:21 PM, jmh530 wrote:

On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:


If you think there should be any more information included in the
article, please let me know so I can add it.


I was a little confused by something in the main autodecoding thread, so
I read your article again. Unfortunately, I don't think my confusion is
resolved. I was trying one of your examples (full code I used below).
You claim it works, but I keep getting assertion failures. I'm just
running it with rdmd on Windows 7.


import std.algorithm : canFind;

void main()
{
string s = "cassé";

assert(s.canFind!(x => x == 'é'));
}


If that é above is an e followed by a combining character, then you 
will get the error. This is because autodecoding does not auto normalize 
as well -- the code points have to match exactly.


-Steve


Re: D's Auto Decoding and You

2016-06-02 Thread Jack Stouffer via Digitalmars-d-announce

On Thursday, 2 June 2016 at 21:21:50 UTC, jmh530 wrote:
I was a little confused by something in the main autodecoding 
thread, so I read your article again. Unfortunately, I don't 
think my confusion is resolved. I was trying one of your 
examples (full code I used below). You claim it works, but I 
keep getting assertion failures. I'm just running it with rdmd 
on Windows 7.



import std.algorithm : canFind;

void main()
{
string s = "cassé";

assert(s.canFind!(x => x == 'é'));
}


Your browser is turning the é in the string into two code points 
via normalization whereas it should be one. Try using \u00E9 
instead.


Re: Why does DMD on Debian need xdg-utils

2016-06-02 Thread Mathias Lang via Digitalmars-d
It shouldn't be necessary. I believe that is because of `dmd -man`, which
open a web browser.

That's an apt-d issue (in hopefully Jordi Sayol will read this) which
prevents using this repository if your machine has no X (I guess you
discovered that on a server, as I did).

2016-06-02 20:17 GMT+02:00 ZombineDev via Digitalmars-d <
digitalmars-d@puremagic.com>:

> On Thursday, 2 June 2016 at 18:04:43 UTC, flamencofantasy wrote:
>
>> On Thursday, 2 June 2016 at 17:54:07 UTC, ZombineDev wrote:
>>
>>> On Thursday, 2 June 2016 at 17:36:46 UTC, flamencofantasy wrote:
>>>
 DMD on debian depends on the xdg-utils package.

 When I install xdg-utils I get many more packages (see bottom of
 message).

 Is that really necessary?

 Thanks.

>>>
>>> It shouldn't be necessary. It's probably a packaging issue. Meanwhile,
>>> you can try the install.sh script on listed on http://dlang.org/download.
>>> It shouldn't have any unnecessary dependencies.
>>>
>>
>> Thanks, but does the script handle upgrades?
>>
>
> Yes. It also supports multiple versions side by side and also installs
> dub. You can find the source here:
> https://github.com/dlang/installer/blob/master/script/install.sh
>
> Example usage:
>
> # Install dmd 2.70.0
> ~/dlang/install.sh install dmd-2.70.0
>
> # Install dmd 2.69.0
> ~/dlang/install.sh install dmd-2.69.0
>
> # start using version 2.70.0
> activate ~/dlang/dmd-2.70.0
>
> # stop using version 2.70.0
> deactivate
>
> # start using version 2.69.0
> activate ~/dlang/dmd-2.69.0
>
> # stop using version 2.69.0
> deactivate
>
> # uninstall version 2.69.0
> ~/dlang/install.sh uninstall dmd-2.69.0
>
> # removes everything installed so far
> rm -rf ~/dlang
>
> # downloads (again) the install script and
> # installs the latest stable version of the compiler.
> curl -fsS https://dlang.org/install.sh | bash -s dmd
>
>


Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d

On 02.06.2016 22:51, Andrei Alexandrescu wrote:

On 06/02/2016 04:50 PM, Timon Gehr wrote:

On 02.06.2016 22:28, Andrei Alexandrescu wrote:

On 06/02/2016 04:12 PM, Timon Gehr wrote:

It is not meaningful to compare utf-8 and utf-16 code units directly.


But it is meaningful to compare Unicode code points. -- Andrei



It is also meaningful to compare two utf-8 code units or two utf-16 code
units.


By decoding them of course. -- Andrei



That makes no sense, I cannot decode single code units.

BTW, I guess the reason why char converts to wchar converts to dchar is 
that the lower half of code units in char and the lower half of code 
units in wchar are code points. Maybe code units and code points with 
low numerical values should have distinct types.


Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 6/2/16 5:20 PM, deadalnix wrote:

The good thing when you define works by whatever it does right now


No, it works as it was designed. -- Andrei


Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 6/2/16 5:23 PM, Timon Gehr wrote:

On 02.06.2016 22:51, Andrei Alexandrescu wrote:

On 06/02/2016 04:50 PM, Timon Gehr wrote:

On 02.06.2016 22:28, Andrei Alexandrescu wrote:

On 06/02/2016 04:12 PM, Timon Gehr wrote:

It is not meaningful to compare utf-8 and utf-16 code units directly.


But it is meaningful to compare Unicode code points. -- Andrei



It is also meaningful to compare two utf-8 code units or two utf-16 code
units.


By decoding them of course. -- Andrei



That makes no sense, I cannot decode single code units.

BTW, I guess the reason why char converts to wchar converts to dchar is
that the lower half of code units in char and the lower half of code
units in wchar are code points. Maybe code units and code points with
low numerical values should have distinct types.


Then you lost me. (I'm sure you're making a good point.) -- Andrei


Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d

On 02.06.2016 23:20, deadalnix wrote:


The sample code won't count the instance of the grapheme 'ö' as some of
its encoding won't be counted, which definitively count as doesn't work.


It also has false positives (you can combine 'ö' with some combining 
character in order to get some strange character that is not an 'ö', and 
not even NFC helps with that).


Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 6/2/16 5:24 PM, ag0aep6g wrote:

On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote:

Nope, that's a radically different matter. As the examples show, the
examples would be entirely meaningless at code unit level.


They're simply not possible. Won't compile.


They do compile.


There is no single UTF-8
code unit for 'ö', so you can't (easily) search for it in a range for
code units.


Of course you can. Can you search for an int in a short[]? Oh yes you 
can. Can you search for a dchar in a char[]? Of course you can. 
Autodecoding also gives it meaning.



Just like there is no single code point for 'a⃗' so you can't
search for it in a range of code points.


Of course you can.


You can still search for 'a', and 'o', and the rest of ASCII in a range
of code units.


You can search for a dchar in a char[] because you can compare an 
individual dchar with either another dchar (correct, autodecoding) or 
with a char (incorrect, no autodecoding).


As I said: this thread produces an unpleasant amount of arguments in 
favor of autodecoding. Even I don't like that :o).



Andrei



Re: The Case Against Autodecode

2016-06-02 Thread ag0aep6g via Digitalmars-d

On 06/02/2016 11:24 PM, ag0aep6g wrote:

They're simply not possible. Won't compile. There is no single UTF-8
code unit for 'ö', so you can't (easily) search for it in a range for
code units. Just like there is no single code point for 'a⃗' so you can't
search for it in a range of code points.

You can still search for 'a', and 'o', and the rest of ASCII in a range
of code units.


I'm ignoring combining characters there. You can search for 'a' in code 
units in the same way that you can search for 'ä' in code points. I.e., 
more or less, depending on how serious you are about combining characters.


Re: The Case Against Autodecode

2016-06-02 Thread deadalnix via Digitalmars-d

On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote:

On 6/2/2016 12:34 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu 
wrote:
Pretty much everything. Consider s and s1 string variables 
with possibly

different encodings (UTF8/UTF16).

* s.all!(c => c == 'ö') works only with autodecoding. It 
returns always false

without.



False. Many characters can be represented by different 
sequences of codepoints.
For instance, ê can be ê as one codepoint or ^ as a modifier 
followed by e. ö is

one such character.


There are 3 levels of Unicode support. What Andrei is talking 
about is Level 1.


http://unicode.org/reports/tr18/tr18-5.1.html

I wonder what rationale there is for Unicode to have two 
different sequences of codepoints be treated as the same. It's 
madness.


To be able to convert back and forth from/to unicode in a 
lossless manner.




Re: The Case Against Autodecode

2016-06-02 Thread ag0aep6g via Digitalmars-d

On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote:

Nope, that's a radically different matter. As the examples show, the
examples would be entirely meaningless at code unit level.


They're simply not possible. Won't compile. There is no single UTF-8 
code unit for 'ö', so you can't (easily) search for it in a range for 
code units. Just like there is no single code point for 'a⃗' so you can't 
search for it in a range of code points.


You can still search for 'a', and 'o', and the rest of ASCII in a range 
of code units.


Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 6/2/16 5:19 PM, Timon Gehr wrote:

On 02.06.2016 23:16, Timon Gehr wrote:

On 02.06.2016 23:06, Andrei Alexandrescu wrote:

As the examples show, the examples would be entirely meaningless at code
unit level.


So far, I needed to count the number of characters 'ö' inside some
string exactly zero times,


(Obviously this isn't even what the example would do. I predict I will
never need to count the number of code points 'ö' by calling some
function from std.algorithm directly.)


You may look for a specific dchar, and it'll work. How about 
findAmong("...") with a bunch of ASCII and Unicode punctuation symbols? 
-- Andrei





Re: year to date pull statistics (week ending 2016-05-28)

2016-06-02 Thread Seb via Digitalmars-d

On Thursday, 2 June 2016 at 18:36:02 UTC, Basile B. wrote:

On Tuesday, 31 May 2016 at 23:48:00 UTC, Brad Roberts wrote:

[...]


You should take Jack Stouffer in dlang ;) . Perso I think that 
in the phobos the problem is that the people who should manage 
it are not enough available.


I am fully for that - Jack has been doing a great job lately at 
cleaning up & reviewing Phobos. He has more than earned his 
promotion!


Re: The Case Against Autodecode

2016-06-02 Thread deadalnix via Digitalmars-d
On Thursday, 2 June 2016 at 20:13:52 UTC, Andrei Alexandrescu 
wrote:

On 06/02/2016 03:34 PM, deadalnix wrote:
On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu 
wrote:
Pretty much everything. Consider s and s1 string variables 
with

possibly different encodings (UTF8/UTF16).

* s.all!(c => c == 'ö') works only with autodecoding. It 
returns

always false without.



False.


True. "Are all code points equal to this one?" -- Andrei


The good thing when you define works by whatever it does right 
now, it is that everything always works and there are literally 
never any bug. The bad thing is that this is a completely useless 
definition of work.


The sample code won't count the instance of the grapheme 'ö' as 
some of its encoding won't be counted, which definitively count 
as doesn't work.


When your point need to redefine words in ways that nobody agree 
with, it is time to admit the point is bogus.




Re: D's Auto Decoding and You

2016-06-02 Thread jmh530 via Digitalmars-d-announce

On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:


If you think there should be any more information included in 
the article, please let me know so I can add it.


I was a little confused by something in the main autodecoding 
thread, so I read your article again. Unfortunately, I don't 
think my confusion is resolved. I was trying one of your examples 
(full code I used below). You claim it works, but I keep getting 
assertion failures. I'm just running it with rdmd on Windows 7.



import std.algorithm : canFind;

void main()
{
string s = "cassé";

assert(s.canFind!(x => x == 'é'));
}


Re: Blocking points for further D adoption

2016-06-02 Thread Jack Stouffer via Digitalmars-d

On Thursday, 2 June 2016 at 21:01:53 UTC, Jacob Carlborg wrote:
Don't you have that issue with most stuff. Not everything can 
fit everyone's need.


Sure, it's a sliding scale. But, web servers, even ones that sit 
behind Apache or Nginx, are specialized much more than what we 
currently have in Phobos.


It would make more sense from a maintenance standpoint to have a 
toy server, but I don't see the utility of including one in 
Phobos over just having it in dub.




Re: The Case Against Autodecode

2016-06-02 Thread Timon Gehr via Digitalmars-d

On 02.06.2016 23:06, Andrei Alexandrescu wrote:

As the examples show, the examples would be entirely meaningless at code
unit level.


So far, I needed to count the number of characters 'ö' inside some 
string exactly zero times, but I wanted to chain or join strings 
relatively often.


Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 6/2/16 5:05 PM, tsbockman wrote:

On Thursday, 2 June 2016 at 20:56:26 UTC, Walter Bright wrote:

What is supposed to be done with "do not merge" PRs other than close
them?


Occasionally people need to try something on the auto tester (not sure
if that's relevant to that particular PR, though). Presumably if someone
marks their own PR as "do not merge", it means they're planning to
either close it themselves after it has served its purpose, or they plan
to fix/finish it and then remove the "do not merge" label.


Feel free to reopen if it helps, it wasn't closed in anger. -- Andrei



Re: The Case Against Autodecode

2016-06-02 Thread default0 via Digitalmars-d

On Thursday, 2 June 2016 at 20:52:29 UTC, ag0aep6g wrote:

On 06/02/2016 10:36 PM, Andrei Alexandrescu wrote:
By whom? The "support level 1" folks yonder at the Unicode 
standard? :o)

-- Andrei


Do they say that level 1 should be the default, and do they 
give a rationale for that? Would you kindly link or quote that?


The level 2 support description noted that it should be opt-in 
because its slow.
Arguably it should be easier to operate on code units if you know 
its safe to do so, but either always working on code units or 
always working on graphemes as the default seems to be either too 
broken too often or too slow too often.


Now one can argue either consistency for code units (because then 
we can treat char[] and friends as a slice) or correctness for 
graphemes but really the more I think about it the more I think 
there is no good default and you need to learn unicode anyways. 
The only sad parts here are that 1) we hijacked an array type for 
strings, which sucks and 2) that we dont have an api that is 
actually good at teaching the user what it does and doesnt do.


The consequence of 1 is that generic code that also wants to deal 
with strings will want to special-case to get rid of 
auto-decoding, the consequence of 2 is that we will have tons of 
not-actually-correct string handling.
I would assume that almost all string handling code that is out 
in the wild is broken anyways (in code I have encountered I have 
never seen attempts to normalize or do other things before or 
after comparisons, searching, etc), unless of course, YOU or one 
of your colleagues wrote it (consider that checking the length of 
a string in Java or C# to validate it is no longer than X 
characters is often done and wrong, because .Length is the number 
of UTF-16 code units in those languages) :o)


So really as bad and alarming as "incorrect string handling" by 
default seems, it in practice of other languages that get used 
way more than D has not prevented people from writing working 
(internationalized!) applications in those languages.
One could say we should do it better than them, but I would be 
inclined to believe that RCStr provides our opportunity to do so. 
Having char[] be what it is is an annoying wart, and maybe at 
some point we can deprecate/remove that behaviour, but for now Id 
rather see if RCStr is viable than attempt to change semantics of 
all string handling code in D.


Re: The Case Against Autodecode

2016-06-02 Thread tsbockman via Digitalmars-d

On Thursday, 2 June 2016 at 20:56:26 UTC, Walter Bright wrote:
What is supposed to be done with "do not merge" PRs other than 
close them?


Occasionally people need to try something on the auto tester (not 
sure if that's relevant to that particular PR, though). 
Presumably if someone marks their own PR as "do not merge", it 
means they're planning to either close it themselves after it has 
served its purpose, or they plan to fix/finish it and then remove 
the "do not merge" label.


Either way, they shouldn't be closed just because they say "do 
not merge" (unless they're abandoned or something, obviously).


Re: The Case Against Autodecode

2016-06-02 Thread Andrei Alexandrescu via Digitalmars-d

On 6/2/16 5:01 PM, ag0aep6g wrote:

On 06/02/2016 10:50 PM, Andrei Alexandrescu wrote:

It does not fall apart for code points.


Yes it does. You've been given plenty examples where it falls apart.


There weren't any.


Your answer to that was that it operates on code points, not graphemes.


That is correct.


Well, duh. Comparing UTF-8 code units against each other works, too.
That's not an argument for doing that by default.


Nope, that's a radically different matter. As the examples show, the 
examples would be entirely meaningless at code unit level.



Andrei



Re: Phobos needs a (part-time) maintainer

2016-06-02 Thread qznc via Digitalmars-d

On Thursday, 2 June 2016 at 20:59:52 UTC, Basile B. wrote:
Eventually I'll come back to bugfix if they take Jake, but not 
you Seb.

For a reason or another I don't like you wilzbach.


You are frustrated. I get that.

Don't make this personal for others, please. Maybe you should 
ignore this thread for today?


Re: Blocking points for further D adoption

2016-06-02 Thread Jacob Carlborg via Digitalmars-d

On 2016-06-02 20:14, Jack Stouffer wrote:


Just to be clear, it's not a good idea to have a full blown server in
your stdlib. Non-toy web servers are complicated pieces of software
involving > 10KLOC. Not only that, but there are many ways to skin a cat
in this field. Different products need varying, sometimes mutually
exclusive, features from their servers.

Therefore, I don't web servers are good candidates for standardization.


Don't you have that issue with most stuff. Not everything can fit 
everyone's need. I have never used std.bigint but it still present in 
Phobos because it's useful from someone.


I agree with the complexity of web servers but they don't need to handle 
all the gory details off clients not following the protocol. I would 
think it works perfectly fine for non-public facing servers. For public 
facing servers it should sit behind a well test well understood 
implementation like Apache or nginx, regardless if the implementation is 
in Go, Node.js or D.


--
/Jacob Carlborg


Re: The Case Against Autodecode

2016-06-02 Thread ag0aep6g via Digitalmars-d

On 06/02/2016 10:50 PM, Andrei Alexandrescu wrote:

It does not fall apart for code points.


Yes it does. You've been given plenty examples where it falls apart. 
Your answer to that was that it operates on code points, not graphemes. 
Well, duh. Comparing UTF-8 code units against each other works, too. 
That's not an argument for doing that by default.


Re: The Case Against Autodecode

2016-06-02 Thread tsbockman via Digitalmars-d
On Thursday, 2 June 2016 at 20:49:52 UTC, Andrei Alexandrescu 
wrote:

On 06/02/2016 04:47 PM, tsbockman wrote:
That doesn't sound like much of an endorsement for defaulting 
to only
level 1 support to me - "it does not handle more complex 
languages or

extensions to the Unicode Standard very well".


Code point/Level 1 support sounds like a sweet spot between 
efficiency/complexity and conviviality. Level 2 is opt-in with 
byGrapheme. -- Andrei


Actually, according to the document Walter Bright linked level 1 
does NOT operate at the code point level:


Level 1: Basic Unicode Support. At this level, the regular 
expression engine provides support for Unicode characters as 
basic 16-bit logical units. (This is independent of the actual 
serialization of Unicode as UTF-8, UTF-16BE, UTF-16LE, or 
UTF-32.)

...
Level 1 support works well in many circumstances. However, it 
does not handle more complex languages or extensions to the 
Unicode Standard very well. Particularly important cases are 
**surrogates** ...


So, level 1 appears to be UTF-16 code units, not code points. To 
do code points it would have to recognize surrogates, which are 
specifically mentioned as not supported.


Level 2 skips straight to graphemes, and there is no code point 
level.


However, this document is very old - from Unicode 3.0 and the 
year 2000:


While there are no surrogate characters in Unicode 3.0 (outside 
of private use characters), future versions of Unicode will 
contain them...


Perhaps level 1 has since been redefined?


Re: Phobos needs a (part-time) maintainer

2016-06-02 Thread Basile B. via Digitalmars-d

On Thursday, 2 June 2016 at 20:23:37 UTC, Seb wrote:
On Thursday, 2 June 2016 at 20:17:32 UTC, Andrei Alexandrescu 
wrote:

On 06/02/2016 03:41 PM, Basile B. wrote:
Once a pr gets the label "@andrei". It basically means that 
"it's dead".


You mean @andralex? You are right. I am sorry, I'm coming off 
an unprecedently busy spring spent mostly evangelizing D at 
various conferences, or doing contract work that will pour 
money in the Foundation's coffers. This is not work I can 
delegate, but is poised to have great impact (more on that 
later). I thought leaving Facebook would free my time, but 
things have gotten really crazily busy. And look at me - I 
spend most of my time on the autodecoding thread.


Andrei


Can't we have someone that can dedicate a fixed amount of his 
professional time to maintain the D infrastructure?
There is so much to do - reviewing and categorizing PRs is just 
the tip of the ice berg.


Ideally it would be a full-time position, but if a company 
would dedicate 20% of an employees time to start be able to 
contribute to D, that would be an awesome step forward.


Eventually I'll come back to bugfix if they take Jake, but not 
you Seb.

For a reason or another I don't like you wilzbach.


  1   2   3   4   >