Re: [OT] Re: First Impressions!

2017-12-04 Thread Joakim via Digitalmars-d
On Monday, 4 December 2017 at 21:23:51 UTC, Andrei Alexandrescu 
wrote:

On 12/2/17 5:16 PM, Joakim wrote:
Yep, that's why five years back many of the major Chinese 
sites were still not using UTF-8:


http://xahlee.info/w/what_encoding_do_chinese_websites_use.html

That led that Chinese guy to also rant against UTF-8 a couple 
years ago:


http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html


BTW has anyone been in contact with Xah Lee? Perhaps we could 
commission him to write some tutorial material for D. -- Andrei


I traded email with him last summer, emailed you his email 
address just now.


[OT] Re: First Impressions!

2017-12-04 Thread Andrei Alexandrescu via Digitalmars-d

On 12/2/17 5:16 PM, Joakim wrote:
Yep, that's why five years back many of the major Chinese sites were 
still not using UTF-8:


http://xahlee.info/w/what_encoding_do_chinese_websites_use.html

That led that Chinese guy to also rant against UTF-8 a couple years ago:

http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html


BTW has anyone been in contact with Xah Lee? Perhaps we could commission 
him to write some tutorial material for D. -- Andrei


Re: First Impressions!

2017-12-04 Thread Kagamin via Digitalmars-d

On Sunday, 3 December 2017 at 01:59:58 UTC, H. S. Teoh wrote:
Still, it betrays the emperor's invisible clothes of the 
"graphics == intuitive" mantra -- you still have to learn the 
icons just like you have to learn the keywords of a text-based 
UI, before you can use the software effectively.


What happened when you ran vi for the first time?


Re: First Impressions!

2017-12-04 Thread Steven Schveighoffer via Digitalmars-d

On 12/2/17 11:28 PM, Walter Bright wrote:

On 12/2/2017 5:59 PM, H. S. Teoh wrote:

[...]


Even worse, companies go and copyright their icons, guaranteeing they 
have to be substantially different for every company!


I like this site for icons. Only requires you to reference them in your 
about box:


https://icons8.com/

-Steve


Re: First Impressions!

2017-12-03 Thread Patrick Schluter via Digitalmars-d

On Saturday, 2 December 2017 at 22:16:09 UTC, Joakim wrote:

On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:
On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via 
Digitalmars-d wrote:

On 11/30/2017 9:23 AM, Kagamin wrote:
> On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki 
> cattermole wrote:
> > Be aware Microsoft is alone in thinking that UTF-16 was 
> > awesome. Everybody else standardized on UTF-8 for Unicode.
> 
> UCS2 was awesome. UTF-16 is used by Java, JavaScript, 
> Objective-C, Swift, Dart and ms tech, which is 28% of tiobe 
> index.


"was" :-) Those are pretty much pre-surrogate pair designs, 
or based

on them (Dart compiles to JavaScript, for example).

UCS2 has serious problems:

1. Most strings are in ascii, meaning UCS2 doubles memory 
consumption. Strings in the executable file are twice the 
size.


This is not true in Asia, esp. where the CJK block is 
extensively used. A CJK block character is 3 bytes in UTF-8, 
meaning that string sizes are 150% of the UCS2 encoding.  If 
your code contains a lot of CJK text, that's a lot of bloat.


Yep, that's why five years back many of the major Chinese sites 
were still not using UTF-8:


http://xahlee.info/w/what_encoding_do_chinese_websites_use.html


Summary

Taiwan sites almost all use UTF-8. Very old ones still use BIG5.

Mainland China sites mostly still use GBK or GB2312, but a few 
newer ones use UTF-8.


Many top Japan, Korea, sites also use UTF-8, but some uses EUC 
(Extended Unix Code) variants.


This probably means that UTF-8 might dominate in the future.

mmmh


That led that Chinese guy to also rant against UTF-8 a couple 
years ago:


http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html


A rant from someone reproaching a video it doesn't provide 
reasons why utf-8 is good by not providing any reasons why utf-8 
is bad. I'm not denying the issues with utf-8, only that the 
ranter doesn't provide any useful info on what the issues the 
"Asian" encounter with it, besides legacy reasons (which are 
important but do not enter in judging the technical quality of an 
encoding).
Add to that that he advocates for GB18030 which is quite inferior 
to utf-8 except in the legacy support area (here some of the 
advantages of utf-8 that GB-18030 does not possess: 
auto-synchronization, algorithmic mapping of codepoints, error 
detection).
If his only beef with utf-8 is the size for CJK text then he 
shouldn't argue for UTF-32 as he seems to do at the end.


Re: First Impressions!

2017-12-02 Thread Walter Bright via Digitalmars-d

On 12/2/2017 5:59 PM, H. S. Teoh wrote:

[...]


Even worse, companies go and copyright their icons, guaranteeing they have to be 
substantially different for every company!


If there ever was an Emperor's New Clothes, it's icons and emojis.


Re: First Impressions!

2017-12-02 Thread H. S. Teoh via Digitalmars-d
On Sat, Dec 02, 2017 at 02:20:10AM -0800, Walter Bright via Digitalmars-d wrote:
[...]
> My car has a bunch emoticons labeling the controls. I can't figure out
> what any of them do without reading the manual, or just pushing random
> buttons until what I want happens. One button has an icon on it that
> looks like a snowflake. What does that do? Turn on the A/C? Defrost
> the frosty windows?  Set the AWD in slippery mode? Turn on the
> Christmas lights?

The same can be argued for the icon mania started by the GUI craze in
the 90's that has now become the de facto standard.  Some icons are more
obvious than others, but nowadays GUI toolbars are full of inscrutible
icons of unclear meaning that are basically opaque unless you already
have prior knowledge of what they're supposed to represent. Thankfully
most(?) GUI programs have enough sanity left to provide tooltips with
textual labels for what each button means.  Still, it betrays the
emperor's invisible clothes of the "graphics == intuitive" mantra -- you
still have to learn the icons just like you have to learn the keywords
of a text-based UI, before you can use the software effectively.

Reminds me also of the infamous Mystery Meat navigation style of the
90's, where people would use images for navigation weblinks on their
website, that you basically don't know where they're linking to until
you click on it.

This is why I think GUIs and the whole "desktop metaphor" craze is
heading the wrong direction, and why 95% of my computer usage is via a
text terminal. There's a place for graphical interfaces, but it's gone
too far these days.

But thanks to Unicode emoticons, we can now have icons on my text
terminal too, isn't that just wonderful?! Esp. when a
missing/incompatible font causes them to show up as literal blank boxes.
The power of a standardized, universal character set, lemme tell ya!


T

-- 
Almost all proofs have bugs, but almost all theorems are true. -- Paul Pedersen


Re: First Impressions!

2017-12-02 Thread codephantom via Digitalmars-d

On Sunday, 3 December 2017 at 01:11:14 UTC, codephantom wrote:


but my wider point is, unicode emoji's are useless if they only 
contain those that 'some' consider to be polictically correct, 
or socially acceptable.


The Unicode consortium is a bunch of ...   (I don't have the 
unicode emoji representation yet to complete that sentence).


btw. Good article here, further demonstrating my point..

"We're talking about engineers that are concerned about standards 
and internationalization issues who now have to do something more 
in line with Apple or Google's marketing teams,".


https://www.buzzfeed.com/charliewarzel/thanks-to-apples-influence-youre-not-getting-a-rifle-emoji



Re: First Impressions!

2017-12-02 Thread codephantom via Digitalmars-d
On Saturday, 2 December 2017 at 16:44:56 UTC, Ola Fosheim Grøstad 
wrote:

On Saturday, 2 December 2017 at 12:25:22 UTC, codephantom wrote:
Do the people on the unicode consortium consider such 
communication to be invalid?


https://splinternews.com/violent-emoji-are-starting-to-get-people-in-trouble-wit-1793845130

On the other hand try to google "emoji sexual"…


No. Humans never express negative emotions, and also, never 
communicate a desire to have sex. That's explains a lot about the 
unicode consortium. 's', 'e', 'x' is ok, just not together.


Q.What's the difference between a politician and an emoji?

A.Nothing. You cannot take either at face value.

..oophs. politics again. I should know better.

but my wider point is, unicode emoji's are useless if they only 
contain those that 'some' consider to be polictically correct, or 
socially acceptable.


The Unicode consortium is a bunch of ...   (I don't have the 
unicode emoji representation yet to complete that sentence).




Re: First Impressions!

2017-12-02 Thread Walter Bright via Digitalmars-d

On 11/30/2017 10:07 PM, Patrick Schluter wrote:

endianness


Yeah, I forgot to mention that one. As if anyone remembers to put in the Byte 
Order Mark :-(


Re: First Impressions!

2017-12-02 Thread Joakim via Digitalmars-d

On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:
On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via 
Digitalmars-d wrote:

On 11/30/2017 9:23 AM, Kagamin wrote:
> On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki 
> cattermole wrote:
> > Be aware Microsoft is alone in thinking that UTF-16 was 
> > awesome. Everybody else standardized on UTF-8 for Unicode.
> 
> UCS2 was awesome. UTF-16 is used by Java, JavaScript, 
> Objective-C, Swift, Dart and ms tech, which is 28% of tiobe 
> index.


"was" :-) Those are pretty much pre-surrogate pair designs, or 
based

on them (Dart compiles to JavaScript, for example).

UCS2 has serious problems:

1. Most strings are in ascii, meaning UCS2 doubles memory 
consumption. Strings in the executable file are twice the size.


This is not true in Asia, esp. where the CJK block is 
extensively used. A CJK block character is 3 bytes in UTF-8, 
meaning that string sizes are 150% of the UCS2 encoding.  If 
your code contains a lot of CJK text, that's a lot of bloat.


Yep, that's why five years back many of the major Chinese sites 
were still not using UTF-8:


http://xahlee.info/w/what_encoding_do_chinese_websites_use.html

That led that Chinese guy to also rant against UTF-8 a couple 
years ago:


http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html

Considering China buys more smartphones than the US and Europe 
combined, it's time people started recognizing their importance 
when it comes to issues like this:


https://www.statista.com/statistics/412108/global-smartphone-shipments-global-region/

Regarding the unique representation issue Jonathan brings up, 
I've heard people say that was to provide an easier path for 
legacy encodings, ie some used combining characters and others 
didn't, so Unicode chose to accommodate both so both groups would 
move to Unicode.  It would be nice if the Unicode people spent 
their time pruning and regularizing what they have, rather than 
adding more useless stuff.


Speaking of which, completely agree with Walter and Jonathan that 
there's no need to add emoji and other such symbols to Unicode, 
should have never been added.  Unicode is supposed to standardize 
long-existing characters, not promote marginal new symbols to 
characters.  If there's a real need for it, chat software will 
figure out a way to do it, no need to add such symbols to the 
Unicode character set.


Re: First Impressions!

2017-12-02 Thread Ola Fosheim Grøstad via Digitalmars-d

On Saturday, 2 December 2017 at 12:25:22 UTC, codephantom wrote:
Do the people on the unicode consortium consider such 
communication to be invalid?


https://splinternews.com/violent-emoji-are-starting-to-get-people-in-trouble-wit-1793845130

On the other hand try to google "emoji sexual"…



Re: First Impressions!

2017-12-02 Thread Patrick Schluter via Digitalmars-d

On Saturday, 2 December 2017 at 10:20:10 UTC, Walter Bright wrote:

On 12/1/2017 8:08 PM, Jonathan M Davis wrote:

[...]


Yup. I've presented that point of view a couple times on 
HackerNews, and some Unicode people took umbrage at that. The 
case they presented fell a little flat.


[...]


Where it gets really fun is the when there is color composition 
for emoticons

U+1F466 = 👦
U+1F466 U+1F3FF = 👦🏿


Re: First Impressions!

2017-12-02 Thread Ola Fosheim Grøstad via Digitalmars-d
On Saturday, 2 December 2017 at 04:08:54 UTC, Jonathan M Davis 
wrote:
code points. Emojis are specifically representable by a 
sequence of existing characters (usually ASCII), because they 
came from folks trying to represent pictures with text.


They are used as symbols culturally, which is how written 
language happen, so I think the real question is if they have 
just implemented the ones that have become widespread over a long 
period of time or if they have deliberately created completely 
new ones... It makes sense for the most used ones.


E.g. I don't want "8-(3+4)" to render as "😳3+4" ;-)

There is also a difference between Ø and ∅, because the meaning 
is different. Too bad the same does not apply to arrows (math vs 
non math usage).


So yeah, they could do better, but not too bad. If something is 
widely used in a way that gives signs a different meaning then it 
makes sense to introduce a new symbol for it so that one both can 
render them slightly differently and so that the programs can 
interpret them correctly.






Re: First Impressions!

2017-12-02 Thread codephantom via Digitalmars-d
On Saturday, 2 December 2017 at 04:08:54 UTC, Jonathan M Davis 
wrote:


The fact that they're then trying to put those pictures into 
the Unicode standard just blatantly shows that the Unicode 
folks have lost sight of what they're up to. It's like if they 
started trying to add Unicode characters for words. It makes no 
sense. But unfortunately, we just have to live with it... :(


- Jonathan M Davis


The real problem, is that sometimes people don't feel like a 
little cat with a smiling face. Sometimes, people actually get 
pissed off at something, and would like to express it.


Do the people on the unicode consortium consider such 
communication to be invalid?


Where are the emoji's for saying.. I'm pissed off at this..or 
that..


(unicode consortium == emoji censorship)

https://www.google.com.au/search?q=fuck+you+emoticon&source=lnms&tbm=isch&sa=X&ved=0ahUKEwiWkMzMpOvXAhWIj5QKHVnGC5YQ_AUICigB&biw=1536&bih=736



Re: First Impressions!

2017-12-02 Thread Patrick Schluter via Digitalmars-d
On Saturday, 2 December 2017 at 10:35:50 UTC, Patrick Schluter 
wrote:

On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:

[...]


That's true in theory, in practice it's not that severe as the 
CJK languages are never isolated and appear embedded in a lot 
of ASCII. You can read here a case study [1] which shows 106% 
for Simplified Chinese, 76% for Traditional Chinese, 129% for 
Japanese and 94% for Korean. These numbers for pure text.


106% for Korean, copied the wrong column. Traditiojal Chinese was 
smaller, probably because of whitespaces.


Publish it on the web embedded in bloated html and there goes 
the size advantage of UTF-16


[...]




Re: First Impressions!

2017-12-02 Thread Patrick Schluter via Digitalmars-d

On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote:
On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via 
Digitalmars-d wrote:

On 11/30/2017 9:23 AM, Kagamin wrote:
> On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki 
> cattermole wrote:
> > Be aware Microsoft is alone in thinking that UTF-16 was 
> > awesome. Everybody else standardized on UTF-8 for Unicode.
> 
> UCS2 was awesome. UTF-16 is used by Java, JavaScript, 
> Objective-C, Swift, Dart and ms tech, which is 28% of tiobe 
> index.


"was" :-) Those are pretty much pre-surrogate pair designs, or 
based

on them (Dart compiles to JavaScript, for example).

UCS2 has serious problems:

1. Most strings are in ascii, meaning UCS2 doubles memory 
consumption. Strings in the executable file are twice the size.


This is not true in Asia, esp. where the CJK block is 
extensively used. A CJK block character is 3 bytes in UTF-8, 
meaning that string sizes are 150% of the UCS2 encoding.  If 
your code contains a lot of CJK text, that's a lot of bloat.


That's true in theory, in practice it's not that severe as the 
CJK languages are never isolated and appear embedded in a lot of 
ASCII. You can read here a case study [1] which shows 106% for 
Simplified Chinese, 76% for Traditional Chinese, 129% for 
Japanese and 94% for Korean. These numbers for pure text. Publish 
it on the web embedded in bloated html and there goes the size 
advantage of UTF-16






But then again, in non-Latin locales you'd generally store your 
strings separately of the executable (usually in l10n files), 
so this may not be that big an issue. But the blanket statement 
"Most strings are in ASCII" is not correct.


False, in the sense that isolated pure text is rare and is 
generally delivered inside some file format, most times ASCII 
based like docx, odf, tmx, xliff, akoma ntoso etc...


[1]: 
https://stackoverflow.com/questions/6883434/at-all-times-text-encoded-in-utf-8-will-never-give-us-more-than-a-50-file-size




Re: First Impressions!

2017-12-02 Thread Jacob Carlborg via Digitalmars-d

On 2017-12-02 11:02, Walter Bright wrote:

Are you sure about that? I know that Asian languages will be longer in 
UTF-8. But how much data that programs handle is in those languages? The 
language of business, science, programming, aviation, and engineering is 
english.


Not necessarily. I've seen code in non-English languages, i.e. when the 
identifiers are non-English. But of course, most programming languages 
will using English for keywords and built-in functions.


--
/Jacob Carlborg


Re: First Impressions!

2017-12-02 Thread Walter Bright via Digitalmars-d

On 12/1/2017 8:08 PM, Jonathan M Davis wrote:

And personally, I think that their worst decisions tend to be at the code
point level (e.g. having the same character being representable by different
combinations of code points).


Yup. I've presented that point of view a couple times on HackerNews, and some 
Unicode people took umbrage at that. The case they presented fell a little flat.




Quite possbily the most depressing thing that I've run into with Unicode
though was finding out that emojis had their own code points. Emojis are
specifically representable by a sequence of existing characters (usually
ASCII), because they came from folks trying to represent pictures with text.
The fact that they're then trying to put those pictures into the Unicode
standard just blatantly shows that the Unicode folks have lost sight of what
they're up to. It's like if they started trying to add Unicode characters
for words. It makes no sense. But unfortunately, we just have to live with
it... :(


Yah, I've argued against that, too. And those "international" icons are arguably 
one of the dumber ideas to ever sweep the world, yet they seem to be celebrated 
without question.


Have you ever tried to look up an icon in a dictionary? It doesn't work. So if 
you don't know what an icon means, you're hosed. If it is a word you don't 
understand, you can look it up in a dictionary.


Furthermore, you don't need to know English to know what "ON" means. There is no 
more cognitive difficulty asking someone what "ON" means than there is asking 
what "|" means. Is an illiterate person from XxLand really going to understand 
that "|" means "ON" without help?


My car has a bunch emoticons labeling the controls. I can't figure out what any 
of them do without reading the manual, or just pushing random buttons until what 
I want happens. One button has an icon on it that looks like a snowflake. What 
does that do? Turn on the A/C? Defrost the frosty windows? Set the AWD in 
slippery mode? Turn on the Christmas lights?


On my pre-madness truck, they're labeled in English. Never had any trouble with 
that.


Part of the problem I've seen is that people do things like "vote for my 
emoji/icon and I'll vote for yours!" And then when they get something accepted, 
they wear it as a badge of status and write articles saying how you, too, can 
get your whatever accepted as an icon. It's madness, madness I say!


Re: First Impressions!

2017-12-02 Thread Walter Bright via Digitalmars-d

On 12/1/2017 3:16 PM, H. S. Teoh wrote:

This is not true in Asia, esp. where the CJK block is extensively used.
A CJK block character is 3 bytes in UTF-8, meaning that string sizes are
150% of the UCS2 encoding.  If your code contains a lot of CJK text,
that's a lot of bloat.

But then again, in non-Latin locales you'd generally store your strings
separately of the executable (usually in l10n files), so this may not be
that big an issue. But the blanket statement "Most strings are in ASCII"
is not correct.


Are you sure about that? I know that Asian languages will be longer in UTF-8. 
But how much data that programs handle is in those languages? The language of 
business, science, programming, aviation, and engineering is english.


Of course, D itself is agnostic about that. The compiler, for example, accepts 
strings, identifiers, and comments in Chinese in UTF-16 format.


Re: First Impressions!

2017-12-01 Thread Jonathan M Davis via Digitalmars-d
On Friday, December 01, 2017 15:54:31 Walter Bright via Digitalmars-d wrote:
> On 11/30/2017 9:56 AM, Jonathan M Davis wrote:
> > I'm sure that we could come up with a better encoding than UTF-8 (e.g.
> > getting rid of Unicode normalization as being a thing and never having
> > multiple encodings for the same character), but _that_'s never going to
> > happen.
>
> UTF-8 is not the cause of that particular problem, it's caused by the
> Unicode committee being a committee. Other Unicode problems are caused by
> the committee trying to add semantic information to code points, which
> causes nothing but problems. I.e. the committee forgot that Unicode is a
> character set, and nothing more.

Oh, definitely. UTF-8 is arguably the best that Unicode has, but Unicode in
general is what's broken, because the folks designing it made poor choices.
And personally, I think that their worst decisions tend to be at the code
point level (e.g. having the same character being representable by different
combinations of code points).

Quite possbily the most depressing thing that I've run into with Unicode
though was finding out that emojis had their own code points. Emojis are
specifically representable by a sequence of existing characters (usually
ASCII), because they came from folks trying to represent pictures with text.
The fact that they're then trying to put those pictures into the Unicode
standard just blatantly shows that the Unicode folks have lost sight of what
they're up to. It's like if they started trying to add Unicode characters
for words. It makes no sense. But unfortunately, we just have to live with
it... :(

- Jonathan M Davis



Re: First Impressions!

2017-12-01 Thread Walter Bright via Digitalmars-d

On 11/30/2017 9:56 AM, Jonathan M Davis wrote:

I'm sure that we could come up with a better encoding than UTF-8 (e.g.
getting rid of Unicode normalization as being a thing and never having
multiple encodings for the same character), but _that_'s never going to
happen.


UTF-8 is not the cause of that particular problem, it's caused by the Unicode 
committee being a committee. Other Unicode problems are caused by the committee 
trying to add semantic information to code points, which causes nothing but 
problems. I.e. the committee forgot that Unicode is a character set, and nothing 
more.




Re: First Impressions!

2017-12-01 Thread H. S. Teoh via Digitalmars-d
On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via Digitalmars-d wrote:
> On 11/30/2017 9:23 AM, Kagamin wrote:
> > On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote:
> > > Be aware Microsoft is alone in thinking that UTF-16 was awesome.
> > > Everybody else standardized on UTF-8 for Unicode.
> > 
> > UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C,
> > Swift, Dart and ms tech, which is 28% of tiobe index.
> 
> "was" :-) Those are pretty much pre-surrogate pair designs, or based
> on them (Dart compiles to JavaScript, for example).
> 
> UCS2 has serious problems:
> 
> 1. Most strings are in ascii, meaning UCS2 doubles memory consumption.
> Strings in the executable file are twice the size.

This is not true in Asia, esp. where the CJK block is extensively used.
A CJK block character is 3 bytes in UTF-8, meaning that string sizes are
150% of the UCS2 encoding.  If your code contains a lot of CJK text,
that's a lot of bloat.

But then again, in non-Latin locales you'd generally store your strings
separately of the executable (usually in l10n files), so this may not be
that big an issue. But the blanket statement "Most strings are in ASCII"
is not correct.


T

-- 
Bare foot: (n.) A device for locating thumb tacks on the floor.


Re: First Impressions!

2017-12-01 Thread Walter Bright via Digitalmars-d

On 11/30/2017 9:23 AM, Kagamin wrote:

On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote:
Be aware Microsoft is alone in thinking that UTF-16 was awesome. Everybody 
else standardized on UTF-8 for Unicode.


UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C, Swift, Dart 
and ms tech, which is 28% of tiobe index.


"was" :-) Those are pretty much pre-surrogate pair designs, or based on them 
(Dart compiles to JavaScript, for example).


UCS2 has serious problems:

1. Most strings are in ascii, meaning UCS2 doubles memory consumption. Strings 
in the executable file are twice the size.


2. The code doesn't work well with C. C doesn't even have a UCS2 type.

3. There's no reasonable way to audit the code to see if it handles surrogate 
pairs correctly. Surrogate pairs occur only rarely, so the code is never tested 
for it, and the bugs may remain latent for many, many years.


With UTF8, multibyte code points are much more common, so bugs are detected much 
earlier.


Re: First Impressions!

2017-12-01 Thread A Guy With a Question via Digitalmars-d
On Friday, 1 December 2017 at 18:31:46 UTC, Jonathan M Davis 
wrote:
On Friday, December 01, 2017 09:49:08 Steven Schveighoffer via 
Digitalmars-d wrote:

On 12/1/17 7:26 AM, Patrick Schluter wrote:
> On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter 
> wrote:

>>  isolated codepoints.
>
> I meant isolated code-units, of course.

Hehe, it's impossible for me to talk about code points and 
code units without having to pause and consider which one I 
mean :)


What, you mean that Unicode can be confusing? No way! ;)

LOL. I have to be careful with that too. What bugs me even more 
though is that the Unicode spec talks about code points being 
characters, and then talks about combining characters for 
grapheme clusters - and this in spite of the fact that what 
most people would consider a character is a grapheme cluster 
and _not_ a code point. But they presumably had to come up with 
new terms for a lot of this nonsense, and that's not always 
easy.


Regardless, what they came up with is complicated enough that 
it's arguably a miracle whenever a program actually handles 
Unicode text 100% correctly. :|


- Jonathan M Davis


And dealing with that complexity can often introduce bugs in 
their own right, because it's hard to get right. That's why 
sometimes it's easy just to simplify things and to exclude 
certain ways of looking at the string.


Re: First Impressions!

2017-12-01 Thread Jonathan M Davis via Digitalmars-d
On Friday, December 01, 2017 09:49:08 Steven Schveighoffer via Digitalmars-d 
wrote:
> On 12/1/17 7:26 AM, Patrick Schluter wrote:
> > On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter wrote:
> >>  isolated codepoints.
> >
> > I meant isolated code-units, of course.
>
> Hehe, it's impossible for me to talk about code points and code units
> without having to pause and consider which one I mean :)

What, you mean that Unicode can be confusing? No way! ;)

LOL. I have to be careful with that too. What bugs me even more though is
that the Unicode spec talks about code points being characters, and then
talks about combining characters for grapheme clusters - and this in spite
of the fact that what most people would consider a character is a grapheme
cluster and _not_ a code point. But they presumably had to come up with new
terms for a lot of this nonsense, and that's not always easy.

Regardless, what they came up with is complicated enough that it's arguably
a miracle whenever a program actually handles Unicode text 100% correctly.
:|

- Jonathan M Davis



Re: First Impressions!

2017-12-01 Thread Steven Schveighoffer via Digitalmars-d

On 12/1/17 7:26 AM, Patrick Schluter wrote:

On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter wrote:


 isolated codepoints. 


I meant isolated code-units, of course.


Hehe, it's impossible for me to talk about code points and code units 
without having to pause and consider which one I mean :)


-Steve


Re: First Impressions!

2017-12-01 Thread Patrick Schluter via Digitalmars-d
On Friday, 1 December 2017 at 12:21:22 UTC, A Guy With a Question 
wrote:
On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter 
wrote:
On Thursday, 30 November 2017 at 19:37:47 UTC, Steven 
Schveighoffer wrote:

On 11/30/17 1:20 PM, Patrick Schluter wrote:

[...]


iopipe handles this: 
http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html




It was only to give an example. With UTF-8 people who 
implement the low level code in general think about the 
multiple codeunits at the buffer boundary. With UTF-16 it's 
often forgotten. In UTF-16 there are also 2 other common 
pitfalls, that exist also in UTF-8 but are less consciously 
acknowledged, overlong encoding and isolated codepoints. So 
UTF-16 has the same issues as UTF-8, plus some more, 
endianness and size.


Most problems with UTF16 is applicable to UTF8. The only issue 
that isn't, is if you are just dealing with ASCII it's a bit of 
a waste of space.


That's what I said. UTF-16 and UTF-8 have the same issues, but 
UTF-16 has even 2 more: endianness and bloat for ASCII. All 3 
encodings have their pluses and minuses, that's why D supports 
all 3 but with a preference for utf-8.


Re: First Impressions!

2017-12-01 Thread Patrick Schluter via Digitalmars-d
On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter 
wrote:
On Thursday, 30 November 2017 at 19:37:47 UTC, Steven 
Schveighoffer wrote:

On 11/30/17 1:20 PM, Patrick Schluter wrote:

[...]


iopipe handles this: 
http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html




It was only to give an example. With UTF-8 people who implement 
the low level code in general think about the multiple 
codeunits at the buffer boundary. With UTF-16 it's often 
forgotten. In UTF-16 there are also 2 other common pitfalls, 
that exist also in UTF-8 but are less consciously acknowledged, 
overlong encoding and isolated codepoints. So UTF-16 has the


I meant isolated code-units, of course.


same issues as UTF-8, plus some more, endianness and size.




Re: First Impressions!

2017-12-01 Thread A Guy With a Question via Digitalmars-d
On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter 
wrote:
On Thursday, 30 November 2017 at 19:37:47 UTC, Steven 
Schveighoffer wrote:

On 11/30/17 1:20 PM, Patrick Schluter wrote:
On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M 
Davis wrote:
English and thus don't as easily hit the cases where their 
code is wrong. For better or worse, UTF-16 hides it better 
than UTF-8, but the problem exists in both.




To give just an example of what can go wrong with UTF-16. 
Reading a file in UTF-16 and converting it tosomething else 
like UTF-8 or UTF-32. Reading block by block and hitting 
exactly a SMP codepoint at the buffer limit, high surrogate 
at the end of the first buffer, low surrogate at the start of 
the next. If you don't think about it => 2 invalid characters 
instead of your nice poop 💩 emoji character (emojis are in 
the SMP and they are more and more frequent).


iopipe handles this: 
http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html




It was only to give an example. With UTF-8 people who implement 
the low level code in general think about the multiple 
codeunits at the buffer boundary. With UTF-16 it's often 
forgotten. In UTF-16 there are also 2 other common pitfalls, 
that exist also in UTF-8 but are less consciously acknowledged, 
overlong encoding and isolated codepoints. So UTF-16 has the 
same issues as UTF-8, plus some more, endianness and size.


Most problems with UTF16 is applicable to UTF8. The only issue 
that isn't, is if you are just dealing with ASCII it's a bit of a 
waste of space.


Re: First Impressions!

2017-11-30 Thread Patrick Schluter via Digitalmars-d
On Thursday, 30 November 2017 at 19:37:47 UTC, Steven 
Schveighoffer wrote:

On 11/30/17 1:20 PM, Patrick Schluter wrote:
On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M 
Davis wrote:
English and thus don't as easily hit the cases where their 
code is wrong. For better or worse, UTF-16 hides it better 
than UTF-8, but the problem exists in both.




To give just an example of what can go wrong with UTF-16. 
Reading a file in UTF-16 and converting it tosomething else 
like UTF-8 or UTF-32. Reading block by block and hitting 
exactly a SMP codepoint at the buffer limit, high surrogate at 
the end of the first buffer, low surrogate at the start of the 
next. If you don't think about it => 2 invalid characters 
instead of your nice poop 💩 emoji character (emojis are in the 
SMP and they are more and more frequent).


iopipe handles this: 
http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html




It was only to give an example. With UTF-8 people who implement 
the low level code in general think about the multiple codeunits 
at the buffer boundary. With UTF-16 it's often forgotten. In 
UTF-16 there are also 2 other common pitfalls, that exist also in 
UTF-8 but are less consciously acknowledged, overlong encoding 
and isolated codepoints. So UTF-16 has the same issues as UTF-8, 
plus some more, endianness and size.




Re: First Impressions!

2017-11-30 Thread Walter Bright via Digitalmars-d

On 11/30/2017 5:22 AM, A Guy With a Question wrote:
It's also worth mentioning that the more I think about it, the UTF8 vs. UTF16 
thing was probably not worth mentioning with the rest of the things I listed 
out. It's pretty minor and more of a preference.


Both Windows and Java selected UTF16 before surrogates were added, so it was a 
reasonable decision made in good faith. But an awful lot of Windows/Java code 
has latent bugs in it because of not dealing with surrogates.


D is designed from the ground up to work smoothly with UTF8/UTF16 multi-codeunit 
encodings. If you do decide to use UTF16, please take advantage of this and deal 
with surrogates correctly. When you do decide to give up on UTF16 (!) and go 
with UTF8, your code will be easy to convert to UTF8.


Re: First Impressions!

2017-11-30 Thread Steven Schveighoffer via Digitalmars-d

On 11/30/17 1:20 PM, Patrick Schluter wrote:

On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis wrote:
English and thus don't as easily hit the cases where their code is 
wrong. For better or worse, UTF-16 hides it better than UTF-8, but the 
problem exists in both.




To give just an example of what can go wrong with UTF-16. Reading a file 
in UTF-16 and converting it tosomething else like UTF-8 or UTF-32. 
Reading block by block and hitting exactly a SMP codepoint at the buffer 
limit, high surrogate at the end of the first buffer, low surrogate at 
the start of the next. If you don't think about it => 2 invalid 
characters instead of your nice poop 💩 emoji character (emojis are in 
the SMP and they are more and more frequent).


iopipe handles this: 
http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html


-Steve


Re: First Impressions!

2017-11-30 Thread Jonathan M Davis via Digitalmars-d
On Thursday, November 30, 2017 18:32:46 A Guy With a Question via 
Digitalmars-d wrote:
> On Thursday, 30 November 2017 at 17:56:58 UTC, Jonathan M Davis
>
> wrote:
> > On Thursday, November 30, 2017 03:37:37 Walter Bright via
> > Digitalmars-d wrote:
> > Language-wise, I think that most of the UTF-16 is driven by the
> > fact that Java went with UCS-2 / UTF-16, and C# followed them
> > (both because they were copying Java and because the Win32 API
> > had gone with UCS-2 / UTF-16). So, that's had a lot of
> > influence on folks, though most others have gone with UTF-8 for
> > backwards compatibility and because it typically takes up less
> > space for non-Asian text. But the use of UTF-16 in Windows,
> > Java, and C# does seem to have resulted in some folks thinking
> > that wide characters means Unicode, and narrow characters
> > meaning ASCII.
> >
> > - Jonathan M Davis
>
> I think it also simplifies the logic. You are not always looking
> to represent the codepoints symbolically. You are just trying to
> see what information is in it. Therefore, if you can practically
> treat a codepoint as the unit of data behind the scenes, it
> simplifies the logic.

Even if that were true, UTF-16 code units are not code points. If you want
to operate on code points, you have to go to UTF-32. And even if you're at
UTF-32, you have to worry about Unicode normalization, otherwise the same
information can be represented differently even if all you care about is
code points and not graphemes. And of course, some stuff really does care
about graphemes, since those are the actual characters.

Ultimately, you have to understand how code units, code points, and
graphemes work and what you're doing with a particular algorithm so that you
know at which level you should operate at and where the pitfalls are. Some
code can operate on code units and be fine; some can operate on code points;
and some can operate on graphemes. But there is no one-size-fits-all
solution that makes it all magically easy and efficient to use.

And UTF-16 does _nothing_ to improve any of this over UTF-8. It's just a
different way to encode code points. And really, it makes things worse,
because it usually takes up more space than UTF-8, and it makes it easier to
miss when you screw up your Unicode handling, because more UTF-16 code units
are valid code points than UTF-8 code units are, but they still aren't all
valid code points. So, if you use UTF-8, you're more likely to catch your
mistakes.

Honestly, I think that the only good reason to use UTF-16 is if you're
interacting with existing APIs that use UTF-16, and even then, I think that
in most cases, you're better off using UTF-8 and converting to UTF-16 only
when you have to. Strings eat less memory that way, and mistakes are more
easily caught. And if you're writing cross-platform code in D, then Windows
is really the only place that you're typically going to have to deal with
UTF-16, so it definitely works better in general to favor UTF-8 in D
programs. But regardless, at least D gives you the tools to deal with the
different Unicode encodings relatively cleanly and easily, so you can use
whichever Unicode encoding you need to. Most D code is going to use UTF-8
though.

- Jonathan M Davis



Re: First Impressions!

2017-11-30 Thread A Guy With a Question via Digitalmars-d
On Thursday, 30 November 2017 at 17:56:58 UTC, Jonathan M Davis 
wrote:
On Thursday, November 30, 2017 03:37:37 Walter Bright via 
Digitalmars-d wrote:
Language-wise, I think that most of the UTF-16 is driven by the 
fact that Java went with UCS-2 / UTF-16, and C# followed them 
(both because they were copying Java and because the Win32 API 
had gone with UCS-2 / UTF-16). So, that's had a lot of 
influence on folks, though most others have gone with UTF-8 for 
backwards compatibility and because it typically takes up less 
space for non-Asian text. But the use of UTF-16 in Windows, 
Java, and C# does seem to have resulted in some folks thinking 
that wide characters means Unicode, and narrow characters 
meaning ASCII.



- Jonathan M Davis


I think it also simplifies the logic. You are not always looking 
to represent the codepoints symbolically. You are just trying to 
see what information is in it. Therefore, if you can practically 
treat a codepoint as the unit of data behind the scenes, it 
simplifies the logic.


Re: First Impressions!

2017-11-30 Thread A Guy With a Question via Digitalmars-d
On Thursday, 30 November 2017 at 17:56:58 UTC, Jonathan M Davis 
wrote:
On Thursday, November 30, 2017 03:37:37 Walter Bright via 
Digitalmars-d wrote:

On 11/30/2017 2:39 AM, Joakim wrote:
> Java, .NET, Qt, Javascript, and a handful of others use 
> UTF-16 too, some starting off with the earlier UCS-2:

>
> https://en.m.wikipedia.org/wiki/UTF-16#Usage
>
> Not saying either is better, each has their flaws, just 
> pointing out it's more than just Windows.


I stand corrected.


I get the impression that the stuff that uses UTF-16 is mostly 
stuff that picked an encoding early on in the Unicode game and 
thought that they picked one that guaranteed that a code unit 
would be an entire character.


I don't think that's true though. Haven't you always been able to 
combine two codepoints into one visual representation (Ä for 
example). To me it's still two characters to look for when going 
through the string, but the UI or text interpreter might choose 
to combine them. So in certain domains, such as trying to 
visually represent the character, yes a codepoint is not a 
character, if by what you mean by character is the visual 
representation. But what we are referring to as a character can 
kind of morph depending on context. When you are running through 
the data though in the algorithm behind the scenes, you care 
about the *information* therefore the codepoint. And we are 
really just have a semantics battle if someone calls that a 
character.


Many of them picked UCS-2 and then switched later to UTF-16, 
but once they picked a 16-bit encoding, they were kind of stuck.


Others - most notably C/C++ and the *nix world - picked UTF-8 
for backwards compatibility, and once it became clear that 
UCS-2 / UTF-16 wasn't going to cut it for a code unit 
representing a character, most stuff that went Unicode went 
UTF-8.


That's only because C used ASCII and thus was a byte. UTF-8 is 
inline with this, so literally nothing needs to change to get 
pretty much the same behavior. It makes sense. With this this in 
mind, it actually might make sense for D to use it.







Re: First Impressions!

2017-11-30 Thread Patrick Schluter via Digitalmars-d
On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis 
wrote:
English and thus don't as easily hit the cases where their code 
is wrong. For better or worse, UTF-16 hides it better than 
UTF-8, but the problem exists in both.




To give just an example of what can go wrong with UTF-16. Reading 
a file in UTF-16 and converting it tosomething else like UTF-8 or 
UTF-32. Reading block by block and hitting exactly a SMP 
codepoint at the buffer limit, high surrogate at the end of the 
first buffer, low surrogate at the start of the next. If you 
don't think about it => 2 invalid characters instead of your nice 
poop 💩 emoji character (emojis are in the SMP and they are more 
and more frequent).


Re: First Impressions!

2017-11-30 Thread Patrick Schluter via Digitalmars-d
On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis 
wrote:


[...] And if you're not dealing with Asian languages, UTF-16 
uses up more space than UTF-8.


Not even that in most cases. Only if you use unstructured text 
can it happen that UTF-16 needs less space than UTF-8. In most 
cases, the text is embedded in some sort of ML (html, odf, docx, 
tmx, xliff, akoma ntoso, etc...) which puts the balance again to 
the side of UTF-8.


Re: First Impressions!

2017-11-30 Thread Jonathan M Davis via Digitalmars-d
On Thursday, November 30, 2017 03:37:37 Walter Bright via Digitalmars-d 
wrote:
> On 11/30/2017 2:39 AM, Joakim wrote:
> > Java, .NET, Qt, Javascript, and a handful of others use UTF-16 too, some
> > starting off with the earlier UCS-2:
> >
> > https://en.m.wikipedia.org/wiki/UTF-16#Usage
> >
> > Not saying either is better, each has their flaws, just pointing out
> > it's more than just Windows.
>
> I stand corrected.

I get the impression that the stuff that uses UTF-16 is mostly stuff that
picked an encoding early on in the Unicode game and thought that they picked
one that guaranteed that a code unit would be an entire character. Many of
them picked UCS-2 and then switched later to UTF-16, but once they picked a
16-bit encoding, they were kind of stuck.

Others - most notably C/C++ and the *nix world - picked UTF-8 for backwards
compatibility, and once it became clear that UCS-2 / UTF-16 wasn't going to
cut it for a code unit representing a character, most stuff that went
Unicode went UTF-8.

Language-wise, I think that most of the UTF-16 is driven by the fact that
Java went with UCS-2 / UTF-16, and C# followed them (both because they were
copying Java and because the Win32 API had gone with UCS-2 / UTF-16). So,
that's had a lot of influence on folks, though most others have gone with
UTF-8 for backwards compatibility and because it typically takes up less
space for non-Asian text. But the use of UTF-16 in Windows, Java, and C#
does seem to have resulted in some folks thinking that wide characters means
Unicode, and narrow characters meaning ASCII.

I really wish that everything would just got to UTF-8 and that UTF-16 would
die, but that would just break too much code. And if we were willing to do
that, I'm sure that we could come up with a better encoding than UTF-8 (e.g.
getting rid of Unicode normalization as being a thing and never having
multiple encodings for the same character), but _that_'s never going to
happen.

- Jonathan M Davis



Re: First Impressions!

2017-11-30 Thread Kagamin via Digitalmars-d
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:
- Attributes. I had another post in the Learn forum about 
attributes which was unfortunate. At first I was excited 
because it seems like on the surface it would help me write 
better code, but it gets a little tedious and tiresome to have 
to remember to decorate code with them.


Then do it the C# way. There's choice.

I think the better decision would be to not have the errors 
occur.


Hehe, I'm not against living in an idea world either.

- Immutable. I'm not sure I fully understand it. On the surface 
it seemed like const but transitive. I tried having a method 
return an immutable value, but when I used it in my unit test I 
got some weird errors about objects not being able to return 
immutable (I forget the exact error...apologies).


That's the point of static type system: if you make a mistake, 
the code doesn't compile.


+- Unicode support is good. Although I think D's string type 
should have probably been utf16 by default. Especially 
considering the utf module states:


"UTF character support is restricted to '\u' <= character 
<= '\U0010'."


Seems like the natural fit for me.


UTF-16 in inadequate for range '\u' <= character <= 
'\U0010', though. UCS2 was adequate (for '\u' <= 
character <= '\u'), but lost relevance. UTF-16 is only 
backward compatibility for early adopters of unicode based on 
UCS2.


Plus for the vast majority of use cases I am pretty guaranteed 
a char = codepoint.


That way only end users will be able to catch bugs in production 
system. It's not the best strategy, is it? Text is often 
persistent data, how do you plan to fix a text handling bug when 
corruption accumulated for years and spilled all over the place?


Re: First Impressions!

2017-11-30 Thread Jonathan M Davis via Digitalmars-d
On Thursday, November 30, 2017 13:18:37 A Guy With a Question via 
Digitalmars-d wrote:
> As long as you understand it's limitations I think most bugs can
> be avoided. Where UTF16 breaks down, is pretty well defined.
> Also, super rare. I think UTF32 would be great to, but it seems
> like just a waste of space 99% of the time. UTF8 isn't horrible,
> I am not going to never use D because it uses UTF8 (that would be
> silly). Especially when wstring also seems baked into the
> language. However, it can complicate code because you pretty much
> always have to assume character != codepoint outside of ASCII. I
> can see a reasonable person arguing that it forcing you assume
> character != code point is actually a good thing. And that is a
> valid opinion.

The reality of the matter is that if you want to write fully valid Unicode,
then you have to understand the differences between code units, code points,
and graphemes, and since it really doesn't make sense to operate at the
grapheme level for everything (it would be terribly slow and is completely
unnecessary for many algorithms), you pretty much have to come to accept
that in the general case, you can't assume that something like a char
represents an actual character, regardless of its encoding. UTF-8 vs UTF-16
doesn't change anything in that respect except for the fact that there are
more characters which fit fully in a UTF-16 code unit than a UTF-8 code
unit, so it's easier to think that you're correctly handling Unicode when
you actually aren't. And if you're not dealing with Asian languages, UTF-16
uses up more space than UTF-8. But either way, they're both wrong if you're
trying to treat a code unit as a code point, let alone a grapheme. It's just
that we have a lot of programmers who only deal with English and thus don't
as easily hit the cases where their code is wrong. For better or worse,
UTF-16 hides it better than UTF-8, but the problem exists in both.

- Jonathan M Davis



Re: First Impressions!

2017-11-30 Thread Kagamin via Digitalmars-d
On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole 
wrote:
Be aware Microsoft is alone in thinking that UTF-16 was 
awesome. Everybody else standardized on UTF-8 for Unicode.


UCS2 was awesome. UTF-16 is used by Java, JavaScript, 
Objective-C, Swift, Dart and ms tech, which is 28% of tiobe index.


Re: First Impressions!

2017-11-30 Thread Dukc via Digitalmars-d

On Tuesday, 28 November 2017 at 16:14:52 UTC, Jack Stouffer wrote:

you can apply attributes to your whole project by adding them 
to main


void main(string[] args) @safe {}

Although this isn't recommended, as almost no program can be 
completely safe.


In fact I believe it is. When you have something unsafe you can 
manually wrap it with @trusted. Same goes with nothrow, since you 
can catch everything thrown.


But putting @nogc to main is of course not recommended except in 
special cases, and pure is competely out of question.


Re: First Impressions!

2017-11-30 Thread A Guy With a Question via Digitalmars-d
On Thursday, 30 November 2017 at 11:41:09 UTC, Walter Bright 
wrote:

On 11/30/2017 2:47 AM, Nicholas Wilson wrote:
As far as I can tell, pretty much the only users of UTF16 are 
Windows programs. Everyone else uses UTF8 or UCS32.
I assume you meant UTF32 not UCS32, given UCS2 is Microsoft's 
half-assed UTF16.


I meant UCS-4, which is identical to UTF-32. It's hard keeping 
all that stuff straight. Sigh.


https://en.wikipedia.org/wiki/UTF-32


It's also worth mentioning that the more I think about it, the 
UTF8 vs. UTF16 thing was probably not worth mentioning with the 
rest of the things I listed out. It's pretty minor and more of a 
preference.


Re: First Impressions!

2017-11-30 Thread A Guy With a Question via Digitalmars-d
On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright 
wrote:

On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
+- Unicode support is good. Although I think D's string type 
should have probably been utf16 by default. Especially 
considering the utf module states:


"UTF character support is restricted to '\u' <= character 
<= '\U0010'."


Seems like the natural fit for me. Plus for the vast majority 
of use cases I am pretty guaranteed a char = codepoint. Not 
the biggest issue in the world and maybe I'm just being overly 
critical here.


Sooner or later your code will exhibit bugs if it assumes that 
char==codepoint with UTF16, because of surrogate pairs.


https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java

As far as I can tell, pretty much the only users of UTF16 are 
Windows programs. Everyone else uses UTF8 or UCS32.


I recommend using UTF8.


As long as you understand it's limitations I think most bugs can 
be avoided. Where UTF16 breaks down, is pretty well defined. 
Also, super rare. I think UTF32 would be great to, but it seems 
like just a waste of space 99% of the time. UTF8 isn't horrible, 
I am not going to never use D because it uses UTF8 (that would be 
silly). Especially when wstring also seems baked into the 
language. However, it can complicate code because you pretty much 
always have to assume character != codepoint outside of ASCII. I 
can see a reasonable person arguing that it forcing you assume 
character != code point is actually a good thing. And that is a 
valid opinion.


Re: First Impressions!

2017-11-30 Thread Walter Bright via Digitalmars-d

On 11/30/2017 2:47 AM, Nicholas Wilson wrote:
As far as I can tell, pretty much the only users of UTF16 are Windows 
programs. Everyone else uses UTF8 or UCS32.

I assume you meant UTF32 not UCS32, given UCS2 is Microsoft's half-assed UTF16.


I meant UCS-4, which is identical to UTF-32. It's hard keeping all that stuff 
straight. Sigh.


https://en.wikipedia.org/wiki/UTF-32


Re: First Impressions!

2017-11-30 Thread Walter Bright via Digitalmars-d

On 11/30/2017 2:39 AM, Joakim wrote:
Java, .NET, Qt, Javascript, and a handful of others use UTF-16 too, some 
starting off with the earlier UCS-2:


https://en.m.wikipedia.org/wiki/UTF-16#Usage

Not saying either is better, each has their flaws, just pointing out it's more 
than just Windows.


I stand corrected.


Re: First Impressions!

2017-11-30 Thread Nicholas Wilson via Digitalmars-d
On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright 
wrote:

On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:

[...]


Sooner or later your code will exhibit bugs if it assumes that 
char==codepoint with UTF16, because of surrogate pairs.


https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java

As far as I can tell, pretty much the only users of UTF16 are 
Windows programs. Everyone else uses UTF8 or UCS32.


I recommend using UTF8.


I assume you meant UTF32 not UCS32, given UCS2 is Microsoft's 
half-assed UTF16.


Re: First Impressions!

2017-11-30 Thread Joakim via Digitalmars-d
On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright 
wrote:

On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
+- Unicode support is good. Although I think D's string type 
should have probably been utf16 by default. Especially 
considering the utf module states:


"UTF character support is restricted to '\u' <= character 
<= '\U0010'."


Seems like the natural fit for me. Plus for the vast majority 
of use cases I am pretty guaranteed a char = codepoint. Not 
the biggest issue in the world and maybe I'm just being overly 
critical here.


Sooner or later your code will exhibit bugs if it assumes that 
char==codepoint with UTF16, because of surrogate pairs.


https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java

As far as I can tell, pretty much the only users of UTF16 are 
Windows programs. Everyone else uses UTF8 or UCS32.


I recommend using UTF8.


Java, .NET, Qt, Javascript, and a handful of others use UTF-16 
too, some starting off with the earlier UCS-2:


https://en.m.wikipedia.org/wiki/UTF-16#Usage

Not saying either is better, each has their flaws, just pointing 
out it's more than just Windows.


Re: First Impressions!

2017-11-30 Thread Walter Bright via Digitalmars-d

On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
+- Unicode support is good. Although I think D's string type should have 
probably been utf16 by default. Especially considering the utf module states:


"UTF character support is restricted to '\u' <= character <= '\U0010'."

Seems like the natural fit for me. Plus for the vast majority of use cases I am 
pretty guaranteed a char = codepoint. Not the biggest issue in the world and 
maybe I'm just being overly critical here.


Sooner or later your code will exhibit bugs if it assumes that char==codepoint 
with UTF16, because of surrogate pairs.


https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java

As far as I can tell, pretty much the only users of UTF16 are Windows programs. 
Everyone else uses UTF8 or UCS32.


I recommend using UTF8.


Re: First Impressions!

2017-11-29 Thread A Guy With a Question via Digitalmars-d

On Tuesday, 28 November 2017 at 22:08:48 UTC, Mike Parker wrote:
On Tuesday, 28 November 2017 at 19:39:19 UTC, Michael V. 
Franklin wrote:


This DIP is related 
(https://github.com/dlang/DIPs/blob/master/DIPs/DIP1012.md) 
but I don't know what's happening with it.




It's awaiting formal review. I'll move it forward when the 
formal review queue clears out a bit.


How well does phobos play with it? I'm finding, for instance, 
it's not playing too well with nothrow. Things throw that I don't 
understand why.


Re: First Impressions!

2017-11-28 Thread Mike Parker via Digitalmars-d
On Tuesday, 28 November 2017 at 19:39:19 UTC, Michael V. Franklin 
wrote:


This DIP is related 
(https://github.com/dlang/DIPs/blob/master/DIPs/DIP1012.md) but 
I don't know what's happening with it.




It's awaiting formal review. I'll move it forward when the formal 
review queue clears out a bit.


Re: First Impressions!

2017-11-28 Thread Adam D. Ruppe via Digitalmars-d
On Tuesday, 28 November 2017 at 19:34:27 UTC, A Guy With an 
Opinion wrote:

I take it adding those inverse attributes is no trivial thing?


Technically, it is extremely trivial.

Politically, that's a different matter. There's been arguments 
before about the words or the syntax (is it "@gc" or 
"@nogc(false)", for example? tbh i think the latter is kinda 
elegant, but the former works too, i just want something that 
work) and the process (so much paperwork!) and all kinds of 
nonsense.


Re: First Impressions!

2017-11-28 Thread Michael V. Franklin via Digitalmars-d
On Tuesday, 28 November 2017 at 19:34:27 UTC, A Guy With an 
Opinion wrote:



I take it adding those inverse attributes is no trivial thing?


It would require a DIP: https://github.com/dlang/DIPs

This DIP is related 
(https://github.com/dlang/DIPs/blob/master/DIPs/DIP1012.md) but I 
don't know what's happening with it.


Mike


Re: First Impressions!

2017-11-28 Thread A Guy With an Opinion via Digitalmars-d

On Tuesday, 28 November 2017 at 16:24:56 UTC, Adam D. Ruppe wrote:


That doesn't quite work since it doesn't descend into 
aggregates. And you can't turn most them off.


I take it adding those inverse attributes is no trivial thing?


Re: First Impressions!

2017-11-28 Thread Jacob Carlborg via Digitalmars-d

On 2017-11-28 17:24, Adam D. Ruppe wrote:

That doesn't quite work since it doesn't descend into aggregates. And 
you can't turn most them off.


And if your project is a library.

--
/Jacob Carlborg


Re: First Impressions!

2017-11-28 Thread Patrick Schluter via Digitalmars-d
On Tuesday, 28 November 2017 at 04:19:40 UTC, A Guy With an 
Opinion wrote:
On Tuesday, 28 November 2017 at 04:17:18 UTC, A Guy With an 
Opinion wrote:

[...]


Also, C and C++ didn't just have undefined behavior, sometimes 
it has inconsistent behavior. Sometimes int a; is actually set 
to 0.
It's only auto variables that are undefined. statics and code 
unit (aka globals) are defined.


Re: First Impressions!

2017-11-28 Thread Patrick Schluter via Digitalmars-d
On Tuesday, 28 November 2017 at 04:17:18 UTC, A Guy With an 
Opinion wrote:

On Tuesday, 28 November 2017 at 04:12:14 UTC, ketmar wrote:

A Guy With an Opinion wrote:

That is true, but I'm still unconvinced that making the 
person's program likely to error is better than initializing 
a number to 0. Zero is such a fundamental default for so many 
things. And it would be consistent with the other number 
types.
basically, default initializers aren't meant to give a "usable 
value", they meant to give a *defined* value, so we don't have 
UB. that is, just initialize your variables explicitly, don't 
rely on defaults. writing:


int a;
a += 42;

is still bad code, even if you're know that `a` is guaranteed 
to be zero.


int a = 0;
a += 42;

is the "right" way to write it.

if you'll look at default values from this PoV, you'll see 
that NaN has more sense that zero. if there was a NaN for 
ints, ints would be inited with it too. ;-)


Eh...I still don't agree. I think C and C++ just gave that 
style of coding a bad rap due to the undefined behavior. But 
the issue is it was undefined behavior. A lot of language 
features aim to make things well defined and have less verbose 
representations. Once a language matures that's what a big 
portion of their newer features become. Less verbose shortcuts 
of commonly done things. I agree it's important that it's well 
defined, I'm just thinking it should be a value that someone 
actually wants some notable fraction of the time. Not something 
no one wants ever.


I could be persuaded, but so far I'm not drinking the koolaid 
on that. It's not the end of the world, I was just confused 
when my float was NaN.


Just a little anecdote of a maintainer of a legacy project in C. 
My predecessors in that project had the habit of systematically 
initialize any auto declared variable at the beginning of a 
function. The code base that was initiated in the early '90s and 
written by people who were typical BASIC programmer, so the 
consequence of it was that functions were very often hundreds of 
lines long and they all started with a lot of declarations.
In the years of reviewing that code, and I was really surprised 
by that, was how often I found bugs because the variables had 
been wrongly initialised. By initialising with 0 or NULL, the 
data flow pass was essentially suppressed at the start so that it 
could not detect when variables were used before they had been 
properly populated with the right values the functionality 
required. The thing with these kind of bugs was that they were 
very subtle.


To make it short, 0 is an arbitrary number that often is the 
right value but when it isn't, it can be a pain to detect that it 
was the wrong value.




Re: First Impressions!

2017-11-28 Thread Adam D. Ruppe via Digitalmars-d

On Tuesday, 28 November 2017 at 16:14:52 UTC, Jack Stouffer wrote:
You can do it on a per-file basis by putting the attributes at 
the top like so


That doesn't quite work since it doesn't descend into aggregates. 
And you can't turn most them off.


Re: First Impressions!

2017-11-28 Thread Jack Stouffer via Digitalmars-d
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:
- Attributes. I had another post in the Learn forum about 
attributes which was unfortunate. At first I was excited 
because it seems like on the surface it would help me write 
better code, but it gets a little tedious and tiresome to have 
to remember to decorate code with them. It seems like most of 
them should have been the defaults. I would have preferred if 
the compiler helped me and reminded me. I asked if there was a 
way to enforce them globally, which I guess there is, but I 
guess there's also not a way to turn some of them off 
afterwards. A bit unfortunate. But at least I can see some 
solutions to this.


Attributes were one of my biggest hurdles when working on my own 
projects. For example, it's a huge PITA when you have to add a 
debug writeln deep down in your call stack, and it ends up 
violating a bunch of function attributes further up. Thankfully, 
wrapping statements in debug {} allows you to ignore pure and 
@safe violations in that code if you compile with the flag -debug.


Also, you can apply attributes to your whole project by adding 
them to main


void main(string[] args) @safe {}

Although this isn't recommended, as almost no program can be 
completely safe. You can do it on a per-file basis by putting the 
attributes at the top like so


@safe:
pure:


Re: First Impressions!

2017-11-28 Thread A Guy With an Opinion via Digitalmars-d
On Tuesday, 28 November 2017 at 13:17:16 UTC, Steven 
Schveighoffer wrote:

https://github.com/schveiguy/dcollections


On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole 
wrote:

https://github.com/economicmodeling/containers


Thanks. I'll check both out. It's not that I don't want to write 
them, it's just I don't want to stop what I'm doing when I need 
them and write them. It takes me out of my thought process.


Re: First Impressions!

2017-11-28 Thread A Guy With an Opinion via Digitalmars-d
On Tuesday, 28 November 2017 at 13:17:16 UTC, Steven 
Schveighoffer wrote:
This is likely because of Adam's suggestion -- you were 
incorrectly declaring a function that returned an immutable 
like this:



immutable T foo();

-Steve


That's exactly what it was I think. As I stated before, I tried 
to do immutable(T) but I was drowning in errors at that point 
that I just took a step back. I'll try to refactor it back to 
using immutable. I just honestly didn't quite know what I was 
doing obviously.




Re: First Impressions!

2017-11-28 Thread Guillaume Piolat via Digitalmars-d
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:

So those are just some of my thoughts. Tell me why I'm wrong :P


You are not supposed to come to this forum with well-balanced 
opinions and reasonable arguments. It's not colourful enough to 
be heard!


Instead make a dent in the universe. Prepare your most impactful, 
most offensive statements to push your personal agenda of what 
your own system programming language would be like, if you had 
the stamina. Use doubtful analogies and references to languages 
with wildly different goals than D. Prepare to abuse the 
volunteers, and say how much you would dare to use D, if only it 
would do "just this one obvious change". Having this feature 
would make the BlobTech industry switch to D overnight!


And you haven't asked for any new feature, especially no new 
_syntax_ were demanded! I don't know, find anything:


"It would be nice to have a shortcut syntax for when you wan't to 
add zero. Writing 0 + x is cumbersome, when +x would do it. It 
has the nice benefit or unifying unary and binary operators, and 
thus leads to a simplified implementation."


Do you realize the dangers of looking satisfied?




Re: First Impressions!

2017-11-28 Thread Steven Schveighoffer via Digitalmars-d

On 11/27/17 10:01 PM, A Guy With an Opinion wrote:

Hi,


Hi Guy, welcome, and I wanted to say I was saying "me too" while reading 
much of your post. I worked on a C# based client/server for about 5 
years, and the biggest thing I agree with you on is the generic 
programming. I was also using D at the time, and using generics felt 
like eating a superbly under-baked cake.


A few points:

- Some of the errors from DMD are a little strange. I don't want to crap 
on this too much, because for the most part it's fine. However 
occasionally it throws errors I still can't really work out why THAT is 
the error it gave me. Some of you may have saw my question in the 
"Learn" forum about not knowing to use static in an embedded class, but 
the error was the following:


Error: 'this' is only defined in non-static member functions


Yes, this is simply a bad error message. Many of our bad error messages 
come from something called "lowering", where one piece of code is 
converted to another piece of code, and then the error message happens 
on the converted code. So essentially you are getting errors on code you 
didn't write!


They are more difficult to fix, since we can't change the real error 
message (it applies to real code as well), and the code that generated 
the lowered code is decoupled from the error. I think this is one of 
those cases.


I'd say the errors so far are above some of the cryptic stuff C++ can 
throw at you (however, I haven't delved that deeply into D templates 
yet, so don't hold me to this yet), but in terms of quality I'd put it 
somewhere between C# and C++ in quality. With C# being the ideal.


Once you use templates a lot, the error messages explode in cryptology 
:) But generally, you can get the gist of your errors if you can 
decipher half-way the mangling.


- ...however, where are all of the collections? No Queue? No Stack? No 
HashTable? I've read that it's not a big focus because some of the built 
in stuff *can* behave like those things. The C# project I'm porting 
utilizes queues and a specifically C#'s Dictionary<> quite a bit, so I'm 
not looking forward to having to hand roll my own or use something that 
aren't fundamentally them. This is definitely the biggest negative I've 
come across. I want a queue, not something that *can* behave as a queue. 
I definitely expected more from a language that is this old.


I haven't touched this in years, but it should still work pretty well 
(if you try it and it doesn't compile for some reason, please submit an 
issue there): https://github.com/schveiguy/dcollections


It has more of a Java/C# feel than other libraries, including an 
interface hierarchy.


That being said, Queue is just so easy to implement given a linked list, 
I never bothered :)


+ Unit tests. Finally built in unit tests. Enough said here. If the lack 
of collections was the biggest negative, this is the biggest positive. I 
would like to enable them at build time if possible though.


+1000

About the running of unit tests at build time, many people version their 
main function like this:


version(unittest) void main() {}
else

int main(string[] args) // real declaration
{ ... }

This way, when you build with -unittest, you only run unit tests, and 
exit immediately. So enabling them at build time is quite easy.


- Attributes. I had another post in the Learn forum about attributes 
which was unfortunate. At first I was excited because it seems like on 
the surface it would help me write better code, but it gets a little 
tedious and tiresome to have to remember to decorate code with them. It 
seems like most of them should have been the defaults. I would have 
preferred if the compiler helped me and reminded me. I asked if there 
was a way to enforce them globally, which I guess there is, but I guess 
there's also not a way to turn some of them off afterwards. A bit 
unfortunate. But at least I can see some solutions to this.


If you are using more templates (and I use them the more I write D 
code), you will not have this problem. Templates infer almost all 
attributes.


- Immutable. I'm not sure I fully understand it. On the surface it 
seemed like const but transitive. I tried having a method return an 
immutable value, but when I used it in my unit test I got some weird 
errors about objects not being able to return immutable (I forget the 
exact error...apologies). I refactored to use const, and it all worked 
as I expected, but I don't get why the immutable didn't work. I was 
returning a value type, so I don't see why passing in 
assert(object.errorCount == 0) would have triggered errors. But it did. 


This is likely because of Adam's suggestion -- you were incorrectly 
declaring a function that returned an immutable like this:


immutable T foo();

Where the immutable *doesn't* apply to the return value, but to the 
function itself. immutable applied to a function is really applying 
immutable to the 'this' reference.


+ Templates seem powe

Re: First Impressions!

2017-11-27 Thread A Guy With an Opinion via Digitalmars-d
On Tuesday, 28 November 2017 at 05:16:54 UTC, Michael V. Franklin 
wrote:
On Tuesday, 28 November 2017 at 04:48:57 UTC, A Guy With an 
Opinion wrote:


I'd be happy to submit an issue, but I'm not quite sure I'd be 
the best to determine an error message (at least not this 
early). Mainly because I have no clue what it was yelling at 
me about. I only new to add static because I told people my 
intentions and they suggested it. I guess having a non 
statically marked class is a valid feature imported from Java 
world.


If this was on the forum, please point me to it.  I'll see if I 
can understand what's going on and do something about it.


Thanks,
Mike


https://forum.dlang.org/thread/vcvlffjxowgdvpvjs...@forum.dlang.org


Re: First Impressions!

2017-11-27 Thread Michael V. Franklin via Digitalmars-d
On Tuesday, 28 November 2017 at 04:48:57 UTC, A Guy With an 
Opinion wrote:


I'd be happy to submit an issue, but I'm not quite sure I'd be 
the best to determine an error message (at least not this 
early). Mainly because I have no clue what it was yelling at me 
about. I only new to add static because I told people my 
intentions and they suggested it. I guess having a non 
statically marked class is a valid feature imported from Java 
world.


If this was on the forum, please point me to it.  I'll see if I 
can understand what's going on and do something about it.


Thanks,
Mike




Re: First Impressions!

2017-11-27 Thread A Guy With an Opinion via Digitalmars-d
On Tuesday, 28 November 2017 at 04:37:04 UTC, Michael V. Franklin 
wrote:
Please submit things like this to the issue tracker.  They are 
very easy to fix, and if I'm aware of them, I'll probably do 
the work.  But, please provide a code example and
offer a suggestion of what you would prefer it to say; it just 
makes things easier.>


I'd be happy to submit an issue, but I'm not quite sure I'd be 
the best to determine an error message (at least not this early). 
Mainly because I have no clue what it was yelling at me about. I 
only new to add static because I told people my intentions and 
they suggested it. I guess having a non statically marked class 
is a valid feature imported from Java world. I'm just not as 
familiar with that specific feature of Java. Therefore I have no 
idea what the text really had to do with anything. Maybe 
appending "if you meant to make a static class" would have been 
helpful. I fiddled with Rust a little too, and it's what they 
tend to do very well. Make verbose error messages.



We're not alone:  https://youtu.be/6_xdfSVRrKo?t=353


And he was so much better at articulating it than I was. Another 
C# guy though. :)




Re: First Impressions!

2017-11-27 Thread Michael V. Franklin via Digitalmars-d
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:


+ D code so far is pushing me towards more "flat" code (for a 
lack of a better way to phrase it) and so far that has helped 
tremendously when it comes to readability. C# kind is the 
opposite. With it's namespace -> class -> method coupled with 
lock, using, etc...you tend to do a lot of nesting. You are 
generally 3 '{' in before any true logic even begins. Then 
couple that with try/catch, IDisposable/using, locking, and 
then if/else, it can get quite chaotic very easily. So right 
away, I saw my C# code actually appear more readable when I 
translated it and I think it has to do with the flatness. I'm 
not sure if that opinion will hold when I delve into 'static 
if' a little more, but so far my uses of it haven't really 
dampened that opinion.


I come from a heavy C#/C++ background.  I also I *felt* this as 
well, but never really consciously though about it, until you 
mentioned it :-)


- Some of the errors from DMD are a little strange. I don't 
want to crap on this too much, because for the most part it's 
fine. However occasionally it throws errors I still can't 
really work out why THAT is the error it gave me. Some of you 
may have saw my question in the "Learn" forum about not knowing 
to use static in an embedded class, but the error was the 
following:


Error: 'this' is only defined in non-static member functions


Please submit things like this to the issue tracker.  They are 
very easy to fix, and if I'm aware of them, I'll probably do the 
work.  But, please provide a code example and offer a suggestion 
of what you would prefer it to say; it just makes things easier.


- Modules. I like modules better than #include, but I don't 
like them better than C#'s namespaces. Specifically I don't 
like how there is this gravity that kind of pulls me to 
associate a module with a file. It appears you don't have to, 
because I can do the package thing, but whenever I try to do 
things outside that one idiom I end up in a soup of errors. I'm 
sure I'm just not use to it, but so far it's been a little 
dissatisfying. Sometimes I want where it is physically on my 
file system to be different from how I include it in other 
source files. To me, C#'s namespaces are really the standard to 
beat or meet.


I feel the same.  I don't like that modules are tied to files; it 
seems like such an arbitrary limitation.  We're not alone:  
https://youtu.be/6_xdfSVRrKo?t=353


- Attributes. I had another post in the Learn forum about 
attributes which was unfortunate. At first I was excited 
because it seems like on the surface it would help me write 
better code, but it gets a little tedious and tiresome to have 
to remember to decorate code with them. It seems like most of 
them should have been the defaults. I would have preferred if 
the compiler helped me and reminded me. I asked if there was a 
way to enforce them globally, which I guess there is, but I 
guess there's also not a way to turn some of them off 
afterwards. A bit unfortunate. But at least I can see some 
solutions to this.


Yep.  One of my pet peeves in D.

- The defaults for primitives seem off. They seem to encourage 
errors. I don't think that is the best design decision even if 
it encourages the errors to be caught as quickly as possible. I 
think the better decision would be to not have the errors 
occur. When I asked about this, there seemed to be a 
disassociation between the spec and the implementation. The 
spec says a declaration should error if not explicitly set, but 
the implementation just initializes them to something that is 
likely to error. Like NaN for floats which I would have thought 
would have been 0 based on prior experiences with other 
languages.


Another one of my pet peeves in D.  Though this post 
(http://forum.dlang.org/post/tcldaatzzbhjoamnv...@forum.dlang.org) made me realize we might be able to do something about that.


+- Unicode support is good. Although I think D's string type 
should have probably been utf16 by default. Especially 
considering the utf module states:


"UTF character support is restricted to '\u' <= character 
<= '\U0010'."


See http://utf8everywhere.org/

+ Templates seem powerful. I've only fiddled thus far, but I 
don't think I've quite comprehended their usefulness yet. It 
will probably take me some time to figure out how to wield them 
effectively. One thing I accidentally stumbled upon that I 
liked was that I could simulate inheritance in structs with 
them, by using the mixin keyword. That was cool, and I'm not 
even sure if that is what they were really meant to enable.


Templates, CTFE, and mixins are gravy! and D's the only language 
I know of that has this symbiotic feature set.



So those are just some of my thoughts. Tell me why I'm wrong :P


I share much of your perspective.  Thanks for the interesting 
read.


Mike




Re: First Impressions!

2017-11-27 Thread A Guy With an Opinion via Digitalmars-d

On Tuesday, 28 November 2017 at 04:24:46 UTC, Adam D. Ruppe wrote:

immutable(int) errorCount() { return ...; }


I actually did try something like that, because I remembered 
seeing the parens around the string definition. I think at that 
point I was just so riddled with errors I just took a step back 
and went back to something I know. Just to make sure I wasn't 
going insane.




Re: First Impressions!

2017-11-27 Thread codephantom via Digitalmars-d
On Tuesday, 28 November 2017 at 04:19:40 UTC, A Guy With an 
Opinion wrote:
Also, C and C++ didn't just have undefined behavior, sometimes 
it has inconsistent behavior. Sometimes int a; is actually set 
to 0.


set to?


Re: First Impressions!

2017-11-27 Thread ketmar via Digitalmars-d

A Guy With an Opinion wrote:


Eh...I still don't agree.
anyway, it is something that won't be changed, 'cause there may be code 
that rely on current default values.


i'm not really trying to change your mind, i just tried to give a rationale 
behind the choice. that's why `char.init` is 255 too, not zero.


still, explicit variable initialization looks better for me. with default 
init, it is hard to say if the author just forget to initialize a variable, 
and it happens to work, or he knows about the default value and used it. 
and the reader don't have to guess what default value is.


Re: First Impressions!

2017-11-27 Thread Adam D. Ruppe via Digitalmars-d
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:

- Some of the errors from DMD are a little strange.


Yes, indeed, and many of them don't help much in finding the real 
source of your problem. I think improvements to dmd's error 
reporting would be the #1 productivity gain D could get right now.


- ...however, where are all of the collections? No Queue? No 
Stack? No HashTable?


I always say "meh" to that because any second year student can 
slap those together in... well, for a second year student, maybe 
a couple hours for the student, but after that you're looking at 
just a few minutes, especially leveraging D's built in arrays and 
associative arrays as your foundation.


Sure, they'd be nice to have, but it isn't a dealbreaker in the 
slightest.


Try turning Dictionary into D's string[string], 
for example.


Sometimes I want where it is physically on my file system to be 
different from how I include it in other source files.


This is a common misconception, though one promoted by several of 
the tools: you don't actually need to match file system layout to 
modules.


OK, sure, D does require one module == one file. But the file 
name and location is not actually tied to the import name you use 
in code. They can be anything, you just need to pass the list of 
files to the compiler so it can parse them and figure out the 
names.


- Attributes. I had another post in the Learn forum about 
attributes which was unfortunate.


Yeah, of course, from my post there you know my basic opinion on 
them. I've written in more detail about them elsewhere and don't 
feel like it tonight, but I think they are a big failure right 
now but they could be fixed if we're willing to take a few 
steps (#0 improve the error messages, #1 add opposites to all of 
them, e.g. throws and @gc, #2, change the defaults via a single 
declaration at the module level, #3 omg revel in how useful they 
are)


- Immutable. I'm not sure I fully understand it. On the surface 
it seemed like const but transitive.


const is transitive too. So the difference is really that `const` 
means YOU won't change it, whereas `immutable` means NOBODY will 
change it.


What's important there is that to make something immutable, you 
need to prove to the compiler's satisfaction that nobody else can 
change it either.


const/immutable in D isn't as common as in its family of 
languages (C++ notably), but when you do get to use it - at least 
once you get to know it - it is useful.


I was returning a value type, so I don't see why passing in 
assert(object.errorCount == 0) would have triggered errors.


Was the object itself immutable? I suspect you wrote something 
like this:


immutable int errorCount() { return ...; }


But this is a curious syntax... the `immutable` there actually 
applies to the *object*, not the return value! It means you can 
call this method on an immutable object (in fact, it means you 
MUST call it on an immutable object. const is the middle ground 
that allows you to call it on either)



immutable(int) errorCount() { return ...; }

note the parens, is how you apply it to the return value. Yes, 
this is kinda weird, and style guides tend to suggest putting the 
qualifiers after the argument list for the `this` thing instead 
of before... but the language allows it before, so it trips up a 
LOT of people like this.


The type string seems to be an immutable(char[]) which works 
exactly the way I was expecting,


It is actually `immutable(char)[]`. The parens are important here 
- it applies to the contents of the array, but not the array 
itself here.


+- Unicode support is good. Although I think D's string type 
should have probably been utf16 by default. Especially 
considering the utf module states:


Note that it has UTF-16 built in as well, with almost equal 
support. Put `w` at the end of a literal:


`"this literal is UTF-16"w` // notice the w after the "

and you get utf16. It considers that to be `wstring` instead of 
`string`, but it works basically the same.


If you are doing a lot of Windows API work, this is pretty useful!

That was cool, and I'm not even sure if that is what they were 
really meant to enable.


yes, indeed. plugging my book 
https://www.packtpub.com/application-development/d-cookbook i 
talk about much of this stuff in there




Re: First Impressions!

2017-11-27 Thread A Guy With an Opinion via Digitalmars-d
On Tuesday, 28 November 2017 at 04:17:18 UTC, A Guy With an 
Opinion wrote:

On Tuesday, 28 November 2017 at 04:12:14 UTC, ketmar wrote:

A Guy With an Opinion wrote:

That is true, but I'm still unconvinced that making the 
person's program likely to error is better than initializing 
a number to 0. Zero is such a fundamental default for so many 
things. And it would be consistent with the other number 
types.
basically, default initializers aren't meant to give a "usable 
value", they meant to give a *defined* value, so we don't have 
UB. that is, just initialize your variables explicitly, don't 
rely on defaults. writing:


int a;
a += 42;

is still bad code, even if you're know that `a` is guaranteed 
to be zero.


int a = 0;
a += 42;

is the "right" way to write it.

if you'll look at default values from this PoV, you'll see 
that NaN has more sense that zero. if there was a NaN for 
ints, ints would be inited with it too. ;-)


Eh...I still don't agree. I think C and C++ just gave that 
style of coding a bad rap due to the undefined behavior. But 
the issue is it was undefined behavior. A lot of language 
features aim to make things well defined and have less verbose 
representations. Once a language matures that's what a big 
portion of their newer features become. Less verbose shortcuts 
of commonly done things. I agree it's important that it's well 
defined, I'm just thinking it should be a value that someone 
actually wants some notable fraction of the time. Not something 
no one wants ever.


I could be persuaded, but so far I'm not drinking the koolaid 
on that. It's not the end of the world, I was just confused 
when my float was NaN.


Also, C and C++ didn't just have undefined behavior, sometimes it 
has inconsistent behavior. Sometimes int a; is actually set to 0.


Re: First Impressions!

2017-11-27 Thread A Guy With an Opinion via Digitalmars-d

On Tuesday, 28 November 2017 at 04:12:14 UTC, ketmar wrote:

A Guy With an Opinion wrote:

That is true, but I'm still unconvinced that making the 
person's program likely to error is better than initializing a 
number to 0. Zero is such a fundamental default for so many 
things. And it would be consistent with the other number types.
basically, default initializers aren't meant to give a "usable 
value", they meant to give a *defined* value, so we don't have 
UB. that is, just initialize your variables explicitly, don't 
rely on defaults. writing:


int a;
a += 42;

is still bad code, even if you're know that `a` is guaranteed 
to be zero.


int a = 0;
a += 42;

is the "right" way to write it.

if you'll look at default values from this PoV, you'll see that 
NaN has more sense that zero. if there was a NaN for ints, ints 
would be inited with it too. ;-)


Eh...I still don't agree. I think C and C++ just gave that style 
of coding a bad rap due to the undefined behavior. But the issue 
is it was undefined behavior. A lot of language features aim to 
make things well defined and have less verbose representations. 
Once a language matures that's what a big portion of their newer 
features become. Less verbose shortcuts of commonly done things. 
I agree it's important that it's well defined, I'm just thinking 
it should be a value that someone actually wants some notable 
fraction of the time. Not something no one wants ever.


I could be persuaded, but so far I'm not drinking the koolaid on 
that. It's not the end of the world, I was just confused when my 
float was NaN.


Re: First Impressions!

2017-11-27 Thread ketmar via Digitalmars-d

A Guy With an Opinion wrote:

That is true, but I'm still unconvinced that making the person's program 
likely to error is better than initializing a number to 0. Zero is such a 
fundamental default for so many things. And it would be consistent with 
the other number types.
basically, default initializers aren't meant to give a "usable value", they 
meant to give a *defined* value, so we don't have UB. that is, just 
initialize your variables explicitly, don't rely on defaults. writing:


int a;
a += 42;

is still bad code, even if you're know that `a` is guaranteed to be zero.

int a = 0;
a += 42;

is the "right" way to write it.

if you'll look at default values from this PoV, you'll see that NaN has 
more sense that zero. if there was a NaN for ints, ints would be inited 
with it too. ;-)


Re: First Impressions!

2017-11-27 Thread A Guy With an Opinion via Digitalmars-d
On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole 
wrote:


Its on our TODO list.

Allocators need to come out of experimental and some form of RC 
before we tackle it again.


In the mean time https://github.com/economicmodeling/containers 
is pretty good.


That's good to hear.

I keep saying it, if you don't have unit tests built in, you 
don't care about code quality!




I just like not having to create a throwaway project to test my 
code. It's nice to just use unit tests for what I used to create 
console apps for and then it forever ensures my

code works the same!


You don't need to bother with them for most code :)


That seems to be what people here are saying, but that seems so 
sad...




Doesn't mean the other languages are right either.



That is true, but I'm still unconvinced that making the person's 
program likely to error is better than initializing a number to 
0. Zero is such a fundamental default for so many things. And it 
would be consistent with the other number types.




If you need a wstring, use a wstring!

Be aware Microsoft is alone in thinking that UTF-16 was 
awesome. Everybody else standardized on UTF-8 for Unicode.




I do come from that world, so there is a chance I'm just 
comfortable with it.






Re: First Impressions!

2017-11-27 Thread rikki cattermole via Digitalmars-d

On 28/11/2017 3:01 AM, A Guy With an Opinion wrote:

Hi,

I've been using D for a personal project for about two weeks now and 
just thought I'd share my initial impression just in case it's useful! I 
like feedback on things I do, so I just assume others do to. Plus my 
opinion is the best on the internet! You will see (hopefully the sarcasm 
is obvious otherwise I'll just appear pompous). It would probably be 
better if I did a retrospective after my project is completed, but with 
life who knows if that will happen. I could lose interest or something 
and not finish it. And then you guys wouldn't know my opinion. I can't 
allow that.


I'll start off by saying I like the overall experience. I come from a C# 
and C++ background with a little bit of C mixed in. For the most part 
though, I work with C#, SQL and web technologies on a day to day basis. 
I did do a three year stint working with C/C++ (mostly C++), but I never 
really enjoyed it much. C++ is overly verbose, overly complicated, 
overly littered with poor legacy decisions, and too error prone. C# on 
the other hand has for the most part been a delight. The only problem is 
I don't find it to be the best when it comes to generative programming. 
C# can do some generative programming with it's generics, but for the 
most part it's always struck me as more specialized for container types 
and to do anything remotely outside of it's purpose takes a fair bit of 
cleverness. I'm sick of being clever in that aspect.


So here are some impressions good and bad:

+ Porting straight C# seems pretty straight forward. Even some of the 
.NET framework, like files and unicode, have fairly direct counterparts 
in D.


+ D code so far is pushing me towards more "flat" code (for a lack of a 
better way to phrase it) and so far that has helped tremendously when it 
comes to readability. C# kind is the opposite. With it's namespace -> 
class -> method coupled with lock, using, etc...you tend to do a lot of 
nesting. You are generally 3 '{' in before any true logic even begins. 
Then couple that with try/catch, IDisposable/using, locking, and then 
if/else, it can get quite chaotic very easily. So right away, I saw my 
C# code actually appear more readable when I translated it and I think 
it has to do with the flatness. I'm not sure if that opinion will hold 
when I delve into 'static if' a little more, but so far my uses of it 
haven't really dampened that opinion.


+ Visual D. It might be that I had poor expectations of it, because I 
read D's tooling was poor on the internet (and nothing is ever wrong on 
the internet), however, the combination of Visual D and DMD actually 
exceeded my expectations. I've been quite happy with it. It was 
relatively easy to set up and worked as I would expect it to work. It 
lets me debug, add breakpoints, and does the basic syntax highlighting I 
would expect. It could have a few other features, but for a project that 
is not corporate backed, it was really above what I could have asked for.


+ So far, compiling is fast. And from what I hear it will stay fast. A 
big motivator. The one commercial C++ project I worked on was a beast 
and could take an hour+ to compile if you needed to compile something 
fundamental. C# is fairly fast, so I've grown accustomed to not having 
to go to the bathroom, get a drink, etc...before returning to find out 
I'm on the linking step. I'm used to if it doesn't take less than ten 
seconds (probably less) then I prep myself for an error to deal with. I 
want this to remain.


- Some of the errors from DMD are a little strange. I don't want to crap 
on this too much, because for the most part it's fine. However 
occasionally it throws errors I still can't really work out why THAT is 
the error it gave me. Some of you may have saw my question in the 
"Learn" forum about not knowing to use static in an embedded class, but 
the error was the following:


Error: 'this' is only defined in non-static member functions

I'd say the errors so far are above some of the cryptic stuff C++ can 
throw at you (however, I haven't delved that deeply into D templates 
yet, so don't hold me to this yet), but in terms of quality I'd put it 
somewhere between C# and C++ in quality. With C# being the ideal.


+ The standard library so far is really good. Nullable worked as I 
thought it should. I just guessed a few of the methods based on what I 
had seen at that point and got it right. So it appears consistent and 
intuitive. I also like the fact I can peek at the code and understand it 
by just reading it. Unlike with C++ where I still don't know how some of 
the stuff is *really* implemented. The STL almost seems like it's 
written in a completely different language than the stuff it enables. 
For instance, I figured out how to do packages by seeing it in Phobos.


- ...however, where are all of the collections? No Queue? No Stack? No 
HashTable? I've read that it's not a big focus because some of the built 
in 

Re: First Impressions!

2017-11-27 Thread docandrew via Digitalmars-d
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an 
Opinion wrote:


- ...however, where are all of the collections? No Queue? No 
Stack? No HashTable? I've read that it's not a big focus 
because some of the built in stuff *can* behave like those 
things. The C# project I'm porting utilizes queues and a 
specifically C#'s Dictionary<> quite a bit, so I'm not looking 
forward to having to hand roll my own or use something that 
aren't fundamentally them. This is definitely the biggest 
negative I've come across. I want a queue, not something that 
*can* behave as a queue. I definitely expected more from a 
language that is this old.




Good feedback overall, thanks for checking it out. You're not 
wrong, but some of the design decisions that feel strange to 
newcomers at first have been heavily-debated, generally 
well-reasoned, and just take some time to get used to. That 
sounds like a cop-out, but stick with it and I think you'll find 
that a lot of the decisions make sense - see the extensive 
discussion on NaN-default for floats, for example.


Just one note about the above comment though: the 
std.container.dlist doubly-linked list has methods that you can 
use to put together stacks and queues easily:


https://dlang.org/phobos/std_container_dlist.html

Also, D's associative arrays implement a hash map 
https://dlang.org/spec/hash-map.html, which I think should take 
care of most of C#'s Dictionary functionality.


Anyhow, D is a big language (for better and sometimes worse), so 
it's easy to miss some of the good nuggets buried within the 
spec/library.


-Doc