Re: [OT] Re: First Impressions!
On Monday, 4 December 2017 at 21:23:51 UTC, Andrei Alexandrescu wrote: On 12/2/17 5:16 PM, Joakim wrote: Yep, that's why five years back many of the major Chinese sites were still not using UTF-8: http://xahlee.info/w/what_encoding_do_chinese_websites_use.html That led that Chinese guy to also rant against UTF-8 a couple years ago: http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html BTW has anyone been in contact with Xah Lee? Perhaps we could commission him to write some tutorial material for D. -- Andrei I traded email with him last summer, emailed you his email address just now.
[OT] Re: First Impressions!
On 12/2/17 5:16 PM, Joakim wrote: Yep, that's why five years back many of the major Chinese sites were still not using UTF-8: http://xahlee.info/w/what_encoding_do_chinese_websites_use.html That led that Chinese guy to also rant against UTF-8 a couple years ago: http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html BTW has anyone been in contact with Xah Lee? Perhaps we could commission him to write some tutorial material for D. -- Andrei
Re: First Impressions!
On Sunday, 3 December 2017 at 01:59:58 UTC, H. S. Teoh wrote: Still, it betrays the emperor's invisible clothes of the "graphics == intuitive" mantra -- you still have to learn the icons just like you have to learn the keywords of a text-based UI, before you can use the software effectively. What happened when you ran vi for the first time?
Re: First Impressions!
On 12/2/17 11:28 PM, Walter Bright wrote: On 12/2/2017 5:59 PM, H. S. Teoh wrote: [...] Even worse, companies go and copyright their icons, guaranteeing they have to be substantially different for every company! I like this site for icons. Only requires you to reference them in your about box: https://icons8.com/ -Steve
Re: First Impressions!
On Saturday, 2 December 2017 at 22:16:09 UTC, Joakim wrote: On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote: On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via Digitalmars-d wrote: On 11/30/2017 9:23 AM, Kagamin wrote: > On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki > cattermole wrote: > > Be aware Microsoft is alone in thinking that UTF-16 was > > awesome. Everybody else standardized on UTF-8 for Unicode. > > UCS2 was awesome. UTF-16 is used by Java, JavaScript, > Objective-C, Swift, Dart and ms tech, which is 28% of tiobe > index. "was" :-) Those are pretty much pre-surrogate pair designs, or based on them (Dart compiles to JavaScript, for example). UCS2 has serious problems: 1. Most strings are in ascii, meaning UCS2 doubles memory consumption. Strings in the executable file are twice the size. This is not true in Asia, esp. where the CJK block is extensively used. A CJK block character is 3 bytes in UTF-8, meaning that string sizes are 150% of the UCS2 encoding. If your code contains a lot of CJK text, that's a lot of bloat. Yep, that's why five years back many of the major Chinese sites were still not using UTF-8: http://xahlee.info/w/what_encoding_do_chinese_websites_use.html Summary Taiwan sites almost all use UTF-8. Very old ones still use BIG5. Mainland China sites mostly still use GBK or GB2312, but a few newer ones use UTF-8. Many top Japan, Korea, sites also use UTF-8, but some uses EUC (Extended Unix Code) variants. This probably means that UTF-8 might dominate in the future. mmmh That led that Chinese guy to also rant against UTF-8 a couple years ago: http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html A rant from someone reproaching a video it doesn't provide reasons why utf-8 is good by not providing any reasons why utf-8 is bad. I'm not denying the issues with utf-8, only that the ranter doesn't provide any useful info on what the issues the "Asian" encounter with it, besides legacy reasons (which are important but do not enter in judging the technical quality of an encoding). Add to that that he advocates for GB18030 which is quite inferior to utf-8 except in the legacy support area (here some of the advantages of utf-8 that GB-18030 does not possess: auto-synchronization, algorithmic mapping of codepoints, error detection). If his only beef with utf-8 is the size for CJK text then he shouldn't argue for UTF-32 as he seems to do at the end.
Re: First Impressions!
On 12/2/2017 5:59 PM, H. S. Teoh wrote: [...] Even worse, companies go and copyright their icons, guaranteeing they have to be substantially different for every company! If there ever was an Emperor's New Clothes, it's icons and emojis.
Re: First Impressions!
On Sat, Dec 02, 2017 at 02:20:10AM -0800, Walter Bright via Digitalmars-d wrote: [...] > My car has a bunch emoticons labeling the controls. I can't figure out > what any of them do without reading the manual, or just pushing random > buttons until what I want happens. One button has an icon on it that > looks like a snowflake. What does that do? Turn on the A/C? Defrost > the frosty windows? Set the AWD in slippery mode? Turn on the > Christmas lights? The same can be argued for the icon mania started by the GUI craze in the 90's that has now become the de facto standard. Some icons are more obvious than others, but nowadays GUI toolbars are full of inscrutible icons of unclear meaning that are basically opaque unless you already have prior knowledge of what they're supposed to represent. Thankfully most(?) GUI programs have enough sanity left to provide tooltips with textual labels for what each button means. Still, it betrays the emperor's invisible clothes of the "graphics == intuitive" mantra -- you still have to learn the icons just like you have to learn the keywords of a text-based UI, before you can use the software effectively. Reminds me also of the infamous Mystery Meat navigation style of the 90's, where people would use images for navigation weblinks on their website, that you basically don't know where they're linking to until you click on it. This is why I think GUIs and the whole "desktop metaphor" craze is heading the wrong direction, and why 95% of my computer usage is via a text terminal. There's a place for graphical interfaces, but it's gone too far these days. But thanks to Unicode emoticons, we can now have icons on my text terminal too, isn't that just wonderful?! Esp. when a missing/incompatible font causes them to show up as literal blank boxes. The power of a standardized, universal character set, lemme tell ya! T -- Almost all proofs have bugs, but almost all theorems are true. -- Paul Pedersen
Re: First Impressions!
On Sunday, 3 December 2017 at 01:11:14 UTC, codephantom wrote: but my wider point is, unicode emoji's are useless if they only contain those that 'some' consider to be polictically correct, or socially acceptable. The Unicode consortium is a bunch of ... (I don't have the unicode emoji representation yet to complete that sentence). btw. Good article here, further demonstrating my point.. "We're talking about engineers that are concerned about standards and internationalization issues who now have to do something more in line with Apple or Google's marketing teams,". https://www.buzzfeed.com/charliewarzel/thanks-to-apples-influence-youre-not-getting-a-rifle-emoji
Re: First Impressions!
On Saturday, 2 December 2017 at 16:44:56 UTC, Ola Fosheim Grøstad wrote: On Saturday, 2 December 2017 at 12:25:22 UTC, codephantom wrote: Do the people on the unicode consortium consider such communication to be invalid? https://splinternews.com/violent-emoji-are-starting-to-get-people-in-trouble-wit-1793845130 On the other hand try to google "emoji sexual"… No. Humans never express negative emotions, and also, never communicate a desire to have sex. That's explains a lot about the unicode consortium. 's', 'e', 'x' is ok, just not together. Q.What's the difference between a politician and an emoji? A.Nothing. You cannot take either at face value. ..oophs. politics again. I should know better. but my wider point is, unicode emoji's are useless if they only contain those that 'some' consider to be polictically correct, or socially acceptable. The Unicode consortium is a bunch of ... (I don't have the unicode emoji representation yet to complete that sentence).
Re: First Impressions!
On 11/30/2017 10:07 PM, Patrick Schluter wrote: endianness Yeah, I forgot to mention that one. As if anyone remembers to put in the Byte Order Mark :-(
Re: First Impressions!
On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote: On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via Digitalmars-d wrote: On 11/30/2017 9:23 AM, Kagamin wrote: > On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki > cattermole wrote: > > Be aware Microsoft is alone in thinking that UTF-16 was > > awesome. Everybody else standardized on UTF-8 for Unicode. > > UCS2 was awesome. UTF-16 is used by Java, JavaScript, > Objective-C, Swift, Dart and ms tech, which is 28% of tiobe > index. "was" :-) Those are pretty much pre-surrogate pair designs, or based on them (Dart compiles to JavaScript, for example). UCS2 has serious problems: 1. Most strings are in ascii, meaning UCS2 doubles memory consumption. Strings in the executable file are twice the size. This is not true in Asia, esp. where the CJK block is extensively used. A CJK block character is 3 bytes in UTF-8, meaning that string sizes are 150% of the UCS2 encoding. If your code contains a lot of CJK text, that's a lot of bloat. Yep, that's why five years back many of the major Chinese sites were still not using UTF-8: http://xahlee.info/w/what_encoding_do_chinese_websites_use.html That led that Chinese guy to also rant against UTF-8 a couple years ago: http://xahlee.info/comp/unicode_utf8_encoding_propaganda.html Considering China buys more smartphones than the US and Europe combined, it's time people started recognizing their importance when it comes to issues like this: https://www.statista.com/statistics/412108/global-smartphone-shipments-global-region/ Regarding the unique representation issue Jonathan brings up, I've heard people say that was to provide an easier path for legacy encodings, ie some used combining characters and others didn't, so Unicode chose to accommodate both so both groups would move to Unicode. It would be nice if the Unicode people spent their time pruning and regularizing what they have, rather than adding more useless stuff. Speaking of which, completely agree with Walter and Jonathan that there's no need to add emoji and other such symbols to Unicode, should have never been added. Unicode is supposed to standardize long-existing characters, not promote marginal new symbols to characters. If there's a real need for it, chat software will figure out a way to do it, no need to add such symbols to the Unicode character set.
Re: First Impressions!
On Saturday, 2 December 2017 at 12:25:22 UTC, codephantom wrote: Do the people on the unicode consortium consider such communication to be invalid? https://splinternews.com/violent-emoji-are-starting-to-get-people-in-trouble-wit-1793845130 On the other hand try to google "emoji sexual"…
Re: First Impressions!
On Saturday, 2 December 2017 at 10:20:10 UTC, Walter Bright wrote: On 12/1/2017 8:08 PM, Jonathan M Davis wrote: [...] Yup. I've presented that point of view a couple times on HackerNews, and some Unicode people took umbrage at that. The case they presented fell a little flat. [...] Where it gets really fun is the when there is color composition for emoticons U+1F466 = 👦 U+1F466 U+1F3FF = 👦🏿
Re: First Impressions!
On Saturday, 2 December 2017 at 04:08:54 UTC, Jonathan M Davis wrote: code points. Emojis are specifically representable by a sequence of existing characters (usually ASCII), because they came from folks trying to represent pictures with text. They are used as symbols culturally, which is how written language happen, so I think the real question is if they have just implemented the ones that have become widespread over a long period of time or if they have deliberately created completely new ones... It makes sense for the most used ones. E.g. I don't want "8-(3+4)" to render as "😳3+4" ;-) There is also a difference between Ø and ∅, because the meaning is different. Too bad the same does not apply to arrows (math vs non math usage). So yeah, they could do better, but not too bad. If something is widely used in a way that gives signs a different meaning then it makes sense to introduce a new symbol for it so that one both can render them slightly differently and so that the programs can interpret them correctly.
Re: First Impressions!
On Saturday, 2 December 2017 at 04:08:54 UTC, Jonathan M Davis wrote: The fact that they're then trying to put those pictures into the Unicode standard just blatantly shows that the Unicode folks have lost sight of what they're up to. It's like if they started trying to add Unicode characters for words. It makes no sense. But unfortunately, we just have to live with it... :( - Jonathan M Davis The real problem, is that sometimes people don't feel like a little cat with a smiling face. Sometimes, people actually get pissed off at something, and would like to express it. Do the people on the unicode consortium consider such communication to be invalid? Where are the emoji's for saying.. I'm pissed off at this..or that.. (unicode consortium == emoji censorship) https://www.google.com.au/search?q=fuck+you+emoticon&source=lnms&tbm=isch&sa=X&ved=0ahUKEwiWkMzMpOvXAhWIj5QKHVnGC5YQ_AUICigB&biw=1536&bih=736
Re: First Impressions!
On Saturday, 2 December 2017 at 10:35:50 UTC, Patrick Schluter wrote: On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote: [...] That's true in theory, in practice it's not that severe as the CJK languages are never isolated and appear embedded in a lot of ASCII. You can read here a case study [1] which shows 106% for Simplified Chinese, 76% for Traditional Chinese, 129% for Japanese and 94% for Korean. These numbers for pure text. 106% for Korean, copied the wrong column. Traditiojal Chinese was smaller, probably because of whitespaces. Publish it on the web embedded in bloated html and there goes the size advantage of UTF-16 [...]
Re: First Impressions!
On Friday, 1 December 2017 at 23:16:45 UTC, H. S. Teoh wrote: On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via Digitalmars-d wrote: On 11/30/2017 9:23 AM, Kagamin wrote: > On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki > cattermole wrote: > > Be aware Microsoft is alone in thinking that UTF-16 was > > awesome. Everybody else standardized on UTF-8 for Unicode. > > UCS2 was awesome. UTF-16 is used by Java, JavaScript, > Objective-C, Swift, Dart and ms tech, which is 28% of tiobe > index. "was" :-) Those are pretty much pre-surrogate pair designs, or based on them (Dart compiles to JavaScript, for example). UCS2 has serious problems: 1. Most strings are in ascii, meaning UCS2 doubles memory consumption. Strings in the executable file are twice the size. This is not true in Asia, esp. where the CJK block is extensively used. A CJK block character is 3 bytes in UTF-8, meaning that string sizes are 150% of the UCS2 encoding. If your code contains a lot of CJK text, that's a lot of bloat. That's true in theory, in practice it's not that severe as the CJK languages are never isolated and appear embedded in a lot of ASCII. You can read here a case study [1] which shows 106% for Simplified Chinese, 76% for Traditional Chinese, 129% for Japanese and 94% for Korean. These numbers for pure text. Publish it on the web embedded in bloated html and there goes the size advantage of UTF-16 But then again, in non-Latin locales you'd generally store your strings separately of the executable (usually in l10n files), so this may not be that big an issue. But the blanket statement "Most strings are in ASCII" is not correct. False, in the sense that isolated pure text is rare and is generally delivered inside some file format, most times ASCII based like docx, odf, tmx, xliff, akoma ntoso etc... [1]: https://stackoverflow.com/questions/6883434/at-all-times-text-encoded-in-utf-8-will-never-give-us-more-than-a-50-file-size
Re: First Impressions!
On 2017-12-02 11:02, Walter Bright wrote: Are you sure about that? I know that Asian languages will be longer in UTF-8. But how much data that programs handle is in those languages? The language of business, science, programming, aviation, and engineering is english. Not necessarily. I've seen code in non-English languages, i.e. when the identifiers are non-English. But of course, most programming languages will using English for keywords and built-in functions. -- /Jacob Carlborg
Re: First Impressions!
On 12/1/2017 8:08 PM, Jonathan M Davis wrote: And personally, I think that their worst decisions tend to be at the code point level (e.g. having the same character being representable by different combinations of code points). Yup. I've presented that point of view a couple times on HackerNews, and some Unicode people took umbrage at that. The case they presented fell a little flat. Quite possbily the most depressing thing that I've run into with Unicode though was finding out that emojis had their own code points. Emojis are specifically representable by a sequence of existing characters (usually ASCII), because they came from folks trying to represent pictures with text. The fact that they're then trying to put those pictures into the Unicode standard just blatantly shows that the Unicode folks have lost sight of what they're up to. It's like if they started trying to add Unicode characters for words. It makes no sense. But unfortunately, we just have to live with it... :( Yah, I've argued against that, too. And those "international" icons are arguably one of the dumber ideas to ever sweep the world, yet they seem to be celebrated without question. Have you ever tried to look up an icon in a dictionary? It doesn't work. So if you don't know what an icon means, you're hosed. If it is a word you don't understand, you can look it up in a dictionary. Furthermore, you don't need to know English to know what "ON" means. There is no more cognitive difficulty asking someone what "ON" means than there is asking what "|" means. Is an illiterate person from XxLand really going to understand that "|" means "ON" without help? My car has a bunch emoticons labeling the controls. I can't figure out what any of them do without reading the manual, or just pushing random buttons until what I want happens. One button has an icon on it that looks like a snowflake. What does that do? Turn on the A/C? Defrost the frosty windows? Set the AWD in slippery mode? Turn on the Christmas lights? On my pre-madness truck, they're labeled in English. Never had any trouble with that. Part of the problem I've seen is that people do things like "vote for my emoji/icon and I'll vote for yours!" And then when they get something accepted, they wear it as a badge of status and write articles saying how you, too, can get your whatever accepted as an icon. It's madness, madness I say!
Re: First Impressions!
On 12/1/2017 3:16 PM, H. S. Teoh wrote: This is not true in Asia, esp. where the CJK block is extensively used. A CJK block character is 3 bytes in UTF-8, meaning that string sizes are 150% of the UCS2 encoding. If your code contains a lot of CJK text, that's a lot of bloat. But then again, in non-Latin locales you'd generally store your strings separately of the executable (usually in l10n files), so this may not be that big an issue. But the blanket statement "Most strings are in ASCII" is not correct. Are you sure about that? I know that Asian languages will be longer in UTF-8. But how much data that programs handle is in those languages? The language of business, science, programming, aviation, and engineering is english. Of course, D itself is agnostic about that. The compiler, for example, accepts strings, identifiers, and comments in Chinese in UTF-16 format.
Re: First Impressions!
On Friday, December 01, 2017 15:54:31 Walter Bright via Digitalmars-d wrote: > On 11/30/2017 9:56 AM, Jonathan M Davis wrote: > > I'm sure that we could come up with a better encoding than UTF-8 (e.g. > > getting rid of Unicode normalization as being a thing and never having > > multiple encodings for the same character), but _that_'s never going to > > happen. > > UTF-8 is not the cause of that particular problem, it's caused by the > Unicode committee being a committee. Other Unicode problems are caused by > the committee trying to add semantic information to code points, which > causes nothing but problems. I.e. the committee forgot that Unicode is a > character set, and nothing more. Oh, definitely. UTF-8 is arguably the best that Unicode has, but Unicode in general is what's broken, because the folks designing it made poor choices. And personally, I think that their worst decisions tend to be at the code point level (e.g. having the same character being representable by different combinations of code points). Quite possbily the most depressing thing that I've run into with Unicode though was finding out that emojis had their own code points. Emojis are specifically representable by a sequence of existing characters (usually ASCII), because they came from folks trying to represent pictures with text. The fact that they're then trying to put those pictures into the Unicode standard just blatantly shows that the Unicode folks have lost sight of what they're up to. It's like if they started trying to add Unicode characters for words. It makes no sense. But unfortunately, we just have to live with it... :( - Jonathan M Davis
Re: First Impressions!
On 11/30/2017 9:56 AM, Jonathan M Davis wrote: I'm sure that we could come up with a better encoding than UTF-8 (e.g. getting rid of Unicode normalization as being a thing and never having multiple encodings for the same character), but _that_'s never going to happen. UTF-8 is not the cause of that particular problem, it's caused by the Unicode committee being a committee. Other Unicode problems are caused by the committee trying to add semantic information to code points, which causes nothing but problems. I.e. the committee forgot that Unicode is a character set, and nothing more.
Re: First Impressions!
On Fri, Dec 01, 2017 at 03:04:44PM -0800, Walter Bright via Digitalmars-d wrote: > On 11/30/2017 9:23 AM, Kagamin wrote: > > On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote: > > > Be aware Microsoft is alone in thinking that UTF-16 was awesome. > > > Everybody else standardized on UTF-8 for Unicode. > > > > UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C, > > Swift, Dart and ms tech, which is 28% of tiobe index. > > "was" :-) Those are pretty much pre-surrogate pair designs, or based > on them (Dart compiles to JavaScript, for example). > > UCS2 has serious problems: > > 1. Most strings are in ascii, meaning UCS2 doubles memory consumption. > Strings in the executable file are twice the size. This is not true in Asia, esp. where the CJK block is extensively used. A CJK block character is 3 bytes in UTF-8, meaning that string sizes are 150% of the UCS2 encoding. If your code contains a lot of CJK text, that's a lot of bloat. But then again, in non-Latin locales you'd generally store your strings separately of the executable (usually in l10n files), so this may not be that big an issue. But the blanket statement "Most strings are in ASCII" is not correct. T -- Bare foot: (n.) A device for locating thumb tacks on the floor.
Re: First Impressions!
On 11/30/2017 9:23 AM, Kagamin wrote: On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote: Be aware Microsoft is alone in thinking that UTF-16 was awesome. Everybody else standardized on UTF-8 for Unicode. UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C, Swift, Dart and ms tech, which is 28% of tiobe index. "was" :-) Those are pretty much pre-surrogate pair designs, or based on them (Dart compiles to JavaScript, for example). UCS2 has serious problems: 1. Most strings are in ascii, meaning UCS2 doubles memory consumption. Strings in the executable file are twice the size. 2. The code doesn't work well with C. C doesn't even have a UCS2 type. 3. There's no reasonable way to audit the code to see if it handles surrogate pairs correctly. Surrogate pairs occur only rarely, so the code is never tested for it, and the bugs may remain latent for many, many years. With UTF8, multibyte code points are much more common, so bugs are detected much earlier.
Re: First Impressions!
On Friday, 1 December 2017 at 18:31:46 UTC, Jonathan M Davis wrote: On Friday, December 01, 2017 09:49:08 Steven Schveighoffer via Digitalmars-d wrote: On 12/1/17 7:26 AM, Patrick Schluter wrote: > On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter > wrote: >> isolated codepoints. > > I meant isolated code-units, of course. Hehe, it's impossible for me to talk about code points and code units without having to pause and consider which one I mean :) What, you mean that Unicode can be confusing? No way! ;) LOL. I have to be careful with that too. What bugs me even more though is that the Unicode spec talks about code points being characters, and then talks about combining characters for grapheme clusters - and this in spite of the fact that what most people would consider a character is a grapheme cluster and _not_ a code point. But they presumably had to come up with new terms for a lot of this nonsense, and that's not always easy. Regardless, what they came up with is complicated enough that it's arguably a miracle whenever a program actually handles Unicode text 100% correctly. :| - Jonathan M Davis And dealing with that complexity can often introduce bugs in their own right, because it's hard to get right. That's why sometimes it's easy just to simplify things and to exclude certain ways of looking at the string.
Re: First Impressions!
On Friday, December 01, 2017 09:49:08 Steven Schveighoffer via Digitalmars-d wrote: > On 12/1/17 7:26 AM, Patrick Schluter wrote: > > On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter wrote: > >> isolated codepoints. > > > > I meant isolated code-units, of course. > > Hehe, it's impossible for me to talk about code points and code units > without having to pause and consider which one I mean :) What, you mean that Unicode can be confusing? No way! ;) LOL. I have to be careful with that too. What bugs me even more though is that the Unicode spec talks about code points being characters, and then talks about combining characters for grapheme clusters - and this in spite of the fact that what most people would consider a character is a grapheme cluster and _not_ a code point. But they presumably had to come up with new terms for a lot of this nonsense, and that's not always easy. Regardless, what they came up with is complicated enough that it's arguably a miracle whenever a program actually handles Unicode text 100% correctly. :| - Jonathan M Davis
Re: First Impressions!
On 12/1/17 7:26 AM, Patrick Schluter wrote: On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter wrote: isolated codepoints. I meant isolated code-units, of course. Hehe, it's impossible for me to talk about code points and code units without having to pause and consider which one I mean :) -Steve
Re: First Impressions!
On Friday, 1 December 2017 at 12:21:22 UTC, A Guy With a Question wrote: On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter wrote: On Thursday, 30 November 2017 at 19:37:47 UTC, Steven Schveighoffer wrote: On 11/30/17 1:20 PM, Patrick Schluter wrote: [...] iopipe handles this: http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html It was only to give an example. With UTF-8 people who implement the low level code in general think about the multiple codeunits at the buffer boundary. With UTF-16 it's often forgotten. In UTF-16 there are also 2 other common pitfalls, that exist also in UTF-8 but are less consciously acknowledged, overlong encoding and isolated codepoints. So UTF-16 has the same issues as UTF-8, plus some more, endianness and size. Most problems with UTF16 is applicable to UTF8. The only issue that isn't, is if you are just dealing with ASCII it's a bit of a waste of space. That's what I said. UTF-16 and UTF-8 have the same issues, but UTF-16 has even 2 more: endianness and bloat for ASCII. All 3 encodings have their pluses and minuses, that's why D supports all 3 but with a preference for utf-8.
Re: First Impressions!
On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter wrote: On Thursday, 30 November 2017 at 19:37:47 UTC, Steven Schveighoffer wrote: On 11/30/17 1:20 PM, Patrick Schluter wrote: [...] iopipe handles this: http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html It was only to give an example. With UTF-8 people who implement the low level code in general think about the multiple codeunits at the buffer boundary. With UTF-16 it's often forgotten. In UTF-16 there are also 2 other common pitfalls, that exist also in UTF-8 but are less consciously acknowledged, overlong encoding and isolated codepoints. So UTF-16 has the I meant isolated code-units, of course. same issues as UTF-8, plus some more, endianness and size.
Re: First Impressions!
On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter wrote: On Thursday, 30 November 2017 at 19:37:47 UTC, Steven Schveighoffer wrote: On 11/30/17 1:20 PM, Patrick Schluter wrote: On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis wrote: English and thus don't as easily hit the cases where their code is wrong. For better or worse, UTF-16 hides it better than UTF-8, but the problem exists in both. To give just an example of what can go wrong with UTF-16. Reading a file in UTF-16 and converting it tosomething else like UTF-8 or UTF-32. Reading block by block and hitting exactly a SMP codepoint at the buffer limit, high surrogate at the end of the first buffer, low surrogate at the start of the next. If you don't think about it => 2 invalid characters instead of your nice poop 💩 emoji character (emojis are in the SMP and they are more and more frequent). iopipe handles this: http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html It was only to give an example. With UTF-8 people who implement the low level code in general think about the multiple codeunits at the buffer boundary. With UTF-16 it's often forgotten. In UTF-16 there are also 2 other common pitfalls, that exist also in UTF-8 but are less consciously acknowledged, overlong encoding and isolated codepoints. So UTF-16 has the same issues as UTF-8, plus some more, endianness and size. Most problems with UTF16 is applicable to UTF8. The only issue that isn't, is if you are just dealing with ASCII it's a bit of a waste of space.
Re: First Impressions!
On Thursday, 30 November 2017 at 19:37:47 UTC, Steven Schveighoffer wrote: On 11/30/17 1:20 PM, Patrick Schluter wrote: On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis wrote: English and thus don't as easily hit the cases where their code is wrong. For better or worse, UTF-16 hides it better than UTF-8, but the problem exists in both. To give just an example of what can go wrong with UTF-16. Reading a file in UTF-16 and converting it tosomething else like UTF-8 or UTF-32. Reading block by block and hitting exactly a SMP codepoint at the buffer limit, high surrogate at the end of the first buffer, low surrogate at the start of the next. If you don't think about it => 2 invalid characters instead of your nice poop 💩 emoji character (emojis are in the SMP and they are more and more frequent). iopipe handles this: http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html It was only to give an example. With UTF-8 people who implement the low level code in general think about the multiple codeunits at the buffer boundary. With UTF-16 it's often forgotten. In UTF-16 there are also 2 other common pitfalls, that exist also in UTF-8 but are less consciously acknowledged, overlong encoding and isolated codepoints. So UTF-16 has the same issues as UTF-8, plus some more, endianness and size.
Re: First Impressions!
On 11/30/2017 5:22 AM, A Guy With a Question wrote: It's also worth mentioning that the more I think about it, the UTF8 vs. UTF16 thing was probably not worth mentioning with the rest of the things I listed out. It's pretty minor and more of a preference. Both Windows and Java selected UTF16 before surrogates were added, so it was a reasonable decision made in good faith. But an awful lot of Windows/Java code has latent bugs in it because of not dealing with surrogates. D is designed from the ground up to work smoothly with UTF8/UTF16 multi-codeunit encodings. If you do decide to use UTF16, please take advantage of this and deal with surrogates correctly. When you do decide to give up on UTF16 (!) and go with UTF8, your code will be easy to convert to UTF8.
Re: First Impressions!
On 11/30/17 1:20 PM, Patrick Schluter wrote: On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis wrote: English and thus don't as easily hit the cases where their code is wrong. For better or worse, UTF-16 hides it better than UTF-8, but the problem exists in both. To give just an example of what can go wrong with UTF-16. Reading a file in UTF-16 and converting it tosomething else like UTF-8 or UTF-32. Reading block by block and hitting exactly a SMP codepoint at the buffer limit, high surrogate at the end of the first buffer, low surrogate at the start of the next. If you don't think about it => 2 invalid characters instead of your nice poop 💩 emoji character (emojis are in the SMP and they are more and more frequent). iopipe handles this: http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html -Steve
Re: First Impressions!
On Thursday, November 30, 2017 18:32:46 A Guy With a Question via Digitalmars-d wrote: > On Thursday, 30 November 2017 at 17:56:58 UTC, Jonathan M Davis > > wrote: > > On Thursday, November 30, 2017 03:37:37 Walter Bright via > > Digitalmars-d wrote: > > Language-wise, I think that most of the UTF-16 is driven by the > > fact that Java went with UCS-2 / UTF-16, and C# followed them > > (both because they were copying Java and because the Win32 API > > had gone with UCS-2 / UTF-16). So, that's had a lot of > > influence on folks, though most others have gone with UTF-8 for > > backwards compatibility and because it typically takes up less > > space for non-Asian text. But the use of UTF-16 in Windows, > > Java, and C# does seem to have resulted in some folks thinking > > that wide characters means Unicode, and narrow characters > > meaning ASCII. > > > > - Jonathan M Davis > > I think it also simplifies the logic. You are not always looking > to represent the codepoints symbolically. You are just trying to > see what information is in it. Therefore, if you can practically > treat a codepoint as the unit of data behind the scenes, it > simplifies the logic. Even if that were true, UTF-16 code units are not code points. If you want to operate on code points, you have to go to UTF-32. And even if you're at UTF-32, you have to worry about Unicode normalization, otherwise the same information can be represented differently even if all you care about is code points and not graphemes. And of course, some stuff really does care about graphemes, since those are the actual characters. Ultimately, you have to understand how code units, code points, and graphemes work and what you're doing with a particular algorithm so that you know at which level you should operate at and where the pitfalls are. Some code can operate on code units and be fine; some can operate on code points; and some can operate on graphemes. But there is no one-size-fits-all solution that makes it all magically easy and efficient to use. And UTF-16 does _nothing_ to improve any of this over UTF-8. It's just a different way to encode code points. And really, it makes things worse, because it usually takes up more space than UTF-8, and it makes it easier to miss when you screw up your Unicode handling, because more UTF-16 code units are valid code points than UTF-8 code units are, but they still aren't all valid code points. So, if you use UTF-8, you're more likely to catch your mistakes. Honestly, I think that the only good reason to use UTF-16 is if you're interacting with existing APIs that use UTF-16, and even then, I think that in most cases, you're better off using UTF-8 and converting to UTF-16 only when you have to. Strings eat less memory that way, and mistakes are more easily caught. And if you're writing cross-platform code in D, then Windows is really the only place that you're typically going to have to deal with UTF-16, so it definitely works better in general to favor UTF-8 in D programs. But regardless, at least D gives you the tools to deal with the different Unicode encodings relatively cleanly and easily, so you can use whichever Unicode encoding you need to. Most D code is going to use UTF-8 though. - Jonathan M Davis
Re: First Impressions!
On Thursday, 30 November 2017 at 17:56:58 UTC, Jonathan M Davis wrote: On Thursday, November 30, 2017 03:37:37 Walter Bright via Digitalmars-d wrote: Language-wise, I think that most of the UTF-16 is driven by the fact that Java went with UCS-2 / UTF-16, and C# followed them (both because they were copying Java and because the Win32 API had gone with UCS-2 / UTF-16). So, that's had a lot of influence on folks, though most others have gone with UTF-8 for backwards compatibility and because it typically takes up less space for non-Asian text. But the use of UTF-16 in Windows, Java, and C# does seem to have resulted in some folks thinking that wide characters means Unicode, and narrow characters meaning ASCII. - Jonathan M Davis I think it also simplifies the logic. You are not always looking to represent the codepoints symbolically. You are just trying to see what information is in it. Therefore, if you can practically treat a codepoint as the unit of data behind the scenes, it simplifies the logic.
Re: First Impressions!
On Thursday, 30 November 2017 at 17:56:58 UTC, Jonathan M Davis wrote: On Thursday, November 30, 2017 03:37:37 Walter Bright via Digitalmars-d wrote: On 11/30/2017 2:39 AM, Joakim wrote: > Java, .NET, Qt, Javascript, and a handful of others use > UTF-16 too, some starting off with the earlier UCS-2: > > https://en.m.wikipedia.org/wiki/UTF-16#Usage > > Not saying either is better, each has their flaws, just > pointing out it's more than just Windows. I stand corrected. I get the impression that the stuff that uses UTF-16 is mostly stuff that picked an encoding early on in the Unicode game and thought that they picked one that guaranteed that a code unit would be an entire character. I don't think that's true though. Haven't you always been able to combine two codepoints into one visual representation (Ä for example). To me it's still two characters to look for when going through the string, but the UI or text interpreter might choose to combine them. So in certain domains, such as trying to visually represent the character, yes a codepoint is not a character, if by what you mean by character is the visual representation. But what we are referring to as a character can kind of morph depending on context. When you are running through the data though in the algorithm behind the scenes, you care about the *information* therefore the codepoint. And we are really just have a semantics battle if someone calls that a character. Many of them picked UCS-2 and then switched later to UTF-16, but once they picked a 16-bit encoding, they were kind of stuck. Others - most notably C/C++ and the *nix world - picked UTF-8 for backwards compatibility, and once it became clear that UCS-2 / UTF-16 wasn't going to cut it for a code unit representing a character, most stuff that went Unicode went UTF-8. That's only because C used ASCII and thus was a byte. UTF-8 is inline with this, so literally nothing needs to change to get pretty much the same behavior. It makes sense. With this this in mind, it actually might make sense for D to use it.
Re: First Impressions!
On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis wrote: English and thus don't as easily hit the cases where their code is wrong. For better or worse, UTF-16 hides it better than UTF-8, but the problem exists in both. To give just an example of what can go wrong with UTF-16. Reading a file in UTF-16 and converting it tosomething else like UTF-8 or UTF-32. Reading block by block and hitting exactly a SMP codepoint at the buffer limit, high surrogate at the end of the first buffer, low surrogate at the start of the next. If you don't think about it => 2 invalid characters instead of your nice poop 💩 emoji character (emojis are in the SMP and they are more and more frequent).
Re: First Impressions!
On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis wrote: [...] And if you're not dealing with Asian languages, UTF-16 uses up more space than UTF-8. Not even that in most cases. Only if you use unstructured text can it happen that UTF-16 needs less space than UTF-8. In most cases, the text is embedded in some sort of ML (html, odf, docx, tmx, xliff, akoma ntoso, etc...) which puts the balance again to the side of UTF-8.
Re: First Impressions!
On Thursday, November 30, 2017 03:37:37 Walter Bright via Digitalmars-d wrote: > On 11/30/2017 2:39 AM, Joakim wrote: > > Java, .NET, Qt, Javascript, and a handful of others use UTF-16 too, some > > starting off with the earlier UCS-2: > > > > https://en.m.wikipedia.org/wiki/UTF-16#Usage > > > > Not saying either is better, each has their flaws, just pointing out > > it's more than just Windows. > > I stand corrected. I get the impression that the stuff that uses UTF-16 is mostly stuff that picked an encoding early on in the Unicode game and thought that they picked one that guaranteed that a code unit would be an entire character. Many of them picked UCS-2 and then switched later to UTF-16, but once they picked a 16-bit encoding, they were kind of stuck. Others - most notably C/C++ and the *nix world - picked UTF-8 for backwards compatibility, and once it became clear that UCS-2 / UTF-16 wasn't going to cut it for a code unit representing a character, most stuff that went Unicode went UTF-8. Language-wise, I think that most of the UTF-16 is driven by the fact that Java went with UCS-2 / UTF-16, and C# followed them (both because they were copying Java and because the Win32 API had gone with UCS-2 / UTF-16). So, that's had a lot of influence on folks, though most others have gone with UTF-8 for backwards compatibility and because it typically takes up less space for non-Asian text. But the use of UTF-16 in Windows, Java, and C# does seem to have resulted in some folks thinking that wide characters means Unicode, and narrow characters meaning ASCII. I really wish that everything would just got to UTF-8 and that UTF-16 would die, but that would just break too much code. And if we were willing to do that, I'm sure that we could come up with a better encoding than UTF-8 (e.g. getting rid of Unicode normalization as being a thing and never having multiple encodings for the same character), but _that_'s never going to happen. - Jonathan M Davis
Re: First Impressions!
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an Opinion wrote: - Attributes. I had another post in the Learn forum about attributes which was unfortunate. At first I was excited because it seems like on the surface it would help me write better code, but it gets a little tedious and tiresome to have to remember to decorate code with them. Then do it the C# way. There's choice. I think the better decision would be to not have the errors occur. Hehe, I'm not against living in an idea world either. - Immutable. I'm not sure I fully understand it. On the surface it seemed like const but transitive. I tried having a method return an immutable value, but when I used it in my unit test I got some weird errors about objects not being able to return immutable (I forget the exact error...apologies). That's the point of static type system: if you make a mistake, the code doesn't compile. +- Unicode support is good. Although I think D's string type should have probably been utf16 by default. Especially considering the utf module states: "UTF character support is restricted to '\u' <= character <= '\U0010'." Seems like the natural fit for me. UTF-16 in inadequate for range '\u' <= character <= '\U0010', though. UCS2 was adequate (for '\u' <= character <= '\u'), but lost relevance. UTF-16 is only backward compatibility for early adopters of unicode based on UCS2. Plus for the vast majority of use cases I am pretty guaranteed a char = codepoint. That way only end users will be able to catch bugs in production system. It's not the best strategy, is it? Text is often persistent data, how do you plan to fix a text handling bug when corruption accumulated for years and spilled all over the place?
Re: First Impressions!
On Thursday, November 30, 2017 13:18:37 A Guy With a Question via Digitalmars-d wrote: > As long as you understand it's limitations I think most bugs can > be avoided. Where UTF16 breaks down, is pretty well defined. > Also, super rare. I think UTF32 would be great to, but it seems > like just a waste of space 99% of the time. UTF8 isn't horrible, > I am not going to never use D because it uses UTF8 (that would be > silly). Especially when wstring also seems baked into the > language. However, it can complicate code because you pretty much > always have to assume character != codepoint outside of ASCII. I > can see a reasonable person arguing that it forcing you assume > character != code point is actually a good thing. And that is a > valid opinion. The reality of the matter is that if you want to write fully valid Unicode, then you have to understand the differences between code units, code points, and graphemes, and since it really doesn't make sense to operate at the grapheme level for everything (it would be terribly slow and is completely unnecessary for many algorithms), you pretty much have to come to accept that in the general case, you can't assume that something like a char represents an actual character, regardless of its encoding. UTF-8 vs UTF-16 doesn't change anything in that respect except for the fact that there are more characters which fit fully in a UTF-16 code unit than a UTF-8 code unit, so it's easier to think that you're correctly handling Unicode when you actually aren't. And if you're not dealing with Asian languages, UTF-16 uses up more space than UTF-8. But either way, they're both wrong if you're trying to treat a code unit as a code point, let alone a grapheme. It's just that we have a lot of programmers who only deal with English and thus don't as easily hit the cases where their code is wrong. For better or worse, UTF-16 hides it better than UTF-8, but the problem exists in both. - Jonathan M Davis
Re: First Impressions!
On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote: Be aware Microsoft is alone in thinking that UTF-16 was awesome. Everybody else standardized on UTF-8 for Unicode. UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C, Swift, Dart and ms tech, which is 28% of tiobe index.
Re: First Impressions!
On Tuesday, 28 November 2017 at 16:14:52 UTC, Jack Stouffer wrote: you can apply attributes to your whole project by adding them to main void main(string[] args) @safe {} Although this isn't recommended, as almost no program can be completely safe. In fact I believe it is. When you have something unsafe you can manually wrap it with @trusted. Same goes with nothrow, since you can catch everything thrown. But putting @nogc to main is of course not recommended except in special cases, and pure is competely out of question.
Re: First Impressions!
On Thursday, 30 November 2017 at 11:41:09 UTC, Walter Bright wrote: On 11/30/2017 2:47 AM, Nicholas Wilson wrote: As far as I can tell, pretty much the only users of UTF16 are Windows programs. Everyone else uses UTF8 or UCS32. I assume you meant UTF32 not UCS32, given UCS2 is Microsoft's half-assed UTF16. I meant UCS-4, which is identical to UTF-32. It's hard keeping all that stuff straight. Sigh. https://en.wikipedia.org/wiki/UTF-32 It's also worth mentioning that the more I think about it, the UTF8 vs. UTF16 thing was probably not worth mentioning with the rest of the things I listed out. It's pretty minor and more of a preference.
Re: First Impressions!
On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright wrote: On 11/27/2017 7:01 PM, A Guy With an Opinion wrote: +- Unicode support is good. Although I think D's string type should have probably been utf16 by default. Especially considering the utf module states: "UTF character support is restricted to '\u' <= character <= '\U0010'." Seems like the natural fit for me. Plus for the vast majority of use cases I am pretty guaranteed a char = codepoint. Not the biggest issue in the world and maybe I'm just being overly critical here. Sooner or later your code will exhibit bugs if it assumes that char==codepoint with UTF16, because of surrogate pairs. https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java As far as I can tell, pretty much the only users of UTF16 are Windows programs. Everyone else uses UTF8 or UCS32. I recommend using UTF8. As long as you understand it's limitations I think most bugs can be avoided. Where UTF16 breaks down, is pretty well defined. Also, super rare. I think UTF32 would be great to, but it seems like just a waste of space 99% of the time. UTF8 isn't horrible, I am not going to never use D because it uses UTF8 (that would be silly). Especially when wstring also seems baked into the language. However, it can complicate code because you pretty much always have to assume character != codepoint outside of ASCII. I can see a reasonable person arguing that it forcing you assume character != code point is actually a good thing. And that is a valid opinion.
Re: First Impressions!
On 11/30/2017 2:47 AM, Nicholas Wilson wrote: As far as I can tell, pretty much the only users of UTF16 are Windows programs. Everyone else uses UTF8 or UCS32. I assume you meant UTF32 not UCS32, given UCS2 is Microsoft's half-assed UTF16. I meant UCS-4, which is identical to UTF-32. It's hard keeping all that stuff straight. Sigh. https://en.wikipedia.org/wiki/UTF-32
Re: First Impressions!
On 11/30/2017 2:39 AM, Joakim wrote: Java, .NET, Qt, Javascript, and a handful of others use UTF-16 too, some starting off with the earlier UCS-2: https://en.m.wikipedia.org/wiki/UTF-16#Usage Not saying either is better, each has their flaws, just pointing out it's more than just Windows. I stand corrected.
Re: First Impressions!
On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright wrote: On 11/27/2017 7:01 PM, A Guy With an Opinion wrote: [...] Sooner or later your code will exhibit bugs if it assumes that char==codepoint with UTF16, because of surrogate pairs. https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java As far as I can tell, pretty much the only users of UTF16 are Windows programs. Everyone else uses UTF8 or UCS32. I recommend using UTF8. I assume you meant UTF32 not UCS32, given UCS2 is Microsoft's half-assed UTF16.
Re: First Impressions!
On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright wrote: On 11/27/2017 7:01 PM, A Guy With an Opinion wrote: +- Unicode support is good. Although I think D's string type should have probably been utf16 by default. Especially considering the utf module states: "UTF character support is restricted to '\u' <= character <= '\U0010'." Seems like the natural fit for me. Plus for the vast majority of use cases I am pretty guaranteed a char = codepoint. Not the biggest issue in the world and maybe I'm just being overly critical here. Sooner or later your code will exhibit bugs if it assumes that char==codepoint with UTF16, because of surrogate pairs. https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java As far as I can tell, pretty much the only users of UTF16 are Windows programs. Everyone else uses UTF8 or UCS32. I recommend using UTF8. Java, .NET, Qt, Javascript, and a handful of others use UTF-16 too, some starting off with the earlier UCS-2: https://en.m.wikipedia.org/wiki/UTF-16#Usage Not saying either is better, each has their flaws, just pointing out it's more than just Windows.
Re: First Impressions!
On 11/27/2017 7:01 PM, A Guy With an Opinion wrote: +- Unicode support is good. Although I think D's string type should have probably been utf16 by default. Especially considering the utf module states: "UTF character support is restricted to '\u' <= character <= '\U0010'." Seems like the natural fit for me. Plus for the vast majority of use cases I am pretty guaranteed a char = codepoint. Not the biggest issue in the world and maybe I'm just being overly critical here. Sooner or later your code will exhibit bugs if it assumes that char==codepoint with UTF16, because of surrogate pairs. https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java As far as I can tell, pretty much the only users of UTF16 are Windows programs. Everyone else uses UTF8 or UCS32. I recommend using UTF8.
Re: First Impressions!
On Tuesday, 28 November 2017 at 22:08:48 UTC, Mike Parker wrote: On Tuesday, 28 November 2017 at 19:39:19 UTC, Michael V. Franklin wrote: This DIP is related (https://github.com/dlang/DIPs/blob/master/DIPs/DIP1012.md) but I don't know what's happening with it. It's awaiting formal review. I'll move it forward when the formal review queue clears out a bit. How well does phobos play with it? I'm finding, for instance, it's not playing too well with nothrow. Things throw that I don't understand why.
Re: First Impressions!
On Tuesday, 28 November 2017 at 19:39:19 UTC, Michael V. Franklin wrote: This DIP is related (https://github.com/dlang/DIPs/blob/master/DIPs/DIP1012.md) but I don't know what's happening with it. It's awaiting formal review. I'll move it forward when the formal review queue clears out a bit.
Re: First Impressions!
On Tuesday, 28 November 2017 at 19:34:27 UTC, A Guy With an Opinion wrote: I take it adding those inverse attributes is no trivial thing? Technically, it is extremely trivial. Politically, that's a different matter. There's been arguments before about the words or the syntax (is it "@gc" or "@nogc(false)", for example? tbh i think the latter is kinda elegant, but the former works too, i just want something that work) and the process (so much paperwork!) and all kinds of nonsense.
Re: First Impressions!
On Tuesday, 28 November 2017 at 19:34:27 UTC, A Guy With an Opinion wrote: I take it adding those inverse attributes is no trivial thing? It would require a DIP: https://github.com/dlang/DIPs This DIP is related (https://github.com/dlang/DIPs/blob/master/DIPs/DIP1012.md) but I don't know what's happening with it. Mike
Re: First Impressions!
On Tuesday, 28 November 2017 at 16:24:56 UTC, Adam D. Ruppe wrote: That doesn't quite work since it doesn't descend into aggregates. And you can't turn most them off. I take it adding those inverse attributes is no trivial thing?
Re: First Impressions!
On 2017-11-28 17:24, Adam D. Ruppe wrote: That doesn't quite work since it doesn't descend into aggregates. And you can't turn most them off. And if your project is a library. -- /Jacob Carlborg
Re: First Impressions!
On Tuesday, 28 November 2017 at 04:19:40 UTC, A Guy With an Opinion wrote: On Tuesday, 28 November 2017 at 04:17:18 UTC, A Guy With an Opinion wrote: [...] Also, C and C++ didn't just have undefined behavior, sometimes it has inconsistent behavior. Sometimes int a; is actually set to 0. It's only auto variables that are undefined. statics and code unit (aka globals) are defined.
Re: First Impressions!
On Tuesday, 28 November 2017 at 04:17:18 UTC, A Guy With an Opinion wrote: On Tuesday, 28 November 2017 at 04:12:14 UTC, ketmar wrote: A Guy With an Opinion wrote: That is true, but I'm still unconvinced that making the person's program likely to error is better than initializing a number to 0. Zero is such a fundamental default for so many things. And it would be consistent with the other number types. basically, default initializers aren't meant to give a "usable value", they meant to give a *defined* value, so we don't have UB. that is, just initialize your variables explicitly, don't rely on defaults. writing: int a; a += 42; is still bad code, even if you're know that `a` is guaranteed to be zero. int a = 0; a += 42; is the "right" way to write it. if you'll look at default values from this PoV, you'll see that NaN has more sense that zero. if there was a NaN for ints, ints would be inited with it too. ;-) Eh...I still don't agree. I think C and C++ just gave that style of coding a bad rap due to the undefined behavior. But the issue is it was undefined behavior. A lot of language features aim to make things well defined and have less verbose representations. Once a language matures that's what a big portion of their newer features become. Less verbose shortcuts of commonly done things. I agree it's important that it's well defined, I'm just thinking it should be a value that someone actually wants some notable fraction of the time. Not something no one wants ever. I could be persuaded, but so far I'm not drinking the koolaid on that. It's not the end of the world, I was just confused when my float was NaN. Just a little anecdote of a maintainer of a legacy project in C. My predecessors in that project had the habit of systematically initialize any auto declared variable at the beginning of a function. The code base that was initiated in the early '90s and written by people who were typical BASIC programmer, so the consequence of it was that functions were very often hundreds of lines long and they all started with a lot of declarations. In the years of reviewing that code, and I was really surprised by that, was how often I found bugs because the variables had been wrongly initialised. By initialising with 0 or NULL, the data flow pass was essentially suppressed at the start so that it could not detect when variables were used before they had been properly populated with the right values the functionality required. The thing with these kind of bugs was that they were very subtle. To make it short, 0 is an arbitrary number that often is the right value but when it isn't, it can be a pain to detect that it was the wrong value.
Re: First Impressions!
On Tuesday, 28 November 2017 at 16:14:52 UTC, Jack Stouffer wrote: You can do it on a per-file basis by putting the attributes at the top like so That doesn't quite work since it doesn't descend into aggregates. And you can't turn most them off.
Re: First Impressions!
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an Opinion wrote: - Attributes. I had another post in the Learn forum about attributes which was unfortunate. At first I was excited because it seems like on the surface it would help me write better code, but it gets a little tedious and tiresome to have to remember to decorate code with them. It seems like most of them should have been the defaults. I would have preferred if the compiler helped me and reminded me. I asked if there was a way to enforce them globally, which I guess there is, but I guess there's also not a way to turn some of them off afterwards. A bit unfortunate. But at least I can see some solutions to this. Attributes were one of my biggest hurdles when working on my own projects. For example, it's a huge PITA when you have to add a debug writeln deep down in your call stack, and it ends up violating a bunch of function attributes further up. Thankfully, wrapping statements in debug {} allows you to ignore pure and @safe violations in that code if you compile with the flag -debug. Also, you can apply attributes to your whole project by adding them to main void main(string[] args) @safe {} Although this isn't recommended, as almost no program can be completely safe. You can do it on a per-file basis by putting the attributes at the top like so @safe: pure:
Re: First Impressions!
On Tuesday, 28 November 2017 at 13:17:16 UTC, Steven Schveighoffer wrote: https://github.com/schveiguy/dcollections On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote: https://github.com/economicmodeling/containers Thanks. I'll check both out. It's not that I don't want to write them, it's just I don't want to stop what I'm doing when I need them and write them. It takes me out of my thought process.
Re: First Impressions!
On Tuesday, 28 November 2017 at 13:17:16 UTC, Steven Schveighoffer wrote: This is likely because of Adam's suggestion -- you were incorrectly declaring a function that returned an immutable like this: immutable T foo(); -Steve That's exactly what it was I think. As I stated before, I tried to do immutable(T) but I was drowning in errors at that point that I just took a step back. I'll try to refactor it back to using immutable. I just honestly didn't quite know what I was doing obviously.
Re: First Impressions!
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an Opinion wrote: So those are just some of my thoughts. Tell me why I'm wrong :P You are not supposed to come to this forum with well-balanced opinions and reasonable arguments. It's not colourful enough to be heard! Instead make a dent in the universe. Prepare your most impactful, most offensive statements to push your personal agenda of what your own system programming language would be like, if you had the stamina. Use doubtful analogies and references to languages with wildly different goals than D. Prepare to abuse the volunteers, and say how much you would dare to use D, if only it would do "just this one obvious change". Having this feature would make the BlobTech industry switch to D overnight! And you haven't asked for any new feature, especially no new _syntax_ were demanded! I don't know, find anything: "It would be nice to have a shortcut syntax for when you wan't to add zero. Writing 0 + x is cumbersome, when +x would do it. It has the nice benefit or unifying unary and binary operators, and thus leads to a simplified implementation." Do you realize the dangers of looking satisfied?
Re: First Impressions!
On 11/27/17 10:01 PM, A Guy With an Opinion wrote: Hi, Hi Guy, welcome, and I wanted to say I was saying "me too" while reading much of your post. I worked on a C# based client/server for about 5 years, and the biggest thing I agree with you on is the generic programming. I was also using D at the time, and using generics felt like eating a superbly under-baked cake. A few points: - Some of the errors from DMD are a little strange. I don't want to crap on this too much, because for the most part it's fine. However occasionally it throws errors I still can't really work out why THAT is the error it gave me. Some of you may have saw my question in the "Learn" forum about not knowing to use static in an embedded class, but the error was the following: Error: 'this' is only defined in non-static member functions Yes, this is simply a bad error message. Many of our bad error messages come from something called "lowering", where one piece of code is converted to another piece of code, and then the error message happens on the converted code. So essentially you are getting errors on code you didn't write! They are more difficult to fix, since we can't change the real error message (it applies to real code as well), and the code that generated the lowered code is decoupled from the error. I think this is one of those cases. I'd say the errors so far are above some of the cryptic stuff C++ can throw at you (however, I haven't delved that deeply into D templates yet, so don't hold me to this yet), but in terms of quality I'd put it somewhere between C# and C++ in quality. With C# being the ideal. Once you use templates a lot, the error messages explode in cryptology :) But generally, you can get the gist of your errors if you can decipher half-way the mangling. - ...however, where are all of the collections? No Queue? No Stack? No HashTable? I've read that it's not a big focus because some of the built in stuff *can* behave like those things. The C# project I'm porting utilizes queues and a specifically C#'s Dictionary<> quite a bit, so I'm not looking forward to having to hand roll my own or use something that aren't fundamentally them. This is definitely the biggest negative I've come across. I want a queue, not something that *can* behave as a queue. I definitely expected more from a language that is this old. I haven't touched this in years, but it should still work pretty well (if you try it and it doesn't compile for some reason, please submit an issue there): https://github.com/schveiguy/dcollections It has more of a Java/C# feel than other libraries, including an interface hierarchy. That being said, Queue is just so easy to implement given a linked list, I never bothered :) + Unit tests. Finally built in unit tests. Enough said here. If the lack of collections was the biggest negative, this is the biggest positive. I would like to enable them at build time if possible though. +1000 About the running of unit tests at build time, many people version their main function like this: version(unittest) void main() {} else int main(string[] args) // real declaration { ... } This way, when you build with -unittest, you only run unit tests, and exit immediately. So enabling them at build time is quite easy. - Attributes. I had another post in the Learn forum about attributes which was unfortunate. At first I was excited because it seems like on the surface it would help me write better code, but it gets a little tedious and tiresome to have to remember to decorate code with them. It seems like most of them should have been the defaults. I would have preferred if the compiler helped me and reminded me. I asked if there was a way to enforce them globally, which I guess there is, but I guess there's also not a way to turn some of them off afterwards. A bit unfortunate. But at least I can see some solutions to this. If you are using more templates (and I use them the more I write D code), you will not have this problem. Templates infer almost all attributes. - Immutable. I'm not sure I fully understand it. On the surface it seemed like const but transitive. I tried having a method return an immutable value, but when I used it in my unit test I got some weird errors about objects not being able to return immutable (I forget the exact error...apologies). I refactored to use const, and it all worked as I expected, but I don't get why the immutable didn't work. I was returning a value type, so I don't see why passing in assert(object.errorCount == 0) would have triggered errors. But it did. This is likely because of Adam's suggestion -- you were incorrectly declaring a function that returned an immutable like this: immutable T foo(); Where the immutable *doesn't* apply to the return value, but to the function itself. immutable applied to a function is really applying immutable to the 'this' reference. + Templates seem powe
Re: First Impressions!
On Tuesday, 28 November 2017 at 05:16:54 UTC, Michael V. Franklin wrote: On Tuesday, 28 November 2017 at 04:48:57 UTC, A Guy With an Opinion wrote: I'd be happy to submit an issue, but I'm not quite sure I'd be the best to determine an error message (at least not this early). Mainly because I have no clue what it was yelling at me about. I only new to add static because I told people my intentions and they suggested it. I guess having a non statically marked class is a valid feature imported from Java world. If this was on the forum, please point me to it. I'll see if I can understand what's going on and do something about it. Thanks, Mike https://forum.dlang.org/thread/vcvlffjxowgdvpvjs...@forum.dlang.org
Re: First Impressions!
On Tuesday, 28 November 2017 at 04:48:57 UTC, A Guy With an Opinion wrote: I'd be happy to submit an issue, but I'm not quite sure I'd be the best to determine an error message (at least not this early). Mainly because I have no clue what it was yelling at me about. I only new to add static because I told people my intentions and they suggested it. I guess having a non statically marked class is a valid feature imported from Java world. If this was on the forum, please point me to it. I'll see if I can understand what's going on and do something about it. Thanks, Mike
Re: First Impressions!
On Tuesday, 28 November 2017 at 04:37:04 UTC, Michael V. Franklin wrote: Please submit things like this to the issue tracker. They are very easy to fix, and if I'm aware of them, I'll probably do the work. But, please provide a code example and offer a suggestion of what you would prefer it to say; it just makes things easier.> I'd be happy to submit an issue, but I'm not quite sure I'd be the best to determine an error message (at least not this early). Mainly because I have no clue what it was yelling at me about. I only new to add static because I told people my intentions and they suggested it. I guess having a non statically marked class is a valid feature imported from Java world. I'm just not as familiar with that specific feature of Java. Therefore I have no idea what the text really had to do with anything. Maybe appending "if you meant to make a static class" would have been helpful. I fiddled with Rust a little too, and it's what they tend to do very well. Make verbose error messages. We're not alone: https://youtu.be/6_xdfSVRrKo?t=353 And he was so much better at articulating it than I was. Another C# guy though. :)
Re: First Impressions!
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an Opinion wrote: + D code so far is pushing me towards more "flat" code (for a lack of a better way to phrase it) and so far that has helped tremendously when it comes to readability. C# kind is the opposite. With it's namespace -> class -> method coupled with lock, using, etc...you tend to do a lot of nesting. You are generally 3 '{' in before any true logic even begins. Then couple that with try/catch, IDisposable/using, locking, and then if/else, it can get quite chaotic very easily. So right away, I saw my C# code actually appear more readable when I translated it and I think it has to do with the flatness. I'm not sure if that opinion will hold when I delve into 'static if' a little more, but so far my uses of it haven't really dampened that opinion. I come from a heavy C#/C++ background. I also I *felt* this as well, but never really consciously though about it, until you mentioned it :-) - Some of the errors from DMD are a little strange. I don't want to crap on this too much, because for the most part it's fine. However occasionally it throws errors I still can't really work out why THAT is the error it gave me. Some of you may have saw my question in the "Learn" forum about not knowing to use static in an embedded class, but the error was the following: Error: 'this' is only defined in non-static member functions Please submit things like this to the issue tracker. They are very easy to fix, and if I'm aware of them, I'll probably do the work. But, please provide a code example and offer a suggestion of what you would prefer it to say; it just makes things easier. - Modules. I like modules better than #include, but I don't like them better than C#'s namespaces. Specifically I don't like how there is this gravity that kind of pulls me to associate a module with a file. It appears you don't have to, because I can do the package thing, but whenever I try to do things outside that one idiom I end up in a soup of errors. I'm sure I'm just not use to it, but so far it's been a little dissatisfying. Sometimes I want where it is physically on my file system to be different from how I include it in other source files. To me, C#'s namespaces are really the standard to beat or meet. I feel the same. I don't like that modules are tied to files; it seems like such an arbitrary limitation. We're not alone: https://youtu.be/6_xdfSVRrKo?t=353 - Attributes. I had another post in the Learn forum about attributes which was unfortunate. At first I was excited because it seems like on the surface it would help me write better code, but it gets a little tedious and tiresome to have to remember to decorate code with them. It seems like most of them should have been the defaults. I would have preferred if the compiler helped me and reminded me. I asked if there was a way to enforce them globally, which I guess there is, but I guess there's also not a way to turn some of them off afterwards. A bit unfortunate. But at least I can see some solutions to this. Yep. One of my pet peeves in D. - The defaults for primitives seem off. They seem to encourage errors. I don't think that is the best design decision even if it encourages the errors to be caught as quickly as possible. I think the better decision would be to not have the errors occur. When I asked about this, there seemed to be a disassociation between the spec and the implementation. The spec says a declaration should error if not explicitly set, but the implementation just initializes them to something that is likely to error. Like NaN for floats which I would have thought would have been 0 based on prior experiences with other languages. Another one of my pet peeves in D. Though this post (http://forum.dlang.org/post/tcldaatzzbhjoamnv...@forum.dlang.org) made me realize we might be able to do something about that. +- Unicode support is good. Although I think D's string type should have probably been utf16 by default. Especially considering the utf module states: "UTF character support is restricted to '\u' <= character <= '\U0010'." See http://utf8everywhere.org/ + Templates seem powerful. I've only fiddled thus far, but I don't think I've quite comprehended their usefulness yet. It will probably take me some time to figure out how to wield them effectively. One thing I accidentally stumbled upon that I liked was that I could simulate inheritance in structs with them, by using the mixin keyword. That was cool, and I'm not even sure if that is what they were really meant to enable. Templates, CTFE, and mixins are gravy! and D's the only language I know of that has this symbiotic feature set. So those are just some of my thoughts. Tell me why I'm wrong :P I share much of your perspective. Thanks for the interesting read. Mike
Re: First Impressions!
On Tuesday, 28 November 2017 at 04:24:46 UTC, Adam D. Ruppe wrote: immutable(int) errorCount() { return ...; } I actually did try something like that, because I remembered seeing the parens around the string definition. I think at that point I was just so riddled with errors I just took a step back and went back to something I know. Just to make sure I wasn't going insane.
Re: First Impressions!
On Tuesday, 28 November 2017 at 04:19:40 UTC, A Guy With an Opinion wrote: Also, C and C++ didn't just have undefined behavior, sometimes it has inconsistent behavior. Sometimes int a; is actually set to 0. set to?
Re: First Impressions!
A Guy With an Opinion wrote: Eh...I still don't agree. anyway, it is something that won't be changed, 'cause there may be code that rely on current default values. i'm not really trying to change your mind, i just tried to give a rationale behind the choice. that's why `char.init` is 255 too, not zero. still, explicit variable initialization looks better for me. with default init, it is hard to say if the author just forget to initialize a variable, and it happens to work, or he knows about the default value and used it. and the reader don't have to guess what default value is.
Re: First Impressions!
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an Opinion wrote: - Some of the errors from DMD are a little strange. Yes, indeed, and many of them don't help much in finding the real source of your problem. I think improvements to dmd's error reporting would be the #1 productivity gain D could get right now. - ...however, where are all of the collections? No Queue? No Stack? No HashTable? I always say "meh" to that because any second year student can slap those together in... well, for a second year student, maybe a couple hours for the student, but after that you're looking at just a few minutes, especially leveraging D's built in arrays and associative arrays as your foundation. Sure, they'd be nice to have, but it isn't a dealbreaker in the slightest. Try turning Dictionary into D's string[string], for example. Sometimes I want where it is physically on my file system to be different from how I include it in other source files. This is a common misconception, though one promoted by several of the tools: you don't actually need to match file system layout to modules. OK, sure, D does require one module == one file. But the file name and location is not actually tied to the import name you use in code. They can be anything, you just need to pass the list of files to the compiler so it can parse them and figure out the names. - Attributes. I had another post in the Learn forum about attributes which was unfortunate. Yeah, of course, from my post there you know my basic opinion on them. I've written in more detail about them elsewhere and don't feel like it tonight, but I think they are a big failure right now but they could be fixed if we're willing to take a few steps (#0 improve the error messages, #1 add opposites to all of them, e.g. throws and @gc, #2, change the defaults via a single declaration at the module level, #3 omg revel in how useful they are) - Immutable. I'm not sure I fully understand it. On the surface it seemed like const but transitive. const is transitive too. So the difference is really that `const` means YOU won't change it, whereas `immutable` means NOBODY will change it. What's important there is that to make something immutable, you need to prove to the compiler's satisfaction that nobody else can change it either. const/immutable in D isn't as common as in its family of languages (C++ notably), but when you do get to use it - at least once you get to know it - it is useful. I was returning a value type, so I don't see why passing in assert(object.errorCount == 0) would have triggered errors. Was the object itself immutable? I suspect you wrote something like this: immutable int errorCount() { return ...; } But this is a curious syntax... the `immutable` there actually applies to the *object*, not the return value! It means you can call this method on an immutable object (in fact, it means you MUST call it on an immutable object. const is the middle ground that allows you to call it on either) immutable(int) errorCount() { return ...; } note the parens, is how you apply it to the return value. Yes, this is kinda weird, and style guides tend to suggest putting the qualifiers after the argument list for the `this` thing instead of before... but the language allows it before, so it trips up a LOT of people like this. The type string seems to be an immutable(char[]) which works exactly the way I was expecting, It is actually `immutable(char)[]`. The parens are important here - it applies to the contents of the array, but not the array itself here. +- Unicode support is good. Although I think D's string type should have probably been utf16 by default. Especially considering the utf module states: Note that it has UTF-16 built in as well, with almost equal support. Put `w` at the end of a literal: `"this literal is UTF-16"w` // notice the w after the " and you get utf16. It considers that to be `wstring` instead of `string`, but it works basically the same. If you are doing a lot of Windows API work, this is pretty useful! That was cool, and I'm not even sure if that is what they were really meant to enable. yes, indeed. plugging my book https://www.packtpub.com/application-development/d-cookbook i talk about much of this stuff in there
Re: First Impressions!
On Tuesday, 28 November 2017 at 04:17:18 UTC, A Guy With an Opinion wrote: On Tuesday, 28 November 2017 at 04:12:14 UTC, ketmar wrote: A Guy With an Opinion wrote: That is true, but I'm still unconvinced that making the person's program likely to error is better than initializing a number to 0. Zero is such a fundamental default for so many things. And it would be consistent with the other number types. basically, default initializers aren't meant to give a "usable value", they meant to give a *defined* value, so we don't have UB. that is, just initialize your variables explicitly, don't rely on defaults. writing: int a; a += 42; is still bad code, even if you're know that `a` is guaranteed to be zero. int a = 0; a += 42; is the "right" way to write it. if you'll look at default values from this PoV, you'll see that NaN has more sense that zero. if there was a NaN for ints, ints would be inited with it too. ;-) Eh...I still don't agree. I think C and C++ just gave that style of coding a bad rap due to the undefined behavior. But the issue is it was undefined behavior. A lot of language features aim to make things well defined and have less verbose representations. Once a language matures that's what a big portion of their newer features become. Less verbose shortcuts of commonly done things. I agree it's important that it's well defined, I'm just thinking it should be a value that someone actually wants some notable fraction of the time. Not something no one wants ever. I could be persuaded, but so far I'm not drinking the koolaid on that. It's not the end of the world, I was just confused when my float was NaN. Also, C and C++ didn't just have undefined behavior, sometimes it has inconsistent behavior. Sometimes int a; is actually set to 0.
Re: First Impressions!
On Tuesday, 28 November 2017 at 04:12:14 UTC, ketmar wrote: A Guy With an Opinion wrote: That is true, but I'm still unconvinced that making the person's program likely to error is better than initializing a number to 0. Zero is such a fundamental default for so many things. And it would be consistent with the other number types. basically, default initializers aren't meant to give a "usable value", they meant to give a *defined* value, so we don't have UB. that is, just initialize your variables explicitly, don't rely on defaults. writing: int a; a += 42; is still bad code, even if you're know that `a` is guaranteed to be zero. int a = 0; a += 42; is the "right" way to write it. if you'll look at default values from this PoV, you'll see that NaN has more sense that zero. if there was a NaN for ints, ints would be inited with it too. ;-) Eh...I still don't agree. I think C and C++ just gave that style of coding a bad rap due to the undefined behavior. But the issue is it was undefined behavior. A lot of language features aim to make things well defined and have less verbose representations. Once a language matures that's what a big portion of their newer features become. Less verbose shortcuts of commonly done things. I agree it's important that it's well defined, I'm just thinking it should be a value that someone actually wants some notable fraction of the time. Not something no one wants ever. I could be persuaded, but so far I'm not drinking the koolaid on that. It's not the end of the world, I was just confused when my float was NaN.
Re: First Impressions!
A Guy With an Opinion wrote: That is true, but I'm still unconvinced that making the person's program likely to error is better than initializing a number to 0. Zero is such a fundamental default for so many things. And it would be consistent with the other number types. basically, default initializers aren't meant to give a "usable value", they meant to give a *defined* value, so we don't have UB. that is, just initialize your variables explicitly, don't rely on defaults. writing: int a; a += 42; is still bad code, even if you're know that `a` is guaranteed to be zero. int a = 0; a += 42; is the "right" way to write it. if you'll look at default values from this PoV, you'll see that NaN has more sense that zero. if there was a NaN for ints, ints would be inited with it too. ;-)
Re: First Impressions!
On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote: Its on our TODO list. Allocators need to come out of experimental and some form of RC before we tackle it again. In the mean time https://github.com/economicmodeling/containers is pretty good. That's good to hear. I keep saying it, if you don't have unit tests built in, you don't care about code quality! I just like not having to create a throwaway project to test my code. It's nice to just use unit tests for what I used to create console apps for and then it forever ensures my code works the same! You don't need to bother with them for most code :) That seems to be what people here are saying, but that seems so sad... Doesn't mean the other languages are right either. That is true, but I'm still unconvinced that making the person's program likely to error is better than initializing a number to 0. Zero is such a fundamental default for so many things. And it would be consistent with the other number types. If you need a wstring, use a wstring! Be aware Microsoft is alone in thinking that UTF-16 was awesome. Everybody else standardized on UTF-8 for Unicode. I do come from that world, so there is a chance I'm just comfortable with it.
Re: First Impressions!
On 28/11/2017 3:01 AM, A Guy With an Opinion wrote: Hi, I've been using D for a personal project for about two weeks now and just thought I'd share my initial impression just in case it's useful! I like feedback on things I do, so I just assume others do to. Plus my opinion is the best on the internet! You will see (hopefully the sarcasm is obvious otherwise I'll just appear pompous). It would probably be better if I did a retrospective after my project is completed, but with life who knows if that will happen. I could lose interest or something and not finish it. And then you guys wouldn't know my opinion. I can't allow that. I'll start off by saying I like the overall experience. I come from a C# and C++ background with a little bit of C mixed in. For the most part though, I work with C#, SQL and web technologies on a day to day basis. I did do a three year stint working with C/C++ (mostly C++), but I never really enjoyed it much. C++ is overly verbose, overly complicated, overly littered with poor legacy decisions, and too error prone. C# on the other hand has for the most part been a delight. The only problem is I don't find it to be the best when it comes to generative programming. C# can do some generative programming with it's generics, but for the most part it's always struck me as more specialized for container types and to do anything remotely outside of it's purpose takes a fair bit of cleverness. I'm sick of being clever in that aspect. So here are some impressions good and bad: + Porting straight C# seems pretty straight forward. Even some of the .NET framework, like files and unicode, have fairly direct counterparts in D. + D code so far is pushing me towards more "flat" code (for a lack of a better way to phrase it) and so far that has helped tremendously when it comes to readability. C# kind is the opposite. With it's namespace -> class -> method coupled with lock, using, etc...you tend to do a lot of nesting. You are generally 3 '{' in before any true logic even begins. Then couple that with try/catch, IDisposable/using, locking, and then if/else, it can get quite chaotic very easily. So right away, I saw my C# code actually appear more readable when I translated it and I think it has to do with the flatness. I'm not sure if that opinion will hold when I delve into 'static if' a little more, but so far my uses of it haven't really dampened that opinion. + Visual D. It might be that I had poor expectations of it, because I read D's tooling was poor on the internet (and nothing is ever wrong on the internet), however, the combination of Visual D and DMD actually exceeded my expectations. I've been quite happy with it. It was relatively easy to set up and worked as I would expect it to work. It lets me debug, add breakpoints, and does the basic syntax highlighting I would expect. It could have a few other features, but for a project that is not corporate backed, it was really above what I could have asked for. + So far, compiling is fast. And from what I hear it will stay fast. A big motivator. The one commercial C++ project I worked on was a beast and could take an hour+ to compile if you needed to compile something fundamental. C# is fairly fast, so I've grown accustomed to not having to go to the bathroom, get a drink, etc...before returning to find out I'm on the linking step. I'm used to if it doesn't take less than ten seconds (probably less) then I prep myself for an error to deal with. I want this to remain. - Some of the errors from DMD are a little strange. I don't want to crap on this too much, because for the most part it's fine. However occasionally it throws errors I still can't really work out why THAT is the error it gave me. Some of you may have saw my question in the "Learn" forum about not knowing to use static in an embedded class, but the error was the following: Error: 'this' is only defined in non-static member functions I'd say the errors so far are above some of the cryptic stuff C++ can throw at you (however, I haven't delved that deeply into D templates yet, so don't hold me to this yet), but in terms of quality I'd put it somewhere between C# and C++ in quality. With C# being the ideal. + The standard library so far is really good. Nullable worked as I thought it should. I just guessed a few of the methods based on what I had seen at that point and got it right. So it appears consistent and intuitive. I also like the fact I can peek at the code and understand it by just reading it. Unlike with C++ where I still don't know how some of the stuff is *really* implemented. The STL almost seems like it's written in a completely different language than the stuff it enables. For instance, I figured out how to do packages by seeing it in Phobos. - ...however, where are all of the collections? No Queue? No Stack? No HashTable? I've read that it's not a big focus because some of the built in
Re: First Impressions!
On Tuesday, 28 November 2017 at 03:01:33 UTC, A Guy With an Opinion wrote: - ...however, where are all of the collections? No Queue? No Stack? No HashTable? I've read that it's not a big focus because some of the built in stuff *can* behave like those things. The C# project I'm porting utilizes queues and a specifically C#'s Dictionary<> quite a bit, so I'm not looking forward to having to hand roll my own or use something that aren't fundamentally them. This is definitely the biggest negative I've come across. I want a queue, not something that *can* behave as a queue. I definitely expected more from a language that is this old. Good feedback overall, thanks for checking it out. You're not wrong, but some of the design decisions that feel strange to newcomers at first have been heavily-debated, generally well-reasoned, and just take some time to get used to. That sounds like a cop-out, but stick with it and I think you'll find that a lot of the decisions make sense - see the extensive discussion on NaN-default for floats, for example. Just one note about the above comment though: the std.container.dlist doubly-linked list has methods that you can use to put together stacks and queues easily: https://dlang.org/phobos/std_container_dlist.html Also, D's associative arrays implement a hash map https://dlang.org/spec/hash-map.html, which I think should take care of most of C#'s Dictionary functionality. Anyhow, D is a big language (for better and sometimes worse), so it's easy to miss some of the good nuggets buried within the spec/library. -Doc