Re: Programming in D book is about 88% translated
On Thursday, 6 March 2014 at 02:57:51 UTC, Puming wrote: Hi, I am tranlating this book on http://git.oschina.net/lucifer2031/Programming-in-D-in-Chinese. My email is 786325...@qq.com. Can we talk about this? Thanks Ali, I've sent you an email. On Wednesday, 5 March 2014 at 19:05:51 UTC, Ali Çehreli wrote: On 03/04/2014 09:58 PM, Puming wrote: I'd like to translate your book into Chinese, can we talk about this? Of course. :) You can email me at acehr...@yahoo.com or start translating from the sources: https://code.google.com/p/ddili/source/checkout Here are the build instructions: https://code.google.com/p/ddili/source/browse/trunk/README Somebody has created a git clone of that svn repo but I haven't gotten to use that yet. Sorry... Ali
Re: Bounty for -minimal compiler flag
So? Is anyone working on these features?
Re: Article: Increasing the D Compiler Speed by Over 75%
On 8/3/2013 1:54 PM, Andrej Mitrovic wrote: On 8/3/13, Walter Bright newshou...@digitalmars.com wrote: /delexe http://www.digitalmars.com/ctg/ctgLinkSwitches.html#delexecutable Note that this switch doesn't actually work. We've talked about this somewhere in an Optlink-related bugzilla issue. I don't recall seeing it in bugzilla. If it isn't there, please add it.
Re: Emacs users: flycheck-dmd-dub
Cool, I should get something similar for Vim. I keep finding myself in the situation where the linting (I think through DMD) with syntastic doesn't know where my source files are a lot of the time, so I get a lot of problems with it not knowing where to find imports.
Mono-D v1.7 - Struct init member completion parser refactorings
Hi everyone, just wanted to drop a small sign of life of Mono-D. http://mono-d.alexanderbothe.com/mono-d-v1-7-struct-init-member-completion-massive-parse-improvements/ Cheers, Alex
Re: Programming in D book is about 88% translated
Hi Lucifer, Seems like you've got a team doing this :-) Hope we can collaborate on this translation. On Monday, 10 March 2014 at 10:12:47 UTC, Lucifer wrote: On Thursday, 6 March 2014 at 02:57:51 UTC, Puming wrote: Hi, I am tranlating this book on http://git.oschina.net/lucifer2031/Programming-in-D-in-Chinese. My email is 786325...@qq.com. Can we talk about this? Thanks Ali, I've sent you an email. On Wednesday, 5 March 2014 at 19:05:51 UTC, Ali Çehreli wrote: On 03/04/2014 09:58 PM, Puming wrote: I'd like to translate your book into Chinese, can we talk about this? Of course. :) You can email me at acehr...@yahoo.com or start translating from the sources: https://code.google.com/p/ddili/source/checkout Here are the build instructions: https://code.google.com/p/ddili/source/browse/trunk/README Somebody has created a git clone of that svn repo but I haven't gotten to use that yet. Sorry... Ali
Re: Mono-D v1.7 - Struct init member completion parser refactorings
Hi Alexander, Thanks for the great work. I'm always using Mono-D. On Monday, 10 March 2014 at 20:37:31 UTC, Alexander Bothe wrote: Hi everyone, just wanted to drop a small sign of life of Mono-D. http://mono-d.alexanderbothe.com/mono-d-v1-7-struct-init-member-completion-massive-parse-improvements/ Cheers, Alex
Re: DIP 57: static foreach
2014-03-10 6:31 GMT+09:00 Timon Gehr timon.g...@gmx.ch: http://wiki.dlang.org/DIP57 Thoughts? From the Semantics section: For static foreach statements, break and continue are supported and treated like for foreach statements over tuples. This is questionable sentence. On the foreach with tuple iteration, break and continue have no effect for the unrolling. void main() { import std.typetuple, std.stdio; foreach (i; TypeTuple!(1, 2, 3)) { static if (i == 2) continue; else static if (i == 3) break; pragma(msg, CT: i = , i); // prints 1, 2, and 3 in CT writeln(RT: i = , i); // prints only 1 in RT } } So, I think that static foreach *cannot* support break and continue as same as foreach with tuples. Kenji Hara
Re: Major performance problem with std.array.front()
On 3/10/2014 12:23 AM, Walter Bright wrote: On 3/9/2014 9:19 PM, Nick Sabalausky wrote: On 3/9/2014 6:31 PM, Walter Bright wrote: On 3/9/2014 6:08 AM, Marc Schütz schue...@gmx.net wrote: Also, `byCodeUnit` and `byCodePoint` would probably be better names than `raw` and `decode`, to much the already existing `byGrapheme` in std.uni. I'd vastly prefer 'byChar', 'byWchar', 'byDchar' for each of string, wstring, dstring, and InputRange!char, etc. 'byCodePoint' and 'byDchar' are the same. However, 'byCodeUnit' is completely different from anything else: string str; wstring wstr; dstring dstr; (str|wchar|dchar).byChar // Always range of char (str|wchar|dchar).byWchar // Always range of wchar (str|wchar|dchar).byDchar // Always range of dchar str.representation // Range of ubyte wstr.representation // Range of ushort dstr.representation // Range of uint str.byCodeUnit // Range of char wstr.byCodeUnit // Range of wchar dstr.byCodeUnit // Range of dchar I don't see much point to the latter 3. Do you mean: 1. You don't see the point to iterating by code unit? 2. You don't see the point to 'byCodeUnit' if we have 'representation'? 3. You don't see the point to 'byCodeUnit' if we have 'byChar/byWchar/byDchar'? 4. You don't see the point to having 'byCodeUnit' work on UTF-32 dstrings? Responses: 1. Iterating by code unit: Useful for tweaking performance anytime decoding is unnecessary. For example, parsing a grammar where the bulk of the keywords and operators are ASCII. (Occasional uses of Unicode, like unicode whitespace, can of course be handled easily enough by the lexer FSM). 2. 'byCodeUnit' if we have 'representation': This one I have trouble answering since I'm still unclear on the purpose of 'representation' (I wasn't even aware of it until a few days ago.) I've been assuming there's some specific use-case I've overlooked where it's useful to iterate by code unit *while* treating the code units as if they weren't UTF-8/16/32 at all. But since 'representation' is called *on* a string/wstring/dstring, they should already be UTF-8/16/32 anyway, not some other encoding that would necessitate using integer types. Or maybe it's just for working around problems with the auto-verification being too eager (I've ran into those)? I admit I don't quite get 'representation'. 3. 'byCodeUnit' if we have 'byChar/byWchar/byDchar': To avoid a static if chain every time you want to use code units inside generic code. Also, so in non-generic code you can change your data type without updating instances of 'by*char'. 4. Having 'byCodeUnit' work on UTF-32 dstrings: So generic code working on code units doesn't have to special-case UTF-32.
Re: ddox-generated Phobos documentation is available for review
Fantastic! The organization makes it easy to find the right tool for the job. This is probably nitpicking, but in std.algorithm and other modules ( http://dlang.org/library/std/algorithm.html ) there are multiple overloads of the same function (splitter, reverse, etc); it'd be nice if these could be organized into their own sub-categories, so there's no unnecessary visual redundancy. There's also the library list which displays all modules; do the internal modules (druntime, etc) need to be exposed? It might be nicer for the end-user for these to be hidden, or kept in their own category. Otherwise, very nice! :)
Re: ddox-generated Phobos documentation is available for review
On Monday, 10 March 2014 at 03:44:54 UTC, Andrei Alexandrescu wrote: http://dlang.org/library Looking good! The module list current shows deeply nested modules (e.g. std.c.stdio) before less nested ones (std.stdio). I think it should be the other way round, otherwise you have all the std.c.* modules listed first.
Re: ddox-generated Phobos documentation is available for review
10-Mar-2014 07:44, Andrei Alexandrescu пишет: Consider it alpha quality. Please don't announce yet before we put it in good shape. https://github.com/D-Programming-Language/dlang.org/pull/516 http://dlang.org/library http://dlang.org/library-prerelease I needed to change quite a bit about the makefile. It was building everything over and over again, and it's _slow_. Some functions are not ready, compare e.g. http://dlang.org/library/std/algorithm/balancedParens.html with http://dlang.org/library/std/algorithm/any.html Andrei The front page shouldn't contain std.internal.* stuff and we probably need to adjust DDocs so that all modules have proper blurb text. -- Dmitry Olshansky
Re: ddox-generated Phobos documentation is available for review
On Monday, 10 March 2014 at 03:44:54 UTC, Andrei Alexandrescu wrote: Consider it alpha quality. Please don't announce yet before we put it in good shape. https://github.com/D-Programming-Language/dlang.org/pull/516 http://dlang.org/library http://dlang.org/library-prerelease I needed to change quite a bit about the makefile. It was building everything over and over again, and it's _slow_. Some functions are not ready, compare e.g. http://dlang.org/library/std/algorithm/balancedParens.html with http://dlang.org/library/std/algorithm/any.html Andrei For me it's a real improvement! One thing: symbol names (modules, functions, etc.) shouldn't be hyphenated, specially in tables. Nicolas
Re: Major performance problem with std.array.front()
On 3/10/2014 12:09 AM, Nick Sabalausky wrote: On 3/10/2014 12:23 AM, Walter Bright wrote: On 3/9/2014 9:19 PM, Nick Sabalausky wrote: On 3/9/2014 6:31 PM, Walter Bright wrote: On 3/9/2014 6:08 AM, Marc Schütz schue...@gmx.net wrote: Also, `byCodeUnit` and `byCodePoint` would probably be better names than `raw` and `decode`, to much the already existing `byGrapheme` in std.uni. I'd vastly prefer 'byChar', 'byWchar', 'byDchar' for each of string, wstring, dstring, and InputRange!char, etc. 'byCodePoint' and 'byDchar' are the same. However, 'byCodeUnit' is completely different from anything else: string str; wstring wstr; dstring dstr; (str|wchar|dchar).byChar // Always range of char (str|wchar|dchar).byWchar // Always range of wchar (str|wchar|dchar).byDchar // Always range of dchar str.representation // Range of ubyte wstr.representation // Range of ushort dstr.representation // Range of uint str.byCodeUnit // Range of char wstr.byCodeUnit // Range of wchar dstr.byCodeUnit // Range of dchar I don't see much point to the latter 3. Do you mean: 1. You don't see the point to iterating by code unit? 2. You don't see the point to 'byCodeUnit' if we have 'representation'? 3. You don't see the point to 'byCodeUnit' if we have 'byChar/byWchar/byDchar'? 4. You don't see the point to having 'byCodeUnit' work on UTF-32 dstrings? (3) 3. 'byCodeUnit' if we have 'byChar/byWchar/byDchar': To avoid a static if chain every time you want to use code units inside generic code. Also, so in non-generic code you can change your data type without updating instances of 'by*char'. Just not sure I see a use for that.
Re: Major performance problem with std.array.front()
On Sunday, 9 March 2014 at 21:14:30 UTC, Nick Sabalausky wrote: With all due respect, D string type is exclusively for UTF-8 strings. If it is not valid UTF-8, it should never had been a D string in the first place. In the other cases, ubyte[] is there. This is an arbitrary self-imposed limitation caused by the choice in how strings are handled in Phobos. Yea, I've had problems before - completely unnecessary problems that were *not* helpful or indicative of latent bugs - which were a direct result of Phobos being overly pedantic and eager about UTF validation. And yet the implicit UTF validation has never actually *helped* me in any way. self-imposed limitation For greater good. I finds this article very telling about why string should be converted to UTF-8 as often as possible. http://www.utf8everywhere.org/ I agree 100% with its content, it's impossibly hard to have a sane handling of encodings on WIndows (even more in a team), if not following the drastic rules the article exposes. This happens to be what Phobos gently mandates, UTF validation is certainly the lesser evil as compared the mess that everything become without. How is mandating valid UTF-8 being overly pedantic? This is the sanest behaviour. Just use sanitizeUTF8 (http://vibed.org/api/vibe.utils.string/sanitizeUTF8) or equivalent.
Re: ddox-generated Phobos documentation is available for review
On Monday, 10 March 2014 at 03:44:54 UTC, Andrei Alexandrescu wrote: Consider it alpha quality. Please don't announce yet before we put it in good shape. https://github.com/D-Programming-Language/dlang.org/pull/516 http://dlang.org/library http://dlang.org/library-prerelease I needed to change quite a bit about the makefile. It was building everything over and over again, and it's _slow_. Some functions are not ready, compare e.g. http://dlang.org/library/std/algorithm/balancedParens.html with http://dlang.org/library/std/algorithm/any.html Andrei Nice, but those duplicates have got to go!
Re: Major performance problem with std.array.front()
I'm not sure I understood the point of this (long) thread. The main problem is that decode() is called also if not needed? Well, in this case that's not a problem only for string. I found this problem also when I was writing other ranges. For example when I read binary data from db stream. Front represent a single row, and I decode it every time also if not needed. On Friday, 7 March 2014 at 02:37:11 UTC, Walter Bright wrote: In Lots of low hanging fruit in Phobos the issue came up about the automatic encoding and decoding of char ranges. Throughout D's history, there are regular and repeated proposals to redesign D's view of char[] to pretend it is not UTF-8, but UTF-32. I.e. so D will automatically generate code to decode and encode on every attempt to index char[]. I have strongly objected to these proposals on the grounds that: 1. It is a MAJOR performance problem to do this. 2. Very, very few manipulations of strings ever actually need decoded values. 3. D is a systems/native programming language, and systems/native programming languages must not hide the underlying representation (I make similar arguments about proposals to make ints issue errors on overflow, etc.). 4. Users should choose when decode/encode happens, not the language. and I have been successful at heading these off. But one slipped by me. See this in std.array: @property dchar front(T)(T[] a) @safe pure if (isNarrowString!(T[])) { assert(a.length, Attempting to fetch the front of an empty array of ~ T.stringof); size_t i = 0; return decode(a, i); } What that means is that if I implement an algorithm that accepts, as input, an InputRange of char's, it will ALWAYS try to decode it. This means that even: from.copy(to) will decode 'from', and then re-encode it for 'to'. And it will do it SILENTLY. The user won't notice, and he'll just assume that D performance sux. Even if he does notice, his options to make his code run faster are poor. If the user wants decoding, it should be explicit, as in: from.decode.copy(encode!to) The USER should decide where and when the decoding goes. 'decode' should be just another algorithm. (Yes, I know that std.algorithm.copy() has some specializations to take care of this. But these specializations would have to be written for EVERY algorithm, which is thoroughly unreasonable. Furthermore, copy()'s specializations only apply if BOTH source and destination are arrays. If just one is, the decode/encode penalty applies.) Is there any hope of fixing this?
Re: Major performance problem with std.array.front()
On 3/10/2014 6:21 AM, ponce wrote: On Sunday, 9 March 2014 at 21:14:30 UTC, Nick Sabalausky wrote: Yea, I've had problems before - completely unnecessary problems that were *not* helpful or indicative of latent bugs - which were a direct result of Phobos being overly pedantic and eager about UTF validation. And yet the implicit UTF validation has never actually *helped* me in any way. self-imposed limitation For greater good. I finds this article very telling about why string should be converted to UTF-8 as often as possible. http://www.utf8everywhere.org/ I agree 100% with its content, it's impossibly hard to have a sane handling of encodings on WIndows (even more in a team), if not following the drastic rules the article exposes. I may have missed it, but I don't see where it says anything about validation or immediate sanitation of invalid sequences. It's mostly UTF-16 sucks and so does Windows (not that I'm necessarily disagreeing with it). (ot: Kinda wish they hadn't used such a hard to read font...)
Re: Major performance problem with std.array.front()
On Monday, 10 March 2014 at 11:04:43 UTC, Nick Sabalausky wrote: I may have missed it, but I don't see where it says anything about validation or immediate sanitation of invalid sequences. It's mostly UTF-16 sucks and so does Windows (not that I'm necessarily disagreeing with it). (ot: Kinda wish they hadn't used such a hard to read font...) I should have highlighted it, their recommendations for proper encoding handling on Windows are in section 5 (How to do text on Windows). One of them is std::strings and char*, anywhere in the program, are considered UTF-8 (if not said otherwise). I finds it interesting that D tends to enforce this lesson learned with mixed-encodings codebases.
Re: Major performance problem with std.array.front()
On 3/9/2014 11:27 AM, Vladimir Panteleev wrote: On Sunday, 9 March 2014 at 08:32:09 UTC, monarch_dodra wrote: On topic, I think D's implicit default decode to dchar is *infinity* times better than C++'s char-based strings. While imperfect in terms of grapheme, it was still a design decision made of win. Care to argument? It's simple: Breaking things on all non-English languages is worse than breaking things on non-western[1] languages. Is still breakage, and that *is* bad, but there's no question which breakage is significantly larger. [1] (And yes, I realize western is a gross over-simplification here. Point is one working language vs several working languages.)
Re: ddox-generated Phobos documentation is available for review
Very nice! std.algorithm, std.net.curl etc. have their functions/classes split in categories. I haven't used ddox myself but would it be possible to modify it to read a category variable in the documentation for a function and then use that to group things in the resulting html file? Or would that need modifications to dmd itself. /Jonas
Re: DIP 57: static foreach
On Sunday, 9 March 2014 at 21:31:40 UTC, Timon Gehr wrote: http://wiki.dlang.org/DIP57/ Thoughts? 1) Additionally, CTFE is invoked on all expressions occurring in the ForeachAggregate I think it can be phrased more universally ForeachTypeList symbols must be evaluated as compile-time entities, if it is not possible, implementation-defined compilation error happens. 2) Saying that it does not introduce a new scope is not entirely true as symbols from ForeachTypeList should not be available outside of static foreach. You mention it later in the same block but it is important concept to define as we currently don't have such pseudo-scopes (do we?) 3) The body of the static foreach statement or static foreach declaration is duplicated once for each iteration which the corresponding foreach statement with an empty body would perform when executed in CTFE I don't understand the reason behind limiting static foreach to CTFE semantics. Simply evaluating and pasting the body for each iteration should be enough. It is much closer to mixin template instances in that regard. This will also remove necessity to rely on shadowing rules to re-define ForeachTypeList symbols as at the time of pasting the body those won't exist anymore. 4) Declarations introduced in the body itself are inserted into this enclosing scope Isn't enclosing term used only for scope-to-scope relations or it is applicable to any language construct? (I don't know) 5) For static foreach statements, break and continue are supported and treated like for foreach statements over tuples. It is impossible as far as I understand existing semantics. Currently placed continue/break refer to created scope and don't stop iteration over remaining template argument list members. This is not applicable to generic foreach. 6) In Iterating over members of a scope example there is a strange Python-like colon after `static if` condition. Typo? :) 7) In Relation to tuple foreach stating equivalency is not correct. It is more of subset and even not a strict one as semantics will differ in some corner cases. For example, iterating over expression list will create a local copy right now if `ref` is not used. I'd really want this to not be the case for static foreach. Overall provided examples seem to much my expectations but semantics description can be more structured and detailed.
Re: DIP 57: static foreach
On Sunday, 9 March 2014 at 21:53:45 UTC, Adam D. Ruppe wrote: On Sunday, 9 March 2014 at 21:47:17 UTC, bearophile wrote: suggest to add to DIP57 one more thing: that the introduction of static foreach should come with a warning against the usage of not-static foreach on tuples (and eventually this warning should become a deprecation message). I don't agree because foreach on a tuple is just plain foreach. That it unrolls is just an implementation detail that doesn't change much else. I think considering it to be a separate kind of loop is like considering foreach over arrays, ranges, and opApply items separate loops. Those are just different implementation details of the same user concept. Can't agree. You can't call it implementation detail if it is a property that leaks into user code and can be relied upon. I sometimes hear statements akin to tuple is like container and tuple foreach is just like foreach but it is a very idealistic view that simply does not match current D state. Despite all behavior hacks that try to make it look so. So right now it _is_ a separate and distinctive kind of loop. At the same time it is a very specialized tool and deprecating it does not sound like a practical approach for reducing language complexity. Probably some years later if we eventually find out no one uses it anymore.
Re: Major performance problem with std.array.front()
On Sunday, 9 March 2014 at 17:27:20 UTC, Andrei Alexandrescu wrote: On 3/9/14, 6:47 AM, Marc Schütz schue...@gmx.net wrote: On Friday, 7 March 2014 at 15:03:24 UTC, Dicebot wrote: 2) It is regression back to C++ days of no-one-cares-about-Unicode pain. Thinking about strings as character arrays is so natural and convenient that if language/Phobos won't punish you for that, it will be extremely widespread. Not with Nick Sabalausky's suggestion to remove the implementation of front from char arrays. This way, everyone will be forced to decide whether they want code units or code points or something else. Such as giving up on that crappy language that keeps on breaking their code. Andrei That was more about if you are that crazy to even consider such breakage, this is closer my personal perfection than actual proposal ;)
Proposal for fixing dchar ranges
I proposed this inside the long major performance problem with std.array.front, I've also proposed it before, a long time ago. But seems to be getting no attention buried in that thread, not even negative attention :) An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. e.g.: struct string { immutable(char)[] representation; this(char[] data) { representation = data;} ... // dchar range primitives } Then, a char[] array is simply an array of char[]. points: 1. No more issues with foreach(c; cassé), it iterates via dchar 2. No more issues with cassé[4], it is a static compiler error. 3. No more awkward ASCII manipulation using ubyte[]. 4. No more phobos schizophrenia saying char[] is not an array. 5. No more special casing char[] array templates to fool the compiler. 6. Any other special rules we come up with can be dictated by the library, and not ignored by the compiler. Note, std.algorithm.copy(string1, mutablestring) will still decode/encode, but it's more explicit. It's EXPLICITLY a dchar range. Use std.algorithm.copy(string1.representation, mutablestring.representation) will avoid the issues. I imagine only code that is currently UTF ignorant will break, and that code is easily 'fixed' by adding the 'representation' qualifier. -Steve
Re: ddox-generated Phobos documentation is available for review
On Sun, 09 Mar 2014 23:44:43 -0400, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: Consider it alpha quality. Please don't announce yet before we put it in good shape. I LOVE this. Been waiting for it for a long time. The cross-links themselves are worth the wait. Just look at how organized std.datetime has become! Now, one nitpick -- I would like to see leaf links expand locally instead of opening a new page. Perhaps you can click on the link, and it opens a new page, but have a + button to expand in-line if desired. Essentially, the disruption of going to a new page when looking at the details of a function, I feel is too much. And look at that, disqus comments! -Steve
Re: Major performance problem with std.array.front()
On Friday, 7 March 2014 at 19:43:57 UTC, Walter Bright wrote: On 3/7/2014 7:03 AM, Dicebot wrote: 1) It is a huge breakage and you have been refusing to do one even for more important problems. What is about this sudden change of mind? 1. Performance Performance Performance Not important enough. D has always been safe by default, fast when asked to language, not other way around. There is no fundamental performance problem here, only lack of knowledge about Phobos. 2. The current behavior is surprising (it sure surprised me, I didn't notice it until I looked at the assembler to figure out why the performance sucked) That may imply that better documentation is needed. You were only surprised because of wrong initial assumption about what `char[]` type means. 3. Weirdnesses like ElementEncodingType ElementEncodingType is extremely annoying but I think it is just a side effect of more bigger problem how string algorithms are handled currently. It does not need to be that way. 4. Strange behavior differences between char[], char*, and InputRange!char types Again, there is nothing strange about it. `char[]` is a special type with special semantics that is defined in documentation and consistently following that definition in all but raw array indexing/slicing (which is what I find unfortunate but also beyond fixing feasibility). 5. Funky anomalous issues with writing OutputRange!char (the put(T) must take a dchar) Bad but not worth even a small breaking change. 2) lack of convenient .raw property which will effectively do cast(ubyte[]) I've done the cast as a workaround, but when working with generic code it turns out the ubyte type becomes viral - you have to use it everywhere. So all over the place you're having casts between ubyte = char in unexpected places. You also wind up with ugly ubyte = dchar casts, with the commensurate risk that you goofed and have a truncation bug. Of course it is viral. Because you never ever wan't to have char[] at all if you don't work with Unicode (or work with it on raw byte level). And in that case it is your responsibility to do manual decoding when appropriate. Trying to dish out that performance often means going at low level with all associated risks, there is nothing special about char[] here. It is not a common use case. Essentially, the auto-decode makes trivial code look better, but if you're writing a more comprehensive string processing program, and care about performance, it makes a regular ugly mess of things. And this is how it should be. Again, I am all for creating language that favors performance-critical power programming needs over common/casual needs but it is not what D is and you have been making such choices consistently over quite a long time now (array literals that allocate, I will never forgive that). Suddenly changing your mind only because you have encountered this specific issue personally as opposed to just reports does not fit a language author role. It does not really matter if any new approach itself is good or bad - being unpredictable is a reputation damage D simply can't afford.
Re: Major performance problem with std.array.front()
On Monday, 10 March 2014 at 10:52:02 UTC, Andrea Fontana wrote: I'm not sure I understood the point of this (long) thread. The main problem is that decode() is called also if not needed? I'd like to offer up one D 'user' perspective, it's just a single data point but perhaps useful. I write applications that process Arabic, and I'm thinking about converting one of those apps from python to D, for performance reasons. My app deals with unicode arabic text that is 'out there', and the UnicodeTM support for Arabic is not that well thought out, so the data is often (always) inconsistent in terms of sequencing diacritics etc. Even the code page can vary. Therefore my code has to cater to various ways that other developers have sequenced the code points. So, my needs as a 'user' are: * I want to encode all incoming data immediately into unicode, usually UTF8, if isn't already. * I want to iterate over code points. I don't care about the raw data. * When I get the length of my string it should be the number of code points. * When I index my string it should return the nth code point. * When I manipulate my strings I want to work with code points ... you get the drift. If I want to access the raw data, which I don't, then I'm very happy to cast to ubyte etc. If encode/decode is a performance issue then perhaps there could be a cache for recently used strings where the code point representation is held. BTW to answer a question in the thread, yes the data is left-to-right and visualised right-to-left.
Re: Major performance problem with std.array.front()
In italian we need unicode too. We have several accented letters and often programming languages don't handle utf-8 and other encoding so well... In D I never had any problem with this, and I work a lot on text processing. So my question: is there any problem I'm missing in D with unicode support or is just a performance problem on algorithms? If the problem is performance on algorithms that use .front() but don't care to understand its data, why don't we add a .rawFront() property to implement only when make sense and then a fallback like: auto rawFront(R)(R range) if ( ... isrange ... !__traits(compiles, range.rawFront)) { return range.front; } In this way on copy() or other algorithms we can use rawFront() and it's backward compatible with other ranges too. But I guess I'm missing the point :) On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote: On Monday, 10 March 2014 at 10:52:02 UTC, Andrea Fontana wrote: I'm not sure I understood the point of this (long) thread. The main problem is that decode() is called also if not needed? I'd like to offer up one D 'user' perspective, it's just a single data point but perhaps useful. I write applications that process Arabic, and I'm thinking about converting one of those apps from python to D, for performance reasons. My app deals with unicode arabic text that is 'out there', and the UnicodeTM support for Arabic is not that well thought out, so the data is often (always) inconsistent in terms of sequencing diacritics etc. Even the code page can vary. Therefore my code has to cater to various ways that other developers have sequenced the code points. So, my needs as a 'user' are: * I want to encode all incoming data immediately into unicode, usually UTF8, if isn't already. * I want to iterate over code points. I don't care about the raw data. * When I get the length of my string it should be the number of code points. * When I index my string it should return the nth code point. * When I manipulate my strings I want to work with code points ... you get the drift. If I want to access the raw data, which I don't, then I'm very happy to cast to ubyte etc. If encode/decode is a performance issue then perhaps there could be a cache for recently used strings where the code point representation is held. BTW to answer a question in the thread, yes the data is left-to-right and visualised right-to-left.
Re: ddox-generated Phobos documentation is available for review
On Monday, 10 March 2014 at 03:44:54 UTC, Andrei Alexandrescu wrote: Consider it alpha quality. Please don't announce yet before we put it in good shape. https://github.com/D-Programming-Language/dlang.org/pull/516 http://dlang.org/library http://dlang.org/library-prerelease I needed to change quite a bit about the makefile. It was building everything over and over again, and it's _slow_. Some functions are not ready, compare e.g. http://dlang.org/library/std/algorithm/balancedParens.html with http://dlang.org/library/std/algorithm/any.html Andrei I still don't like disqus :) Documentation in general may probably benefit from some styling tweaks - for example, std.alogrithm looks funny when manually crafted tables turn into usual generated function list. But overall look solid.
Re: Major performance problem with std.array.front()
Am 07.03.2014 03:37, schrieb Walter Bright: In Lots of low hanging fruit in Phobos the issue came up about the automatic encoding and decoding of char ranges. after reading many of the attached posts the question is - what could be Ds future design of introducing breaking changes, its not a solution to say its not possible because of too many breaking changes - that will become more and more a problem of Ds evolution - much like C++
Re: ddox-generated Phobos documentation is available for review
Thank you, to everone who worked on this. It's quite an improvement. Problem: http://dlang.org/library/std/compiler/vendor.html is a 404 Recommendation: I really liked the immediate link to the source file on github in the old layout. If possible please add it to the new layout. Mike
Re: DIP 57: static foreach
On 3/10/14, Kenji Hara k.hara...@gmail.com wrote: This is questionable sentence. On the foreach with tuple iteration, break and continue have no effect for the unrolling. Whatever is implemented, we need to make sure the current code is possible. in std.conv.to: - switch(value) { foreach (I, member; NoDuplicates!(EnumMembers!S)) { case member: return to!T(enumRep!(immutable(T), S, I)); } default: } -
Re: Formal review of std.lexer
Reminder about benchmarks. By the way, is generated lexer usable at CTFE? Imaginary use case : easier DSL implementation.
Re: ddox-generated Phobos documentation is available for review
On Monday, 10 March 2014 at 14:08:07 UTC, Mike wrote: Thank you, to everone who worked on this. It's quite an improvement. Problem: http://dlang.org/library/std/compiler/vendor.html is a 404 Recommendation: I really liked the immediate link to the source file on github in the old layout. If possible please add it to the new layout. Since (IIRC) DDox parses JSON layout, I think it is capable of generating exact links to the file:line of each symbol. That would be neat, as it allows quickly seeing the implementation if the documentation is not sufficient.
Re: Major performance problem with std.array.front()
On Monday, 10 March 2014 at 14:05:39 UTC, dennis luehring wrote: Am 07.03.2014 03:37, schrieb Walter Bright: In Lots of low hanging fruit in Phobos the issue came up about the automatic encoding and decoding of char ranges. after reading many of the attached posts the question is - what could be Ds future design of introducing breaking changes, its not a solution to say its not possible because of too many breaking changes - that will become more and more a problem of Ds evolution - much like C++ Historically 2 approaches has been practiced: 1) argue a lot and then do nothing 2) suddenly change something and tell users is was necessary I also think that this is much more important issue than this whole thread but it does not seem to attract any real attention when mentioned.
Re: Major performance problem with std.array.front()
On Monday, 10 March 2014 at 14:05:39 UTC, dennis luehring wrote: Am 07.03.2014 03:37, schrieb Walter Bright: In Lots of low hanging fruit in Phobos the issue came up about the automatic encoding and decoding of char ranges. after reading many of the attached posts the question is - what could be Ds future design of introducing breaking changes, its not a solution to say its not possible because of too many breaking changes - that will become more and more a problem of Ds evolution - much like C++ I'm a newbie here but I've been waiting for D to mature for a long time. D IMHO has to stabilise now because: * D needs a bigger community so that the the big fish who have learnt the ins and outs don't get bored and move on due to lack of kudos etc. * To get the bigger community D needs more _working_ libraries for major toolkits (GUI etc. etc.) * Libraries will cease to work if there is significant change in D, and then can stay broken because there isn't the inertial mass of other developers to maintain it after the intial developer has moved on. You can see that this has happened a LOT * Anyway the D that I read about in TDPL is already very exciting for programmers like myself, we just want that thanks. Breaking changes can go into D3, if and whenever that is. Keep breaking D2 now and it risks just being forevermore a playpen for computer scientist types. Anyway who cares what I think but I think it reflects a lot of people's opinions too.
Re: Formal review of std.lexer
On Wednesday, 26 February 2014 at 18:07:37 UTC, Jacob Carlborg wrote: On 2014-02-26 00:25, Dicebot wrote: Don't know if it makes sense to introduce random package categorizatin. I'd love to see more hierarchy in Phobos too but we'd first need to agree to package separation principles then. Then that's what we need to do. I don't want any more top level modules. There are already too many. As much as I hate to say it, but such hierarchy is worth a DIP. Once it is formalized, I can proceed with it in review queue as if it was a new module proposal.
Re: Major performance problem with std.array.front()
On Monday, 10 March 2014 at 14:11:13 UTC, Dicebot wrote: Historically 2 approaches has been practiced: 1) argue a lot and then do nothing 2) suddenly change something and tell users is was necessary These are one and the same, just from the two opposing points of view. I also think that this is much more important issue than this whole thread but it does not seem to attract any real attention when mentioned. You mean the whole policy on breaking changes?
Re: Major performance problem with std.array.front()
Historically 2 approaches has been practiced: 1) argue a lot and then do nothing This happens (I think) because Andrei and Walter really value your's and other expert's opinions, but nevertheless have to preserve the general way things work to preserve the long term future of D. They have to be open to persuasion but it would have to be very compelling to get them to change basics now - it seems to me. D is at that difficult 90% stage that we all know about where the boring difficult stuff is left to do. People like to discuss interesting new stuff which at the time seems oh-so-important.
Re: Major performance problem with std.array.front()
On Monday, 10 March 2014 at 14:27:02 UTC, Vladimir Panteleev wrote: On Monday, 10 March 2014 at 14:11:13 UTC, Dicebot wrote: Historically 2 approaches has been practiced: 1) argue a lot and then do nothing 2) suddenly change something and tell users is was necessary These are one and the same, just from the two opposing points of view. /sarcasm :) I also think that this is much more important issue than this whole thread but it does not seem to attract any real attention when mentioned. You mean the whole policy on breaking changes? Yes. I have given up about this idea at some point as there seemed to be consensus that no breaking changes will be even considered for D2 and those that come from fixing bugs are not worth the fuss. This is exactly why I was so shocked that Walter has even started this thread. If breaking changes are actually considered (rare or not), then it is absolutely critical to define the process for it and put link to its description to dlang.org front page.
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 09:35:44 -0400, Steven Schveighoffer schvei...@yahoo.com wrote: Then, a char[] array is simply an array of char[]. An array of char even. -Steve
Re: Proposal for fixing dchar ranges
On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer wrote: I proposed this inside the long major performance problem with std.array.front, I've also proposed it before, a long time ago. But seems to be getting no attention buried in that thread, not even negative attention :) An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. e.g.: struct string { immutable(char)[] representation; this(char[] data) { representation = data;} ... // dchar range primitives } Then, a char[] array is simply an array of char[]. points: 1. No more issues with foreach(c; cassé), it iterates via dchar 2. No more issues with cassé[4], it is a static compiler error. 3. No more awkward ASCII manipulation using ubyte[]. 4. No more phobos schizophrenia saying char[] is not an array. 5. No more special casing char[] array templates to fool the compiler. 6. Any other special rules we come up with can be dictated by the library, and not ignored by the compiler. Note, std.algorithm.copy(string1, mutablestring) will still decode/encode, but it's more explicit. It's EXPLICITLY a dchar range. Use std.algorithm.copy(string1.representation, mutablestring.representation) will avoid the issues. I imagine only code that is currently UTF ignorant will break, and that code is easily 'fixed' by adding the 'representation' qualifier. -Steve It will break any code that slices stored char[] strings directly which may or may not be breaking UTF depending on how indices are calculated. Also adding one more runtime dependency into language but there are so many that it probably does not matter.
Maybe in D3...
From time to time, there are discussions concerning ideas which would impact the language, as it is now, too drastically to be implemented (it would break too much code or require a significant reengineering effort). These discussions get lost, which is regrettable since some of the discussions sometimes produce genuinely great ideas. Although there is no D3 on the horizon, I think it would be nice to keep track of these ideas anyway. http://wiki.dlang.org/Language_issues
Re: Proposal for fixing dchar ranges
On Mon, Mar 10, 2014 at 09:35:44AM -0400, Steven Schveighoffer wrote: [...] An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. e.g.: struct string { immutable(char)[] representation; this(char[] data) { representation = data;} ... // dchar range primitives } Then, a char[] array is simply an array of char[]. points: 1. No more issues with foreach(c; cassé), it iterates via dchar 2. No more issues with cassé[4], it is a static compiler error. 3. No more awkward ASCII manipulation using ubyte[]. 4. No more phobos schizophrenia saying char[] is not an array. 5. No more special casing char[] array templates to fool the compiler. 6. Any other special rules we come up with can be dictated by the library, and not ignored by the compiler. I like this idea. Special-casing char[] in templates was a bad idea. It makes Phobos code needlessly complex, and the inconsistent treatment of char[] sometimes as an array of char and sometimes not causes silly issues like foreach defaulting to char but range iteration defaulting to dchar. Enclosing it in a struct means we can enforce string rules separately from the fact that it's a char array. Note, std.algorithm.copy(string1, mutablestring) will still decode/encode, but it's more explicit. It's EXPLICITLY a dchar range. Use std.algorithm.copy(string1.representation, mutablestring.representation) will avoid the issues. I imagine only code that is currently UTF ignorant will break, and that code is easily 'fixed' by adding the 'representation' qualifier. [...] The only concern I have is the current use of char[] and const(char)[] as mutable strings, and the current implicit conversion from string to const(char)[]. We would need similar wrappers for char[] and const(char)[], and string and mutablestring must be implicitly convertible to conststring, otherwise a LOT of existing code will break in a major way. Plus, these wrappers should also expose the same dchar range API with .representation giving a way to get at the raw code units. T -- It is the quality rather than the quantity that matters. -- Lucius Annaeus Seneca
Re: ddox-generated Phobos documentation is available for review
On 3/10/14, 1:35 AM, Nicolas Sicard wrote: For me it's a real improvement! One thing: symbol names (modules, functions, etc.) shouldn't be hyphenated, specially in tables. All: how does one turn off css hyphenation? Andrei
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 10:48:26 -0400, Dicebot pub...@dicebot.lv wrote: On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer wrote: I proposed this inside the long major performance problem with std.array.front, I've also proposed it before, a long time ago. But seems to be getting no attention buried in that thread, not even negative attention :) An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. e.g.: struct string { immutable(char)[] representation; this(char[] data) { representation = data;} ... // dchar range primitives } Then, a char[] array is simply an array of char[]. points: 1. No more issues with foreach(c; cassé), it iterates via dchar 2. No more issues with cassé[4], it is a static compiler error. 3. No more awkward ASCII manipulation using ubyte[]. 4. No more phobos schizophrenia saying char[] is not an array. 5. No more special casing char[] array templates to fool the compiler. 6. Any other special rules we come up with can be dictated by the library, and not ignored by the compiler. Note, std.algorithm.copy(string1, mutablestring) will still decode/encode, but it's more explicit. It's EXPLICITLY a dchar range. Use std.algorithm.copy(string1.representation, mutablestring.representation) will avoid the issues. I imagine only code that is currently UTF ignorant will break, and that code is easily 'fixed' by adding the 'representation' qualifier. It will break any code that slices stored char[] strings directly which may or may not be breaking UTF depending on how indices are calculated. That is already broken. What I'm looking to do is remove the cruft and WTF factor of the current state of affairs (an array that's not an array). Originally (in that long ago proposal) I had proposed to check for and disallow invalid slicing during runtime. In fact, it could be added if desired with the type defined by the library. Also adding one more runtime dependency into language but there are so many that it probably does not matter. alias string = immutable(char)[]; There isn't much extra dependency one must add to revert to the original behavior. In fact, one nice thing about this proposal is the compiler changes can be done and tested before any real meddling with the string type is done. -Steve
Re: ddox-generated Phobos documentation is available for review
On 3/10/14, 7:00 AM, Dicebot wrote: I still don't like disqus :) Are there better such systems available? Documentation in general may probably benefit from some styling tweaks - for example, std.alogrithm looks funny when manually crafted tables turn into usual generated function list. But overall look solid. Yah, we need a solid community effort on this all. Please file issues appropriately, and hopefully fix others directly. Folks, this is the long tail. Please help us improve our documentation. Andrei
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 10:54:50 -0400, H. S. Teoh hst...@quickfur.ath.cx wrote: The only concern I have is the current use of char[] and const(char)[] as mutable strings, and the current implicit conversion from string to const(char)[]. We would need similar wrappers for char[] and const(char)[], and string and mutablestring must be implicitly convertible to conststring, otherwise a LOT of existing code will break in a major way. I agree that is a limitation of the proposal. It's more of a language-wide problem that one cannot make a struct that can be tail-const-ified. One idea to begin with is to weakly bind to immutable(char)[] using alias this. That way, existing code devolves to current behavior. Then you pick off the primitives you want by defining them in the struct itself. Plus, these wrappers should also expose the same dchar range API with .representation giving a way to get at the raw code units. It already does that, representation is a public member. -Steve
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 11:11:50 -0400, Boyd gaboonvi...@gmx.net wrote: I personally love this idea, though I think it probably introduces too much silent breaking changes for it to be universally acceptable by D users. What silent breaking changes? -Steve
Re: Proposal for fixing dchar ranges
On Monday, 10 March 2014 at 15:01:54 UTC, Steven Schveighoffer wrote: That is already broken. What I'm looking to do is remove the cruft and WTF factor of the current state of affairs (an array that's not an array). Originally (in that long ago proposal) I had proposed to check for and disallow invalid slicing during runtime. In fact, it could be added if desired with the type defined by the library. Broken as if in you are not supposed to do it user code? Yes. Broken as in does the wrong thing - no. If your index is properly calculated, it is no different from casting to ubyte[] and then slicing. I am pretty sure even Phobos does it here and there.
Re: Proposal for fixing dchar ranges
I personally love this idea, though I think it probably introduces too much silent breaking changes for it to be universally acceptable by D users. Perhaps naming it 'String', and deprecating 'string' would make it more acceptable? On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer wrote: I proposed this inside the long major performance problem with std.array.front, I've also proposed it before, a long time ago. But seems to be getting no attention buried in that thread, not even negative attention :) An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. e.g.: struct string { immutable(char)[] representation; this(char[] data) { representation = data;} ... // dchar range primitives } Then, a char[] array is simply an array of char[]. points: 1. No more issues with foreach(c; cassé), it iterates via dchar 2. No more issues with cassé[4], it is a static compiler error. 3. No more awkward ASCII manipulation using ubyte[]. 4. No more phobos schizophrenia saying char[] is not an array. 5. No more special casing char[] array templates to fool the compiler. 6. Any other special rules we come up with can be dictated by the library, and not ignored by the compiler. Note, std.algorithm.copy(string1, mutablestring) will still decode/encode, but it's more explicit. It's EXPLICITLY a dchar range. Use std.algorithm.copy(string1.representation, mutablestring.representation) will avoid the issues. I imagine only code that is currently UTF ignorant will break, and that code is easily 'fixed' by adding the 'representation' qualifier. -Steve
Re: Maybe in D3...
On Monday, 10 March 2014 at 14:50:27 UTC, Vladimir Panteleev wrote: From time to time, there are discussions concerning ideas which would impact the language, as it is now, too drastically to be implemented (it would break too much code or require a significant reengineering effort). These discussions get lost, which is regrettable since some of the discussions sometimes produce genuinely great ideas. Although there is no D3 on the horizon, I think it would be nice to keep track of these ideas anyway. http://wiki.dlang.org/Language_issues I remember someone already creating such page but can't remember the title :( Main problem with it is that with D3 not being a realistic option there is not much motivation into maintaining it. Some ideas are great but by time those may become demanded collective conscious is likely to produce even greater ideas :)
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 11:20:49 -0400, Boyd gaboonvi...@gmx.net wrote: Utf8 aware slicing for strings would be an issue. I'm not proposing to add this. -Steve
Re: Proposal for fixing dchar ranges
Utf8 aware slicing for strings would be an issue. -- On Monday, 10 March 2014 at 15:13:26 UTC, Steven Schveighoffer wrote: On Mon, 10 Mar 2014 11:11:50 -0400, Boyd gaboonvi...@gmx.net wrote: I personally love this idea, though I think it probably introduces too much silent breaking changes for it to be universally acceptable by D users. What silent breaking changes? -Steve
Re: Proposal for fixing dchar ranges
Ok, then you just destroyed my sole hypothetical objection to this. --- On Monday, 10 March 2014 at 15:22:41 UTC, Steven Schveighoffer wrote: On Mon, 10 Mar 2014 11:20:49 -0400, Boyd gaboonvi...@gmx.net wrote: Utf8 aware slicing for strings would be an issue. I'm not proposing to add this. -Steve
Re: DIP 57: static foreach
On 03/10/2014 02:08 PM, Dicebot wrote: On Sunday, 9 March 2014 at 21:31:40 UTC, Timon Gehr wrote: http://wiki.dlang.org/DIP57/ Thoughts? 1) Additionally, CTFE is invoked on all expressions occurring in the ForeachAggregate I think it can be phrased more universally ForeachTypeList symbols must be evaluated as compile-time entities, if it is not possible, implementation-defined compilation error happens. ... I don't see how this is more universal. 2) Saying that it does not introduce a new scope is not entirely true as symbols from ForeachTypeList should not be available outside of static foreach. You mention it later in the same block but it is important concept to define as we currently don't have such pseudo-scopes ... The description only says that the usual scope for foreach statements is not introduced. (do we?) ... Nope. 3) The body of the static foreach statement or static foreach declaration is duplicated once for each iteration which the corresponding foreach statement with an empty body would perform when executed in CTFE I don't understand the reason behind limiting static foreach to CTFE semantics. Simply evaluating and pasting the body for each iteration should be enough. It is much closer to mixin template instances in that regard. ... I don't understand how the DIP is 'limiting static foreach to CTFE semantics' and/or why this is a bad thing or how your suggestion is different. This will also remove necessity to rely on shadowing rules to re-define ForeachTypeList symbols as at the time of pasting the body those won't exist anymore. ... I have no idea what this means. 4) Declarations introduced in the body itself are inserted into this enclosing scope Isn't enclosing term used only for scope-to-scope relations or it is applicable to any language construct? (I don't know) ... There is no formal language spec. What is meant is the scope `hosting' the static foreach construct. 5) For static foreach statements, break and continue are supported and treated like for foreach statements over tuples. It is impossible as far as I understand existing semantics. Currently placed continue/break refer to created scope and don't stop iteration over remaining template argument list members. This is not applicable to generic foreach. ... This is not 'impossible', it is trivial to implement. Is your point that you would prefer break and continue to affect static foreach expansion? 6) In Iterating over members of a scope example there is a strange Python-like colon after `static if` condition. Typo? :) ... Nope. This is a language feature. See: http://dlang.org/version.html 7) In Relation to tuple foreach stating equivalency is not correct. I have removed the section. It is more of subset and even not a strict one as semantics will differ in some corner cases. I think as described they would not need to. For example, iterating over expression list will create a local copy right now if `ref` is not used. I'd really want this to not be the case for static foreach. ... I think the description is actually not detailed enough to warrant this critique. (In particular, it is not clear what 'ref' should do.) I.e., I think currently the following code is ambiguous: int y,z; static foreach(x;Seq!(y,z)) x = 2; // what is the value of y and z now? Overall provided examples seem to much my expectations but semantics description can be more structured and detailed. Agreed. I will do another iteration when I can find the time. Maybe I will have to re-specify the behaviour of foreach though.
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 11:11:23 -0400, Dicebot pub...@dicebot.lv wrote: On Monday, 10 March 2014 at 15:01:54 UTC, Steven Schveighoffer wrote: That is already broken. What I'm looking to do is remove the cruft and WTF factor of the current state of affairs (an array that's not an array). Originally (in that long ago proposal) I had proposed to check for and disallow invalid slicing during runtime. In fact, it could be added if desired with the type defined by the library. Broken as if in you are not supposed to do it user code? Yes. Broken as in does the wrong thing - no. If your index is properly calculated, it is no different from casting to ubyte[] and then slicing. I am pretty sure even Phobos does it here and there. If the idea to ensure the user cannot slice a code point was added, you would still be able to slice via str.representation[a..b], or even str.ptr[a..b] if you were so sure of the length you didn't want it to be checked ;) The idea behind the proposal is to make it fully backwards compatible with existing code, except for randomly accessing a char, and probably .length. Slicing would still work as it does now, but could be adjusted later. It will break existing code. To fix those breaks, you would need to use the char[] array directly via the representation member, or rethink your code to be UTF-correct. Basically, instead of pretending an array isn't an array, create a new mostly-compatible type that behaves as we want it to behave in all circumstances, not just when you use phobos algorithms. The breaks may be trivial to work around, and might seem annoying. However, they may be actual UTF bugs that make your code more correct when you fix them. The biggest problem right now is the lack of the ability to implicitly cast to tail-const with a custom struct. We can keep an alias-this link for those cases until we can fix that in the compiler. -Steve
Re: DIP 57: static foreach
On 03/10/2014 07:40 AM, Kenji Hara wrote: 2014-03-10 6:31 GMT+09:00 Timon Gehr timon.g...@gmx.ch mailto:timon.g...@gmx.ch: http://wiki.dlang.org/DIP57 http://wiki.dlang.org/DIP57/ Thoughts? From the Semantics section: For static foreach statements, break and continue are supported and treated like for foreach statements over tuples. This is questionable sentence. On the foreach with tuple iteration, break and continue have no effect for the unrolling. ... That's what is meant, and indeed this is visible in the examples section. void main() { import std.typetuple, std.stdio; foreach (i; TypeTuple!(1, 2, 3)) { static if (i == 2) continue; else static if (i == 3) break; pragma(msg, CT: i = , i); // prints 1, 2, and 3 in CT writeln(RT: i = , i); // prints only 1 in RT } } So, I think that static foreach *cannot* support break and continue as same as foreach with tuples. Kenji Hara Yes it can. What is your suggestion? Influencing the unrolling?
Re: ddox-generated Phobos documentation is available for review
On Monday, 10 March 2014 at 14:56:13 UTC, Andrei Alexandrescu wrote: On 3/10/14, 1:35 AM, Nicolas Sicard wrote: For me it's a real improvement! One thing: symbol names (modules, functions, etc.) shouldn't be hyphenated, specially in tables. All: how does one turn off css hyphenation? Andrei word-wrap: break-word; -webkit-hypens: none; -moz-hypens: none; -ms-hypens: none; hypens: none; should do the trick..
Re: Maybe in D3...
On Monday, 10 March 2014 at 15:16:13 UTC, Dicebot wrote: On Monday, 10 March 2014 at 14:50:27 UTC, Vladimir Panteleev wrote: From time to time, there are discussions concerning ideas which would impact the language, as it is now, too drastically to be implemented (it would break too much code or require a significant reengineering effort). These discussions get lost, which is regrettable since some of the discussions sometimes produce genuinely great ideas. Although there is no D3 on the horizon, I think it would be nice to keep track of these ideas anyway. http://wiki.dlang.org/Language_issues I remember someone already creating such page but can't remember the title :( Main problem with it is that with D3 not being a realistic option there is not much motivation into maintaining it. Some ideas are great but by time those may become demanded collective conscious is likely to produce even greater ideas :) Keeping track of the ideas is still worthwhile though, if only to bring people up to speed who haven't been part of the whole conversation.
Re: ddox-generated Phobos documentation is available for review
On Monday, 10 March 2014 at 14:56:13 UTC, Andrei Alexandrescu wrote: On 3/10/14, 1:35 AM, Nicolas Sicard wrote: For me it's a real improvement! One thing: symbol names (modules, functions, etc.) shouldn't be hyphenated, specially in tables. All: how does one turn off css hyphenation? Andrei class=donthyphenate
Re: ddox-generated Phobos documentation is available for review
On Monday, 10 March 2014 at 14:11:06 UTC, Vladimir Panteleev wrote: On Monday, 10 March 2014 at 14:08:07 UTC, Mike wrote: Thank you, to everone who worked on this. It's quite an improvement. Problem: http://dlang.org/library/std/compiler/vendor.html is a 404 Recommendation: I really liked the immediate link to the source file on github in the old layout. If possible please add it to the new layout. Since (IIRC) DDox parses JSON layout, I think it is capable of generating exact links to the file:line of each symbol. That would be neat, as it allows quickly seeing the implementation if the documentation is not sufficient. I wanted to do just this so I considered adding a predefined macro to ddoc to get line numbers like I did to get filenames (I needed SRCFILENAME to add the Improve This Page button) but the line numbers would pretty quickly lose sync between master and the documentation so that would also require integrating the release tag into the documentation somehow so I gave up on that idea.
Re: ddox-generated Phobos documentation is available for review
On Monday, 10 March 2014 at 16:54:37 UTC, Brad Anderson wrote: On Monday, 10 March 2014 at 14:11:06 UTC, Vladimir Panteleev wrote: On Monday, 10 March 2014 at 14:08:07 UTC, Mike wrote: Thank you, to everone who worked on this. It's quite an improvement. Problem: http://dlang.org/library/std/compiler/vendor.html is a 404 Recommendation: I really liked the immediate link to the source file on github in the old layout. If possible please add it to the new layout. Since (IIRC) DDox parses JSON layout, I think it is capable of generating exact links to the file:line of each symbol. That would be neat, as it allows quickly seeing the implementation if the documentation is not sufficient. I wanted to do just this so I considered adding a predefined macro to ddoc to get line numbers like I did to get filenames (I needed SRCFILENAME to add the Improve This Page button) but the line numbers would pretty quickly lose sync between master and the documentation so that would also require integrating the release tag into the documentation somehow so I gave up on that idea. So... don't link to master? The dmd repo has a VERSION file. Can that be used to link to the respective tag instead?
Re: Proposal for fixing dchar ranges
On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer wrote: I proposed this inside the long major performance problem with std.array.front, I've also proposed it before, a long time ago. But seems to be getting no attention buried in that thread, not even negative attention :) An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. e.g.: struct string { immutable(char)[] representation; this(char[] data) { representation = data;} ... // dchar range primitives } Then, a char[] array is simply an array of char[]. points: 1. No more issues with foreach(c; cassé), it iterates via dchar 2. No more issues with cassé[4], it is a static compiler error. 3. No more awkward ASCII manipulation using ubyte[]. 4. No more phobos schizophrenia saying char[] is not an array. 5. No more special casing char[] array templates to fool the compiler. 6. Any other special rules we come up with can be dictated by the library, and not ignored by the compiler. Note, std.algorithm.copy(string1, mutablestring) will still decode/encode, but it's more explicit. It's EXPLICITLY a dchar range. Use std.algorithm.copy(string1.representation, mutablestring.representation) will avoid the issues. I imagine only code that is currently UTF ignorant will break, and that code is easily 'fixed' by adding the 'representation' qualifier. -Steve Generally I think it's a good idea. Going a bit further you could also enable Short String Optimization but you'd have to encapsulate the backing array. It seems like this would be an even bigger breaking change than Walter's proposal though (right or wrong, slicing strings is very common).
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson e...@gnuk.net wrote: It seems like this would be an even bigger breaking change than Walter's proposal though (right or wrong, slicing strings is very common). You're the second person to mention that, I was not planning on disabling string slicing. Just random access to individual chars, and probably .length. -Steve
Re: Proposal for fixing dchar ranges
On Monday, 10 March 2014 at 17:54:49 UTC, Steven Schveighoffer wrote: On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson e...@gnuk.net wrote: It seems like this would be an even bigger breaking change than Walter's proposal though (right or wrong, slicing strings is very common). You're the second person to mention that, I was not planning on disabling string slicing. Just random access to individual chars, and probably .length. -Steve How is slicing any better than indexing?
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 14:01:45 -0400, John Colvin john.loughran.col...@gmail.com wrote: On Monday, 10 March 2014 at 17:54:49 UTC, Steven Schveighoffer wrote: On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson e...@gnuk.net wrote: It seems like this would be an even bigger breaking change than Walter's proposal though (right or wrong, slicing strings is very common). You're the second person to mention that, I was not planning on disabling string slicing. Just random access to individual chars, and probably .length. -Steve How is slicing any better than indexing? Because one can slice out a multi-code-unit code point, one cannot access it via index. Strings would be horribly crippled without slicing. Without indexing, they are fine. A possibility is to allow index, but actually decode the code point at that index (error on invalid index). That might actually be the correct mechanism. -Steve
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 13:59:53 -0400, John Colvin john.loughran.col...@gmail.com wrote: On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer wrote: I proposed this inside the long major performance problem with std.array.front, I've also proposed it before, a long time ago. But seems to be getting no attention buried in that thread, not even negative attention :) An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. e.g.: struct string { immutable(char)[] representation; this(char[] data) { representation = data;} ... // dchar range primitives } Then, a char[] array is simply an array of char[]. points: 1. No more issues with foreach(c; cassé), it iterates via dchar 2. No more issues with cassé[4], it is a static compiler error. 3. No more awkward ASCII manipulation using ubyte[]. 4. No more phobos schizophrenia saying char[] is not an array. 5. No more special casing char[] array templates to fool the compiler. 6. Any other special rules we come up with can be dictated by the library, and not ignored by the compiler. Note, std.algorithm.copy(string1, mutablestring) will still decode/encode, but it's more explicit. It's EXPLICITLY a dchar range. Use std.algorithm.copy(string1.representation, mutablestring.representation) will avoid the issues. I imagine only code that is currently UTF ignorant will break, and that code is easily 'fixed' by adding the 'representation' qualifier. -Steve I know warnings are disliked, but couldn't we make the slicing and indexing work as currently but issue a warning*? It's not ideal but it does mean we get backwards compatibility. As I mentioned elsewhere (but repeating here for viewers), I was not planning on disabling slicing. Indexing is rarely a feature one needs or should use, especially with encoded strings. -Steve
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 14:30:07 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/10/2014 6:35 AM, Steven Schveighoffer wrote: An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. Proposals to make a string class for D have come up many times. I have a kneejerk dislike for it. It's a really strong feature for D to have strings be an array type, and I'll go to great lengths to keep it that way. I wholly agree, they should be an array type. But what they are now is worse. -Steve
Re: Proposal for fixing dchar ranges
On 3/10/2014 6:35 AM, Steven Schveighoffer wrote: An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. Proposals to make a string class for D have come up many times. I have a kneejerk dislike for it. It's a really strong feature for D to have strings be an array type, and I'll go to great lengths to keep it that way.
Re: Major performance problem with std.array.front()
Am Mon, 10 Mar 2014 14:05:03 + schrieb Andrea Fontana nos...@example.com: In italian we need unicode too. We have several accented letters and often programming languages don't handle utf-8 and other encoding so well... In D I never had any problem with this, and I work a lot on text processing. So my question: is there any problem I'm missing in D with unicode support or is just a performance problem on algorithms? The only real problem apart from potential performance issues I've seen mentioned in this thread is that indexing/slicing is done with code units. I think this: auto index = countUntil(...); auto slice = str[0 .. index]; is really the only problem with the current implementation. If we could start from scratch I'd say we keep operating on code points by default but don't make strings arrays of char/wchar/dchar. Instead they should be special types which do all operations (especially indexing, slicing) on code points. This would be as safe as the current implementation, always consistent but probably even slower in some cases. Then offer some nice way to get the raw data for algorithms which can deal with it. However, I think it's too late to make these changes.
Re: Major performance problem with std.array.front()
On Monday, 10 March 2014 at 13:18:50 UTC, Dicebot wrote: On Sunday, 9 March 2014 at 17:27:20 UTC, Andrei Alexandrescu wrote: On 3/9/14, 6:47 AM, Marc Schütz schue...@gmx.net wrote: On Friday, 7 March 2014 at 15:03:24 UTC, Dicebot wrote: 2) It is regression back to C++ days of no-one-cares-about-Unicode pain. Thinking about strings as character arrays is so natural and convenient that if language/Phobos won't punish you for that, it will be extremely widespread. Not with Nick Sabalausky's suggestion to remove the implementation of front from char arrays. This way, everyone will be forced to decide whether they want code units or code points or something else. Such as giving up on that crappy language that keeps on breaking their code. Andrei That was more about if you are that crazy to even consider such breakage, this is closer my personal perfection than actual proposal ;) BTW, I don't believe it would be that bad, because there's a straight-forward path of deprecation: First, std.range.front for narrow strings (and dchar, for consistency) can be marked as deprecated. The deprecation message can say: Please specify .byCodePoint()/.byCodeUnit(), guiding the users towards a better style (assuming one agrees that explicit is indeed better than implicit in this case). After some time, the functionality can be moved into a compatibility module, with the deprecated functions still there, but now additionally telling the user about the quick fix of importing that module. The deprecation period can be very long, and even if the functions should never be removed, at least everyone writing new code would do so in the new style.
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 14:30:07 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/10/2014 6:35 AM, Steven Schveighoffer wrote: An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. Proposals to make a string class for D have come up many times. I have a kneejerk dislike for it. It's a really strong feature for D to have strings be an array type, and I'll go to great lengths to keep it that way. BTW, this escaped my view the first time reading your post, but I am NOT proposing a string *class*. In fact, I'm not proposing we change anything technical about strings, the code generated should be basically identical. What I'm proposing is to encapsulate what you can and can't do with a string in the type itself, instead of making the standard library flip over backwards to treat it as something else when the compiler treats it as a simple array of char. -Steve
Re: Major performance problem with std.array.front()
On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote: My app deals with unicode arabic text that is 'out there', and the UnicodeTM support for Arabic is not that well thought out, so the data is often (always) inconsistent in terms of sequencing diacritics etc. Even the code page can vary. Therefore my code has to cater to various ways that other developers have sequenced the code points. So, my needs as a 'user' are: * I want to encode all incoming data immediately into unicode, usually UTF8, if isn't already. * I want to iterate over code points. I don't care about the raw data. * When I get the length of my string it should be the number of code points. * When I index my string it should return the nth code point. * When I manipulate my strings I want to work with code points ... you get the drift. Are you sure that code points is what you want? AFAIK there are lots of diacritics in Arabic, and I believe they are not precomposed with their carrying letters...
Re: Proposal for fixing dchar ranges
Am Mon, 10 Mar 2014 11:30:07 -0700 schrieb Walter Bright newshou...@digitalmars.com: On 3/10/2014 6:35 AM, Steven Schveighoffer wrote: An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. Proposals to make a string class for D have come up many times. I have a kneejerk dislike for it. It's a really strong feature for D to have strings be an array type, and I'll go to great lengths to keep it that way. Question: which type T doesn't have slicing, has an ElementType of dchar, has typeof(T[0]).sizeof == 4, ElementEncodingType!T == char and still satisfies isArray? It's a string. Would you call that 'an array type'? writeln(isArray!string); //true writeln(hasSlicing!string); //false writeln(ElementType!string.stringof); //dchar writeln(ElementEncodingType!string.stringof); //char I wouldn't call that an array. Part of the problem is that you want string to be arrays (fixed size elements, direct indexing) and Andrei doesn't want them to be arrays (operating on code points = not fixed size = not arrays).
Re: Proposal for fixing dchar ranges
Am Mon, 10 Mar 2014 13:55:00 -0400 schrieb Steven Schveighoffer schvei...@yahoo.com: On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson e...@gnuk.net wrote: It seems like this would be an even bigger breaking change than Walter's proposal though (right or wrong, slicing strings is very common). You're the second person to mention that, I was not planning on disabling string slicing. Just random access to individual chars, and probably .length. -Steve Unfortunately slicing by code units is probably the most important safety issue with the current implementation: As was mentioned in the other thread: size_t index = str.countUntil('a'); auto slice = str[0..index]; This can be a safety and security issue. (I realize that this would break lots of code so I'm not sure if we should/can fix it. But I think this was the most important problem mentioned in the other thread.)
Re: ddox-generated Phobos documentation is available for review
Am 10.03.2014 15:11, schrieb Vladimir Panteleev: On Monday, 10 March 2014 at 14:08:07 UTC, Mike wrote: Thank you, to everone who worked on this. It's quite an improvement. Problem: http://dlang.org/library/std/compiler/vendor.html is a 404 Recommendation: I really liked the immediate link to the source file on github in the old layout. If possible please add it to the new layout. Since (IIRC) DDox parses JSON layout, I think it is capable of generating exact links to the file:line of each symbol. That would be neat, as it allows quickly seeing the implementation if the documentation is not sufficient. It's actually already there - at the top of each page, there is a View source code button that goes to the proper file/line and to the proper branch/tag. I've used the same style as the already existing buttons, but those are indeed not very noticeable on the right side of the page. Any suggestions for a better place/style without visually cluttering up the actual documentation?
Re: Proposal for fixing dchar ranges
On 3/10/2014 11:54 AM, Steven Schveighoffer wrote: BTW, this escaped my view the first time reading your post, but I am NOT proposing a string *class*. Right, but here I used the term class to be more generic as in being a user defined type, i.e. struct or class. I should have been more clear.
Re: Duals or ranges and reactive D
On Saturday, 8 March 2014 at 12:01:10 UTC, Timon Gehr wrote: On 02/27/2014 01:41 PM, Szymon Gatner wrote: C#'s IObservable/IObserver made me think how would a dual [1][2] of a range concept look in D. Since D has no equivalent IEnumerable (as it is no needed thanks to templates) it is only about IEnumerator / IObserver part which relates to a D range. Ranges/enumerators are models of 'pull' style interface whereas their duals represent models of 'push' style enabling reactive programming [3] techniques which are really nicely solving issues of asynchronous / event - based programming. I suppose OutptRange is similar in concept, although it has 'OnCompleted' / 'OnError' missing. What do you think? Rx along with LINQ is a really clean solution to the problem of asynchronous ranges of values. I think it would be very nice to have in D too. [1] http://csl.stanford.edu/~christos/pldi2010.fit/meijer.duality.pdf [2] http://josemigueltorres.net/index.php/ienumerableiobservable-duality/ [3] https://channel9.msdn.com/Shows/Going+Deep/Expert-to-Expert-Brian-Beckman-and-Erik-Meijer-Inside-the-NET-Reactive-Framework-Rx In case you are interested, I have thrown together a small proof of concept implementation: http://dpaste.dzfl.pl/9d8386768da0 Wow, that is now what I'd small ;) I will definitely take a look. Is it something you already had written or something new? How do you feel about the concept?
Re: Proposal for fixing dchar ranges
On Monday, 10 March 2014 at 18:50:28 UTC, Johannes Pfau wrote: Question: which type T doesn't have slicing, has an ElementType of dchar, has typeof(T[0]).sizeof == 4, ElementEncodingType!T == char and still satisfies isArray? In addition, hasLength!T == false, which totally freaked me out when I first discovered that.
Re: Proposal for fixing dchar ranges
On Monday, 10 March 2014 at 18:09:51 UTC, Steven Schveighoffer wrote: On Mon, 10 Mar 2014 14:01:45 -0400, John Colvin john.loughran.col...@gmail.com wrote: On Monday, 10 March 2014 at 17:54:49 UTC, Steven Schveighoffer wrote: On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson e...@gnuk.net wrote: It seems like this would be an even bigger breaking change than Walter's proposal though (right or wrong, slicing strings is very common). You're the second person to mention that, I was not planning on disabling string slicing. Just random access to individual chars, and probably .length. -Steve How is slicing any better than indexing? Because one can slice out a multi-code-unit code point, one cannot access it via index. Strings would be horribly crippled without slicing. Without indexing, they are fine. A possibility is to allow index, but actually decode the code point at that index (error on invalid index). That might actually be the correct mechanism. -Steve In order to be correct, both require exactly the same knowledge: The beginning of a code point, followed by the end of a code point. In the indexing case they just happen to be the same code-point and happen to be one code unit from each other. I don't see how one is any more or less errror-prone or fundamentally wrong than the other. I do understand that slicing is more important however.
Re: ddox-generated Phobos documentation is available for review
The documentation is looking very good, good work to all involved. There are a few bugs here and there. Appender's docs were missing, some runtime modules are in there which should maybe be hidden. Still, this is a massive improvement, and I love it.
Re: Proposal for fixing dchar ranges
On Mon, Mar 10, 2014 at 07:49:04PM +0100, Johannes Pfau wrote: Am Mon, 10 Mar 2014 11:30:07 -0700 schrieb Walter Bright newshou...@digitalmars.com: On 3/10/2014 6:35 AM, Steven Schveighoffer wrote: An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. Proposals to make a string class for D have come up many times. I have a kneejerk dislike for it. It's a really strong feature for D to have strings be an array type, and I'll go to great lengths to keep it that way. I'm on the fence about this one. The nice thing about strings being an array type, is that it is a familiar concept to C coders, and it allows array slicing for extracting substrings, etc., which fits nicely with the C view of strings as character arrays. As a C coder myself, I like it this way too. But the bad thing about strings being an array type, is that it's a holdover from C, and it allows slicing for extracting substrings -- malformed substrings by permitting slicing a multibyte (multiword) character. Basically, the nice aspects of strings being arrays only apply when you're dealing with ASCII (or mostly-ASCII) strings. These very same nice aspects turn into problems when dealing with anything non-ASCII. The only way the user can get it right using only array operations, is if they understand the whole of Unicode in their head and are willing to reinvent Unicode algorithms every time they slice a string or do some operation on it. Since D purportedly supports Unicode by default, it shouldn't be this way. D should *actually* support Unicode all the way -- use proper Unicode algorithms for substring extraction, collation, line-breaking, normalization, etc.. Being a systems language, of course, means that D should allow you to get under the hood and do things directly with the raw string representation -- but this shouldn't be the *default* modus operandi. The default should be a properly-encapsulated string type with Unicode algorithms to operate on it (with the option of reaching into the raw representation where necessary). Question: which type T doesn't have slicing, has an ElementType of dchar, has typeof(T[0]).sizeof == 4, ElementEncodingType!T == char and still satisfies isArray? It's a string. Would you call that 'an array type'? writeln(isArray!string); //true writeln(hasSlicing!string); //false writeln(ElementType!string.stringof); //dchar writeln(ElementEncodingType!string.stringof); //char I wouldn't call that an array. Part of the problem is that you want string to be arrays (fixed size elements, direct indexing) and Andrei doesn't want them to be arrays (operating on code points = not fixed size = not arrays). Exactly. What we have right now is a frankensteinian hybrid that's neither fully an array, nor fully a Unicode string type. If we call the current messy AA implementation split between compiler, aaA.d, and object.di a design problem, then I'd call the current state of D strings a design problem too. This underlying inconsistency is ultimately what leads to the poor performance of strings in std.algorithm. It's precisely because of this that I've given up on using std.algorithm for strings altogether -- std.regex is far better: more flexible, more expressive, and more performant, and specifically designed to operate on strings. Nowadays I only use std.algorithm for non-string ranges (because then the behaviour is actually consistent!!). T -- MS Windows: 64-bit overhaul of 32-bit extensions and a graphical shell for a 16-bit patch to an 8-bit operating system originally coded for a 4-bit microprocessor, written by a 2-bit company that can't stand 1-bit of competition.
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 15:30:00 -0400, John Colvin john.loughran.col...@gmail.com wrote: On Monday, 10 March 2014 at 18:09:51 UTC, Steven Schveighoffer wrote: Because one can slice out a multi-code-unit code point, one cannot access it via index. Strings would be horribly crippled without slicing. Without indexing, they are fine. A possibility is to allow index, but actually decode the code point at that index (error on invalid index). That might actually be the correct mechanism. In order to be correct, both require exactly the same knowledge: The beginning of a code point, followed by the end of a code point. In the indexing case they just happen to be the same code-point and happen to be one code unit from each other. I don't see how one is any more or less errror-prone or fundamentally wrong than the other. Using indexing, you simply cannot get the single code unit that represents a multi-code-unit code point. It doesn't fit in a char. It's guaranteed to fail, whereas slicing will give you access to the all the data in the string. Now, with indexing actually decoding a code point, one can alias a[i] to a[i..$].front(), which means decode the first code point you come to at index i. This means indexing is slow(er), and returns a dchar. I think as a first step, that might be too much to add silently. I'd rather break it first, then add it back later. -Steve
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 14:54:22 -0400, Johannes Pfau nos...@example.com wrote: Am Mon, 10 Mar 2014 13:55:00 -0400 schrieb Steven Schveighoffer schvei...@yahoo.com: On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson e...@gnuk.net wrote: It seems like this would be an even bigger breaking change than Walter's proposal though (right or wrong, slicing strings is very common). You're the second person to mention that, I was not planning on disabling string slicing. Just random access to individual chars, and probably .length. -Steve Unfortunately slicing by code units is probably the most important safety issue with the current implementation: As was mentioned in the other thread: size_t index = str.countUntil('a'); auto slice = str[0..index]; This can be a safety and security issue. (I realize that this would break lots of code so I'm not sure if we should/can fix it. But I think this was the most important problem mentioned in the other thread.) Slicing can never be a code point based operation. It would be too slow (read linear complexity). What needs to be broken is the expectation that an index is the number of code points or characters in a string. Think of an index as a position that has no real meaning except they are ordered in the stream. Like a set of ordered numbers, not necessarily consecutive. The index 4 may not exist, while 5 does. At this point, my proposal does not fix that particular problem, but I don't think there's any way to fix that problem except to train the user who wrote it not to do that. However, it does not leave us in a worse position. -Steve
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 16:06:25 -0400, Steven Schveighoffer schvei...@yahoo.com wrote: Think of an index as a position that has no real meaning except they are ordered in the stream. Like a set of ordered numbers, not necessarily consecutive. The index 4 may not exist, while 5 does. I said that wrong, of course it has meaning. What I mean is that if you have two positions, the ordering will indicate where the characters/graphemes/code points occur in the stream, but their value will not be indicative of how far they are apart in terms of characters/graphemes/code points. In other words, if I have two characters, at position p1 and p2, then p1 p2 = p1 comes later in the string than p2 p1 == p2 = p1 and p2 refer to the same character p1 - p2 = not defined to any particular value. -Steve
Re: Proposal for fixing dchar ranges
On Monday, 10 March 2014 at 17:54:49 UTC, Steven Schveighoffer wrote: On Mon, 10 Mar 2014 13:06:08 -0400, Brad Anderson e...@gnuk.net wrote: It seems like this would be an even bigger breaking change than Walter's proposal though (right or wrong, slicing strings is very common). You're the second person to mention that, I was not planning on disabling string slicing. Just random access to individual chars, and probably .length. -Steve Sorry, I misunderstood. That sounds reasonable.
Re: Proposal for fixing dchar ranges
On 3/10/2014 1:36 PM, Steven Schveighoffer wrote: What strings are already is a user-defined type, No, they are not. but with horrible enforcement. With no enforcement, and that is by design. Keep in mind that D is a systems programming language, and that means unfettered access to strings.
Re: Proposal for fixing dchar ranges
On Monday, 10 March 2014 at 20:00:07 UTC, Steven Schveighoffer wrote: On Mon, 10 Mar 2014 15:30:00 -0400, John Colvin john.loughran.col...@gmail.com wrote: On Monday, 10 March 2014 at 18:09:51 UTC, Steven Schveighoffer wrote: Because one can slice out a multi-code-unit code point, one cannot access it via index. Strings would be horribly crippled without slicing. Without indexing, they are fine. A possibility is to allow index, but actually decode the code point at that index (error on invalid index). That might actually be the correct mechanism. In order to be correct, both require exactly the same knowledge: The beginning of a code point, followed by the end of a code point. In the indexing case they just happen to be the same code-point and happen to be one code unit from each other. I don't see how one is any more or less errror-prone or fundamentally wrong than the other. Using indexing, you simply cannot get the single code unit that represents a multi-code-unit code point. It doesn't fit in a char. It's guaranteed to fail, whereas slicing will give you access to the all the data in the string. I think I understand your motivation now. Indexing never provides anything that slicing doesn't do more generally. Now, with indexing actually decoding a code point, one can alias a[i] to a[i..$].front(), which means decode the first code point you come to at index i. This means indexing is slow(er), and returns a dchar. I think as a first step, that might be too much to add silently. I'd rather break it first, then add it back later. -Steve Of course that i has to be at the beginning of a code-point. Doesn't seem like that useful a feature and potentially very confusing for people who naively expect normal indexing.
Re: Maybe in D3...
On Monday, 10 March 2014 at 14:50:27 UTC, Vladimir Panteleev wrote: From time to time, there are discussions concerning ideas which would impact the language, as it is now, too drastically to be implemented (it would break too much code or require a significant reengineering effort). These discussions get lost, which is regrettable since some of the discussions sometimes produce genuinely great ideas. Although there is no D3 on the horizon, I think it would be nice to keep track of these ideas anyway. http://wiki.dlang.org/Language_issues I imagine that someone else could write it better than I, but having to explicitly break out of safe, pure, nothrow, etc. should be the default, rather than the reverse. Of course, no option has to be language breaking. If two alternate implementations are incompatible, you can add a version/feature flag to the compiler and deprecate the older versions over time. Releasing a new version of the compiler which breaks everything ever made is bad, but if people have 12 months and a working compiler for both versions, moving over to the new expectations isn't unreasonable.
Re: Proposal for fixing dchar ranges
On Monday, 10 March 2014 at 19:48:34 UTC, H. S. Teoh wrote: On Mon, Mar 10, 2014 at 07:49:04PM +0100, Johannes Pfau wrote: Am Mon, 10 Mar 2014 11:30:07 -0700 schrieb Walter Bright newshou...@digitalmars.com: On 3/10/2014 6:35 AM, Steven Schveighoffer wrote: An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. Proposals to make a string class for D have come up many times. I have a kneejerk dislike for it. It's a really strong feature for D to have strings be an array type, and I'll go to great lengths to keep it that way. I'm on the fence about this one. The nice thing about strings being an array type, is that it is a familiar concept to C coders, and it allows array slicing for extracting substrings, etc., which fits nicely with the C view of strings as character arrays. As a C coder myself, I like it this way too. But the bad thing about strings being an array type, is that it's a holdover from C, and it allows slicing for extracting substrings -- malformed substrings by permitting slicing a multibyte (multiword) character. Basically, the nice aspects of strings being arrays only apply when you're dealing with ASCII (or mostly-ASCII) strings. These very same nice aspects turn into problems when dealing with anything non-ASCII. The only way the user can get it right using only array operations, is if they understand the whole of Unicode in their head and are willing to reinvent Unicode algorithms every time they slice a string or do some operation on it. Since D purportedly supports Unicode by default, it shouldn't be this way. D should *actually* support Unicode all the way -- use proper Unicode algorithms for substring extraction, collation, line-breaking, normalization, etc.. Being a systems language, of course, means that D should allow you to get under the hood and do things directly with the raw string representation -- but this shouldn't be the *default* modus operandi. The default should be a properly-encapsulated string type with Unicode algorithms to operate on it (with the option of reaching into the raw representation where necessary). You started off on the fence, but you seem pretty convinced by the end!
Re: Major performance problem with std.array.front()
On Monday, 10 March 2014 at 18:54:26 UTC, Marc Schütz wrote: On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote: My app deals with unicode arabic text that is 'out there', and the UnicodeTM support for Arabic is not that well thought out, so the data is often (always) inconsistent in terms of sequencing diacritics etc. Even the code page can vary. Therefore my code has to cater to various ways that other developers have sequenced the code points. So, my needs as a 'user' are: * I want to encode all incoming data immediately into unicode, usually UTF8, if isn't already. * I want to iterate over code points. I don't care about the raw data. * When I get the length of my string it should be the number of code points. * When I index my string it should return the nth code point. * When I manipulate my strings I want to work with code points ... you get the drift. Are you sure that code points is what you want? AFAIK there are lots of diacritics in Arabic, and I believe they are not precomposed with their carrying letters... I checked the terminology before posting so I'm pretty sure. Arabic has a code page for the logical characters, one code point for each letter of the alphabet and others for various diacritics. Because of the 'shaping' each logical character has various glyphs, found on other code pages. Text editing programs tend to store typed Arabic as the user entered it, and because there can be more than one diacritic per alphabetic letter the sequence varies as to how the user sequenced them.
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 16:52:27 -0400, Walter Bright newshou...@digitalmars.com wrote: On 3/10/2014 1:36 PM, Steven Schveighoffer wrote: What strings are already is a user-defined type, No, they are not. The functionality added via phobos can hardly be considered extraneous. One would not use strings without the library. but with horrible enforcement. With no enforcement, and that is by design. The enforcement is opt-in. That is, you have to use phobos' templates in order to use them properly: auto getIt(R)(R r, size_t idx) { if(idx r.length) return r[idx]; } The above compiles fine for strings. However, it does not compile fine if you do: auto getIt(R)(R r, size_t idx) if(hasLength!R isRandomAccessRange!R) Any other range will fail to compile for the more strict version and the simple implementation without template constraints. In other words, the compiler doesn't believe the same thing phobos does. shooting one's foot is quite easy. Keep in mind that D is a systems programming language, and that means unfettered access to strings. Access is fine, with clear intentions. And we do not have unfettered access. I cannot sort a mutable string of ASCII characters without first converting it to ubyte[]. What in my proposal makes you think you don't have unfettered access? The underlying immutable(char)[] representation is accessible. In fact, you would have more access, since phobos functions would then work with a char[] like it's a proper array. -Steve
Re: Major performance problem with std.array.front()
On Monday, 10 March 2014 at 18:54:26 UTC, Marc Schütz wrote: On Monday, 10 March 2014 at 13:48:44 UTC, Abdulhaq wrote: My app deals with unicode arabic text that is 'out there', and the UnicodeTM support for Arabic is not that well thought out, so the data is often (always) inconsistent in terms of sequencing diacritics etc. Even the code page can vary. Therefore my code has to cater to various ways that other developers have sequenced the code points. So, my needs as a 'user' are: * I want to encode all incoming data immediately into unicode, usually UTF8, if isn't already. * I want to iterate over code points. I don't care about the raw data. * When I get the length of my string it should be the number of code points. * When I index my string it should return the nth code point. * When I manipulate my strings I want to work with code points ... you get the drift. Are you sure that code points is what you want? AFAIK there are lots of diacritics in Arabic, and I believe they are not precomposed with their carrying letters... Adding to my other comment I don't expect a string type to understand arabic and merge the diacritics for me. In fact there are other symbols (code points) that can also be present, for instance instructions on how Quranic text is to be read. These issues have not been standardised and I would say are not well understood generally.
Re: Proposal for fixing dchar ranges
On Mon, 10 Mar 2014 16:54:34 -0400, John Colvin john.loughran.col...@gmail.com wrote: Of course that i has to be at the beginning of a code-point. Doesn't seem like that useful a feature and potentially very confusing for people who naively expect normal indexing. What it would do is remove the confusion of is(typeof(r.front) != typeof(r[0])) Naivety is to be expected when you have made your C-derived language's default string type an encoded UTF8 array called char[]. It doesn't magically make D programs UTF aware. I would suggest that a lofty goal is for the string type to be completely safe, and efficient, and only allow raw access via the .representation member. But I don't think, given the current code base, that we can achieve that in one proposal. It has to be gradual. This is a first step. -Steve
Re: Major performance problem with std.array.front()
On 3/7/2014 8:40 AM, Michel Fortin wrote: On 2014-03-07 03:59:55 +, bearophile bearophileh...@lycos.com said: Walter Bright: I understand this all too well. (Note that we currently have a different silent problem: unnoticed large performance problems.) On the other hand your change could introduce Unicode-related bugs in future code (that the current Phobos avoids) (and here I am not talking about code breakage). The way Phobos works isn't any more correct than dealing with code units. Many graphemes span on multiple code points -- because of combined diacritics or character variant modifiers -- and decoding at the code-point level is thus often insufficient for correctness. Well, it is *more* correct, as many western languages are more likely in current Phobos to just work in most cases. It's just that things still aren't completely correct overall. From my experience, I'd suggest these basic operations for a string range instead of the regular range interface: .empty .frontCodeUnit .frontCodePoint .frontGrapheme .popFrontCodeUnit .popFrontCodePoint .popFrontGrapheme .codeUnitLength (aka length) .codePointLength (for dchar[] only) .codePointLengthLinear .graphemeLengthLinear Someone should be able to mix all the three 'front' and 'pop' function variants above in any code dealing with a string type. In my XML parser for instance I regularly use frontCodeUnit to avoid the decoding penalty when matching the next character with an ASCII one such as '' or ''. An API like the one above forces you to be aware of the level you're working on, making bugs and inefficiencies stand out (as long as you're familiar with each representation). If someone wants to use a generic array/range algorithm with a string, my opinion is that he should have to wrap it in a range type that maps front and popFront to one of the above variant. Having to do that should make it obvious that there's an inefficiency there, as you're using an algorithm that wasn't tailored to work with strings and that more decoding than strictly necessary is being done. I actually like this suggestion quite a bit.
Re: Proposal for fixing dchar ranges
On Monday, 10 March 2014 at 13:35:33 UTC, Steven Schveighoffer wrote: I proposed this inside the long major performance problem with std.array.front, I've also proposed it before, a long time ago. But seems to be getting no attention buried in that thread, not even negative attention :) An idea to fix the whole problems I see with char[] being treated specially by phobos: introduce an actual string type, with char[] as backing, that is a dchar range, that actually dictates the rules we want. Then, make the compiler use this type for literals. e.g.: struct string { immutable(char)[] representation; this(char[] data) { representation = data;} ... // dchar range primitives } Then, a char[] array is simply an array of char[]. points: 1. No more issues with foreach(c; cassé), it iterates via dchar 2. No more issues with cassé[4], it is a static compiler error. 3. No more awkward ASCII manipulation using ubyte[]. 4. No more phobos schizophrenia saying char[] is not an array. 5. No more special casing char[] array templates to fool the compiler. 6. Any other special rules we come up with can be dictated by the library, and not ignored by the compiler. Note, std.algorithm.copy(string1, mutablestring) will still decode/encode, but it's more explicit. It's EXPLICITLY a dchar range. Use std.algorithm.copy(string1.representation, mutablestring.representation) will avoid the issues. I imagine only code that is currently UTF ignorant will break, and that code is easily 'fixed' by adding the 'representation' qualifier. -Steve just to check I understand this fully: in this new scheme, what would this do? auto s = cassé.representation; foreach(i, c; s) write(i, ':', c, ' '); writeln(s); Currently - without the .representation - I get 0:c 1:a 2:s 3:s 4:e 5:̠6:` cassé or, to spell it out a bit more: 0:c 1:a 2:s 3:s 4:e 5:xCC 6:x81 cassé
Re: Proposal for fixing dchar ranges
On Monday, 10 March 2014 at 18:13:14 UTC, Steven Schveighoffer wrote: Indexing is rarely a feature one needs or should use, especially with encoded strings. If I was writing something like a chat or terminal window, I would want to be able to jump to chunks of text based on some sort of buffer length, then search for actual character boundaries. Similarly, if I was indexing text, I don't care what the underlying data is just whether any particular set of n-bytes have been seen together among some document. For the latter case, I don't need to be able to interpret the data as text while indexing, but once I perform an actual search and want to jump the user to that line in the file, being able to take a byte offset that I had stored in the index and convert that to a textual position would be good. I do think that D should have something like alias String8 = UTF!char; alias String16 = UTF!wchar; alias String32 = UTF!dchar; And that those sit on top of an underlying immutable(xchar)[] buffer, providing variants of things like foreach and length based on code-point or grapheme boundaries. But I don't think there's any value in reinterpretting string. Not being a struct or an object, it doesn't have the extensibility to be useful for all the variations of access that working with Unicode and the underlying bytes warrants.
Re: Proposal for fixing dchar ranges
On 3/10/2014 2:09 PM, Steven Schveighoffer wrote: What in my proposal makes you think you don't have unfettered access? The underlying immutable(char)[] representation is accessible. In fact, you would have more access, since phobos functions would then work with a char[] like it's a proper array. You divide the D world into two camps - those that use 'struct string', and those that use immutable(char)[] strings. I imagine only code that is currently UTF ignorant will break, This also makes it a non-starter.