Re: Why I won't fix phobos bugs anymore
On Friday, 3 June 2016 at 05:18:49 UTC, Patrick Schluter wrote: On Thursday, 2 June 2016 at 20:20:58 UTC, Andrei Alexandrescu wrote: On 06/02/2016 03:41 PM, Basile B. wrote: Yesterday I've took the decision not to propose anymore PR for Phobos bugfixes, even if most of the time it's easy. 1) It can take up to 2 or 3 weeks until a "phobos bugfix" get merged. Even a straight one. 2) Once a pr gets the label "@andrei". It basically means that "it's dead". Also meant to add: email should help. I am currently nursing 3-4 emails that are phobos-related among some 50 other important and urgent emails. There's some Midas effect - every day I wake up thinking "I'll halve my inbox today" and by the evening I still have 50 emails. It's maddening. But please don't take offense. Do email and I'll get to your work. Thanks! That would be Sisyphus, Midas was the king that starved because everything he touches becomes gold. Kinda silly, right? He could have just easily paid someone to feed him with all the gold he made, right? Also, Sisyphus must not have been too crafty! If he spend all that time digging out the hill then it would have been lower in gravity and he wouldn't have to carry it for eternity... just give it a nudge and it would roll down. Hen he could use all that dirt to build a barricade to keep the boulder from rolling away(unless it could magically go through the dirt).
Re: Why I won't fix phobos bugs anymore
On Thursday, 2 June 2016 at 20:20:58 UTC, Andrei Alexandrescu wrote: On 06/02/2016 03:41 PM, Basile B. wrote: Yesterday I've took the decision not to propose anymore PR for Phobos bugfixes, even if most of the time it's easy. 1) It can take up to 2 or 3 weeks until a "phobos bugfix" get merged. Even a straight one. 2) Once a pr gets the label "@andrei". It basically means that "it's dead". Also meant to add: email should help. I am currently nursing 3-4 emails that are phobos-related among some 50 other important and urgent emails. There's some Midas effect - every day I wake up thinking "I'll halve my inbox today" and by the evening I still have 50 emails. It's maddening. But please don't take offense. Do email and I'll get to your work. Thanks! That would be Sisyphus, Midas was the king that starved because everything he touches becomes gold.
Re: Phobos needs a (part-time) maintainer
On Thursday, 2 June 2016 at 21:04:46 UTC, qznc wrote: On Thursday, 2 June 2016 at 20:59:52 UTC, Basile B. wrote: Eventually I'll come back to bugfix if they take Jake, but not you Seb. For a reason or another I don't like you wilzbach. You are frustrated. I get that. Don't make this personal for others, please. Maybe you should ignore this thread for today? My POV is that it's easy to fix some phobos bug. But to get the easy fixes merged is a PITA. PRs that fixe a bug are hard to get merged. Why ? I don't know. Sometime we have to act like a bully to get a PR merged and that's not normal. Perso I give up.
Re: Free the DMD backend
On Thursday, 2 June 2016 at 18:16:33 UTC, Basile B. wrote: It's also that LDC is at front end 2.070 and GDC 2.067 ;););) GDC is actively maintained and it would have the latest features if more developers come, what would happen if it would be the reference compiler.
Re: D's Auto Decoding and You
On Thursday, 2 June 2016 at 21:33:02 UTC, Andrei Alexandrescu wrote: Should I assume some normalization occurred on the way? I'm just looking over std.uni's section on normalization and realizing that I had basically no idea what it is or what's going on. The wikipedia page on unicode equivalence is a bit clearer. I'm definitely nowhere near qualified to have an opinion on these issues.
Re: [OT] Things I like about Andrei
On 03/06/2016 2:17 PM, Adam D. Ruppe wrote: A lot of us, myself included, have been very critical of Andrei lately but I want to list of the excellent work he has done over the years: First, early D was very different to D of today. Andrei changed that, for the better. He's a genius of innovation with templates and good at getting to the bottom of generic code. The Range concept is excellent, the logical extension of iterators like slices are to pointers, and std.algorithm is generally brilliant. Many of the patterns we take for granted in D, from templates in general to conversion and literals on top of them, to ranges and algorithms, were principally designed and implemented by Andrei. std.experimental.allocator is very well done and the Design by Introspection is not just smart insight to the generic programming problem, but actually explored and explained in such a way that we can hook onto it. His talks and writing are amusing and informative, and his dedication unquestionable. Andrei Alexandrescu is a very good, innovative programmer and writer who invents and explains things that others can't even consider. We're lucky to have him with us! Wooow, go Andrei!
Re: [OT] Things I like about Andrei
He's also very good looking!! That makes a difference! ;)
Re: D's Auto Decoding and You
On Thursday, 2 June 2016 at 21:31:39 UTC, Jack Stouffer wrote: On Thursday, 2 June 2016 at 21:21:50 UTC, jmh530 wrote: I was a little confused by something in the main autodecoding thread, so I read your article again. Unfortunately, I don't think my confusion is resolved. I was trying one of your examples (full code I used below). You claim it works, but I keep getting assertion failures. I'm just running it with rdmd on Windows 7. import std.algorithm : canFind; void main() { string s = "cassé"; assert(s.canFind!(x => x == 'é')); } Your browser is turning the é in the string into two code points via normalization whereas it should be one. Try using \u00E9 instead. That doesn't cause an assert to fail, but when I do writeln('\u00E9') I get ├⌐. So there might still be something wonky going on. I looked up \u00E9 online and I don't think there's an error with that.
Re: [OT] Things I like about Andrei
On Friday, 3 June 2016 at 02:17:51 UTC, Adam D. Ruppe wrote: A lot of us, myself included, have been very critical of Andrei lately but I want to list of the excellent work he has done over the years: First, early D was very different to D of today. Andrei changed that, for the better. He's a genius of innovation with templates and good at getting to the bottom of generic code. The Range concept is excellent, the logical extension of iterators like slices are to pointers, and std.algorithm is generally brilliant. Many of the patterns we take for granted in D, from templates in general to conversion and literals on top of them, to ranges and algorithms, were principally designed and implemented by Andrei. std.experimental.allocator is very well done and the Design by Introspection is not just smart insight to the generic programming problem, but actually explored and explained in such a way that we can hook onto it. His talks and writing are amusing and informative, and his dedication unquestionable. Andrei Alexandrescu is a very good, innovative programmer and writer who invents and explains things that others can't even consider. We're lucky to have him with us! +1. I've been following D since the (dead tree) Dr. Dobbs article I found in a supermarket a decade ago, and it's been amazing to watch it grow since his participation. Even concepts that Walter used to swear off, like templates, have become not just bearable but a legitimately kick-ass feature thanks to Andrei's help. D owes him a lot! -Jon
Re: Lifetime tracking
On Friday, 3 June 2016 at 00:40:09 UTC, Stefan Koch wrote: On Friday, 3 June 2016 at 00:31:31 UTC, Walter Bright wrote: If they cover the cases that matter, it's good. Rust has the type system annotations you want, but Rust has a reputation for being difficult to write code for. I think we can incorporate typesafe borrowing without making it difficult to write. +1, a big problem with Rust is just that the syntax is really ugly to those coming from D/C/C++/Java. An idea I had was using plain English attributes in function signatures to denote ownership. e.g. void myfunc(sees Myobj arg1, copies Myobj arg2, borrows Myobj arg3, owns Myobj arg4) { //"sees arg1" - read-only reference (basically const now, but cannot be //cast away) //"copies arg2" - read/write copy of argument. It works the same way //value types work now, and will be freed after function exit (unless it is //returned). //"borrows arg3" is a by-reference pass, may have the benefit of enabling //optimization for small functions since it eliminates a copy // (maybe save a stack push and allow register re-use?). Will not be freed //after function exits (ownership returns to calling // function). Reference can be locked for multi-threaded apps. //"owns arg4" - frees after function exit (unless it is returned). } At a glance it's obvious who owns what, what's read-only, etc. Also, a nice bonus is that "const" can become a more rigid guarantee - as in Rust, there can exist multiple const references to an object, but only one mutable reference. Immutable or const by default is probably a bridge too far from what we're used to. There are still a lot of corner-cases that I'd have to think through, i.e. calling class methods through a const/"sees" reference (would have to be "pure" calls only), good syntax for ownership changes mid-function (maybe use "sees" "copies" "borrows" and "owns" as operators?), passing to C functions, mangling, etc. Anyhow, just some brainstorming to stir discussion. It looks pleasant to me, but I'm not sure if you can call it "D" anymore. -Jon
[OT] Things I like about Andrei
A lot of us, myself included, have been very critical of Andrei lately but I want to list of the excellent work he has done over the years: First, early D was very different to D of today. Andrei changed that, for the better. He's a genius of innovation with templates and good at getting to the bottom of generic code. The Range concept is excellent, the logical extension of iterators like slices are to pointers, and std.algorithm is generally brilliant. Many of the patterns we take for granted in D, from templates in general to conversion and literals on top of them, to ranges and algorithms, were principally designed and implemented by Andrei. std.experimental.allocator is very well done and the Design by Introspection is not just smart insight to the generic programming problem, but actually explored and explained in such a way that we can hook onto it. His talks and writing are amusing and informative, and his dedication unquestionable. Andrei Alexandrescu is a very good, innovative programmer and writer who invents and explains things that others can't even consider. We're lucky to have him with us!
Re: Broken links continue to exist on major pages on dlang.org
On Thursday, 2 June 2016 at 20:34:24 UTC, Andrei Alexandrescu wrote: Interestingly it came as encouraging and empowering some fledgling work that had compelling things going for it (including but not limited to enthusiastic receipt in this forum), which ironically is exactly what you just asked for. Yes, indeed, it was a good first (and second) step. But further steps are necessary too in order to finish a project. Here's what would have been ideal to me: 1) Someone writes a cool thing. 2) We encourage further exploration and see interest. 3) After deciding there's serious potential, we decide on the end goal, a timeframe, and set the conditions of success. For example: ddox becomes the official documentation generator at the end of the year if there are no major bugs remaining open. 4) We put it on the website and work toward the goal, with all the teams - Phobos, dlang.org, RejectedSoftware, etc., understanding their role. 5) When the goal deadline arrives, if it passes the major bug test, it goes live and we are committed to it going forward. Why this order? First, someone writing the cool thing means we actually have something to sink our teeth into and a de facto champion in the original author. Second, we need to incubate this work and not discourage the author. ddox got a decent go up to here. But then we need to decide what's next - a clear goal, including a due date, gets us all aligned and removes a lot of the uncertainty on the author's side; it is some reassurance that they aren't wasting their time, and encourages outside teams to get onboard. That leads directly into step four, and then step five actually proves that the others were not in vain.
Re: Unicode Normalization (and graphemes and locales)
On Friday, 3 June 2016 at 00:14:13 UTC, Walter Bright wrote: 5. Normalization, graphemes, and locales should all be explicitly opt-in with corresponding library code. Add decoding to that list and we're right there with you. 7. At some point, as the threads on autodecode amply illustrate, working with level 2 or level 3 Unicode requires a certain level of understanding on the part of the programmer writing the code, because there simply is no overarching correct way to do things. The programmer is going to have to understand what he is trying to accomplish with Unicode and select the code/algorithms accordingly. Working at any level of Unicode in a systems programming language requires knowledge of Unicode. The thing is, because D is a systems language, we can't have the default behavior to decode to grapheme clusters, and because of that, we have to have everything be opt-in, because everything else is fundamentally wrong on some level. Once you step out of scripting language land, you can't get around requiring Unicode knowledge. Like I said in my blog, Unicode is hard. Trying to hide Unicode specifics helps no one because it's going to bite you in the ass eventually.
Re: Lifetime tracking
On Friday, 3 June 2016 at 00:31:31 UTC, Walter Bright wrote: If they cover the cases that matter, it's good. Rust has the type system annotations you want, but Rust has a reputation for being difficult to write code for. I think we can incorporate typesafe borrowing without making it difficult to write.
Re: Lifetime tracking
On 6/2/2016 5:21 PM, Walter Bright wrote: Please give an example. I see you did, so ignore that.
Re: Lifetime tracking
On 6/2/2016 4:29 PM, Timon Gehr wrote: // need to know that lifetime of a ends not after lifetime of b void assign(S,T)(ref S a, T b){ a = b; } void foo(scope int* k){ void bar(){ scope int* x; // need to check that lifetime of x ends not after lifetime of k assign(x,k); It'll fail to compile because T is not annotated with 'scope'. Annotating T with scope will then fail to compile because the assignment to 'a' may outlive 'b'. } } > Note that it is tempting to come up with ad-hoc solutions that make some small finite set of examples work. If they cover the cases that matter, it's good. Rust has the type system annotations you want, but Rust has a reputation for being difficult to write code for.
Re: Lifetime tracking
On 6/2/2016 4:05 PM, Timon Gehr wrote: I'd like to point out again why that design is inadequate: Whenever the type checker is using a certain piece of information to check validity of a program, there should be a way to pass that kind of information across function boundaries. Otherwise the type system is not modular. This is a serious defect. I don't understand where the defect is. Please give an example.
Re: The Case Against Autodecode
On 6/2/2016 3:27 PM, John Colvin wrote: I wonder what rationale there is for Unicode to have two different sequences of codepoints be treated as the same. It's madness. There are languages that make heavy use of diacritics, often several on a single "character". Hebrew is a good example. Should there be only one valid ordering of any given set of diacritics on any given character? I didn't say ordering, I said there should be no such thing as "normalization" in Unicode, where two codepoints are considered to be identical to some other codepoint.
Re: The Case Against Autodecode
On 6/2/2016 2:25 PM, deadalnix wrote: On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote: I wonder what rationale there is for Unicode to have two different sequences of codepoints be treated as the same. It's madness. To be able to convert back and forth from/to unicode in a lossless manner. Sorry, that makes no sense, as it is saying "they're the same, only different."
Unicode Normalization (and graphemes and locales)
On 6/2/2016 4:29 PM, Jonathan M Davis via Digitalmars-d wrote: > How do you suggest that we handle the normalization issue? Should we just > assume NFC like std.uni.normalize does and provide an optional template > argument to indicate a different normalization (like normalize does)? Since > without providing a way to deal with the normalization, we're not actually > making the code fully correct, just faster. The short answer is, we don't. 1. D is a systems programming language. Baking normalization, graphemes and Unicode locales in at a low level will have a disastrous negative effect on performance and size. 2. Very little systems programming work requires level 2 or 3 Unicode support. 3. Are they needed? Pedantically, yes. Practically, not necessarily. 4. What we must do is, for each algorithm, document how it handles Unicode. 5. Normalization, graphemes, and locales should all be explicitly opt-in with corresponding library code. Normalization: s.normalize.algorithm() Graphemes: may require separate algorithms, maybe std.grapheme? Locales: I have no idea, given that I have not studied that issue 6. std.string has many analogues for std.algorithms that are specific to the peculiarities of strings. I think this is a perfectly acceptable approach. For example, there are many ways to sort Unicode strings, and many of them do not fit in with std.algorithm.sort's ways. Having special std.string.sort's for them would be the most practical solution. 7. At some point, as the threads on autodecode amply illustrate, working with level 2 or level 3 Unicode requires a certain level of understanding on the part of the programmer writing the code, because there simply is no overarching correct way to do things. The programmer is going to have to understand what he is trying to accomplish with Unicode and select the code/algorithms accordingly.
Re: Areas of D usage
On Thursday, 2 June 2016 at 21:47:13 UTC, qznc wrote: On Thursday, 2 June 2016 at 13:59:13 UTC, Seb wrote: If I left out an area or you miss an application/usage - please let me know! The Javascript JIT Compiler Higgs: https://github.com/higgsjs/Higgs Vibe.d needs some examples. Looks like their website does not have any. Wasn't too many clicks away to get to the tutorial on building a chat service.
Re: non empty slices
On Thursday, 2 June 2016 at 23:44:49 UTC, ag0aep6g wrote: On 06/03/2016 01:35 AM, ag0aep6g wrote: The alternative `peek` method is not documented to throw an exception, but it's not @nogc either. No idea why. Maybe Algebraic does GC allocations internally. I wouldn't know for what, though. Or it misses a @nogc somewhere. I've looked at the source to see if it's something simple, and Algebraic/VariantN seems to be terribly complicated. Writing a simpler @nogc tagged union may be easier than fixing the phobos one, if the phobos one can even be made @nogc. I'm also inside the source... yes, its not a simple one. I think I will try to write my own one...
Re: The Case Against Autodecode
On 6/2/2016 4:29 PM, Jonathan M Davis via Digitalmars-d wrote: How do you suggest that we handle the normalization issue? Started a new thread for that one.
[Issue 14403] DDox: std.algorithm index links are 404
https://issues.dlang.org/show_bug.cgi?id=14403 --- Comment #3 from github-bugzi...@puremagic.com --- Commit pushed to master at https://github.com/dlang/dlang.org https://github.com/dlang/dlang.org/commit/8e12e01f388097bc947ef8e7ace1fef5926b3521 Merge pull request #1322 from s-ludwig/master Fix formatting of (M)REF_ALTTEXT. See issue 14403. --
Re: Lifetime tracking
On Thursday, 2 June 2016 at 23:29:57 UTC, Timon Gehr wrote: On 03.06.2016 01:12, tsbockman wrote: On Thursday, 2 June 2016 at 23:05:40 UTC, Timon Gehr wrote: Whenever the type checker is using a certain piece of information to check validity of a program, there should be a way to pass that kind of information across function boundaries. Otherwise the type system is not modular. This is a serious defect. Would you mind giving a brief example of how that applies to `scope`? (I'm asking for my own education; I have no personal opinion as to the right implementation at the moment.) The simplest example is this [1]: Thanks for the explanation.
Re: Lifetime tracking
On 03.06.2016 01:29, Timon Gehr wrote: [1] It might be possible to get that example to pass the type checker with 'return' annotations only if I change 'ref' to 'out', but often more than two lifetimes are involved, and then it falls flat on its face. To be slightly more explicit: void multiAssign(A,B,C,D)(ref A a,B b,ref C c,D d){ a = b; c = d; }
Re: non empty slices
On Thursday, 2 June 2016 at 23:35:53 UTC, ag0aep6g wrote: It's the Algebraic. The `get` method isn't @nogc. The documentation [1] says that it may throw an exception, which is most probably being allocated through the GC. So that's a reason why it can't be @nogc. The alternative `peek` method is not documented to throw an exception, but it's not @nogc either. No idea why. Maybe Algebraic does GC allocations internally. I wouldn't know for what, though. Or it misses a @nogc somewhere. [1] http://dlang.org/phobos/std_variant#.VariantN.get Yeah... thanks a lot!
Re: non empty slices
On 06/03/2016 01:35 AM, ag0aep6g wrote: The alternative `peek` method is not documented to throw an exception, but it's not @nogc either. No idea why. Maybe Algebraic does GC allocations internally. I wouldn't know for what, though. Or it misses a @nogc somewhere. I've looked at the source to see if it's something simple, and Algebraic/VariantN seems to be terribly complicated. Writing a simpler @nogc tagged union may be easier than fixing the phobos one, if the phobos one can even be made @nogc.
Re: non empty slices
On 06/03/2016 01:17 AM, Alex wrote: But still, I can't mark the f-method @nogc, and this is not due to the writeln calls... why GC is invoked, although everything is known and no memory allocation should happen? It's the Algebraic. The `get` method isn't @nogc. The documentation [1] says that it may throw an exception, which is most probably being allocated through the GC. So that's a reason why it can't be @nogc. The alternative `peek` method is not documented to throw an exception, but it's not @nogc either. No idea why. Maybe Algebraic does GC allocations internally. I wouldn't know for what, though. Or it misses a @nogc somewhere. [1] http://dlang.org/phobos/std_variant#.VariantN.get
Re: Lifetime tracking
On 03.06.2016 01:12, tsbockman wrote: On Thursday, 2 June 2016 at 23:05:40 UTC, Timon Gehr wrote: Whenever the type checker is using a certain piece of information to check validity of a program, there should be a way to pass that kind of information across function boundaries. Otherwise the type system is not modular. This is a serious defect. Would you mind giving a brief example of how that applies to `scope`? (I'm asking for my own education; I have no personal opinion as to the right implementation at the moment.) The simplest example is this [1]: void foo(scope int* k){ void bar(){ scope int* x; x = k; // ok: lifetime of x ends not after lifetime of k } } Now we factor out the assignment: // need to know that lifetime of a ends not after lifetime of b void assign(S,T)(ref S a, T b){ a = b; } void foo(scope int* k){ void bar(){ scope int* x; // need to check that lifetime of x ends not after lifetime of k assign(x,k); } } I.e. now we need a way to annotate 'assign' in order to specify the contract I have written down in the comments. Note that it is tempting to come up with ad-hoc solutions that make some small finite set of examples work. This is not how well-designed type systems usually come about. You need to think about what information the type checker requires, and how to pass it across function boundaries (i.e. how to encode that information in types). Transfer of information must be lossless, so one should resist the temptation to use more information than can be passed across function boundaries in case it is accidentally available. [1] It might be possible to get that example to pass the type checker with 'return' annotations only if I change 'ref' to 'out', but often more than two lifetimes are involved, and then it falls flat on its face.
Re: The Case Against Autodecode
On Thursday, June 02, 2016 15:48:03 Walter Bright via Digitalmars-d wrote: > On 6/2/2016 3:23 PM, Andrei Alexandrescu wrote: > > On 06/02/2016 05:58 PM, Walter Bright wrote: > >> > * s.balancedParens('〈', '〉') works only with autodecoding. > >> > * s.canFind('ö') works only with autodecoding. It returns always > >> > >> false without. > >> > >> Can be made to work without autodecoding. > > > > By special casing? Perhaps. > > The argument to canFind() can be detected as not being a char, then decoded > into a sequence of char's, then forwarded to a substring search. How do you suggest that we handle the normalization issue? Should we just assume NFC like std.uni.normalize does and provide an optional template argument to indicate a different normalization (like normalize does)? Since without providing a way to deal with the normalization, we're not actually making the code fully correct, just faster. - Jonathan M Davis
Re: Creating a "fixed-range int" with opDispatch and/or alias this?
On Wednesday, 1 June 2016 at 19:59:51 UTC, Mark Isaacson wrote: I'm trying to create a type that for all intents and purposes behaves exactly like an int except that it limits its values to be within a certain range [a,b]. Theoretically, I would think this looks something like: ... It looks like opDispatch doesn't participate in resolution of operator overloads. Is there any way I can achieve my desired result? I know alias this forwards operations like +=, but with alias this I cannot wrap the operation to do the bounds checking. I think you would need to implement all of: * this(...) * opAssign(...) * opOpAssign(...) * opBinary(...) * opBinaryRight(...) * opUnary(...) FWIW, the fixed range int part of this question is just an example, I'm mostly just interested in whether this idea is possible without a lot of bloat/duplication. For a single type, I think the bloat is required. If you want to generate a lot of similar types, though, you could probably write a mixin template to generate the methods for you.
Re: The Case Against Autodecode
On Thursday, June 02, 2016 22:27:16 John Colvin via Digitalmars-d wrote: > On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote: > > I wonder what rationale there is for Unicode to have two > > different sequences of codepoints be treated as the same. It's > > madness. > > There are languages that make heavy use of diacritics, often > several on a single "character". Hebrew is a good example. Should > there be only one valid ordering of any given set of diacritics > on any given character? It's an interesting idea, but it's not > how things are. Yeah. I'm inclined to think that the fact that there are multiple normalizations was a huge mistake on the part of the Unicode folks, but we're stuck dealing with it. And as horrible as it is for most cases, maybe it _does_ ultimately make sense because of certain use cases; I don't know. But bad idea or not, we're stuck. :( - Jonathan M Davis
Re: The Case Against Autodecode
On Thursday, June 02, 2016 18:23:19 Andrei Alexandrescu via Digitalmars-d wrote: > On 06/02/2016 05:58 PM, Walter Bright wrote: > > On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote: > >> The lambda returns bool. -- Andrei > > > > Yes, I was wrong about that. But the point still stands with: > > > * s.balancedParens('〈', '〉') works only with autodecoding. > > > * s.canFind('ö') works only with autodecoding. It returns always > > > > false without. > > > > Can be made to work without autodecoding. > > By special casing? Perhaps. I seem to recall though that one major issue > with autodecoding was that it special-cases certain algorithms. So you'd > need to go through all of std.algorithm and make sure you can > special-case your way out of situations that work today. Yeah, I believe that you do have to do some special casing, though it would be special casing on ranges of code units in general and not strings specifically, and a lot of those functions are already special cased on string in an attempt be efficient. In particular, with a function like find or canFind, you'd take the needle and encode it to match the haystack it was passed so that you can do the comparisons via code units. So, you incur the encoding cost once when encoding the needle rather than incurring the decoding cost of each code point or grapheme as you iterate over the haystack. So, you end up with something that's correct and efficient. It's also much friendlier to code that only operates on ASCII. The one issue that I'm not quite sure how we'd handle in that case is normalization (which auto-decoding doesn't handle either), since you'd need to normalize the needle to match the haystack (which also assumes that the haystack was already normalized). Certainly, it's the sort of thing that makes it so that you kind of wish you were dealing with a string type that had the normalization built into it rather than either an array of code units or an arbitrary range of code units. But maybe we could assume the NFC normalization like std.uni.normalize does and provide an optional template argument for the normalization scheme. In any case, while it's not entirely straightforward, it is quite possible to write some algorithms in a way which works on arbitrary ranges of code units and deals with Unicode correctly without auto-decoding or requiring that the user convert it to a range of code points or graphemes in order to properly handle the full range of Unicode. And even if we keep auto-decoding, we pretty much need to fix it so that std.algorithm and friends are Unicode-aware in this manner so that ranges of code units work in general without requiring that you use byGrapheme. So, this sort of thing could have a large impact on RCStr, even if we keep auto-decoding for narrow strings. Other algorithms, however, can't be made to work automatically with Unicode - at least not with the current range paradigm. filter, for instance, really needs to operate on graphemes to filter on characters, but with a range of code units, that would mean operating on groups of code units as a single element, which you can't do with something like a range of char, since that essentially becomes a range of ranges. It has to be wrapped in a range that's going to provide graphemes - and of course, if you know that you're operating only on ASCII, then you wouldn't want to deal with graphemes anyway, so automatically converting to graphemes would be undesirable. So, for a function like filter, it really does have to be up to the programmer to indicate what level of Unicode they want to operate at. But if we don't make functions Unicode-aware where possible, then we're going to take a performance hit by essentially forcing everyone to use explicit ranges of code points or graphemes even when they should be unnecessary. So, I think that we're stuck with some level of special casing, but it would then be for ranges of code units and code points and not strings. So, it would work efficiently for stuff like RCStr, which the current scheme does not. I think that the reality of the matter is that regardless of whether we keep auto-decoding for narrow strings in place, we need to make Phobos operate on arbitrary ranges of code units and code points, since even stuff like RCStr won't work efficiently otherwise, and stuff like byCodeUnit won't be usuable in as many cases otherwise, because if a generic function isn't Unicode-aware, then in many cases, byCodeUnit will be very wrong, just like byCodePoint would be wrong. So, as far as Phobos goes, I'm not sure that the question of auto-decoding matters much for what we need to do at this point. If we do what we need to do, then Phobos will work whether we have auto-decoding or not (working in a Unicode-aware manner where possible and forcing the user to decide the correct level of Unicode to work at where not), and then it just becomes a question of whether we can or should deprecate auto-decoding once all that's done. -
Re: non empty slices
On Thursday, 2 June 2016 at 22:17:32 UTC, ag0aep6g wrote: Yeah, can't do it that way. You have only one f_impl call, but want it to go to different overloads based on dynamic information (caseS). That doesn't work. You need three different f_impl calls. You can generate them, so there's only one in the source, but it's a bit involved: sw: switch (caseS) { foreach (i, T; TL) { case i: f_impl(result.get!T); break sw; } default: assert(false); } Oh... wow... cool! :) But still, I can't mark the f-method @nogc, and this is not due to the writeln calls... why GC is invoked, although everything is known and no memory allocation should happen?
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 21:56:10 UTC, Walter Bright wrote: Yes, you have a good point. But we do allow things like: byte b; if (b == 1) ... Why allowing char/wchar/dchar comparisons is wrong: void main() { string s = "Привет"; foreach (c; s) assert(c != 'Ñ'); } From my post from 2014: http://forum.dlang.org/post/knrwiqxhlvqwxqshy...@forum.dlang.org
Re: Lifetime tracking
On Thursday, 2 June 2016 at 23:05:40 UTC, Timon Gehr wrote: On 03.06.2016 00:29, Walter Bright wrote: On 6/2/2016 3:10 PM, Marco Leise wrote: we haven't looked into borrowing/scoped enough That's my fault. As for scoped, the idea is to make scope work analogously to DIP25's 'return ref'. I don't believe we need borrowing, we've worked out another solution that will work for ref counting. Please do not reply to this in this thread - start a new one if you wish to continue with this topic. I'd like to point out again why that design is inadequate: Whenever the type checker is using a certain piece of information to check validity of a program, there should be a way to pass that kind of information across function boundaries. Otherwise the type system is not modular. This is a serious defect. Seconded.
Lifetime tracking
On 03.06.2016 00:29, Walter Bright wrote: On 6/2/2016 3:10 PM, Marco Leise wrote: we haven't looked into borrowing/scoped enough That's my fault. As for scoped, the idea is to make scope work analogously to DIP25's 'return ref'. I don't believe we need borrowing, we've worked out another solution that will work for ref counting. Please do not reply to this in this thread - start a new one if you wish to continue with this topic. I'd like to point out again why that design is inadequate: Whenever the type checker is using a certain piece of information to check validity of a program, there should be a way to pass that kind of information across function boundaries. Otherwise the type system is not modular. This is a serious defect.
Re: The Case Against Autodecode
On 03.06.2016 00:23, Andrei Alexandrescu wrote: On 06/02/2016 05:58 PM, Walter Bright wrote: On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote: The lambda returns bool. -- Andrei Yes, I was wrong about that. But the point still stands with: > * s.balancedParens('〈', '〉') works only with autodecoding. > * s.canFind('ö') works only with autodecoding. It returns always false without. Can be made to work without autodecoding. By special casing? Perhaps. I seem to recall though that one major issue with autodecoding was that it special-cases certain algorithms. The major issue is that it special cases when there's different, more natural semantics available.
Re: The Case Against Autodecode
On 03.06.2016 00:26, Walter Bright wrote: On 6/2/2016 3:11 PM, Timon Gehr wrote: Well, this is a somewhat different case, because 1 is just not representable as a byte. Every value that fits in a byte fits in an int though. It's different for code units. They are incompatible both ways. Not exactly. (c == 'ö') is always false for the same reason that (b == 1000) is always false. ... Yes. And _additionally_, some other concerns apply that are not there for byte vs. int. I.e. if b == 1 is disallowed, then c == d should be disallowed too, but b == 1 can be allowed even if c == d is disallowed. I'm not sure what the right answer is here. char to dchar is a lossy conversion, so it shouldn't happen. byte to int is a lossless conversion, so there is no problem a priori.
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 22:20:49 UTC, Walter Bright wrote: On 6/2/2016 2:05 PM, tsbockman wrote: Presumably if someone marks their own PR as "do not merge", it means they're planning to either close it themselves after it has served its purpose, or they plan to fix/finish it and then remove the "do not merge" label. That doesn't seem to apply here, either. Either way, they shouldn't be closed just because they say "do not merge" (unless they're abandoned or something, obviously). Something like that could not be merged until 132 other PRs are done to fix Phobos. It doesn't belong as a PR. I was just responding to the general question you posed about "do not merge" PRs, not really arguing for that one, in particular, to be re-opened. I'm sure @wilzbach is willing to explain if anyone cares to ask him why he did it as a PR, though.
Re: The Case Against Autodecode
On 6/2/2016 3:10 PM, Marco Leise wrote: we haven't looked into borrowing/scoped enough That's my fault. As for scoped, the idea is to make scope work analogously to DIP25's 'return ref'. I don't believe we need borrowing, we've worked out another solution that will work for ref counting. Please do not reply to this in this thread - start a new one if you wish to continue with this topic.
Re: Why does DMD on Debian need xdg-utils
On Thursday, 2 June 2016 at 21:32:28 UTC, Mathias Lang wrote: It shouldn't be necessary. I believe that is because of `dmd -man`, which open a web browser. That's an apt-d issue (in hopefully Jordi Sayol will read this) which prevents using this repository if your machine has no X (I guess you discovered that on a server, as I did). Yes. It also supports multiple versions side by side and also installs dub. You can find the source here: https://github.com/dlang/installer/blob/master/script/install.sh Example usage: # Install dmd 2.70.0 ~/dlang/install.sh install dmd-2.70.0 # Install dmd 2.69.0 ~/dlang/install.sh install dmd-2.69.0 # start using version 2.70.0 activate ~/dlang/dmd-2.70.0 # stop using version 2.70.0 deactivate # start using version 2.69.0 activate ~/dlang/dmd-2.69.0 # stop using version 2.69.0 deactivate # uninstall version 2.69.0 ~/dlang/install.sh uninstall dmd-2.69.0 # removes everything installed so far rm -rf ~/dlang # downloads (again) the install script and # installs the latest stable version of the compiler. curl -fsS https://dlang.org/install.sh | bash -s dmd Yes, it's a server. It's actually a linux branded SmartOS zone and the install script does not seem to work. I have always been using the .deb package and it's been working, I just didn't want xdg-utils and all the other stuff that comes with it.
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote: On 6/2/2016 12:34 PM, deadalnix wrote: On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote: Pretty much everything. Consider s and s1 string variables with possibly different encodings (UTF8/UTF16). * s.all!(c => c == 'ö') works only with autodecoding. It returns always false without. False. Many characters can be represented by different sequences of codepoints. For instance, ê can be ê as one codepoint or ^ as a modifier followed by e. ö is one such character. There are 3 levels of Unicode support. What Andrei is talking about is Level 1. http://unicode.org/reports/tr18/tr18-5.1.html I wonder what rationale there is for Unicode to have two different sequences of codepoints be treated as the same. It's madness. There are languages that make heavy use of diacritics, often several on a single "character". Hebrew is a good example. Should there be only one valid ordering of any given set of diacritics on any given character? It's an interesting idea, but it's not how things are.
Re: The Case Against Autodecode
On 6/2/2016 3:11 PM, Timon Gehr wrote: Well, this is a somewhat different case, because 1 is just not representable as a byte. Every value that fits in a byte fits in an int though. It's different for code units. They are incompatible both ways. Not exactly. (c == 'ö') is always false for the same reason that (b == 1000) is always false. I'm not sure what the right answer is here.
Re: The Case Against Autodecode
On 06/02/2016 05:58 PM, Walter Bright wrote: On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote: The lambda returns bool. -- Andrei Yes, I was wrong about that. But the point still stands with: > * s.balancedParens('〈', '〉') works only with autodecoding. > * s.canFind('ö') works only with autodecoding. It returns always false without. Can be made to work without autodecoding. By special casing? Perhaps. I seem to recall though that one major issue with autodecoding was that it special-cases certain algorithms. So you'd need to go through all of std.algorithm and make sure you can special-case your way out of situations that work today. Andrei
Re: non empty slices
On 06/02/2016 11:37 PM, Alex wrote: Just tried this instead of your f-function: void f(int[] arr) { A result; import std.meta; alias TL = AliasSeq!(Empty, int, Many!int); int caseS; switch (arr.length) { case 0: result = Empty.init; caseS = 0; break; case 1: result = arr[0]; caseS = 1; break; default: result = Many!int(arr); caseS = 2; } f_impl(*result.get!(TL[caseS])); } But got: Error: variable caseS cannot be read at compile time which is obviously true... Yeah, can't do it that way. You have only one f_impl call, but want it to go to different overloads based on dynamic information (caseS). That doesn't work. You need three different f_impl calls. You can generate them, so there's only one in the source, but it's a bit involved: sw: switch (caseS) { foreach (i, T; TL) { case i: f_impl(result.get!T); break sw; } default: assert(false); }
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 22:03:01 UTC, default0 wrote: *sigh* reading comprehension. ... Please do not take what I say out of context, thank you. Earlier you said: The level 2 support description noted that it should be opt-in because its slow. My main point is simply that you mischaracterized what the standard says. Making level 1 opt-in, rather than level 2, would be just as compliant as the reverse. The standard makes no suggestion as to which should be default.
Re: The Case Against Autodecode
Am Thu, 2 Jun 2016 15:05:44 -0400 schrieb Andrei Alexandrescu: > On 06/02/2016 01:54 PM, Marc Schütz wrote: > > Which practical tasks are made possible (and work _correctly_) if you > > decode to code points, that don't already work with code units? > > Pretty much everything. > > s.all!(c => c == 'ö') Andrei, your ignorance is really starting to grind on everyones nerves. If after 350 posts you still don't see why this is incorrect: s.any!(c => c == 'o'), you must be actively skipping the informational content of this thread. You are in error, no one agrees with you, and you refuse to see it and in the end we have to assume you will make a decisive vote against any PR with the intent to remove auto-decoding from Phobos. Your so called vocal minority is actually D's panel of Unicode experts who understand that auto-decoding is a false ally and should be on the deprecation track. Remember final-by-default? You promised, that your objection about breaking code means that D2 will only continue to be fixed in a backwards compatible way, be it the implementation of shared or whatever else. Yet months later you opened a thread with the title "inout must go". So that must have been an appeasement back then. People don't forget these things easily and RCStr seems to be a similar distraction, considering we haven't looked into borrowing/scoped enough and you promise wonders from it. -- Marco
Re: The Case Against Autodecode
On 02.06.2016 23:56, Walter Bright wrote: On 6/2/2016 1:12 PM, Timon Gehr wrote: ... It is not meaningful to compare utf-8 and utf-16 code units directly. Yes, you have a good point. But we do allow things like: byte b; if (b == 1) ... Well, this is a somewhat different case, because 1 is just not representable as a byte. Every value that fits in a byte fits in an int though. It's different for code units. They are incompatible both ways. E.g. dchar obviously does not fit in a char, and while the lower half of char is compatible with dchar, the upper half is specific to the encoding. dchar cannot represent upper half char code units. You get the code points with the corresponding values instead. E.g.: void main(){ import std.stdio,std.utf; foreach(dchar d;"ö".byCodeUnit) writeln(d); // "Ã", "¶" }
Re: Areas of D usage
On Thursday, 2 June 2016 at 21:47:13 UTC, qznc wrote: On Thursday, 2 June 2016 at 13:59:13 UTC, Seb wrote: If I left out an area or you miss an application/usage - please let me know! The Javascript JIT Compiler Higgs: https://github.com/higgsjs/Higgs Wow that's a great example! Vibe.d needs some examples. Looks like their website does not have any. I was also looking for public Vibe.d instances out there - does anyone know a large website using Vibe.d that we could quote?
Re: non empty slices
On 06/02/2016 10:11 PM, Alex wrote: The cool thing about the Algebraic is as I expected, that it doesn't change it's type... And the hard thing is, that I'm not used to its Empty, Many, ... things yet. I just made those up on the spot. Note that Many is not actually implemented at all. There is no check that the array has at least two elements. And Empty is just there, because I needed a type for the Algebraic. But the question remains how to keep this @nogc? Apparently, it's Algebraic that isn't @nogc. I don't know what it allocates. Maybe it allocates space for large types (but there aren't any here), or maybe it can throw a GC-allocated exception. I wonder at the line with peek... and why it is not just returning the value... I wouldn't expect that to be the problem with @nogc. As far as I see, the pointer is used as a way to return "not found" in the form of null. When you get a non-null pointer it's probably just into the Algebraic.
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 21:51:51 UTC, tsbockman wrote: On Thursday, 2 June 2016 at 21:38:02 UTC, default0 wrote: On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote: 1) It does not say that level 2 should be opt-in; it says that level 2 should be toggle-able. Nowhere does it say which of level 1 and 2 should be the default. 2) It says that working with graphemes is slower than UTF-16 code UNITS (level 1), but says nothing about streaming decoding of code POINTS (what we have). 3) That document is from 2000, and its claims about performance are surely extremely out-dated, anyway. Computers and the Unicode standard have both changed much since then. 1) Right because a special toggleable syntax is definitely not "opt-in". It is not "opt-in" unless it is toggled off by default. The only reason it doesn't talk about toggling in the level 1 section, is because that section is written with the assumption that many programs will *only* support level 1. *sigh* reading comprehension. Needing to write .byGrapheme or similar to enable the behaviour qualifies for what that description was arguing for. I hope you understand that now that I am repeating this for you. 2) Several people in this thread noted that working on graphemes is way slower (which makes sense, because its yet another processing you need to do after you decoded - therefore more work - therefore slower) than working on code points. And working on code points is way slower than working on code units (the actual level 1). Never claimed the opposite. Do note however that its specifically talking about UTF-16 code units. 3) Not an argument - doing more work makes code slower. What do you think I'm arguing for? It's not graphemes-by-default. Unrelated. I was refuting the point you made about the relevance of the performance claims of the unicode level 2 support description, not evaluating your hypothetical design. Please do not take what I say out of context, thank you.
Re: The Case Against Autodecode
On 02.06.2016 23:46, Andrei Alexandrescu wrote: On 6/2/16 5:43 PM, Timon Gehr wrote: .̂ ̪.̂ (Copy-paste it somewhere else, I think it might not be rendered correctly on the forum.) The point is that if I do: ".̂ ̪.̂".normalize!NFC.byGrapheme.findAmong([Grapheme("."),Grapheme(",")]) no match is returned. If I use your method with dchars, I will get spurious matches. I.e. the suggested method to look for punctuation symbols is incorrect: writeln(".̂ ̪.̂".findAmong(",.")); // ".̂ ̪.̂" Nice example. ... Thanks! :o) (Also, do you have an use case for this?) Count delimited words. Did you also look at balancedParens? Andrei On 02.06.2016 22:01, Timon Gehr wrote: * s.balancedParens('〈', '〉') works only with autodecoding. ... Doesn't work, e.g. s="⟨⃖". Shouldn't compile. assert("⟨⃖".normalize!NFC.byGrapheme.balancedParens(Grapheme("⟨"),Grapheme("⟩"))); writeln("⟨⃖".balancedParens('⟨','⟩')); // false
Re: The Case Against Autodecode
On 6/2/2016 1:27 PM, Andrei Alexandrescu wrote: The lambda returns bool. -- Andrei Yes, I was wrong about that. But the point still stands with: > * s.balancedParens('〈', '〉') works only with autodecoding. > * s.canFind('ö') works only with autodecoding. It returns always false without. Can be made to work without autodecoding.
Re: The Case Against Autodecode
On 6/2/2016 1:12 PM, Timon Gehr wrote: On 02.06.2016 22:07, Walter Bright wrote: On 6/2/2016 12:05 PM, Andrei Alexandrescu wrote: * s.all!(c => c == 'ö') works only with autodecoding. It returns always false without. The o is inferred as a wchar. The lamda then is inferred to return a wchar. No, the lambda returns a bool. Thanks for the correction. The algorithm can check that the input is char[], and is being tested against a wchar. Therefore, the algorithm can specialize to do the decoding itself. No autodecoding necessary, and it does the right thing. It still would not be the right thing. The lambda shouldn't compile. It is not meaningful to compare utf-8 and utf-16 code units directly. Yes, you have a good point. But we do allow things like: byte b; if (b == 1) ...
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 21:38:02 UTC, default0 wrote: On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote: 1) It does not say that level 2 should be opt-in; it says that level 2 should be toggle-able. Nowhere does it say which of level 1 and 2 should be the default. 2) It says that working with graphemes is slower than UTF-16 code UNITS (level 1), but says nothing about streaming decoding of code POINTS (what we have). 3) That document is from 2000, and its claims about performance are surely extremely out-dated, anyway. Computers and the Unicode standard have both changed much since then. 1) Right because a special toggleable syntax is definitely not "opt-in". It is not "opt-in" unless it is toggled off by default. The only reason it doesn't talk about toggling in the level 1 section, is because that section is written with the assumption that many programs will *only* support level 1. 2) Several people in this thread noted that working on graphemes is way slower (which makes sense, because its yet another processing you need to do after you decoded - therefore more work - therefore slower) than working on code points. And working on code points is way slower than working on code units (the actual level 1). 3) Not an argument - doing more work makes code slower. What do you think I'm arguing for? It's not graphemes-by-default. What I actually want to see: permanently deprecate the auto-decoding range primitives. Force the user to explicitly specify whichever of `by!dchar`, `byCodePoint`, or `byGrapheme` their specific algorithm actually needs. Removing the implicit conversions between `char`, `wchar`, and `dchar` would also be nice, but isn't really necessary I think. That would be a standards-compliant solution (one of several possible). What we have now is non-standard, at least going by the old version Walter linked.
Re: Areas of D usage
On Thursday, 2 June 2016 at 13:59:13 UTC, Seb wrote: If I left out an area or you miss an application/usage - please let me know! The Javascript JIT Compiler Higgs: https://github.com/higgsjs/Higgs Vibe.d needs some examples. Looks like their website does not have any.
Re: The Case Against Autodecode
On 6/2/16 5:43 PM, Timon Gehr wrote: .̂ ̪.̂ (Copy-paste it somewhere else, I think it might not be rendered correctly on the forum.) The point is that if I do: ".̂ ̪.̂".normalize!NFC.byGrapheme.findAmong([Grapheme("."),Grapheme(",")]) no match is returned. If I use your method with dchars, I will get spurious matches. I.e. the suggested method to look for punctuation symbols is incorrect: writeln(".̂ ̪.̂".findAmong(",.")); // ".̂ ̪.̂" Nice example. (Also, do you have an use case for this?) Count delimited words. Did you also look at balancedParens? Andrei
Re: The Case Against Autodecode
On 6/2/16 5:38 PM, cym13 wrote: Allow me to try another angle: - There are different levels of unicode support and you don't want to support them all transparently. That's understandable. Cool. - The level you choose to support is the code point level. There are many good arguments about why this isn't a good default but you won't change your mind. I don't like that at all and I'm not alone but let's forget the entirety of the vocal D community for a moment. You mean all 35 of them? It's not about changing my mind! A massive thing that the code point level handling is the incumbent, and that changing it would need to mark an absolutely Earth-shattering improvement to be worth it! - A huge part of unicode chars can be normalized to fit your definition. That way not everything work (far from it) but a sufficiently big subset works. Cool. - On the other hand without normalization it just doesn't make any sense from a user perspective.The ö example has clearly shown that much, you even admitted it yourself by stating that many counter arguments would have worked had the string been normalized). Yah, operating at code point level does not come free of caveats. It is vastly superior to operating on code units, and did I mention it's the incumbent. - The most proeminent problem is with graphems that can have different representations as those that can't be normalized can't be searched as dchars as well. Yah, I'd say if the program needs graphemes the option is there. Phobos by default deals with code points which are not perfect but are independent of representation, produce meaningful and consistent results with std.algorithm etc. - If autodecoding to code points is to stay and in an effort to find a compromise then normalizing should be done by default. Sure it would take some more time but it wouldn't break any code (I think) and would actually make things more correct. They still wouldn't be correct but I feel that something as crazy as unicode cannot be tackled generically anyway. Some more work on normalization at strategic points in Phobos would be interesting! Andrei
Re: The Case Against Autodecode
On 02.06.2016 23:23, Andrei Alexandrescu wrote: On 6/2/16 5:19 PM, Timon Gehr wrote: On 02.06.2016 23:16, Timon Gehr wrote: On 02.06.2016 23:06, Andrei Alexandrescu wrote: As the examples show, the examples would be entirely meaningless at code unit level. So far, I needed to count the number of characters 'ö' inside some string exactly zero times, (Obviously this isn't even what the example would do. I predict I will never need to count the number of code points 'ö' by calling some function from std.algorithm directly.) You may look for a specific dchar, and it'll work. How about findAmong("...") with a bunch of ASCII and Unicode punctuation symbols? -- Andrei .̂ ̪.̂ (Copy-paste it somewhere else, I think it might not be rendered correctly on the forum.) The point is that if I do: ".̂ ̪.̂".normalize!NFC.byGrapheme.findAmong([Grapheme("."),Grapheme(",")]) no match is returned. If I use your method with dchars, I will get spurious matches. I.e. the suggested method to look for punctuation symbols is incorrect: writeln(".̂ ̪.̂".findAmong(",.")); // ".̂ ̪.̂" (Also, do you have an use case for this?)
Re: The Case Against Autodecode
On 6/2/16 5:38 PM, deadalnix wrote: On Thursday, 2 June 2016 at 21:37:11 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:35 PM, deadalnix wrote: On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:20 PM, deadalnix wrote: The good thing when you define works by whatever it does right now No, it works as it was designed. -- Andrei Nobody says it doesn't. Everybody says the design is crap. I think I like it more after this thread. -- Andrei You start reminding me of the joke with that guy complaining that everybody is going backward on the highway. Touché. (Get it?) -- Andrei
Re: non empty slices
On Thursday, 2 June 2016 at 20:11:21 UTC, Alex wrote: On Thursday, 2 June 2016 at 16:21:03 UTC, ag0aep6g wrote: void f(int[] arr) { A a = arrayToA(arr); foreach (T; A.AllowedTypes) { if (T* p = a.peek!T) f_impl(*p); } } You totally hit the point! The cool thing about the Algebraic is as I expected, that it doesn't change it's type... And the hard thing is, that I'm not used to its Empty, Many, ... things yet. But the question remains how to keep this @nogc? I wonder at the line with peek... and why it is not just returning the value... Just tried this instead of your f-function: void f(int[] arr) { A result; import std.meta; alias TL = AliasSeq!(Empty, int, Many!int); int caseS; switch (arr.length) { case 0: result = Empty.init; caseS = 0; break; case 1: result = arr[0]; caseS = 1; break; default: result = Many!int(arr); caseS = 2; } f_impl(*result.get!(TL[caseS])); } But got: Error: variable caseS cannot be read at compile time which is obviously true...
Re: The Case Against Autodecode
On 6/2/16 5:37 PM, Andrei Alexandrescu wrote: On 6/2/16 5:35 PM, deadalnix wrote: On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:20 PM, deadalnix wrote: The good thing when you define works by whatever it does right now No, it works as it was designed. -- Andrei Nobody says it doesn't. Everybody says the design is crap. I think I like it more after this thread. -- Andrei Meh, thinking of it again: I don't like it more, I'd still do it differently given a clean slate (viz. RCStr). But let's say I didn't get many compelling reasons to remove autodecoding from this thread. -- Andrei
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 20:29:48 UTC, Andrei Alexandrescu wrote: On 06/02/2016 04:22 PM, cym13 wrote: A:“We should decode to code points” B:“No, decoding to code points is a stupid idea.” A:“No it's not!” B:“Can you show a concrete example where it does something useful?” A:“Sure, look at that!” B:“This isn't working at all, look at all those counter-examples!” A:“It may not work for your examples but look how easy it is to find code points!” With autodecoding all of std.algorithm operates correctly on code points. Without it all it does for strings is gibberish. -- Andrei Allow me to try another angle: - There are different levels of unicode support and you don't want to support them all transparently. That's understandable. - The level you choose to support is the code point level. There are many good arguments about why this isn't a good default but you won't change your mind. I don't like that at all and I'm not alone but let's forget the entirety of the vocal D community for a moment. - A huge part of unicode chars can be normalized to fit your definition. That way not everything work (far from it) but a sufficiently big subset works. - On the other hand without normalization it just doesn't make any sense from a user perspective.The ö example has clearly shown that much, you even admitted it yourself by stating that many counter arguments would have worked had the string been normalized). - The most proeminent problem is with graphems that can have different representations as those that can't be normalized can't be searched as dchars as well. - If autodecoding to code points is to stay and in an effort to find a compromise then normalizing should be done by default. Sure it would take some more time but it wouldn't break any code (I think) and would actually make things more correct. They still wouldn't be correct but I feel that something as crazy as unicode cannot be tackled generically anyway.
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 21:37:11 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:35 PM, deadalnix wrote: On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:20 PM, deadalnix wrote: The good thing when you define works by whatever it does right now No, it works as it was designed. -- Andrei Nobody says it doesn't. Everybody says the design is crap. I think I like it more after this thread. -- Andrei You start reminding me of the joke with that guy complaining that everybody is going backward on the highway.
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 21:30:51 UTC, tsbockman wrote: On Thursday, 2 June 2016 at 21:07:19 UTC, default0 wrote: The level 2 support description noted that it should be opt-in because its slow. 1) It does not say that level 2 should be opt-in; it says that level 2 should be toggle-able. Nowhere does it say which of level 1 and 2 should be the default. 2) It says that working with graphemes is slower than UTF-16 code UNITS (level 1), but says nothing about streaming decoding of code POINTS (what we have). 3) That document is from 2000, and its claims about performance are surely extremely out-dated, anyway. Computers and the Unicode standard have both changed much since then. 1) Right because a special toggleable syntax is definitely not "opt-in". 2) Several people in this thread noted that working on graphemes is way slower (which makes sense, because its yet another processing you need to do after you decoded - therefore more work - therefore slower) than working on code points. 3) Not an argument - doing more work makes code slower. The only thing that changes is what specific operations have what cost (for instance, memory access has a much higher cost now than it had then). Considering the way the process works and judging from what others in this thread have said about it, I will stick with "always decoding to graphemes for all operations is very slow" and indulge in being too lazy to write benchmarks for it to show just how bad it is.
Re: The Case Against Autodecode
On 6/2/16 5:35 PM, deadalnix wrote: On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:20 PM, deadalnix wrote: The good thing when you define works by whatever it does right now No, it works as it was designed. -- Andrei Nobody says it doesn't. Everybody says the design is crap. I think I like it more after this thread. -- Andrei
Re: The Case Against Autodecode
On 6/2/16 5:35 PM, ag0aep6g wrote: On 06/02/2016 11:27 PM, Andrei Alexandrescu wrote: On 6/2/16 5:24 PM, ag0aep6g wrote: On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote: Nope, that's a radically different matter. As the examples show, the examples would be entirely meaningless at code unit level. They're simply not possible. Won't compile. They do compile. Yes, you're right, of course they do. char implicitly converts to dchar. I didn't think of that anti-feature. As I said: this thread produces an unpleasant amount of arguments in favor of autodecoding. Even I don't like that :o). It's more of an argument against char : dchar, I'd say. I do think that's an interesting option in PL design space, but that would be super disruptive. -- Andrei
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 21:24:15 UTC, Andrei Alexandrescu wrote: On 6/2/16 5:20 PM, deadalnix wrote: The good thing when you define works by whatever it does right now No, it works as it was designed. -- Andrei Nobody says it doesn't. Everybody says the design is crap.
Re: The Case Against Autodecode
On 06/02/2016 11:27 PM, Andrei Alexandrescu wrote: On 6/2/16 5:24 PM, ag0aep6g wrote: On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote: Nope, that's a radically different matter. As the examples show, the examples would be entirely meaningless at code unit level. They're simply not possible. Won't compile. They do compile. Yes, you're right, of course they do. char implicitly converts to dchar. I didn't think of that anti-feature. As I said: this thread produces an unpleasant amount of arguments in favor of autodecoding. Even I don't like that :o). It's more of an argument against char : dchar, I'd say.
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 21:07:19 UTC, default0 wrote: The level 2 support description noted that it should be opt-in because its slow. 1) It does not say that level 2 should be opt-in; it says that level 2 should be toggle-able. Nowhere does it say which of level 1 and 2 should be the default. 2) It says that working with graphemes is slower than UTF-16 code UNITS (level 1), but says nothing about streaming decoding of code POINTS (what we have). 3) That document is from 2000, and its claims about performance are surely extremely out-dated, anyway. Computers and the Unicode standard have both changed much since then.
Re: The Case Against Autodecode
On 6/2/16 5:27 PM, Andrei Alexandrescu wrote: On 6/2/16 5:24 PM, ag0aep6g wrote: Just like there is no single code point for 'a⃗' so you can't search for it in a range of code points. Of course you can. Correx, indeed you can't. -- Andrei
Re: D's Auto Decoding and You
On 6/2/16 5:27 PM, Steven Schveighoffer wrote: On 6/2/16 5:21 PM, jmh530 wrote: On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote: If you think there should be any more information included in the article, please let me know so I can add it. I was a little confused by something in the main autodecoding thread, so I read your article again. Unfortunately, I don't think my confusion is resolved. I was trying one of your examples (full code I used below). You claim it works, but I keep getting assertion failures. I'm just running it with rdmd on Windows 7. import std.algorithm : canFind; void main() { string s = "cassé"; assert(s.canFind!(x => x == 'é')); } If that é above is an e followed by a combining character, then you will get the error. This is because autodecoding does not auto normalize as well -- the code points have to match exactly. -Steve Indeed. FWIW I just copied OP's code from Thunderbird into Chrome (on OSX) and it worked: https://dpaste.dzfl.pl/09b9188d87a5 Should I assume some normalization occurred on the way? Andrei
Re: D's Auto Decoding and You
On 6/2/16 5:21 PM, jmh530 wrote: On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote: If you think there should be any more information included in the article, please let me know so I can add it. I was a little confused by something in the main autodecoding thread, so I read your article again. Unfortunately, I don't think my confusion is resolved. I was trying one of your examples (full code I used below). You claim it works, but I keep getting assertion failures. I'm just running it with rdmd on Windows 7. import std.algorithm : canFind; void main() { string s = "cassé"; assert(s.canFind!(x => x == 'é')); } If that é above is an e followed by a combining character, then you will get the error. This is because autodecoding does not auto normalize as well -- the code points have to match exactly. -Steve
Re: D's Auto Decoding and You
On Thursday, 2 June 2016 at 21:21:50 UTC, jmh530 wrote: I was a little confused by something in the main autodecoding thread, so I read your article again. Unfortunately, I don't think my confusion is resolved. I was trying one of your examples (full code I used below). You claim it works, but I keep getting assertion failures. I'm just running it with rdmd on Windows 7. import std.algorithm : canFind; void main() { string s = "cassé"; assert(s.canFind!(x => x == 'é')); } Your browser is turning the é in the string into two code points via normalization whereas it should be one. Try using \u00E9 instead.
Re: Why does DMD on Debian need xdg-utils
It shouldn't be necessary. I believe that is because of `dmd -man`, which open a web browser. That's an apt-d issue (in hopefully Jordi Sayol will read this) which prevents using this repository if your machine has no X (I guess you discovered that on a server, as I did). 2016-06-02 20:17 GMT+02:00 ZombineDev via Digitalmars-d < digitalmars-d@puremagic.com>: > On Thursday, 2 June 2016 at 18:04:43 UTC, flamencofantasy wrote: > >> On Thursday, 2 June 2016 at 17:54:07 UTC, ZombineDev wrote: >> >>> On Thursday, 2 June 2016 at 17:36:46 UTC, flamencofantasy wrote: >>> DMD on debian depends on the xdg-utils package. When I install xdg-utils I get many more packages (see bottom of message). Is that really necessary? Thanks. >>> >>> It shouldn't be necessary. It's probably a packaging issue. Meanwhile, >>> you can try the install.sh script on listed on http://dlang.org/download. >>> It shouldn't have any unnecessary dependencies. >>> >> >> Thanks, but does the script handle upgrades? >> > > Yes. It also supports multiple versions side by side and also installs > dub. You can find the source here: > https://github.com/dlang/installer/blob/master/script/install.sh > > Example usage: > > # Install dmd 2.70.0 > ~/dlang/install.sh install dmd-2.70.0 > > # Install dmd 2.69.0 > ~/dlang/install.sh install dmd-2.69.0 > > # start using version 2.70.0 > activate ~/dlang/dmd-2.70.0 > > # stop using version 2.70.0 > deactivate > > # start using version 2.69.0 > activate ~/dlang/dmd-2.69.0 > > # stop using version 2.69.0 > deactivate > > # uninstall version 2.69.0 > ~/dlang/install.sh uninstall dmd-2.69.0 > > # removes everything installed so far > rm -rf ~/dlang > > # downloads (again) the install script and > # installs the latest stable version of the compiler. > curl -fsS https://dlang.org/install.sh | bash -s dmd > >
Re: The Case Against Autodecode
On 02.06.2016 22:51, Andrei Alexandrescu wrote: On 06/02/2016 04:50 PM, Timon Gehr wrote: On 02.06.2016 22:28, Andrei Alexandrescu wrote: On 06/02/2016 04:12 PM, Timon Gehr wrote: It is not meaningful to compare utf-8 and utf-16 code units directly. But it is meaningful to compare Unicode code points. -- Andrei It is also meaningful to compare two utf-8 code units or two utf-16 code units. By decoding them of course. -- Andrei That makes no sense, I cannot decode single code units. BTW, I guess the reason why char converts to wchar converts to dchar is that the lower half of code units in char and the lower half of code units in wchar are code points. Maybe code units and code points with low numerical values should have distinct types.
Re: The Case Against Autodecode
On 6/2/16 5:20 PM, deadalnix wrote: The good thing when you define works by whatever it does right now No, it works as it was designed. -- Andrei
Re: The Case Against Autodecode
On 6/2/16 5:23 PM, Timon Gehr wrote: On 02.06.2016 22:51, Andrei Alexandrescu wrote: On 06/02/2016 04:50 PM, Timon Gehr wrote: On 02.06.2016 22:28, Andrei Alexandrescu wrote: On 06/02/2016 04:12 PM, Timon Gehr wrote: It is not meaningful to compare utf-8 and utf-16 code units directly. But it is meaningful to compare Unicode code points. -- Andrei It is also meaningful to compare two utf-8 code units or two utf-16 code units. By decoding them of course. -- Andrei That makes no sense, I cannot decode single code units. BTW, I guess the reason why char converts to wchar converts to dchar is that the lower half of code units in char and the lower half of code units in wchar are code points. Maybe code units and code points with low numerical values should have distinct types. Then you lost me. (I'm sure you're making a good point.) -- Andrei
Re: The Case Against Autodecode
On 02.06.2016 23:20, deadalnix wrote: The sample code won't count the instance of the grapheme 'ö' as some of its encoding won't be counted, which definitively count as doesn't work. It also has false positives (you can combine 'ö' with some combining character in order to get some strange character that is not an 'ö', and not even NFC helps with that).
Re: The Case Against Autodecode
On 6/2/16 5:24 PM, ag0aep6g wrote: On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote: Nope, that's a radically different matter. As the examples show, the examples would be entirely meaningless at code unit level. They're simply not possible. Won't compile. They do compile. There is no single UTF-8 code unit for 'ö', so you can't (easily) search for it in a range for code units. Of course you can. Can you search for an int in a short[]? Oh yes you can. Can you search for a dchar in a char[]? Of course you can. Autodecoding also gives it meaning. Just like there is no single code point for 'a⃗' so you can't search for it in a range of code points. Of course you can. You can still search for 'a', and 'o', and the rest of ASCII in a range of code units. You can search for a dchar in a char[] because you can compare an individual dchar with either another dchar (correct, autodecoding) or with a char (incorrect, no autodecoding). As I said: this thread produces an unpleasant amount of arguments in favor of autodecoding. Even I don't like that :o). Andrei
Re: The Case Against Autodecode
On 06/02/2016 11:24 PM, ag0aep6g wrote: They're simply not possible. Won't compile. There is no single UTF-8 code unit for 'ö', so you can't (easily) search for it in a range for code units. Just like there is no single code point for 'a⃗' so you can't search for it in a range of code points. You can still search for 'a', and 'o', and the rest of ASCII in a range of code units. I'm ignoring combining characters there. You can search for 'a' in code units in the same way that you can search for 'ä' in code points. I.e., more or less, depending on how serious you are about combining characters.
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 20:27:27 UTC, Walter Bright wrote: On 6/2/2016 12:34 PM, deadalnix wrote: On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote: Pretty much everything. Consider s and s1 string variables with possibly different encodings (UTF8/UTF16). * s.all!(c => c == 'ö') works only with autodecoding. It returns always false without. False. Many characters can be represented by different sequences of codepoints. For instance, ê can be ê as one codepoint or ^ as a modifier followed by e. ö is one such character. There are 3 levels of Unicode support. What Andrei is talking about is Level 1. http://unicode.org/reports/tr18/tr18-5.1.html I wonder what rationale there is for Unicode to have two different sequences of codepoints be treated as the same. It's madness. To be able to convert back and forth from/to unicode in a lossless manner.
Re: The Case Against Autodecode
On 06/02/2016 11:06 PM, Andrei Alexandrescu wrote: Nope, that's a radically different matter. As the examples show, the examples would be entirely meaningless at code unit level. They're simply not possible. Won't compile. There is no single UTF-8 code unit for 'ö', so you can't (easily) search for it in a range for code units. Just like there is no single code point for 'a⃗' so you can't search for it in a range of code points. You can still search for 'a', and 'o', and the rest of ASCII in a range of code units.
Re: The Case Against Autodecode
On 6/2/16 5:19 PM, Timon Gehr wrote: On 02.06.2016 23:16, Timon Gehr wrote: On 02.06.2016 23:06, Andrei Alexandrescu wrote: As the examples show, the examples would be entirely meaningless at code unit level. So far, I needed to count the number of characters 'ö' inside some string exactly zero times, (Obviously this isn't even what the example would do. I predict I will never need to count the number of code points 'ö' by calling some function from std.algorithm directly.) You may look for a specific dchar, and it'll work. How about findAmong("...") with a bunch of ASCII and Unicode punctuation symbols? -- Andrei
Re: year to date pull statistics (week ending 2016-05-28)
On Thursday, 2 June 2016 at 18:36:02 UTC, Basile B. wrote: On Tuesday, 31 May 2016 at 23:48:00 UTC, Brad Roberts wrote: [...] You should take Jack Stouffer in dlang ;) . Perso I think that in the phobos the problem is that the people who should manage it are not enough available. I am fully for that - Jack has been doing a great job lately at cleaning up & reviewing Phobos. He has more than earned his promotion!
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 20:13:52 UTC, Andrei Alexandrescu wrote: On 06/02/2016 03:34 PM, deadalnix wrote: On Thursday, 2 June 2016 at 19:05:44 UTC, Andrei Alexandrescu wrote: Pretty much everything. Consider s and s1 string variables with possibly different encodings (UTF8/UTF16). * s.all!(c => c == 'ö') works only with autodecoding. It returns always false without. False. True. "Are all code points equal to this one?" -- Andrei The good thing when you define works by whatever it does right now, it is that everything always works and there are literally never any bug. The bad thing is that this is a completely useless definition of work. The sample code won't count the instance of the grapheme 'ö' as some of its encoding won't be counted, which definitively count as doesn't work. When your point need to redefine words in ways that nobody agree with, it is time to admit the point is bogus.
Re: D's Auto Decoding and You
On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote: If you think there should be any more information included in the article, please let me know so I can add it. I was a little confused by something in the main autodecoding thread, so I read your article again. Unfortunately, I don't think my confusion is resolved. I was trying one of your examples (full code I used below). You claim it works, but I keep getting assertion failures. I'm just running it with rdmd on Windows 7. import std.algorithm : canFind; void main() { string s = "cassé"; assert(s.canFind!(x => x == 'é')); }
Re: Blocking points for further D adoption
On Thursday, 2 June 2016 at 21:01:53 UTC, Jacob Carlborg wrote: Don't you have that issue with most stuff. Not everything can fit everyone's need. Sure, it's a sliding scale. But, web servers, even ones that sit behind Apache or Nginx, are specialized much more than what we currently have in Phobos. It would make more sense from a maintenance standpoint to have a toy server, but I don't see the utility of including one in Phobos over just having it in dub.
Re: The Case Against Autodecode
On 02.06.2016 23:06, Andrei Alexandrescu wrote: As the examples show, the examples would be entirely meaningless at code unit level. So far, I needed to count the number of characters 'ö' inside some string exactly zero times, but I wanted to chain or join strings relatively often.
Re: The Case Against Autodecode
On 6/2/16 5:05 PM, tsbockman wrote: On Thursday, 2 June 2016 at 20:56:26 UTC, Walter Bright wrote: What is supposed to be done with "do not merge" PRs other than close them? Occasionally people need to try something on the auto tester (not sure if that's relevant to that particular PR, though). Presumably if someone marks their own PR as "do not merge", it means they're planning to either close it themselves after it has served its purpose, or they plan to fix/finish it and then remove the "do not merge" label. Feel free to reopen if it helps, it wasn't closed in anger. -- Andrei
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 20:52:29 UTC, ag0aep6g wrote: On 06/02/2016 10:36 PM, Andrei Alexandrescu wrote: By whom? The "support level 1" folks yonder at the Unicode standard? :o) -- Andrei Do they say that level 1 should be the default, and do they give a rationale for that? Would you kindly link or quote that? The level 2 support description noted that it should be opt-in because its slow. Arguably it should be easier to operate on code units if you know its safe to do so, but either always working on code units or always working on graphemes as the default seems to be either too broken too often or too slow too often. Now one can argue either consistency for code units (because then we can treat char[] and friends as a slice) or correctness for graphemes but really the more I think about it the more I think there is no good default and you need to learn unicode anyways. The only sad parts here are that 1) we hijacked an array type for strings, which sucks and 2) that we dont have an api that is actually good at teaching the user what it does and doesnt do. The consequence of 1 is that generic code that also wants to deal with strings will want to special-case to get rid of auto-decoding, the consequence of 2 is that we will have tons of not-actually-correct string handling. I would assume that almost all string handling code that is out in the wild is broken anyways (in code I have encountered I have never seen attempts to normalize or do other things before or after comparisons, searching, etc), unless of course, YOU or one of your colleagues wrote it (consider that checking the length of a string in Java or C# to validate it is no longer than X characters is often done and wrong, because .Length is the number of UTF-16 code units in those languages) :o) So really as bad and alarming as "incorrect string handling" by default seems, it in practice of other languages that get used way more than D has not prevented people from writing working (internationalized!) applications in those languages. One could say we should do it better than them, but I would be inclined to believe that RCStr provides our opportunity to do so. Having char[] be what it is is an annoying wart, and maybe at some point we can deprecate/remove that behaviour, but for now Id rather see if RCStr is viable than attempt to change semantics of all string handling code in D.
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 20:56:26 UTC, Walter Bright wrote: What is supposed to be done with "do not merge" PRs other than close them? Occasionally people need to try something on the auto tester (not sure if that's relevant to that particular PR, though). Presumably if someone marks their own PR as "do not merge", it means they're planning to either close it themselves after it has served its purpose, or they plan to fix/finish it and then remove the "do not merge" label. Either way, they shouldn't be closed just because they say "do not merge" (unless they're abandoned or something, obviously).
Re: The Case Against Autodecode
On 6/2/16 5:01 PM, ag0aep6g wrote: On 06/02/2016 10:50 PM, Andrei Alexandrescu wrote: It does not fall apart for code points. Yes it does. You've been given plenty examples where it falls apart. There weren't any. Your answer to that was that it operates on code points, not graphemes. That is correct. Well, duh. Comparing UTF-8 code units against each other works, too. That's not an argument for doing that by default. Nope, that's a radically different matter. As the examples show, the examples would be entirely meaningless at code unit level. Andrei
Re: Phobos needs a (part-time) maintainer
On Thursday, 2 June 2016 at 20:59:52 UTC, Basile B. wrote: Eventually I'll come back to bugfix if they take Jake, but not you Seb. For a reason or another I don't like you wilzbach. You are frustrated. I get that. Don't make this personal for others, please. Maybe you should ignore this thread for today?
Re: Blocking points for further D adoption
On 2016-06-02 20:14, Jack Stouffer wrote: Just to be clear, it's not a good idea to have a full blown server in your stdlib. Non-toy web servers are complicated pieces of software involving > 10KLOC. Not only that, but there are many ways to skin a cat in this field. Different products need varying, sometimes mutually exclusive, features from their servers. Therefore, I don't web servers are good candidates for standardization. Don't you have that issue with most stuff. Not everything can fit everyone's need. I have never used std.bigint but it still present in Phobos because it's useful from someone. I agree with the complexity of web servers but they don't need to handle all the gory details off clients not following the protocol. I would think it works perfectly fine for non-public facing servers. For public facing servers it should sit behind a well test well understood implementation like Apache or nginx, regardless if the implementation is in Go, Node.js or D. -- /Jacob Carlborg
Re: The Case Against Autodecode
On 06/02/2016 10:50 PM, Andrei Alexandrescu wrote: It does not fall apart for code points. Yes it does. You've been given plenty examples where it falls apart. Your answer to that was that it operates on code points, not graphemes. Well, duh. Comparing UTF-8 code units against each other works, too. That's not an argument for doing that by default.
Re: The Case Against Autodecode
On Thursday, 2 June 2016 at 20:49:52 UTC, Andrei Alexandrescu wrote: On 06/02/2016 04:47 PM, tsbockman wrote: That doesn't sound like much of an endorsement for defaulting to only level 1 support to me - "it does not handle more complex languages or extensions to the Unicode Standard very well". Code point/Level 1 support sounds like a sweet spot between efficiency/complexity and conviviality. Level 2 is opt-in with byGrapheme. -- Andrei Actually, according to the document Walter Bright linked level 1 does NOT operate at the code point level: Level 1: Basic Unicode Support. At this level, the regular expression engine provides support for Unicode characters as basic 16-bit logical units. (This is independent of the actual serialization of Unicode as UTF-8, UTF-16BE, UTF-16LE, or UTF-32.) ... Level 1 support works well in many circumstances. However, it does not handle more complex languages or extensions to the Unicode Standard very well. Particularly important cases are **surrogates** ... So, level 1 appears to be UTF-16 code units, not code points. To do code points it would have to recognize surrogates, which are specifically mentioned as not supported. Level 2 skips straight to graphemes, and there is no code point level. However, this document is very old - from Unicode 3.0 and the year 2000: While there are no surrogate characters in Unicode 3.0 (outside of private use characters), future versions of Unicode will contain them... Perhaps level 1 has since been redefined?
Re: Phobos needs a (part-time) maintainer
On Thursday, 2 June 2016 at 20:23:37 UTC, Seb wrote: On Thursday, 2 June 2016 at 20:17:32 UTC, Andrei Alexandrescu wrote: On 06/02/2016 03:41 PM, Basile B. wrote: Once a pr gets the label "@andrei". It basically means that "it's dead". You mean @andralex? You are right. I am sorry, I'm coming off an unprecedently busy spring spent mostly evangelizing D at various conferences, or doing contract work that will pour money in the Foundation's coffers. This is not work I can delegate, but is poised to have great impact (more on that later). I thought leaving Facebook would free my time, but things have gotten really crazily busy. And look at me - I spend most of my time on the autodecoding thread. Andrei Can't we have someone that can dedicate a fixed amount of his professional time to maintain the D infrastructure? There is so much to do - reviewing and categorizing PRs is just the tip of the ice berg. Ideally it would be a full-time position, but if a company would dedicate 20% of an employees time to start be able to contribute to D, that would be an awesome step forward. Eventually I'll come back to bugfix if they take Jake, but not you Seb. For a reason or another I don't like you wilzbach.