Re: std.unittests for (final?) review [Update]
On 11/01/11 03:09, Jonathan M Davis wrote: On Monday 10 January 2011 05:40:39 Justin Johansson wrote: On 10/01/11 23:29, Jonathan M Davis wrote: From the sounds of it, if this code gets voted in, it'll be going into std.exception. - Jonathan M Davis May it be asked by what authority you can say that (ie. as said above)? Andrei's posts in this thread. Assuming that the code passes the vote (the deadline for which he set as February 7th), he thinks that std.exception is the best place for it rather than it being in its own module. Oh okay. Thanks for that; I missed Andrei's post in earlier this thread but found it now. Good luck with the voting. -- Justin
Re: either
On 10/01/11 05:42, Andrei Alexandrescu wrote: I wrote a simple helper, in spirit with some recent discussions: // either struct Either(Ts...) { Tuple!Ts data_; bool opEquals(E)(E e) { foreach (i, T; Ts) { if (data_[i] == e) return true; } return false; } } auto either(Ts...)(Ts args) { return Either!Ts(tuple(args)); } unittest { assert(1 == either(1, 2, 3)); assert(4 != either(1, 2, 3)); assert(abac != either(aasd, s)); assert(abac == either(aasd, abac, s)); } Turns out this is very useful in a variety of algorithms. I just don't know where in std this helper belongs! Any ideas? Despite that it may be very useful as you say, personally I think it is a fundamental no-no to overload the meaning of == in any manner that does not preserve the generally accepted semantics of equality which include the notions of reflexivity, symmetry and transitivity**. **See http://en.wikipedia.org/wiki/Equality_%28mathematics%29 The symmetric and transitive properties of the equality relation imply that if (a == c) is true and if (b == c) is true then (a == b) is also true. In this case the semantics of the overloaded == operator have the expressions 1 == either(1, 2, 3) and 2 == either(1, 2, 3) both evaluating to true and by implication/expectation (1 == 2). Clearly though, (1 == 2) evaluates to false in terms of the commonly accepted meaning of equality. Just my 2 cents and I wonder if there some other way of achieving the desired functionality of your helper without resorting to overloading == and the consequential violation of the commonly held semantics of equality. Cheers, Justin Johansson
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 2011-01-10 22:57:36 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org said: I've been thinking on how to better deal with Unicode strings. Currently strings are formally bidirectional ranges with a surreptitious random access interface. The random access interface accesses the support of the string, which is understood to hold data in a variable-encoded format. For as long as the programmer understands this relationship, code for string manipulation can be written with relative ease. However, there is still room for writing wrong code that looks legit. Sometimes the best way to tackle a hairy reality is to invite it to the negotiation table and offer it promotion to first-class abstraction status. Along that vein I was thinking of defining a new range: VLERange, i.e. Variable Length Encoding Range. Such a range would have the power somewhere in between bidirectional and random access. The primitives offered would include empty, access to front and back, popFront and popBack (just like BidirectionalRange), and in addition properties typical of random access ranges: indexing, slicing, and length. Note that the result of the indexing operator is not the same as the element type of the range, as it only represents the unit of encoding. Seems like a good idea to define things formally. In addition to these (and connecting the two), a VLERange would offer two additional primitives: 1. size_t stepSize(size_t offset) gives the length of the step needed to skip to the next element. 2. size_t backstepSize(size_t offset) gives the size of the _backward_ step that goes to the previous element. I like the idea, but I'm not sure about this interface. What's the result of stepSize if your range must create two elements from one underlying unit? Perhaps in those cases the element type could be an array (to return more than one element from one iteration). For instance, say we have a conversion range taking a Unicode string and converting it to ISO Latin 1. The best (lossy) conversion for œ is oe (one chararacter to two characters), in this case 'front' could simply return oe (two characters) in one iteration, with stepSize being the size of the œ code point. In the same conversion process, encountering e followed by a combining ´ would return pre-combined character é (two characters to one character). In both cases, offset is assumed to be at the beginning of a logical element of the range. I suspect that a lot of functions in std.string can be written without Unicode-specific knowledge just by relying on such an interface. Moreover, algorithms can be generalized to other structures that use variable-length encoding, such as those used in data compression. (In that case, the support would be a bit array and the encoded type would be ubyte.) Applicability to other problems seems like a valuable benefit. Writing to such ranges is not addressed by this design. Ideas are welcome. Writing, as in assigning to 'front'? That's not really possible with variable-length units as it'd need to shift everything in case of a length difference. Or maybe you meant writing as in having an output range for variable-length elements... I'm not sure Adding VLERange would legitimize strings and would clarify their handling, at the cost of adding one additional concept that needs to be minded. Is the trade-off worthwhile? In my opinion it's not a trade-off at all, it's a formalization of how strings are handled which is better in every regard than a special case. I welcome this move very much. -- Michel Fortin michel.for...@michelf.com http://michelf.com/
Re: About std.container.RedBlackTree
On Mon, 10 Jan 2011 18:14:31 -0500, bearophile bearophileh...@lycos.com wrote: I've had to use a search tree, so RedBlackTree was the right data structure. It seems to do what I need, so thank you for this useful data structure. Some of the things I write here are questions or things that show my ignorance about this implementation. - Please add some usage examples to this page, this is important and helps people reduce a lot the number of experiments to do to use this tree: http://www.digitalmars.com/d/2.0/phobos/std_container.html# I will do this, when I have some free time. If you want to submit some examples, I would gladly include them. - This doesn't seem to work, it gives a forward reference error: import std.container: RedBlackTree; RedBlackTree!int t; void main() { t = RedBlackTree!int(1); } Grrr... I had issues with forward references (you can see from this comment: http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/container.d#L4071), I thought by reordering the functions I had fixed it, but apparently, it resurfaces under certain conditions. Please vote for that bug. http://d.puremagic.com/issues/show_bug.cgi?id=2810 I don't really know what to do about fixing it. Most likely any 'fix' I try will result in some other situation not compiling. I probably should just avoid using auto, not being able to declare a red black tree as a global variable is a huge limitation. - I need to create an empty tree and add items to it later (where I declare a fixed-sized array of trees I don't know the items to add). How do you do it? This code doesn't work: import std.container: RedBlackTree; void main() { auto t = RedBlackTree!int(); t.insert(1); } RedBlackTree must be initialized with a constructor. Otherwise, your root node is null. I chose this path instead of checking for null on every function. I realize the mistake -- you cannot create an empty tree, because you cannot have a default constructor. I have another function that I use to help create trees during unit tests because IFTI can be weird. I will make this function public and always present, then you can create an empty tree like this: auto t = RedBlackTree!int.create(); If Andrei decides eventually that containers should be classes, then this problem goes away. Please bugzillize this - Is the tree insert() method documented in the HTML docs? I thought this would do it, but apparently it doesn't: http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/container.d#L4457 I will try to make those docs show up. - A tree is a kind of set, so instead of insert() I'd like a name like add(). (But maybe this is not standard in D). The function names must be consistent across containers, because the point is that complexity and semantic requirements are attached to the function name. The function names were decided long ago by Andrei, and I don't think insert is a bad name (I believe std::set and std::map in C++ STL uses insert). - In theory an helper redBlackTree() function allows to write just this, with no need to write types: redBlackTree(1, 2, 3) Yes, this should be done. Please make a bugzilla report. In fact, this can extend to all std.container types. - I have tried to use printTree(), but I have failed. I don't know what to give to it and the docs don't say that it requires -unittest If it's a private debug function then there's no need to give it a ddoc comment. It is a private debug function, only enabled when version = doRBChecks is enabled. When developing the red black node, the red-black algorithms to fix the tree are very complex and error prone to write. This function basically printed the tree layout in a horribly ugly fashion when the red-black properties were not preserved. It helped me find bugs, but is mostly no-longer needed unless I try some more optimizations. Please ignore the function. I will make sure the comment is not ddoc'd. - I've seen that the tree doesn't seem to contain a length. Using walkLength is an option, but a possible idea is to replace: struct RedBlackTree(T,alias less = a b,bool allowDuplicates = false) With: struct RedBlackTree(T, alias less=a b, bool allowDuplicates=false, bool withLength=false) The reason for this is to keep it a reference-type (pImpl style), but I realize that I can easily fix this (I can just make the root node contain a length field). Please file a bugzilla to add length. - If you need to add many value nodes quickly to the tree a memory pool may speed up mass allocation. This has some disadvantages too. I have done this in dcollections, and it helps immensely in node-based containers. It is not
Re: filling an array of structures
On Tue, 11 Jan 2011 00:39:55 -0500, Brad brad.lanam.comp_nos...@nospam_gmail.com wrote: Given an array of structures that you need to populate. Also assume the structure is quite large and has many elements to fill in. S s[]; while (something) { s.length += 1; auto sp = s[$-1]; // method 1 sp.a = 1; ... with (s[$-1]) { // method 2 a = 1; } ... foreach (ref sp; s[$-1..$]) { // method 3 sp.a = 1; } } I don't mind 'with' statements, but they have a readability and maintenance problem if their scope is large. The reader would have to be aware of the context of the structure and the local variables, whereas 'sp.a' is self documenting. method 3 is fine, and provides me with a reference to s[$-1], but I'd really like to have: auto sp = ref s[$-1]; // possible method 4 where sp is a reference, but no pointer arithmetic can be done on it. Another alternative would be runtime aliases. alias s[$-1] as sp; Or sp = with (s[$-1]); // I don't much like this syntax... In the meantime, I'll go with method 1. What about: S sp; sp.a = 1; s ~= sp; Or if you have a constructor for S, or a is the only member in it: s ~= S(1); -Steve
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On Mon, 10 Jan 2011 22:57:36 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: I've been thinking on how to better deal with Unicode strings. Currently strings are formally bidirectional ranges with a surreptitious random access interface. The random access interface accesses the support of the string, which is understood to hold data in a variable-encoded format. For as long as the programmer understands this relationship, code for string manipulation can be written with relative ease. However, there is still room for writing wrong code that looks legit. Sometimes the best way to tackle a hairy reality is to invite it to the negotiation table and offer it promotion to first-class abstraction status. Along that vein I was thinking of defining a new range: VLERange, i.e. Variable Length Encoding Range. Such a range would have the power somewhere in between bidirectional and random access. The primitives offered would include empty, access to front and back, popFront and popBack (just like BidirectionalRange), and in addition properties typical of random access ranges: indexing, slicing, and length. Note that the result of the indexing operator is not the same as the element type of the range, as it only represents the unit of encoding. In addition to these (and connecting the two), a VLERange would offer two additional primitives: 1. size_t stepSize(size_t offset) gives the length of the step needed to skip to the next element. 2. size_t backstepSize(size_t offset) gives the size of the _backward_ step that goes to the previous element. In both cases, offset is assumed to be at the beginning of a logical element of the range. I suspect that a lot of functions in std.string can be written without Unicode-specific knowledge just by relying on such an interface. Moreover, algorithms can be generalized to other structures that use variable-length encoding, such as those used in data compression. (In that case, the support would be a bit array and the encoded type would be ubyte.) Writing to such ranges is not addressed by this design. Ideas are welcome. Adding VLERange would legitimize strings and would clarify their handling, at the cost of adding one additional concept that needs to be minded. Is the trade-off worthwhile? While this makes it possible to write algorithms that only accept VLERanges, I don't think it solves the major problem with strings -- they are treated as arrays by the compiler. I'd also rather see an indexing operation return the element type, and have a separate function to get the encoding unit. This makes more sense for generic code IMO. I noticed you never commented on my proposed string type... That reminds me, I should update with suggested changes and re-post it. -Steve
Re: eliminate junk from std.string?
Hi Andrei, It looks nice. Just a small comment: in many of your comments you use words that not all of us might now. For instance: sans. I happen to know it because I studied French, but otherwise I wouldn't know that. I just showed that phrase to a colleague here in Argentina and he didn't understand it. He thought it maybe meant since. Maybe sans and in lieu are memes there in the USA, but not everywhere. So please, stick with English. :-)
Re: eliminate junk from std.string?
Oh, one more thing: can the names be consistent? inpattern countChars expandtabs chompPrefix toupper toupperInPlace ?? If this can't be done for backwards compatibility maybe you can make alias for the previous ones. Also: stripl stripr strip Strips *l*eading and *t*railing whitespaces... It took me some time to notice that it was strip*r* (for right), but the comment says trailing, and I never think of remove right space, always remove trailing spaces (like in the comment!). So why not name that function stript?
Re: eliminate junk from std.string?
On 01/11/2011 04:34 PM, Ary Borenszweig wrote: Oh, one more thing: can the names be consistent? inpattern countChars expandtabs chompPrefix toupper toupperInPlace ?? If this can't be done for backwards compatibility maybe you can make alias for the previous ones. Also: stripl stripr strip Strips *l*eading and *t*railing whitespaces... stripLeft, stripRight Anyway, the necessity for super-cryptic abbreviated names doesn't exist any more. Maybe, they are justified for very frequently used stuff but stripl/stripr is not the case. It took me some time to notice that it was strip*r* (for right), but the comment says trailing, and I never think of remove right space, always remove trailing spaces (like in the comment!). So why not name that function stript?
Re: eliminate junk from std.string?
Yes, what I meant was that the names are stripl and stripr yet the description of those functions are strip leading and strip trailing... at least put strip left and string right on the description so it matches the names.
Re: eliminate junk from std.string?
On 01/11/2011 05:36 PM, Ary Borenszweig wrote: Yes, what I meant was that the names are stripl and stripr yet the description of those functions are strip leading and strip trailing... at least put strip left and string right on the description so it matches the names. Sorry for misunderstanding. I don't think that the description needs to match the names literally. However, I would aviod trailing and leading, because in RTL environments they can have the opposite meaning.
Re: eliminate junk from std.string?
On 1/11/11 6:29 AM, Ary Borenszweig wrote: Hi Andrei, It looks nice. Just a small comment: in many of your comments you use words that not all of us might now. For instance: sans. I happen to know it because I studied French, but otherwise I wouldn't know that. I just showed that phrase to a colleague here in Argentina and he didn't understand it. He thought it maybe meant since. Maybe sans and in lieu are memes there in the USA, but not everywhere. So please, stick with English. :-) Okay. I think sans is Walter's... Andrei
Re: eliminate junk from std.string?
On Tue, 11 Jan 2011 11:39:11 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 1/11/11 6:29 AM, Ary Borenszweig wrote: Hi Andrei, It looks nice. Just a small comment: in many of your comments you use words that not all of us might now. For instance: sans. I happen to know it because I studied French, but otherwise I wouldn't know that. I just showed that phrase to a colleague here in Argentina and he didn't understand it. He thought it maybe meant since. Maybe sans and in lieu are memes there in the USA, but not everywhere. So please, stick with English. :-) Okay. I think sans is Walter's... sans is in the english dictionary: http://www.merriam-webster.com/dictionary/sans According to that reference, Shakespeare used it :) Don't think you can get more English than that... BTW, it would be impossible to phrase everything so everyone who has their specific dialect of English would understand it, I don't think there's much sense in worrying about it. That being said, using 'without' instead of 'sans' is probably fine. -Steve
Re: eliminate junk from std.string?
On 1/11/11 6:34 AM, Ary Borenszweig wrote: Oh, one more thing: can the names be consistent? inpattern countChars expandtabs chompPrefix toupper toupperInPlace ?? If this can't be done for backwards compatibility maybe you can make alias for the previous ones. The names are for compatibility with... other languages :o|. Also: stripl stripr strip Strips *l*eading and *t*railing whitespaces... It took me some time to notice that it was strip*r* (for right), but the comment says trailing, and I never think of remove right space, always remove trailing spaces (like in the comment!). So why not name that function stript? Same thing. These names are imported from other languages. Andrei
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 1/11/11 5:30 AM, Steven Schveighoffer wrote: While this makes it possible to write algorithms that only accept VLERanges, I don't think it solves the major problem with strings -- they are treated as arrays by the compiler. Except when they're not - foreach with dchar... I'd also rather see an indexing operation return the element type, and have a separate function to get the encoding unit. This makes more sense for generic code IMO. But that's neither here nor there. That would return the logical element at a physical position. I am very doubtful that much generic code could work without knowing they are in fact dealing with a variable-length encoding. I noticed you never commented on my proposed string type... That reminds me, I should update with suggested changes and re-post it. To be frank, I think it didn't mark a visible improvement. It solved some problems and brought others. There was disagreement over the offered primitives and their semantics. That being said, it's good you are doing this work. In the best case, you could bring a compelling abstraction to the table. In the worst, you'll become as happy about D's strings as I am :o). Andrei
Re: eliminate junk from std.string?
Andrei Alexandrescu seewebsiteforem...@erdani.org wrote in message news:igi18o$e5...@digitalmars.com... On 1/11/11 6:34 AM, Ary Borenszweig wrote: Oh, one more thing: can the names be consistent? inpattern countChars expandtabs chompPrefix toupper toupperInPlace ?? If this can't be done for backwards compatibility maybe you can make alias for the previous ones. The names are for compatibility with... other languages :o|. Would that other language be Walterish or C? If C, it's not like using the wrong case will suddendly change the semantics of the function. And if the worry is other non-phobos functions that might have the old C-style name (but different semantics), then Ary's suggestion of compatibly-named alases would take care of that.
Re: eliminate junk from std.string?
Steven Schveighoffer schvei...@yahoo.com wrote in message news:op.vo5kspmfeav...@steve-laptop... On Tue, 11 Jan 2011 11:39:11 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 1/11/11 6:29 AM, Ary Borenszweig wrote: Hi Andrei, It looks nice. Just a small comment: in many of your comments you use words that not all of us might now. For instance: sans. I happen to know it because I studied French, but otherwise I wouldn't know that. I just showed that phrase to a colleague here in Argentina and he didn't understand it. He thought it maybe meant since. Maybe sans and in lieu are memes there in the USA, but not everywhere. So please, stick with English. :-) Okay. I think sans is Walter's... sans is in the english dictionary: http://www.merriam-webster.com/dictionary/sans According to that reference, Shakespeare used it :) Don't think you can get more English than that... Thoust words are true. Seriously though, I'm pretty sure a lot of native english speakers don't know sans either, unless they're familiar with font-related terminology. In lieu of is widely-known though, at least in the US.
Re: eliminate junk from std.string?
Am 11.01.2011 19:07, schrieb Nick Sabalausky: Thoust words are true. Seriously though, I'm pretty sure a lot of native english speakers don't know sans either, unless they're familiar with font-related terminology. In lieu of is widely-known though, at least in the US. I'm neither representative nor a native speaker (I'm german) and I knew sans, but didn't know In lieu of.
Re: About std.container.RedBlackTree
On 01/11/2011 02:22 PM, Steven Schveighoffer wrote: A tree is a kind of set, so instead of insert() I'd like a name like add(). (But maybe this is not standard in D). The function names must be consistent across containers, because the point is that complexity and semantic requirements are attached to the function name. The function names were decided long ago by Andrei, and I don't think insert is a bad name (I believe std::set and std::map in C++ STL uses insert). I have thought at this naming issue, precisely, for a while. add is bad because of connotation with addition. D does not use '+' as operator for putting new elements in a container: this is a very sensible choice imo. insert is bad because of in-between connotation: does not fit when putting an element at the end of a seq, even less for unordered containers. put instead seems to me the right term, obvious and general enough: one puts a new element in there. This can nicely adapt to very diverse container types such as sequences including stacks (no explicite index -- put at end), sets/AAs, trees,... Denis _ vita es estrany spir.wikidot.com
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 01/11/2011 05:36 PM, Andrei Alexandrescu wrote: On 1/11/11 4:41 AM, Michel Fortin wrote: On 2011-01-10 22:57:36 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org said: In addition to these (and connecting the two), a VLERange would offer two additional primitives: 1. size_t stepSize(size_t offset) gives the length of the step needed to skip to the next element. 2. size_t backstepSize(size_t offset) gives the size of the _backward_ step that goes to the previous element. I like the idea, but I'm not sure about this interface. What's the result of stepSize if your range must create two elements from one underlying unit? Perhaps in those cases the element type could be an array (to return more than one element from one iteration). For instance, say we have a conversion range taking a Unicode string and converting it to ISO Latin 1. The best (lossy) conversion for œ is oe (one chararacter to two characters), in this case 'front' could simply return oe (two characters) in one iteration, with stepSize being the size of the œ code point. In the same conversion process, encountering e followed by a combining ´ would return pre-combined character é (two characters to one character). In the design as I thought of it, the effective length of one logical element is one or more representation units. My understanding is that you are referring to a fractional number of representation units for one logical element. I think Michel is right. If I understand correctly, VLERange addresses the low-level and rather simple issue of each codepoint beeing encoding as a variable number of code units. Right? If yes, then what is the advantage of VLERange? D already has string/wstring/dstring, allowing to work with the most advatageous encoding according to given source data, and dstring abstracting from low-level encoding issues. The main (and massively ignored) issue when manipulating unicode text is rather that, unlike with legacy character sets, one codepoint does *not* represent a character in the common sense. In character sets like latin-1: * each code represents a character, in the common sense (eg à) * each character representation has the same size (1 or 2 bytes) * each character has a single representation (à -- always 0xe0) All of this is wrong with unicode. And these are complicated and high-level issues, that appear _after_ decoding, on codepoint sequences. If VLERange is helpful is dealing with those problems, then I don't understand your presentation, sorry. Do you for instance mean such a range would, under the hood, group together codes belonging to the same character (thus making indexing meaningful), and/or normalise (decomp order) (thus allowing to comp/find/count correctly).? denis _ vita es estrany spir.wikidot.com
Re: eliminate junk from std.string?
On 01/11/2011 04:11 PM, Max Samukha wrote: Anyway, the necessity for super-cryptic abbreviated names doesn't exist any more. Maybe, they are justified for very frequently used stuff but stripl/stripr is not the case. +++ Standard names should all be as obvious as possible. Then, everyone is free to alias stripLeft stripRight to sl sr ;-) But standard lib should be super clear code; show the right example of what clarity means --not the opposite! And I ask again: what to do with all inherited junk breaking naming rules like uint, size_t, malloc...? Denis _ vita es estrany spir.wikidot.com
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 01/11/2011 02:30 PM, Steven Schveighoffer wrote: On Mon, 10 Jan 2011 22:57:36 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: I've been thinking on how to better deal with Unicode strings. Currently strings are formally bidirectional ranges with a surreptitious random access interface. The random access interface accesses the support of the string, which is understood to hold data in a variable-encoded format. For as long as the programmer understands this relationship, code for string manipulation can be written with relative ease. However, there is still room for writing wrong code that looks legit. Sometimes the best way to tackle a hairy reality is to invite it to the negotiation table and offer it promotion to first-class abstraction status. Along that vein I was thinking of defining a new range: VLERange, i.e. Variable Length Encoding Range. Such a range would have the power somewhere in between bidirectional and random access. The primitives offered would include empty, access to front and back, popFront and popBack (just like BidirectionalRange), and in addition properties typical of random access ranges: indexing, slicing, and length. Note that the result of the indexing operator is not the same as the element type of the range, as it only represents the unit of encoding. In addition to these (and connecting the two), a VLERange would offer two additional primitives: 1. size_t stepSize(size_t offset) gives the length of the step needed to skip to the next element. 2. size_t backstepSize(size_t offset) gives the size of the _backward_ step that goes to the previous element. In both cases, offset is assumed to be at the beginning of a logical element of the range. I suspect that a lot of functions in std.string can be written without Unicode-specific knowledge just by relying on such an interface. Moreover, algorithms can be generalized to other structures that use variable-length encoding, such as those used in data compression. (In that case, the support would be a bit array and the encoded type would be ubyte.) Writing to such ranges is not addressed by this design. Ideas are welcome. Adding VLERange would legitimize strings and would clarify their handling, at the cost of adding one additional concept that needs to be minded. Is the trade-off worthwhile? While this makes it possible to write algorithms that only accept VLERanges, I don't think it solves the major problem with strings -- they are treated as arrays by the compiler. I'd also rather see an indexing operation return the element type, and have a separate function to get the encoding unit. This makes more sense for generic code IMO. I noticed you never commented on my proposed string type... That reminds me, I should update with suggested changes and re-post it. People interested in solving the general problem with Unicode strings may have a look at https://bitbucket.org/denispir/denispir-d. All constructive feedback welcome. (This will be asked for review in a short while. The main / client interface module is Text.d. A (long) presentation of the issues, reasons, solution can be found in the text called U missing level of abstraction) Denis _ vita es estrany spir.wikidot.com
Re: eliminate junk from std.string?
Daniel Gibson metalcae...@gmail.com wrote in message news:igi6n5$27p...@digitalmars.com... Am 11.01.2011 19:07, schrieb Nick Sabalausky: Thoust words are true. Seriously though, I'm pretty sure a lot of native english speakers don't know sans either, unless they're familiar with font-related terminology. In lieu of is widely-known though, at least in the US. I'm neither representative nor a native speaker (I'm german) and I knew sans, but didn't know In lieu of. I guess that just goes to show, we should all just switch to Esperanto ;)
Re: eliminate junk from std.string?
Max Samukha spam...@d-coding.com wrote in message news:ighvca$ap...@digitalmars.com... On 01/11/2011 05:36 PM, Ary Borenszweig wrote: Yes, what I meant was that the names are stripl and stripr yet the description of those functions are strip leading and strip trailing... at least put strip left and string right on the description so it matches the names. Sorry for misunderstanding. I don't think that the description needs to match the names literally. However, I would aviod trailing and leading, because in RTL environments they can have the opposite meaning. I would have thought RTL languages got stored as RTL. If so, then leading and trailing would be correct and left/right would be wrong (unless the internal behavior of stripl and stripr takes language-direction into account, which would surprise me).
Re: eliminate junk from std.string?
On 01/11/2011 07:14 PM, Nick Sabalausky wrote: Daniel Gibsonmetalcae...@gmail.com wrote in message news:igi6n5$27p...@digitalmars.com... Am 11.01.2011 19:07, schrieb Nick Sabalausky: Thoust words are true. Seriously though, I'm pretty sure a lot of native english speakers don't know sans either, unless they're familiar with font-related terminology. In lieu of is widely-known though, at least in the US. I'm neither representative nor a native speaker (I'm german) and I knew sans, but didn't know In lieu of. I guess that just goes to show, we should all just switch to Esperanto ;) No, esperanto is just a heap of language-design errors! Denis _ vita es estrany spir.wikidot.com
Re: eliminate junk from std.string?
On 12/01/11 05:07, Nick Sabalausky wrote: Steven Schveighofferschvei...@yahoo.com wrote in message news:op.vo5kspmfeav...@steve-laptop... On Tue, 11 Jan 2011 11:39:11 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 1/11/11 6:29 AM, Ary Borenszweig wrote: Hi Andrei, It looks nice. Just a small comment: in many of your comments you use words that not all of us might now. For instance: sans. I happen to know it because I studied French, but otherwise I wouldn't know that. I just showed that phrase to a colleague here in Argentina and he didn't understand it. He thought it maybe meant since. Maybe sans and in lieu are memes there in the USA, but not everywhere. So please, stick with English. :-) Okay. I think sans is Walter's... sans is in the english dictionary: http://www.merriam-webster.com/dictionary/sans According to that reference, Shakespeare used it :) Don't think you can get more English than that... Thoust words are true. As an aside you might find some amusement in The Shakespeare Programming Language http://shakespearelang.sourceforge.net/report/shakespeare/
Re: eliminate junk from std.string?
On 01/11/2011 07:01 PM, Nick Sabalausky wrote: The names are for compatibility with... other languages :o|. Would that other language be Walterish or C? If C, it's not like using the wrong case will suddendly change the semantics of the function. And if the worry is other non-phobos functions that might have the old C-style name (but different semantics), then Ary's suggestion of compatibly-named alases would take care of that. Agreed, Ary's suggestion makes much sense. Anyway, when shall we endly get rid of half-a-century-old naming issues? In the XXIInd century? Denis _ vita es estrany spir.wikidot.com
Re: eliminate junk from std.string?
Nick Sabalausky wrote: Andrei Alexandrescu seewebsiteforem...@erdani.org wrote in message news:igi18o$e5...@digitalmars.com... On 1/11/11 6:34 AM, Ary Borenszweig wrote: Oh, one more thing: can the names be consistent? inpattern countChars expandtabs chompPrefix toupper toupperInPlace ?? If this can't be done for backwards compatibility maybe you can make alias for the previous ones. The names are for compatibility with... other languages :o|. Would that other language be Walterish or C? The names generally come from Python, Ruby and Javascript.
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 1/11/11 9:09 AM, spir wrote: On 01/11/2011 05:36 PM, Andrei Alexandrescu wrote: On 1/11/11 4:41 AM, Michel Fortin wrote: On 2011-01-10 22:57:36 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org said: In addition to these (and connecting the two), a VLERange would offer two additional primitives: 1. size_t stepSize(size_t offset) gives the length of the step needed to skip to the next element. 2. size_t backstepSize(size_t offset) gives the size of the _backward_ step that goes to the previous element. I like the idea, but I'm not sure about this interface. What's the result of stepSize if your range must create two elements from one underlying unit? Perhaps in those cases the element type could be an array (to return more than one element from one iteration). For instance, say we have a conversion range taking a Unicode string and converting it to ISO Latin 1. The best (lossy) conversion for œ is oe (one chararacter to two characters), in this case 'front' could simply return oe (two characters) in one iteration, with stepSize being the size of the œ code point. In the same conversion process, encountering e followed by a combining ´ would return pre-combined character é (two characters to one character). In the design as I thought of it, the effective length of one logical element is one or more representation units. My understanding is that you are referring to a fractional number of representation units for one logical element. I think Michel is right. If I understand correctly, VLERange addresses the low-level and rather simple issue of each codepoint beeing encoding as a variable number of code units. Right? If yes, then what is the advantage of VLERange? D already has string/wstring/dstring, allowing to work with the most advatageous encoding according to given source data, and dstring abstracting from low-level encoding issues. It' not about the data, it's about algorithms. Currently there are algorithms that ostensibly work for bidirectional ranges, but internally cheat by detecting that the input is actually a string, and use that knowledge for better implementations. The benefit of VLERange would that that it legitimizes those algorithms. I wouldn't be surprised if an entire class of algorithms would in fact require VLERange (e.g. many of those that we commonly consider today string algorithms). The main (and massively ignored) issue when manipulating unicode text is rather that, unlike with legacy character sets, one codepoint does *not* represent a character in the common sense. In character sets like latin-1: * each code represents a character, in the common sense (eg à) * each character representation has the same size (1 or 2 bytes) * each character has a single representation (à -- always 0xe0) All of this is wrong with unicode. And these are complicated and high-level issues, that appear _after_ decoding, on codepoint sequences. If VLERange is helpful is dealing with those problems, then I don't understand your presentation, sorry. Do you for instance mean such a range would, under the hood, group together codes belonging to the same character (thus making indexing meaningful), and/or normalise (decomp order) (thus allowing to comp/find/count correctly).? VLERange would offer automatic decoding in front, back, popFront, and popBack - just like BidirectionalRange does right now. It would also offer access to the representational support by means of indexing - also like char[] et al already do now. The difference is that VLERange being a formal concept, algorithms can specialize on it instead of (a) specializing for UTF strings or (b) specializing for BidirectionalRange and then manually detecting isSomeString inside. Conversely, when defining an algorithm you can specify VLARange as a requirement. Boyer-Moore is a perfect example - it doesn't work on bidirectional ranges, but it does work on VLARange. I suspect there are many like it. Of course, it would help a lot if we figured other remarkable VLARanges. Here are a few that come to mind: * Multibyte encodings other than UTF. Currently we have no special support for those beyond e.g. forward or bidirectional ranges. * Huffman, RLE, LZ encoded buffers (and many other compressed formats) * Vocabulary-based translation systems, e.g. associate each word with a number. * Others...? Some of these are forward-only (don't allow bidirectional access). Once we have a number of examples, it would be great to figure a number of remarkable algorithms operating on them. Andrei
Re: eliminate junk from std.string?
spir denis.s...@gmail.com wrote in message news:mailman.550.1294771968.4748.digitalmar...@puremagic.com... On 01/11/2011 07:14 PM, Nick Sabalausky wrote: Daniel Gibsonmetalcae...@gmail.com wrote in message news:igi6n5$27p...@digitalmars.com... Am 11.01.2011 19:07, schrieb Nick Sabalausky: Thoust words are true. Seriously though, I'm pretty sure a lot of native english speakers don't know sans either, unless they're familiar with font-related terminology. In lieu of is widely-known though, at least in the US. I'm neither representative nor a native speaker (I'm german) and I knew sans, but didn't know In lieu of. I guess that just goes to show, we should all just switch to Esperanto ;) No, esperanto is just a heap of language-design errors! And that differs from English, how? ;)
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 2011-01-11 11:36:54 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org said: On 1/11/11 4:41 AM, Michel Fortin wrote: For instance, say we have a conversion range taking a Unicode string and converting it to ISO Latin 1. The best (lossy) conversion for œ is oe (one chararacter to two characters), in this case 'front' could simply return oe (two characters) in one iteration, with stepSize being the size of the œ code point. In the same conversion process, encountering e followed by a combining ´ would return pre-combined character é (two characters to one character). In the design as I thought of it, the effective length of one logical element is one or more representation units. My understanding is that you are referring to a fractional number of representation units for one logical element. Your understanding is correct. I think both cases (one becomes many many becomes one) are important and must be supported. Your proposal only deal with the many-becomes-one case. I proposed returning arrays so we can deal with the one-becomes-many case (œ becoming oe). Another idea would be to introduce substeps. When checking for the next character, in addition to determining its step length you could also determine the number of substeps in it. œ would have two substeps, o and e, and when there is no longer any substep you move to the next step. All this said, I think this should stay an implementation detail as this would allow a variety of strategies. Also, keeping this an implementation detail means that your proposed 'stepSize' and 'backstepSize' need to be an implementation detail too (because they won't make sense for the one-to-many case). So they can't really be part of a standard VLE interface. As far as I know, all we really need to expose to algorithms is whether a range has elements of variable length, because this has an impact on your indexing capabilities. The rest seems unnecessary to me, or am I missing some use cases? -- Michel Fortin michel.for...@michelf.com http://michelf.com/
Re: eliminate junk from std.string?
Why care where they come from? Why not make them intuitive? Say, like, Always camel case?
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On Tue, 11 Jan 2011 11:54:08 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 1/11/11 5:30 AM, Steven Schveighoffer wrote: While this makes it possible to write algorithms that only accept VLERanges, I don't think it solves the major problem with strings -- they are treated as arrays by the compiler. Except when they're not - foreach with dchar... This solitary difference is a very thin argument -- foreach(d; byDchar(str)) would be just as good without requiring compiler help. I'd also rather see an indexing operation return the element type, and have a separate function to get the encoding unit. This makes more sense for generic code IMO. But that's neither here nor there. That would return the logical element at a physical position. I am very doubtful that much generic code could work without knowing they are in fact dealing with a variable-length encoding. It depends on the function, and the way the indexing is implemented. I noticed you never commented on my proposed string type... That reminds me, I should update with suggested changes and re-post it. To be frank, I think it didn't mark a visible improvement. It solved some problems and brought others. There was disagreement over the offered primitives and their semantics. It is supposed to be simple, and provide the expected interface, without causing any undue performance degradation. That is, I should be able to do all the things with a replacement string type that I can with a char array today, as efficiently as I can today, except I should have to work to get at the code-units. The huge benefit is that I can say I'm dealing with this as an array when I know it's safe The disagreement will never be fully solved, as there is just as much disagreement about the current state of affairs ;) e.g. should foreach default to using dchar? That being said, it's good you are doing this work. In the best case, you could bring a compelling abstraction to the table. In the worst, you'll become as happy about D's strings as I am :o). I don't think I'll ever be 'happy' with the way strings sit in phobos currently. I typically deal in ASCII (i.e. code units), and phobos works very hard to prevent that. -Steve
Re: eliminate junk from std.string?
Nick Sabalausky wrote: Andrej Mitrovic andrej.mitrov...@gmail.com wrote in message news:mailman.543.1294713068.4748.digitalmar...@puremagic.com... Speaking of regex.. I see there are two enums in std.regex, email and url, which are regular expressions. Why not collect more of these common regexes? And we could pack them up in a struct to avoid polluting the local namespace. I think this might encourage the use of std.regex, since the average Joe wouldn't have to reach for the regex book whenever he's processing strings. E.g.: foreach(m; match(10abc20def30, regex(patterns.number))) // std.regex.patterns.number { writefln(%s[%s]%s, m.pre, m.hit, m.post); } Just a passing thought.. I think that's a great idea. I agree.
Re: either
On Jan 11, 11 17:10, Justin Johansson wrote: On 10/01/11 05:42, Andrei Alexandrescu wrote: I wrote a simple helper, in spirit with some recent discussions: // either struct Either(Ts...) { Tuple!Ts data_; bool opEquals(E)(E e) { foreach (i, T; Ts) { if (data_[i] == e) return true; } return false; } } auto either(Ts...)(Ts args) { return Either!Ts(tuple(args)); } unittest { assert(1 == either(1, 2, 3)); assert(4 != either(1, 2, 3)); assert(abac != either(aasd, s)); assert(abac == either(aasd, abac, s)); } Turns out this is very useful in a variety of algorithms. I just don't know where in std this helper belongs! Any ideas? Despite that it may be very useful as you say, personally I think it is a fundamental no-no to overload the meaning of == in any manner that does not preserve the generally accepted semantics of equality which include the notions of reflexivity, symmetry and transitivity**. **See http://en.wikipedia.org/wiki/Equality_%28mathematics%29 The symmetric and transitive properties of the equality relation imply that if (a == c) is true and if (b == c) is true then (a == b) is also true. In this case the semantics of the overloaded == operator have the expressions 1 == either(1, 2, 3) and 2 == either(1, 2, 3) both evaluating to true and by implication/expectation (1 == 2). Clearly though, (1 == 2) evaluates to false in terms of the commonly accepted meaning of equality. Just my 2 cents and I wonder if there some other way of achieving the desired functionality of your helper without resorting to overloading == and the consequential violation of the commonly held semantics of equality. Cheers, Justin Johansson We could use in instead of == if (1 in oneOf(1, 2, 3)) { ... } if (4 !in oneOf(1, 2, 3)) { ... }
Re: eliminate junk from std.string?
Adam Ruppe wrote: I don't know about bearophile, but I used a lot of the functions you are talking about removing in my HTML - Plain Text conversion function used for emails and other similar environments. squeeze the whitespace, align text, wrap for the target, etc. As has been pointed out, a lot of these seemingly odd functions come from Python/Ruby/Javascript. Users of those languages will be familiar with them, and they've proven themselves handy in those languages. Let's not be cavalier about dumping them just because they aren't familiar to C programmers.
Re: eliminate junk from std.string?
Ary Borenszweig wrote: Why care where they come from? Why not make them intuitive? Say, like, Always camel case? Because people are used to those names due to their wide use. It's the same reason that we still use Qwerty keyboards.
Re: DVCS (was Re: Moving to D)
retard wrote: Ubuntu has a menu entry for restricted drivers. It provides support for both ATI/AMD (Radeon 8500 or better, appeared in 1998 or 1999!) and NVIDIA cards (Geforce 256 or better, appeared in 1999!) and I think it automatically suggests (a pop-up window) correct drivers in the latest releases right after the first install. Intel chips are automatically supported by the open source drivers. VIA and S3 may or may not work out of the box. I'm just a bit curious to know what GPU you have? If it's some ancient VLB (vesa local bus) or ISA card, I can donate $15 for buying one that uses AGP or PCI Express. Ubuntu doesn't support all video formats out of the box, but the media players and browsers automatically suggest loading missing drivers. At least in the 3 or 4 latest releases. Maybe the problem isn't the encoder, it might be the Linux incompatible web site. My mobo is an ASUS M2A-VM. No graphics cards, or any other cards plugged into it. It's hardly weird or wacky or old (it was new at the time I bought it to install Ubuntu). My display is 1920 x 1200. That just seems to cause grief for Ubuntu. Windows has no issues at all with it. Or you could download the latest version from meld's website and compile it yourself. Yeah, I could spend an afternoon doing that. Another one of these jokes? Probably one of the best compiler authors in the whole world uses a whole afternoon doing something (compiling a program) On the other hand, I regularly get emails from people with 10 years of coding experience who are flummoxed by a symbol not defined message from the linker. :-) that total Linux noobs do in less than 30 minutes with the help of Google search. Yeah, I've spent a lot of time googling for solutions to problems with Linux. You know what? I get pages of results from support forums - every solution is different and comes with statements like seems to work, doesn't work for me, etc. The advice is clearly from people who do not know what they are doing, and randomly stab at things, and these are the first page of google results.
Re: eliminate junk from std.string?
Am 11.01.2011 20:42, schrieb Walter Bright: Ary Borenszweig wrote: Why care where they come from? Why not make them intuitive? Say, like, Always camel case? Because people are used to those names due to their wide use. It's the same reason that we still use Qwerty keyboards. And C++ :-P
Re: DVCS (was Re: Moving to D)
retard wrote: One thing came to my mind. Unless you're using Ubuntu 8.04 LTS, I'm using 8.10, and I've noticed that no more updates are coming. your Ubuntu version isn't supported anymore. They might have already removed the package repositories for unsupported versions and that might indeed lead to problems with graphics and video players as you said. What annoyed the heck out of me was the earlier (7.xx) version of Ubuntu *did* work. The support for desktop 8.04 and 9.10 is also nearing its end (April this year). I'd recommend backing up your /home and installing 10.04 LTS or 10.10 instead. Yeah, I know I'll be forced to upgrade soon. One thing that'll make it easier is I abandoned using Ubuntu for multimedia. For example, to play Pandora I now just plug my ipod into my stereo g. I just stopped using youtube on Ubuntu, as I got tired of the video randomly going black, freezing, etc.
Re: eliminate junk from std.string?
Agreed. So what's wrong with improving things and leaving old things as aliases?
Re: eliminate junk from std.string?
Welcome to D. Do you program in C, Javascript, Python or Ruby? Cool! Then you will feel at home. That phrase currently ends like this: You don't? Oh, sorry, you will have to learn that some names are all lowercase, some not. But it could end like this: You don't? Don't worry. D has the convention of writing all function names with X convention, but we keep some aliases for things that we want to keep backwards compatibility for.
Re: either
On 12/01/11 06:28, KennyTM~ wrote: On Jan 11, 11 17:10, Justin Johansson wrote: On 10/01/11 05:42, Andrei Alexandrescu wrote: I wrote a simple helper, in spirit with some recent discussions: unittest { assert(1 == either(1, 2, 3)); assert(4 != either(1, 2, 3)); assert(abac != either(aasd, s)); assert(abac == either(aasd, abac, s)); } Just my 2 cents and I wonder if there some other way of achieving the desired functionality of your helper without resorting to overloading == and the consequential violation of the commonly held semantics of equality. We could use in instead of == if (1 in oneOf(1, 2, 3)) { ... } if (4 !in oneOf(1, 2, 3)) { ... } Nice suggestion. At the end of the day though it basically boils down to having either a binary operator** or a function for it. (** preferably excluding == and other undesirable operator overloads of course).
Re: eliminate junk from std.string?
On 01/11/2011 09:42 PM, Walter Bright wrote: Ary Borenszweig wrote: Why care where they come from? Why not make them intuitive? Say, like, Always camel case? Because people are used to those names due to their wide use. It's the same reason that we still use Qwerty keyboards. We should be careful in assuming what people are used to. Compare: D/Python/Lisp/... - strip .NET/Delphi/Java/Qt/Haskell/... - Trim/trim/trimmed stripl/stripr are TrimStart/TrimEnd in .NET
Re: eliminate junk from std.string?
On 01/11/2011 08:18 PM, Nick Sabalausky wrote: Max Samukhaspam...@d-coding.com wrote in message news:ighvca$ap...@digitalmars.com... On 01/11/2011 05:36 PM, Ary Borenszweig wrote: Yes, what I meant was that the names are stripl and stripr yet the description of those functions are strip leading and strip trailing... at least put strip left and string right on the description so it matches the names. Sorry for misunderstanding. I don't think that the description needs to match the names literally. However, I would aviod trailing and leading, because in RTL environments they can have the opposite meaning. I would have thought RTL languages got stored as RTL. If so, then leading and trailing would be correct and left/right would be wrong (unless the internal behavior of stripl and stripr takes language-direction into account, which would surprise me). AFAIK, there is no universal standard on storing RTL text. There are recommendations to prefer logical order over visual order because visual order is extremely inflexible. I am not an expert in this field and have to shut up.
Re: std.unittests for (final?) review [Update]
Jonathan M Davis napisał: On Monday, January 10, 2011 13:48:50 Tomek Sowiński wrote: Jonathan M Davis napisał: I followed Andrei's suggestion and merged most of the functions into a highly flexible assertPred. I also renamed the functions as suggested and attempted to fully document everything with fully functional examples instead of examples using types or functions which don't actually exist. Did you zip the right file? I still see things like nameFunc and assertPlease. ??? Those are supposed to be there. All examples are tested in the unit tests exactly as they are. I just thought instead of examples using types or functions which don't actually exist meant well-known Phobos functions would be used. On the whole the examples are too long. It's just daunting I can't see docs for *one* function without scrolling. Please give them a solid hair-cut -- max 10 lines with a median of 5. The descriptions are also watered down by over-explanatory writing. Perhaps. If I cut down on the examples though, the usage wouldn't be as clear. The idea was to be thorough. Andrei wanted better examples, so I gave better examples. Not sure if longer means better. However, it is a bit of a balancing act, and I may have put too many in. It's debatable. Nick's suggestion of a main description before each individual overload would help with that. I agree. Perhaps a synopsis for the whole module like in std.variant would help too. So, now there's just assertThrown, assertNotThrown, collectExceptionMsg, and assertPred (though there are eight different overloads of assertPred). So, review away. Some suggestions: assertPred: Try putting expected in front; uniform call syntax can then set it apart from the operands: assertPred!%(7, 5, 2); // old 2.assertPred!%(7, 5); // new I really don't see any value to this. 1. You can't do that with assert, and assertPred is essentially supposed to be a fancy assert. 2. A number of assertPred overloads don't even have an expected, so it would be inconsistent. 3. People already are annoyed enough that the operator doesn't end up between the arguments. Putting the result on the left-hand side of the operator like that would make it that much more confusing. OK, I understand. assertNotThrown: chain the original exception with AssertError as its cause? Oh, this one badly needs a real-life example. I suppose that chaining it would be a good idea. I didn't think of that. But if you want examples, it's used in the unit tests in this very module, and I used it heavily in std.datetime. I meant a real-life example in documentation. People may often ask themselves how is it different than !assertThrown()?. assertThrown: I'd rather see generified collectException (call it collectThrown?). assertThrown may stay as a convenience wrapper, though. ??? I don't get what you're trying for here. assertThrown isn't trying to collect exceptions at all. It's testing whether the given exception was thrown like it's supposed to be for the given function call. If it was, then the assertion succeeded. If it wasn't, then an AssertError is thrown. Just like assert. I mean now collectException doesn't have a parametrized catch block like assertThrown does. If it did, the latter could come down to: void assertThrown(T : Throwable = Exception, F) (lazy F funcToCall, string msg = null, string file = __FILE__, size_t line = __LINE__) { T e = collectThrown!T(funcToCall); if (e is null) throw new AssertError(...); } Shortening assertThrown's implementation is a bonus, main gain is better collectThrown(). [there's more down] Looking at the code I'm seeing the same cancerous coding style std.datetime suffered from (to a lesser extent, I admit). For instance, this routine: if(result != expected) { if(msg.empty) { throw new AssertError(format(`assertPred!%s failed: [%s] %s [%s]: actual [%s], expected [%s].`, op, lhs, op, rhs, result, expected), file, line); } else { throw new AssertError(format(`assertPred!%s failed: [%s] %s [%s]: actual [%s], expected [%s]: %s`, op, lhs, op, rhs, result, expected, msg), file, line);
Re: eliminate junk from std.string?
Ary Borenszweig wrote: Agreed. So what's wrong with improving things and leaving old things as aliases? Clutter. One of the risks with Phobos development is it becoming a river miles wide, and only an inch deep. In other words, endless gobs of shallow, trite functions, with very little depth. (Aliases are as shallow as they get!) As a general rule, I don't want functionality in Phobos that takes more time for a user to find/read/understand the documentation on than to reimplement it himself. Those things give the illusion of comprehensiveness, but are just useless wankery. Do we really want a 1000 page reference manual on Phobos, but no database interface? No network interface? No D lexer? No disassembler? No superfast XML parser? No best-of-breed regex implementation? No CGI support? No HTML parsing? No sound support? No jpg reading? I worry by endless bikeshedding about perfecting the spelling of some name, we miss the whole show. I'd like to see more meat. For example, Don has recently added gamma functions to the math library. These are hard to implement correctly, and are perfect for inclusion.
Re: eliminate junk from std.string?
Walter Bright newshou...@digitalmars.com wrote in message news:igib2q$12g...@digitalmars.com... Adam Ruppe wrote: I don't know about bearophile, but I used a lot of the functions you are talking about removing in my HTML - Plain Text conversion function used for emails and other similar environments. squeeze the whitespace, align text, wrap for the target, etc. As has been pointed out, a lot of these seemingly odd functions come from Python/Ruby/Javascript. Users of those languages will be familiar with them, and they've proven themselves handy in those languages. Let's not be cavalier about dumping them just because they aren't familiar to C programmers. I agree with this reasoning for having them. However, I don't think it means we shouldn't D-ify or Phobos-ify them, at least as far as capitalization conventions.
Re: eliminate junk from std.string?
Nick Sabalausky wrote: I agree with this reasoning for having them. However, I don't think it means we shouldn't D-ify or Phobos-ify them, at least as far as capitalization conventions. I also object to rather pointlessly annoying people wanting to move their code from D1 to D2 by renaming everything. Endlessly renaming things searching for the perfect name gives the illusion of progress, whereas time would be better spent on improving the documentation, unittests, performance, etc. Naming of things isn't nearly as critical an issue in D as it is in, say, C, because of the excellent antihijacking support in D's module system. Some name changes have turned out to be a big win, like invariant = immutable. But I don't think that implies open season for wholesale renaming of swaths of functions.
Re: eliminate junk from std.string?
Ary Borenszweig wrote: Agreed. So what's wrong with improving things and leaving old things as aliases? I want to add that having multiple names for the same thing doesn't really do anyone any good.
Re: filling an array of structures
Brad wrote: Given an array of structures that you need to populate. Also assume the structure is quite large and has many elements to fill in. S s[]; while (something) { s.length += 1; auto sp = s[$-1]; // method 1 sp.a = 1; ... with (s[$-1]) { // method 2 a = 1; } ... foreach (ref sp; s[$-1..$]) { // method 3 sp.a = 1; } } I don't mind 'with' statements, but they have a readability and maintenance problem if their scope is large. The reader would have to be aware of the context of the structure and the local variables, whereas 'sp.a' is self documenting. method 3 is fine, and provides me with a reference to s[$-1], but I'd really like to have: auto sp = ref s[$-1]; // possible method 4 where sp is a reference, but no pointer arithmetic can be done on it. Another alternative would be runtime aliases. alias s[$-1] as sp; Or sp = with (s[$-1]); // I don't much like this syntax... In the meantime, I'll go with method 1. -- Brad I've been using a method in C++, which involves boost::shared_ptr boost::enable_from_shared boost::list_of That was useful when objects had both some required and some optional properties. Anyway... If polymorphism is not needed something similar can be achieved very simply in D: S[] esses = [ S(42), S(100).optional(3) ]; The whole code: import std.stdio; import std.string; struct S { int must_have_; int optional_; this(int must_have) { must_have_ = must_have; } ref S optional(int optional_arg) { optional_ = optional_arg; return this; } string toString() const { return format(%s.%s, must_have_, optional_); } } void main() { S[] esses = [ S(42), S(100).optional(3) ]; writeln(esses); } Ali
Re: eliminate junk from std.string?
On Tue, Jan 11, 2011 at 12:43:28PM -0800, Walter Bright wrote: Naming of things isn't nearly as critical an issue in D as it is in, say, C, because of the excellent antihijacking support in D's module system. And the spell checker will quickly point out messed up capitalization at compile time anyway.
Re: eliminate junk from std.string?
Walter Bright newshou...@digitalmars.com wrote in message news:igibu6$154...@digitalmars.com... Ary Borenszweig wrote: Why care where they come from? Why not make them intuitive? Say, like, Always camel case? Because people are used to those names due to their wide use. It's the same reason that we still use Qwerty keyboards. Then why switch langauges at all? When you move to a different language you expect that language is going to have its own set of conventions. And even more than that, you also expect it to at least be internally-consistent, not a grab-bag of different styles. Are they really supposed to remember Oh, oh, this func comes from this language, so it's capitalized this way, and that one comes from that language so it's capitalized that way... Not only that, but D has far, far bigger, more significant differences from Ruby/Python/JS/etc than the capitalization of a few functions. If people are going to come over and get used to *those* changes, then using toLower instead of tolower is going to be a downright triviality for them. Your cart is before your horse.
Re: eliminate junk from std.string?
Walter Bright newshou...@digitalmars.com wrote in message news:igifgt$1cu...@digitalmars.com... Nick Sabalausky wrote: I agree with this reasoning for having them. However, I don't think it means we shouldn't D-ify or Phobos-ify them, at least as far as capitalization conventions. I also object to rather pointlessly annoying people wanting to move their code from D1 to D2 by renaming everything. Endlessly renaming things searching for the perfect name gives the illusion of progress, whereas time would be better spent on improving the documentation, unittests, performance, etc. Naming of things isn't nearly as critical an issue in D as it is in, say, C, because of the excellent antihijacking support in D's module system. Some name changes have turned out to be a big win, like invariant = immutable. But I don't think that implies open season for wholesale renaming of swaths of functions. We're not asking for free-for-all bikeshedding, we're asking to get rid of the free-for-all naming-convention-carnival in the std lib. Just basic sensible consistency, that's all. And breaking compatibility with D1 for the sake of progress is the whole point of D2.
Re: DVCS (was Re: Moving to D)
On 1/11/11, Walter Bright newshou...@digitalmars.com wrote: Yeah, I've spent a lot of time googling for solutions to problems with Linux. You know what? I get pages of results from support forums - every solution is different and comes with statements like seems to work, doesn't work for me, etc. The advice is clearly from people who do not know what they are doing, and randomly stab at things, and these are the first page of google results. That's my biggest problem with Linux. Having technical problems is not the issue, finding the right solution in the sea of forum posts is the problem. When I have a problem with something breaking down on Windows, most of the time a single google search reveals the solution in one of the very first results (it's either on an MSDN page or one of the more popular forums). This probably has to do with the fact that regular users have either XP or Vista/7 installed. So there's really not much searching you have to do. Once someone posts a solution, that's the end of the story (more often than not). I remember a few years ago I got a copy of Ubuntu, and I wanted to disable antialiased fonts (they looked really bad on the screen). So I simply disabled antialised fonts in one of the display property panels, and thought that would be the end of the story. But guess what? Firefox and other applications don't want to follow the OS settings, and they will override your settings and render websites with antialised fonts. So now I had to search for half an hour to find a solution. I finally find a guide where the instructions are to edit the etc/fonts.conf file. So I do that. But antialised fonts were still active. So I spend another 30 minutes looking for more information. Then I run into another website where the instructions are to delete a couple of fonts from the system. OK. I run the command in the terminal, I reset the system, but then on boot x-org crashes. So now I'm left with a blinking cursor on a black background, with no knowledge whatsover of how to fix x-org or reset its settings. Instinctively I run help and I get back a list of 100 commands, but I can only read the last 20 and I've no idea how to scroll up to read more. So, hours wasted and a broken Linux system all because I wanted to disable antialiased fonts. But that's just one example. I have plenty more. GRUB failing to install properly, GRUB failing to detect all of my windows installations, and then there's that wubi which *does not* work. Of course there are numerous guides on how to fix wubi as well but those fail too. Bleh. I like open-source, Linux - the kernel might be awesome for all I know, but the distributions plain-simple *suck*.
Re: eliminate junk from std.string?
Andrei Alexandrescu Wrote: On 1/9/11 4:51 PM, Andrei Alexandrescu wrote: There's a lot of junk in std.string that should be gone. I'm trying to motivate myself to port some functions to different string widths and... it's not worth it. What functions do you think we should remove from std.string? Let's make a string and then send them the way of the dino. Thanks, Andrei I have uploaded a preview of the changed APIs here: http://d-programming-language.org/cutting-edge/phobos/std_string.html Unclear if iswhite() refers to ASCII whitespace or Unicode. If Unicode, which version of the standard? Same comment for icmp(). Also, in the Unicode standard, case folding can depend on the specific language. There is room for ascii-only functions, but unless a D version of ICU is going to be done separately, it would be nice to have full unicode-aware functions available. You've got chop() marked as deprecated. Is popBack() going to make sense as something that removes a variable number of chars from a string in the CR-LF case? That might be a bit too magical. Rather than zfill, what about modifying ljustify, rjustify, and center to take an optional fill character? One set of functions I'd like to see are startsWith() and endsWith(). I find them frequently useful in Java and an irritating lack in the C++ standard library. Jerry
Re: eliminate junk from std.string?
Jerry Quinn Wrote: One set of functions I'd like to see are startsWith() and endsWith(). I find them frequently useful in Java and an irritating lack in the C++ standard library. Just adding that these functions are useful because they're more efficient than doing a find and checking that the match is in the first position. Jerry
Re: DVCS (was Re: Moving to D)
Am 11.01.2011 22:36, schrieb Walter Bright: Andrej Mitrovic wrote: That's my biggest problem with Linux. Having technical problems is not the issue, finding the right solution in the sea of forum posts is the problem. The worst ones begin with you might try this... or I think this might work, but YMMV... How do these wind up being the top ranked results by google? Who embeds links to that stuff? My experience with Windows is, like yours, the opposite. The top ranked result will be correct and to the point. No weasel wording. Those results are often in big forums like ubuntuforums.org that get a lot of links etc, so even if one thread doesn't have many incoming links, it may still get a top ranking. Also my blog entries (hosted at wordpress.com) get on the google frontpage when looking for the specific topic, even though my blog is mostly unknown, has 2-20 visitors per day and almost no incoming links.. Googles algorithms often do seem like voodoo ;) Also: Many problems (and their correct solutions) heavily depend on your system. What desktop environment is used, what additional stuff (dbus, hal, ...) is used, what are the versions of this stuff (and X.org), what distribution is used, ... There may be different default configurations shipped depending on what distribution (and what version of that distribution) you use, ... So there often is no single correct answer that will work for anyone. Still, in my experience those HOWTOs often work (it may help to look at multiple HOWTOs and compare them if you're not sure, if it applies to your system) or at least push you in the right direction. Cheers, - Daniel
Re: eliminate junk from std.string?
On 12.01.2011 0:47, Jerry Quinn wrote: Jerry Quinn Wrote: One set of functions I'd like to see are startsWith() and endsWith(). I find them frequently useful in Java and an irritating lack in the C++ standard library. Just adding that these functions are useful because they're more efficient than doing a find and checking that the match is in the first position. Jerry Those are present in std.algorithm and seem to work just fine. What's wrong with them? -- Dmitry Olshansky
Re: std.unittests for (final?) review [Update]
On Tuesday, January 11, 2011 12:25:53 Tomek Sowiński wrote: Jonathan M Davis napisał: On Monday, January 10, 2011 13:48:50 Tomek Sowiński wrote: Jonathan M Davis napisał: I followed Andrei's suggestion and merged most of the functions into a highly flexible assertPred. I also renamed the functions as suggested and attempted to fully document everything with fully functional examples instead of examples using types or functions which don't actually exist. Did you zip the right file? I still see things like nameFunc and assertPlease. ??? Those are supposed to be there. All examples are tested in the unit tests exactly as they are. I just thought instead of examples using types or functions which don't actually exist meant well-known Phobos functions would be used. Well, that would be better, but at least when it comes to types, that doesn't work. Not only is Phobos generally lacking in types, but some of the examples which show what a typical error message from the functions would look like require incorrectly implemented types. I might be able to use existing functions for the examples using functions though. assertThrown: I'd rather see generified collectException (call it collectThrown?). assertThrown may stay as a convenience wrapper, though. ??? I don't get what you're trying for here. assertThrown isn't trying to collect exceptions at all. It's testing whether the given exception was thrown like it's supposed to be for the given function call. If it was, then the assertion succeeded. If it wasn't, then an AssertError is thrown. Just like assert. I mean now collectException doesn't have a parametrized catch block like assertThrown does. If it did, the latter could come down to: void assertThrown(T : Throwable = Exception, F) (lazy F funcToCall, string msg = null, string file = __FILE__, size_t line = __LINE__) { T e = collectThrown!T(funcToCall); if (e is null) throw new AssertError(...); } Shortening assertThrown's implementation is a bonus, main gain is better collectThrown(). [there's more down] Looking at the code I'm seeing the same cancerous coding style std.datetime suffered from (to a lesser extent, I admit). For instance, this routine: if(result != expected) { if(msg.empty) { throw new AssertError(format(`assertPred!%s failed: [%s] %s [%s]: actual [%s], expected [%s].`, op, lhs, op, rhs, result, expected), file, line); } else { throw new AssertError(format(`assertPred!%s failed: [%s] %s [%s]: actual [%s], expected [%s]: %s`, op, lhs, op, rhs, result, expected, msg), file, line); } } can be easily compressed to: enforce(result==expected, new AssertError( format([%s] %s [%s] failed: actual [%s], expected [%s] ~ (msg.empty ? . : : %s), op, lhs, op, rhs, result, expected, msg), file, line)); I really have no problem with them being separate as they are. I think that I end up writing them that way because I see them as two separate code paths. It wouldn't necessarily be a bad idea to combine them, but I really don't think that it's a big deal. Another example: { bool thrown = false; try assertNotThrown!AssertError(throwEx(new AssertError(It's an AssertError, __FILE__, __LINE__)), It's a message); catch(AssertError) thrown = true; assert(thrown); } can be: try { assertNotThrown!AssertError(throwEx(new AssertError(It's an AssertError, __FILE__, __LINE__)), It's a message); assert(false); } catch(AssertError) { /*OK*/ } and you don't have to introduce a new scope every time. Doesn't work actually - at least not in the general case (for this particular test, it's arguably okay). It doesn't take into account the case where an exception other than AssertError is
Re: eliminate junk from std.string?
On Tuesday, January 11, 2011 12:44:57 Nick Sabalausky wrote: Walter Bright newshou...@digitalmars.com wrote in message news:igibu6$154...@digitalmars.com... Ary Borenszweig wrote: Why care where they come from? Why not make them intuitive? Say, like, Always camel case? Because people are used to those names due to their wide use. It's the same reason that we still use Qwerty keyboards. Then why switch langauges at all? When you move to a different language you expect that language is going to have its own set of conventions. And even more than that, you also expect it to at least be internally-consistent, not a grab-bag of different styles. Are they really supposed to remember Oh, oh, this func comes from this language, so it's capitalized this way, and that one comes from that language so it's capitalized that way... Not only that, but D has far, far bigger, more significant differences from Ruby/Python/JS/etc than the capitalization of a few functions. If people are going to come over and get used to *those* changes, then using toLower instead of tolower is going to be a downright triviality for them. Your cart is before your horse. I agree. Having the functions named similarly so that they're quickly recognized is good - if a function has a particular name in a variety of languages, why not give it essentially the same name in D? But I don't see why it must be _exactly_ the same name. At least using the same casing as the rest of Phobos. Unless you're directly porting code, the fact that it's toLower instead of tolower really shouldn't be an issue. It's a new a language, a new library, you're going to have to learn how it works anyway. The function names don't need to be _exactly_ the same as other languages. It does look bad when functions in Phobos don't follow the same naming conventions as the rest of it, and it makes it much harder to remember exactly how they're named. So, I'm all for picking names which are essentially the same as functions with the same functionality in other languages, but I think that insisting that the casing of the names match the casing of the functions from other languages when it doesn't match how functions are normally cased in Phobos is definitely a bad idea. Not to mention, I don't think that I've ever heard anyone complain that the casing on a function in Phobos didn't match the casing of a function with essentially the same name in another language, but complaints definitely pop up about how some of the std.string functions don't use the same casing as the rest of Phobos. I vote for consistency. Using essentially the same names for functions as is used in other languages is great. Insisting on the same casing for the function names strikes me as inconsistent and undesirable. I find that it increases the burden of remembering function names rather than reducing it. - Jonathan M Davis
Re: eliminate junk from std.string?
On Tuesday, January 11, 2011 11:12:44 Nick Sabalausky wrote: spir denis.s...@gmail.com wrote in message news:mailman.550.1294771968.4748.digitalmar...@puremagic.com... On 01/11/2011 07:14 PM, Nick Sabalausky wrote: Daniel Gibsonmetalcae...@gmail.com wrote in message news:igi6n5$27p...@digitalmars.com... Am 11.01.2011 19:07, schrieb Nick Sabalausky: Thoust words are true. Seriously though, I'm pretty sure a lot of native english speakers don't know sans either, unless they're familiar with font-related terminology. In lieu of is widely-known though, at least in the US. I'm neither representative nor a native speaker (I'm german) and I knew sans, but didn't know In lieu of. I guess that just goes to show, we should all just switch to Esperanto ;) No, esperanto is just a heap of language-design errors! And that differs from English, how? ;) English wasn't designed. - Jonathan M Davis
Re: DVCS (was Re: Moving to D)
Google does seem to take into account whatever information it has on you, which might explain why your own blog is a top result for you. If I log out of Google and delete my preferences, searching for D won't find anything about the D language in the top results. But if I log in and search D again, the D website will be the top result.
@templated()
(I am busy, I am late with some answers, I am sorry, I will catch up) This paper is Minimizing Dependencies within Generic Classes for Faster and Smaller Programs, by Dan Tsafrir, Bjarne Stroustrup and others: http://www2.research.att.com/~bs/SCARY.pdf The article shows problems of C++/D template bloat, and a way to avoid some of it. It talks a bit about D too, in two points. Near the end it shows an idea for C++-like languages, Figure 21, page 18: templatetypename X, typename Y, typename Z struct C { void f1() utilizes X,Z { // only allowed to use X or Z, not Y } void f2() { // for backward compatibility, this is // equivalent to: void f2() utilizes X,Y,Z } class Inner_t utilizes Y { // only allowed to use Y, not X nor Z }; }; I have adapted it to a possible syntax for D: struct C(X, Y, Z) { // only allowed to use X or Z, not Y @templated(X,Z) void f1() { } // for backward compatibility, this is // equivalent to: @templated(X,Y,Z) void f2() void f2() { } // only allowed to use Y, not X nor Z @templated(Y) static class Inner { } } The purpose of @templated() is to help the compiler avoid some template bloat. Here the class Inner is allowed to use just the Y template argument of C, this means that if you instantiate C in two ways like this: C!(int, int, float) C!(float, int, double) The Y doesn't change, so the compiler instantiates the code of Inner only once. If you try to use X or Z in Inner it will not compile. A sufficiently smart compiler is able to remove duplicated functions with no need of @templated(), in practice an annotation may help reduce compiler work or compilation time, to produce smaller code. It also helps document a bit of the semantics of the code, an enforced documentation. Bye, bearophile
Re: @templated()
I think that hardcoding instructions in user code for how the compiler should do its optimizations is a bad idea. But that's just me!
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
Andrei Alexandrescu napisał: I've been thinking on how to better deal with Unicode strings. Currently strings are formally bidirectional ranges with a surreptitious random access interface. The random access interface accesses the support of the string, which is understood to hold data in a variable-encoded format. For as long as the programmer understands this relationship, code for string manipulation can be written with relative ease. However, there is still room for writing wrong code that looks legit. Sometimes the best way to tackle a hairy reality is to invite it to the negotiation table and offer it promotion to first-class abstraction status. Along that vein I was thinking of defining a new range: VLERange, i.e. Variable Length Encoding Range. Such a range would have the power somewhere in between bidirectional and random access. The primitives offered would include empty, access to front and back, popFront and popBack (just like BidirectionalRange), and in addition properties typical of random access ranges: indexing, slicing, and length. For some compressions implementing *back is troublesome if not impossible... Note that the result of the indexing operator is not the same as the element type of the range, as it only represents the unit of encoding. It's worth to mention it explicitly -- a VLERange is dually typed. It's important for searching. Statically check if original and encoded match, if so, perform fast search on directly on encoded elements. I think an important feature of a VLERange should be dropping itself down to a encoded-typed range, so that front and back return raw data. Dual typing will also affect foreach -- in general case you'd want to choose whether to decode or not by typing the element. I can't stop thinking that VLERange is a two-piece bikini making a bare random-access range safe to look at, and that you can take off when partners have confidence, not a limited random-access probing facility to span the void between front and back. In addition to these (and connecting the two), a VLERange would offer two additional primitives: 1. size_t stepSize(size_t offset) gives the length of the step needed to skip to the next element. 2. size_t backstepSize(size_t offset) gives the size of the _backward_ step that goes to the previous element. In both cases, offset is assumed to be at the beginning of a logical element of the range. So when I move the spinner in an iPod, I get catapulted in position with the raw data opIndex and from there I try to work my way to the next frame to start playback. Sounds promising. I suspect that a lot of functions in std.string can be written without Unicode-specific knowledge just by relying on such an interface. Moreover, algorithms can be generalized to other structures that use variable-length encoding, such as those used in data compression. (In that case, the support would be a bit array and the encoded type would be ubyte.) I agree, acknowledging encoding/compression as a general direction will bring substantial benefits. Writing to such ranges is not addressed by this design. Ideas are welcome. Yeah, we can address outputting later, that's fair. Adding VLERange would legitimize strings and would clarify their handling, at the cost of adding one additional concept that needs to be minded. Is the trade-off worthwhile? Well, the only way to find out is try it. My advice: VLERanges originated as a solution to the string problem, so start with a non-string incarnation. Having at least two (one, we know, is string) plugs that fit the same socket will spur confidence in the abstraction. -- Tomek
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 1/11/11 11:13 AM, Michel Fortin wrote: On 2011-01-11 11:36:54 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org said: On 1/11/11 4:41 AM, Michel Fortin wrote: For instance, say we have a conversion range taking a Unicode string and converting it to ISO Latin 1. The best (lossy) conversion for œ is oe (one chararacter to two characters), in this case 'front' could simply return oe (two characters) in one iteration, with stepSize being the size of the œ code point. In the same conversion process, encountering e followed by a combining ´ would return pre-combined character é (two characters to one character). In the design as I thought of it, the effective length of one logical element is one or more representation units. My understanding is that you are referring to a fractional number of representation units for one logical element. Your understanding is correct. I think both cases (one becomes many many becomes one) are important and must be supported. Your proposal only deal with the many-becomes-one case. I disagree. When I suggested this design I was worried of over-abstracting. Now this looks like abstracting for stuff that hasn't even been addressed concretely yet. Besides, using bit as an encoding unit sounds like an acceptable approach for anything fractional. I proposed returning arrays so we can deal with the one-becomes-many case (œ becoming oe). Another idea would be to introduce substeps. When checking for the next character, in addition to determining its step length you could also determine the number of substeps in it. œ would have two substeps, o and e, and when there is no longer any substep you move to the next step. All this said, I think this should stay an implementation detail as this would allow a variety of strategies. Also, keeping this an implementation detail means that your proposed 'stepSize' and 'backstepSize' need to be an implementation detail too (because they won't make sense for the one-to-many case). So they can't really be part of a standard VLE interface. If you don't have at least stepSize that tells you how large the stride is to get to the next element, it becomes impossible to move within the range using integral indexes. As far as I know, all we really need to expose to algorithms is whether a range has elements of variable length, because this has an impact on your indexing capabilities. The rest seems unnecessary to me, or am I missing some use cases? I think you could say that you don't really need stepSize because you can compute it as follows: auto r1 = r; r1.popFront(); size_t stepSize = r.length - r1.length; This is tenuous, inefficient, and impossible if the support range doesn't support length (I realize that variable-length encodings work over other ranges than random access, but then again this may be an overgeneralization). Andrei
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 1/11/11 11:21 AM, Steven Schveighoffer wrote: On Tue, 11 Jan 2011 11:54:08 -0500, Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 1/11/11 5:30 AM, Steven Schveighoffer wrote: While this makes it possible to write algorithms that only accept VLERanges, I don't think it solves the major problem with strings -- they are treated as arrays by the compiler. Except when they're not - foreach with dchar... This solitary difference is a very thin argument -- foreach(d; byDchar(str)) would be just as good without requiring compiler help. I'd also rather see an indexing operation return the element type, and have a separate function to get the encoding unit. This makes more sense for generic code IMO. But that's neither here nor there. That would return the logical element at a physical position. I am very doubtful that much generic code could work without knowing they are in fact dealing with a variable-length encoding. It depends on the function, and the way the indexing is implemented. I noticed you never commented on my proposed string type... That reminds me, I should update with suggested changes and re-post it. To be frank, I think it didn't mark a visible improvement. It solved some problems and brought others. There was disagreement over the offered primitives and their semantics. It is supposed to be simple, and provide the expected interface, without causing any undue performance degradation. That is, I should be able to do all the things with a replacement string type that I can with a char array today, as efficiently as I can today, except I should have to work to get at the code-units. The huge benefit is that I can say I'm dealing with this as an array when I know it's safe Unfinished sentence? Anyway, for my money you just described what we have now. The disagreement will never be fully solved, as there is just as much disagreement about the current state of affairs ;) e.g. should foreach default to using dchar? I disagree about the disagreement being unsolvable. I'm not rigid; if I saw a terrific abstraction in your string, I'd be all for it. It just shuffles some issues about, and although I agree it does one thing or two better than char[], at the end of the day it doesn't carry its weight. That being said, it's good you are doing this work. In the best case, you could bring a compelling abstraction to the table. In the worst, you'll become as happy about D's strings as I am :o). I don't think I'll ever be 'happy' with the way strings sit in phobos currently. I typically deal in ASCII (i.e. code units), and phobos works very hard to prevent that. I wonder if we could and should extend some of the functions in std.string to work with ubyte[]. I did add a function called representation() that I didn't document yet. Essentially representation gives you the ubyte[], ushort[], or uint[] underneath a string, with the same qualifiers. Whenever you want an algorithm to work on ASCII in earnest, you can pass representation(s) to it instead of s. If you work a lot with ASCII, an AsciiString abstraction may be a better and more likely to be successful string type. Better yet, you could simply focus on AsciiChar and then define ASCII strings as arrays of AsciiChar. Andrei
Re: eliminate junk from std.string?
On 1/11/11 11:21 AM, Ary Borenszweig wrote: Why care where they come from? Why not make them intuitive? Say, like, Always camel case? If there's enough support for this, I'll do it. Andrei
Re: eliminate junk from std.string?
On 1/12/11 12:00 AM, Andrei Alexandrescu wrote: If there's enough support for this, I'll do it. Andrei +1 from me – sticking to names commonly used in other programming languages is good for ease of adoption, but also inheriting the various naming convention is, in my humble opinion, just plain weird. David
Re: @templated()
I've now remembered that I have discussed this a bit in past, I am sorry for the partially dupe thread: http://www.digitalmars.com/d/archives/digitalmars/D/Few_ideas_to_reduce_template_bloat_108136.html Bye, bearophile
Re: eliminate junk from std.string?
On 1/11/11 1:47 PM, Jerry Quinn wrote: Jerry Quinn Wrote: One set of functions I'd like to see are startsWith() and endsWith(). I find them frequently useful in Java and an irritating lack in the C++ standard library. Just adding that these functions are useful because they're more efficient than doing a find and checking that the match is in the first position. Jerry They're in std.algorithm. Andrei
Re: eliminate junk from std.string?
So what's a good use for aliases?
Re: @templated()
Can't the compiler see what is used and where?
Re: eliminate junk from std.string?
On 1/11/11 1:45 PM, Jerry Quinn wrote: Andrei Alexandrescu Wrote: On 1/9/11 4:51 PM, Andrei Alexandrescu wrote: There's a lot of junk in std.string that should be gone. I'm trying to motivate myself to port some functions to different string widths and... it's not worth it. What functions do you think we should remove from std.string? Let's make a string and then send them the way of the dino. Thanks, Andrei I have uploaded a preview of the changed APIs here: http://d-programming-language.org/cutting-edge/phobos/std_string.html Unclear if iswhite() refers to ASCII whitespace or Unicode. If Unicode, which version of the standard? Not sure. enum dchar LS = '\u2028'; /// UTF line separator enum dchar PS = '\u2029'; /// UTF paragraph separator bool iswhite(dchar c) { return c = 0x7F ? indexOf(whitespace, c) != -1 : (c == PS || c == LS); } Which version? Same comment for icmp(). Also, in the Unicode standard, case folding can depend on the specific language. That uses toUniLower. Not sure how that works. There is room for ascii-only functions, but unless a D version of ICU is going to be done separately, it would be nice to have full unicode-aware functions available. Yah, I'm increasingly thinking of defining an AsciiChar entity and perhaps a Zstring one for zero-terminated strings. You've got chop() marked as deprecated. Is popBack() going to make sense as something that removes a variable number of chars from a string in the CR-LF case? That might be a bit too magical. Well I found little use for chop in e.g. Perl. People either use chomp or want to remove the last character. I think chop is useless. Rather than zfill, what about modifying ljustify, rjustify, and center to take an optional fill character? Yah, I wanted to do that but postponed because it's quite a bit of work with general dchars etc. One set of functions I'd like to see are startsWith() and endsWith(). I find them frequently useful in Java and an irritating lack in the C++ standard library. Yah, those are in std.algorithm. Ideally we'd move everything that's applicable beyond strings to std.algorithm. Andrei
Re: eliminate junk from std.string?
On Tuesday, January 11, 2011 15:29:54 Ary Borenszweig wrote: So what's a good use for aliases? Oh, there's not necessarily anything wrong with aliases. The problem is if an API has a lot of them. The typical place to use typedef in C++ is when you have long, nasty template types which you don't want to actually have to type out, and while auto and D's improved templates reduce the need for that sort of typedef, I'm sure that folks will still want to use them for that sort of thing. Personally, I've used them for three things: 1. When there's a templated function that you want to be able to call with a set of specific names. A prime example would be get on core.time.Duration. It properly genericizes dealing that functionality, but it would be annoying to have to type duration.get!days(), duration.get!hours, etc. all over the place, so it aliases them to the properties days, hours, etc. 2. Deprecating a function name. For instance, let's say that we rename splitl to splitL or SplitLeft in std.string. Having a deprecated alias to splitl would avoid immediately breaking code. 3. In the new std.datetime, DateTimeException is an alias of core.time.TimeException, so that you can use the same exception type throughout the time stuff (std.datetime also publicly imports core.time) without worrying whether it was core.time or std.datetime which threw the exception and yet still have an exception type with the same name as the module as is typical in a number of Phobos modules. So, you get one exception type for all of the time code but still follow the typical naming convention. However, none of these are things that I'd do very often. alias is a tool that can be very handy at times, and I think that it's very good that we have, it but using it all over the place is likely ill-advised - especially if all you're really doing with it is making it possible to call the same function with different names. I'd say that, on the whole, aliases should be used when they simplify code or when renaming functions or types, and you want a good deprecation path, but other than that, in general, it's probably not a good idea to use them much. - Jonathan M Davis
Re: eliminate junk from std.string?
On Tuesday, January 11, 2011 16:07:11 Daniel Gibson wrote: Am 12.01.2011 00:59, schrieb Jonathan M Davis: On Tuesday, January 11, 2011 15:29:54 Ary Borenszweig wrote: So what's a good use for aliases? 2. Deprecating a function name. For instance, let's say that we rename splitl to splitL or SplitLeft in std.string. Having a deprecated alias to splitl would avoid immediately breaking code. Isn't this exactly what Ary had in mind? :-) No, or at least that's not the impression that I got. I understood that he meant to have to aliases around permanently. It's just confusing and adds clutter to do things like have both splitl and splitLeft (or splitL or whotever splitl got renamed to) around in the long run. _That_ is what Andrei and Walter is objecting to. Renaming a function and having a deprecated alias to the old name for a few releases eases the transition would definitely be good practice. aliasing a function just to have another name for the same thing wouldn't be good practice. There has to be a real benefit to having the second name. Providing a smooth deprecation route would be a case where there's a real benefit. - Jonathan M Davis
Re: eliminate junk from std.string?
Am 12.01.2011 01:17, schrieb Jonathan M Davis: On Tuesday, January 11, 2011 16:07:11 Daniel Gibson wrote: Am 12.01.2011 00:59, schrieb Jonathan M Davis: On Tuesday, January 11, 2011 15:29:54 Ary Borenszweig wrote: So what's a good use for aliases? 2. Deprecating a function name. For instance, let's say that we rename splitl to splitL or SplitLeft in std.string. Having a deprecated alias to splitl would avoid immediately breaking code. Isn't this exactly what Ary had in mind? :-) No, or at least that's not the impression that I got. I understood that he meant to have to aliases around permanently. It's just confusing and adds clutter to do things like have both splitl and splitLeft (or splitL or whotever splitl got renamed to) around in the long run. _That_ is what Andrei and Walter is objecting to. Renaming a function and having a deprecated alias to the old name for a few releases eases the transition would definitely be good practice. aliasing a function just to have another name for the same thing wouldn't be good practice. There has to be a real benefit to having the second name. Providing a smooth deprecation route would be a case where there's a real benefit. - Jonathan M Davis Ok, you're right, that is a slight difference. Deprecating them is certainly a good idea, but I'd suggest to keep the deprecated aliases around for longer (until D3), so anybody porting a Phobos1-based application to D2/Phobos2 can use them, even if he doesn't do this within the next few releases. Cheers, - Daniel
Re: eliminate junk from std.string?
On 2011-01-12 01:00:51 +0200, Andrei Alexandrescu said: On 1/11/11 11:21 AM, Ary Borenszweig wrote: Why care where they come from? Why not make them intuitive? Say, like, Always camel case? If there's enough support for this, I'll do it. Andrei ++vote. Uniformity in how functions are named will improve readibility.
Re: DVCS (was Re: Moving to D)
Andrej Mitrovic Wrote: Google does seem to take into account whatever information it has on you, which might explain why your own blog is a top result for you. If I log out of Google and delete my preferences, searching for D won't find anything about the D language in the top results. But if I log in and search D again, the D website will be the top result. Best place to go for ranking information on your website: https://www.google.com/webmasters/tools/home?hl=enpli=1 Need to show you own the site though.
Re: DVCS (was Re: Moving to D)
Walter Bright newshou...@digitalmars.com wrote in message news:igb5uo$26a...@digitalmars.com... Vladimir Panteleev wrote: From taking a quick look, I don't see meld's advantage over WinMerge (other than being cross-platform). Thanks for pointing me at winmerge. I've been looking for one to work on Windows. Beyond Compare and Ultra Compare
Re: eliminate junk from std.string?
You are right, deprecating those names and removing them in the long run is what I think should be done.
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 01/11/2011 08:09 PM, Andrei Alexandrescu wrote: The main (and massively ignored) issue when manipulating unicode text is rather that, unlike with legacy character sets, one codepoint does *not* represent a character in the common sense. In character sets like latin-1: * each code represents a character, in the common sense (eg à) * each character representation has the same size (1 or 2 bytes) * each character has a single representation (à -- always 0xe0) All of this is wrong with unicode. And these are complicated and high-level issues, that appear _after_ decoding, on codepoint sequences. If VLERange is helpful is dealing with those problems, then I don't understand your presentation, sorry. Do you for instance mean such a range would, under the hood, group together codes belonging to the same character (thus making indexing meaningful), and/or normalise (decomp order) (thus allowing to comp/find/count correctly).? VLERange would offer automatic decoding in front, back, popFront, and popBack - just like BidirectionalRange does right now. It would also offer access to the representational support by means of indexing - also like char[] et al already do now. IIUC, for the case of text, VLERange helps abstracting from the annoying fact that a codepoint is encoded as a variable number of code units. What I meant is issues like: auto text = a\u0302d; writeln(text); // â auto range = VLERange(text); // extracts characters correctly? auto letter = range.front();// a or â? // case yes: compares correctly? assert(range.front() == â); // fail or pass? Both fail using all unicode-aware types I know of, because 1. They do not recognise that a character is represented by an arbitrary number of codes (code _points_). 2. They do not use normalised forms for comp, search, count, etc... (while in unicode a given char can have several representations). The difference is that VLERange being a formal concept, algorithms can specialize on it instead of (a) specializing for UTF strings or (b) specializing for BidirectionalRange and then manually detecting isSomeString inside. Conversely, when defining an algorithm you can specify VLARange as a requirement. Boyer-Moore is a perfect example - it doesn't work on bidirectional ranges, but it does work on VLARange. I suspect there are many like it. Of course, it would help a lot if we figured other remarkable VLARanges. I think I see the point, and the general usefulness of such an abstraction. But it would certainly be more useful in other fields than text manipulation, because there are far more annoying issues (that, like in example above, simply prevent code correctness). Denis _ vita es estrany spir.wikidot.com
Re: eliminate junk from std.string?
On Tuesday, January 11, 2011 16:23:13 Daniel Gibson wrote: Am 12.01.2011 01:17, schrieb Jonathan M Davis: On Tuesday, January 11, 2011 16:07:11 Daniel Gibson wrote: Am 12.01.2011 00:59, schrieb Jonathan M Davis: On Tuesday, January 11, 2011 15:29:54 Ary Borenszweig wrote: So what's a good use for aliases? 2. Deprecating a function name. For instance, let's say that we rename splitl to splitL or SplitLeft in std.string. Having a deprecated alias to splitl would avoid immediately breaking code. Isn't this exactly what Ary had in mind? :-) No, or at least that's not the impression that I got. I understood that he meant to have to aliases around permanently. It's just confusing and adds clutter to do things like have both splitl and splitLeft (or splitL or whotever splitl got renamed to) around in the long run. _That_ is what Andrei and Walter is objecting to. Renaming a function and having a deprecated alias to the old name for a few releases eases the transition would definitely be good practice. aliasing a function just to have another name for the same thing wouldn't be good practice. There has to be a real benefit to having the second name. Providing a smooth deprecation route would be a case where there's a real benefit. - Jonathan M Davis Ok, you're right, that is a slight difference. Deprecating them is certainly a good idea, but I'd suggest to keep the deprecated aliases around for longer (until D3), so anybody porting a Phobos1-based application to D2/Phobos2 can use them, even if he doesn't do this within the next few releases. Well, leaving an alias until D3 would equate to a permanent alias in D2, which is exactly what Walter and Andrei don't want (and I don't either). There's already plenty in Phobos 2 that's different from Phobos 1. So, while I don't think that we should rename stuff just to rename stuff, I also don't think that we should keep aliases around just to make porting D1 code easier - especially when most D1 code is probably using Tango anyway. We don't really have a policy in place for how long deprecation should last prior to outright removal, but until D3 is definitely too long. I would have thought that the question would be more along the lines of whether it should be a couple of releases or more like 6 months to a year before removing deprecated functions and modules at this point, not whether something will remain deprecated until D3. - Jonathan M Davis
Re: eliminate junk from std.string?
On 01/11/2011 09:11 PM, Ary Borenszweig wrote: Welcome to D. Do you program in C, Javascript, Python or Ruby? Cool! Then you will feel at home. That phrase currently ends like this: You don't? Oh, sorry, you will have to learn that some names are all lowercase, some not. But it could end like this: You don't? Don't worry. D has the convention of writing all function names with X convention, but we keep some aliases for things that we want to keep backwards compatibility for. Yop. And anyway those legacy names are not all the same in C, Javascript, Python, Ruby, etc.. One has to be chosen or created for D, why not follow a guideline for the standard D name? (I really cannot (under)stand this general politic of sticking at wrong design choices from the past for generations and generations --even in brand new languages. How do improvements happen in other fields than programming? One day or the other, one needs to throw away old (mental) garbage.) Denis _ vita es estrany spir.wikidot.com
levenshteinDistanceAndPath Source bug
Hello, there is a bug at std.algorithm source. dsource,org's source: 4120levenshteinDistanceAndPath(alias equals = a == b, Range1, Range2) 4121(Range1 s, Range2 t) 4122if (isForwardRange!(Range1) isForwardRange!(Range2)) 4123{ 4124Levenshtein!(Range, binaryFun!(equals)) lev; 'Range' at line 4124( 3975 at my downloaded dmd 2.051 ) should be 'Range1' ? The windows lib binary seems ok if this source line is fixed.
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
Sorry if I'm jumping inhere without the appropriate background, but I don't understand why jumping through these hoops are necessary. Please let me know if I'm missing anything. Many problems can be solved by another layer of indirection. Isn't a string essentially a bidirectional range of code points built on top of a random access range of code units? It seems to me that each abstraction separately already fits within the existing D range framework and all the difficulties arise as a consequence of trying to lump them into a single abstraction. Why not choose which of these abstractions is most appropriate in a given situation instead of trying to shoe-horn both concepts into a single abstraction, and provide for easy conversion between them? When character representation is the primary requirement then make it a bidirectional range of code points. When storage representation and random access is required then make it a random access range of code units.
Re: DVCS (was Re: Moving to D)
Daniel Gibson metalcae...@gmail.com wrote in message news:igijc7$27p...@digitalmars.com... Am 11.01.2011 22:36, schrieb Walter Bright: Andrej Mitrovic wrote: That's my biggest problem with Linux. Having technical problems is not the issue, finding the right solution in the sea of forum posts is the problem. The worst ones begin with you might try this... or I think this might work, but YMMV... How do these wind up being the top ranked results by google? Who embeds links to that stuff? My experience with Windows is, like yours, the opposite. The top ranked result will be correct and to the point. No weasel wording. Those results are often in big forums like ubuntuforums.org that get a lot of links etc, so even if one thread doesn't have many incoming links, it may still get a top ranking. Also my blog entries (hosted at wordpress.com) get on the google frontpage when looking for the specific topic, even though my blog is mostly unknown, has 2-20 visitors per day and almost no incoming links.. Googles algorithms often do seem like voodoo ;) Also: Many problems (and their correct solutions) heavily depend on your system. What desktop environment is used, what additional stuff (dbus, hal, ...) is used, what are the versions of this stuff (and X.org), what distribution is used, ... There may be different default configurations shipped depending on what distribution (and what version of that distribution) you use, ... So there often is no single correct answer that will work for anyone. Still, in my experience those HOWTOs often work (it may help to look at multiple HOWTOs and compare them if you're not sure, if it applies to your system) or at least push you in the right direction. That's probably one of the biggest things that's always bothered me about linux (not that there aren't plenty of other things that bother me about every other OS in existence). For something that's considered so standards-compliant/standards-friendly (compared to, say MS), it's painfully *un*standardized.
D standard style [was: Re: eliminate junk from std.string?]
On 01/12/2011 12:07 AM, Daniel Gibson wrote: Am 12.01.2011 00:00, schrieb Andrei Alexandrescu: On 1/11/11 11:21 AM, Ary Borenszweig wrote: Why care where they come from? Why not make them intuitive? Say, like, Always camel case? If there's enough support for this, I'll do it. Andrei Please do, having different naming conventions of functions within the standard library makes it harder to remember the exact spelling of a function and also doesn't look professional. +1 vote for making the standard library comply with the D style guide[1] +1 as well But while we're at conventions, and before any change is actually done, we may take the opportunity to agree not only on morphology, but on semantics ;-) For instance, from online doc: string capitalize(string s); Capitalize first character of string s[], convert rest of string s[] to lower case. Then, use it: auto s = capital; s.capitalize(); writeln(s); // capital Uh? Not only the name is misleading, but the doc as well. For this kind of issue, some guidelines read like: * perform an action -- action verb (eg capitalise: changes the passed string) * return a result -- named after result (eg capitalised: return new string) Sure, the func's interface also tells the reader what's actually done. But having name (and doc) contradict it is not very helpful. And beeing forced to open the doc or even the source for every unknown bit is an annoying obstacle. There are probably other common issues like this. My personal evaluation is whether some newcomer can guess the purpose of the func, the type, the constant, etc... I would also vote for: * full words, except for rare exception used everywhere in programming _and_ really helpful (eg OS) * get rid of obscure, ambiguous, or misleading namings * when possible, use international words rather than english-only (eg section better than slice if everything else equal) Finally, take the opportunity to make the doc usable, eg: string format(...); Format arguments into a string. ??? Denis _ vita es estrany spir.wikidot.com
Re: levenshteinDistanceAndPath Source bug
tsukikage wrote: Hello, there is a bug at std.algorithm source. dsource,org's source: 4120 levenshteinDistanceAndPath(alias equals = a == b, Range1, Range2) 4121 (Range1 s, Range2 t) 4122 if (isForwardRange!(Range1) isForwardRange!(Range2)) 4123 { 4124 Levenshtein!(Range, binaryFun!(equals)) lev; 'Range' at line 4124( 3975 at my downloaded dmd 2.051 ) should be 'Range1' ? The windows lib binary seems ok if this source line is fixed. sorry, wrong place, please ignore.
Re: levenshteinDistanceAndPath Source bug
On 1/11/11 5:28 PM, tsukikage wrote: tsukikage wrote: Hello, there is a bug at std.algorithm source. dsource,org's source: 4120 levenshteinDistanceAndPath(alias equals = a == b, Range1, Range2) 4121 (Range1 s, Range2 t) 4122 if (isForwardRange!(Range1) isForwardRange!(Range2)) 4123 { 4124 Levenshtein!(Range, binaryFun!(equals)) lev; 'Range' at line 4124( 3975 at my downloaded dmd 2.051 ) should be 'Range1' ? The windows lib binary seems ok if this source line is fixed. sorry, wrong place, please ignore. Fixed and readded unittest: http://www.dsource.org/projects/phobos/changeset/2315 http://www.dsource.org/projects/phobos/changeset/2316 To post bugs, you may want to go to http://d.puremagic.com/issues. What you post there will automatically appear in digitalmars.d.bugs (no need to post there). Andrei
Re: eliminate junk from std.string?
On 01/12/2011 02:17 AM, Daniel Gibson wrote: Somewhere in this thread: Am 11.01.2011 21:43, schrieb Walter Bright: Nick Sabalausky wrote: I agree with this reasoning for having them. However, I don't think it means we shouldn't D-ify or Phobos-ify them, at least as far as capitalization conventions. I also object to rather pointlessly annoying people wanting to move their code from D1 to D2 by renaming everything. Endlessly renaming things searching for the perfect name gives the illusion of progress, whereas time would be better spent on improving the documentation, unittests, performance, etc. So his objection was specifically that renaming those functions could annoy people migrating D1 code (and certainly he meant Phobos1 users, because Tango-people either port (parts of) Tango or will have to rewrite that anyway). So, to accomplish that goal (not annoying those people), these aliases should be kept for longer. (An alternative may be to one/some phobos1-compat modules that contain such aliases and maybe even wrappers with old signatures for new functions, that could be imported to ease porting of old applications. That would have the benefit of not cluttering the regular Phobos2 modules with that legacy stuff.) When D2 / Phobos2 stabilise, what about a semi-automatic porting tool (at least signaling potential issues, first of all occurrences of deprecated stdlib names)? Denis _ vita es estrany spir.wikidot.com
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 01/12/2011 02:22 AM, Andrei Alexandrescu wrote: IIUC, for the case of text, VLERange helps abstracting from the annoying fact that a codepoint is encoded as a variable number of code units. What I meant is issues like: auto text = a\u0302d; writeln(text); // â auto range = VLERange(text); // extracts characters correctly? auto letter = range.front(); // a or â? // case yes: compares correctly? assert(range.front() == â); // fail or pass? You should try text.front right now, you might be surprised :o). Hum, right now incorrectly returns a as expected. And indeed assert (â == a\u0302); incorrectly fails as expected. Both would work with legacy charsets like latin-1. This is a new issue introduced with UCS, that requires an additional level of abstraction (in addition to the one required by the distincton codepoint/codeunit!) You may have a look at https://bitbucket.org/denispir/denispir-d/src/5ec6fe1e1065/Text.html for a rough implementation of a type that does the right thing, at https://bitbucket.org/denispir/denispir-d/src/5ec6fe1e1065/U%20missing%20level%20of%20abstraction for a (far too long) explanation. (I have tried to mention those problems a dozen times already, but for any reason nearly everybody seem definitely deaf in front of them.) Denis _ vita es estrany spir.wikidot.com
Re: eliminate junk from std.string?
On Tuesday, January 11, 2011 17:17:43 Daniel Gibson wrote: Am 12.01.2011 01:55, schrieb Jonathan M Davis: On Tuesday, January 11, 2011 16:23:13 Daniel Gibson wrote: Deprecating them is certainly a good idea, but I'd suggest to keep the deprecated aliases around for longer (until D3), so anybody porting a Phobos1-based application to D2/Phobos2 can use them, even if he doesn't do this within the next few releases. Well, leaving an alias until D3 would equate to a permanent alias in D2, which is exactly what Walter and Andrei don't want (and I don't either). There's already plenty in Phobos 2 that's different from Phobos 1. So, while I don't think that we should rename stuff just to rename stuff, I also don't think that we should keep aliases around just to make porting D1 code easier - especially when most D1 code is probably using Tango anyway. We don't really have a policy in place for how long deprecation should last prior to outright removal, but until D3 is definitely too long. I would have thought that the question would be more along the lines of whether it should be a couple of releases or more like 6 months to a year before removing deprecated functions and modules at this point, not whether something will remain deprecated until D3. - Jonathan M Davis Somewhere in this thread: Am 11.01.2011 21:43, schrieb Walter Bright: Nick Sabalausky wrote: I agree with this reasoning for having them. However, I don't think it means we shouldn't D-ify or Phobos-ify them, at least as far as capitalization conventions. I also object to rather pointlessly annoying people wanting to move their code from D1 to D2 by renaming everything. Endlessly renaming things searching for the perfect name gives the illusion of progress, whereas time would be better spent on improving the documentation, unittests, performance, etc. So his objection was specifically that renaming those functions could annoy people migrating D1 code (and certainly he meant Phobos1 users, because Tango-people either port (parts of) Tango or will have to rewrite that anyway). So, to accomplish that goal (not annoying those people), these aliases should be kept for longer. (An alternative may be to one/some phobos1-compat modules that contain such aliases and maybe even wrappers with old signatures for new functions, that could be imported to ease porting of old applications. That would have the benefit of not cluttering the regular Phobos2 modules with that legacy stuff.) Well, I didn't say that Walter wasn't concerned about it. I just don't see the point. Phobos has changed enough from D1 to D2 that even D1 Phobos users (of which I get the impression there are relatively few) that there's probably already plenty of stuff which is going to break for anyone porting over. I do think that keeping a deprecated alias around longer for a function which has been around longer makes sense, and the Phobos 1 functions have been around longer than anything else. So, deprecating a function that was added 2 releases ago probably shouldn't require a deprecated alias for as long as deprecating a function that was in Phobos 1 would, but there's still a limit to how long it makes sense. And given that your average D1 user uses Tango rather than Phobos, it makes that much less sense to keep aliases to Phobos 1 functions around for a long time. So, no, we shoudln't get rid of the deprecated alias for a Phobos 1 function after only a release or two, but I don't think that it makes sense to keep it around for a year or two either. - Jonathan M Davis
Re: eliminate junk from std.string?
Am 12.01.2011 03:10, schrieb Jonathan M Davis: On Tuesday, January 11, 2011 17:17:43 Daniel Gibson wrote: Am 12.01.2011 01:55, schrieb Jonathan M Davis: On Tuesday, January 11, 2011 16:23:13 Daniel Gibson wrote: Deprecating them is certainly a good idea, but I'd suggest to keep the deprecated aliases around for longer (until D3), so anybody porting a Phobos1-based application to D2/Phobos2 can use them, even if he doesn't do this within the next few releases. Well, leaving an alias until D3 would equate to a permanent alias in D2, which is exactly what Walter and Andrei don't want (and I don't either). There's already plenty in Phobos 2 that's different from Phobos 1. So, while I don't think that we should rename stuff just to rename stuff, I also don't think that we should keep aliases around just to make porting D1 code easier - especially when most D1 code is probably using Tango anyway. We don't really have a policy in place for how long deprecation should last prior to outright removal, but until D3 is definitely too long. I would have thought that the question would be more along the lines of whether it should be a couple of releases or more like 6 months to a year before removing deprecated functions and modules at this point, not whether something will remain deprecated until D3. - Jonathan M Davis Somewhere in this thread: Am 11.01.2011 21:43, schrieb Walter Bright: Nick Sabalausky wrote: I agree with this reasoning for having them. However, I don't think it means we shouldn't D-ify or Phobos-ify them, at least as far as capitalization conventions. I also object to rather pointlessly annoying people wanting to move their code from D1 to D2 by renaming everything. Endlessly renaming things searching for the perfect name gives the illusion of progress, whereas time would be better spent on improving the documentation, unittests, performance, etc. So his objection was specifically that renaming those functions could annoy people migrating D1 code (and certainly he meant Phobos1 users, because Tango-people either port (parts of) Tango or will have to rewrite that anyway). So, to accomplish that goal (not annoying those people), these aliases should be kept for longer. (An alternative may be to one/some phobos1-compat modules that contain such aliases and maybe even wrappers with old signatures for new functions, that could be imported to ease porting of old applications. That would have the benefit of not cluttering the regular Phobos2 modules with that legacy stuff.) Well, I didn't say that Walter wasn't concerned about it. I just don't see the point. Phobos has changed enough from D1 to D2 that even D1 Phobos users (of which I get the impression there are relatively few) that there's probably already plenty of stuff which is going to break for anyone porting over. I do think that keeping a deprecated alias around longer for a function which has been around longer makes sense, and the Phobos 1 functions have been around longer than anything else. So, deprecating a function that was added 2 releases ago probably shouldn't require a deprecated alias for as long as deprecating a function that was in Phobos 1 would, but there's still a limit to how long it makes sense. And given that your average D1 user uses Tango rather than Phobos, it makes that much less sense to keep aliases to Phobos 1 functions around for a long time. So, no, we shoudln't get rid of the deprecated alias for a Phobos 1 function after only a release or two, but I don't think that it makes sense to keep it around for a year or two either. - Jonathan M Davis Hmm maybe. I guess there will be further similar discussions (e.g. the depreation of std.stream once the successor is ready). I think those aliases should at least be kept until all Phobos1 stuff that is to be replaced is indeed replaced. That'd allow a decision that is at least consistent for most Phobos1 stuff (some has already been removed/replaced, e.g. by the druntime modules like core.thread). Cheers, - Daniel
Re: DVCS (was Re: Moving to D)
Walter Bright Wrote: retard wrote: One thing came to my mind. Unless you're using Ubuntu 8.04 LTS, I'm using 8.10, and I've noticed that no more updates are coming. Huh! You should seriously consider upgrading. If you are running any kind of services in the system or browsing the web, you're exposed to both remote and local attacks. I know at least one local root exploit 8.10 is vulnerable to. It's just plainly stupid to use a distro after the support has died. Are you running Windows 98 still too? If you upgrade Ubuntu, do a clean install. Upgrading 8.10 in-place goes via - 9.04 - 9.10 - 10.4 - 10.10. Each one takes 1 or 2 hours. Clean install of Ubuntu 10.10 or 11.04 (soon available) will only take less than 30 minutes. The support for desktop 8.04 and 9.10 is also nearing its end (April this year). I'd recommend backing up your /home and installing 10.04 LTS or 10.10 instead. Yeah, I know I'll be forced to upgrade soon. Soon? Your system already sounds like it's broken. One thing that'll make it easier is I abandoned using Ubuntu for multimedia. For example, to play Pandora I now just plug my ipod into my stereo g. I just stopped using youtube on Ubuntu, as I got tired of the video randomly going black, freezing, etc. I'm using Amarok and Spotify. Both work fine.
Re: levenshteinDistanceAndPath Source bug
Andrei Alexandrescu Wrote: Fixed and readded unittest: http://www.dsource.org/projects/phobos/changeset/2315 http://www.dsource.org/projects/phobos/changeset/2316 To post bugs, you may want to go to http://d.puremagic.com/issues. What you post there will automatically appear in digitalmars.d.bugs (no need to post there). Andrei In fact bugs should not be posted to digitalmars.d.bugs directly as it is not their for tracking bugs and many may not follow it.
Re: DVCS (was Re: Moving to D)
Walter Bright Wrote: My mobo is an ASUS M2A-VM. No graphics cards, or any other cards plugged into it. It's hardly weird or wacky or old (it was new at the time I bought it to install Ubuntu). ASUS M2A-VM has 690G chipset. Wikipedia says: http://en.wikipedia.org/wiki/AMD_690_chipset_series#690G AMD recently dropped support for Windows and Linux drivers made for Radeon X1250 graphics integrated in the 690G chipset, stating that users should use the open-source graphics drivers instead. The latest available AMD Linux driver for the 690G chipset is fglrx version 9.3, so all newer Linux distributions using this chipset are unsupported. Fast forward to this day: http://www.phoronix.com/scan.php?page=articleitem=amd_driver_q111num=2 Benchmark page says: the only available driver for your graphics gives only about 10-20% of the real performance. Why? ATI sucks on Linux. Don't buy ATI. Buy Nvidia instead: http://geizhals.at/a466974.html This is 3rd latest Nvidia GPU generation. How long support lasts? Ubuntu 10.10 still supports all Geforce 2+ which is 10 years old. I foretell Ubuntu 19.04 is last one supporting this. Use Nvidia and your problems are gone.
Re: DVCS (was Re: Moving to D)
Did you hear that, Walter? Just buy a 500$ video card so you can watch youtube videos on Linux. Easy. :D
Re: VLERange: a range in between BidirectionalRange and RandomAccessRange
On 2011-01-11 20:28:26 -0500, Steven Wawryk stev...@acres.com.au said: Sorry if I'm jumping inhere without the appropriate background, but I don't understand why jumping through these hoops are necessary. Please let me know if I'm missing anything. Many problems can be solved by another layer of indirection. Isn't a string essentially a bidirectional range of code points built on top of a random access range of code units? Actually, displaying a UTF-8/UTF-16 string involves a range of of glyphs layered over a range of graphemes layered over a range of code points layered over a range of code units. Glyphs represent the visual characters you can get from a font, they often map one-to-one with graphemes but not always (ligatures for instance). Graphemes are what people generally reason about when they see text (the so called user-perceived characters), they often map one-to-one with code points but not always (combining marks for instance). Code points are a list of standardized codes representing various elements of a string, and code units basically encode the code points. If you're writing an XML, JSON or whatever else parser you'll probably care about code points. If you're advancing the insertion point in a text field or count the number of user-perceived characters you'll probably want to deal with graphemes. For searching a substring inside a string, or comparing strings you'll probably want to deal with either graphemes or collation elements (collation elements are layered on top of code points). To print a string you'll need to map graphemes to the glyphs from a particular font. Reducing string operations to code points manipulations will only work as long as all your graphemes, collation elements, or glyphs map one-to-one with code points. It seems to me that each abstraction separately already fits within the existing D range framework and all the difficulties arise as a consequence of trying to lump them into a single abstraction. It's true that each of these abstraction can fit within the existing range framework. Why not choose which of these abstractions is most appropriate in a given situation instead of trying to shoe-horn both concepts into a single abstraction, and provide for easy conversion between them? When character representation is the primary requirement then make it a bidirectional range of code points. When storage representation and random access is required then make it a random access range of code units. I think you're right. The need for a new concept isn't that great, and it gets complicated really fast. -- Michel Fortin michel.for...@michelf.com http://michelf.com/