Re: Is @property implementable?
== Quote from Jonathan M Davis (jmdavisp...@gmx.com)'s article But that's not how the function is written. The left parameter is the destination. If myString.strcpy(myString2) is confusing, I would expect strcpy(myString, myString2) to be just as confusing. I don't see how using the member function call syntax like that makes it any more confusing. Not to mention, functions with a source and destination like that end up in both orders all the time, so I don't think that you can generally expect it in one order or the other anyway. - Jonathan M Davis I personally think strcpy got it wrong anyway -- everyone says 'input and output', not 'output and input', so it's weird to think of the output mentioned first. It's as if you wrote: // everyone who uses this will get the argument order wrong at least once int everythingNice(int spice, int sugar); But that's a side note... Personally, I think the UFCS for anything but arrays is an engraved invitation to function hijacking. Imagine defining this in your code: int find(T, U)(T a, U b) { ... } class Foo { int find(double x, char[] y); }; Foo x; x.find(a, hello); // would this call the template or the method? Whoops -- now every class, structure, and array in your program has a method called find() that hijacks any arguments that don't exactly match an existing function. If you call x.find(), you can't know if you are getting the method in the class or a free function that got sucked in because of this syntax. Of course if you get the signature exactly right it probably prefers the method over the free function. Arrays are special because the only methods they have are *built in* methods. Thus, a basic understanding of the language makes it possible to avoid the ambiguity. But implementing UFCS for other types is a problem. If I do a.getIndex(5) and the compiler says there is no method getIndex on class A, this is a gift --- it is almost always the case that this is an error. I doubt I ever want getIndex(A, int) but wrote A.getIndex(5) or where I wanted f(A, int) but wrote a.f(5). If you want another method in your class *add it to your class*. The reason it makes sense to have it for arrays is so that you can define common methods across all your types like serialize method or toString or deepCopy and know what these methods do. Then you can use them in templates. Arrays can't be subclassed to add the methods so you need a way to produce the same syntax as a method call. Since arrays can't be derived from, UFCS fills in a real gap and doesn't mask existing or future methods unless the interface for arrays changes which is rare, we hope. Allowing it for other types doesn't fill a gap, it creates an overlap/ambiguity. It doesn't let you do anything you can't already do. It's like a built-in alias this that you can't turn off and that matches across all the free functions of the world if I'm understanding the matching correctly. In most cases you can also use functions (for deepCopy etc) to work with both classes and arrays, but this doesn't work with virtual functions, so the method syntax is nicer for that. So UFCS is justified to make arrays 'look like classes' to template code. But it's not justified in my opinion to make classes look like classes with one more method. Kevin
Re: Appender and CTFE
== Quote from Andrei Alexandrescu (seewebsiteforem...@erdani.org)'s article I see nothing wrong with the occasional forking conditioned by __ctfe. Even today, code may fork an optimized but nonportable implementation of some algorithm. The main requirement is that such forks are rare enough to not cause undue maintenance burden. Andrei Regarding maintenance burden, it should be easy to test the correctness of such code: in a unit test: enum a = f(...); assert(a == f(...)); Kevin
Re: std.parallelism: Call for benchmarks
I would say look for small things people do which are CPU intensive. One example is sorting -- it can be parallelized fairly well and yet sorting 10_000_000 elements still takes long enough to be worth parallelizing. If you want examples of working parallel sort etc you can look in the 'Futurism' library that I wrote when exploring Futures for D a few years ago, it's on dsource. Feel free to use whatever you find in there... There might be some other examples but I don't remember off hand. Kevin
uniqueness propagation
I think immutable could benefit from a Value Range Propagation-like uniqueness logic: string a; char[] b; string c = a ~ b; // result of ~ is always unique string z = c.dup; // also any kind of .dup is definitely unique char[] q = z.dup; // also okay -- assign to a non-immutable The lines w/ dup or ~ could be considered legal, since these ops produce definitely unique values. A handful of specific actions known to produce unique values would produce temporaries with a 'uniqueness' property, which allows them to go either way toward mutable or immutable. This list includes (malloc, new, ~, .dup) but maybe others. Once an expression is assigned to a variable it becomes immutable or mutable as per that variable's traits. As an extension variables could be labeled as 'unique' which essentially means they contain a value that is not shared. Unique would only be legal in cases where escape analysis can easily prove that the value does not get copied into both immutable and mutable variables (see below about portability). You could label a function as producing unique values -- only legal if the value returned is definitely unique. Again, if the language cannot do escape analysis then the user will be required to do a defensive dup. Portability note: For portability reasons it is better to have well defined escape analysis rules that can be implemented by any compiler -- so a smart compiler might actually know there is no escape but still be required to forbid a particular expression since it breaks the rules. Kevin
Re: float equality
== Quote from Sean Kelly (s...@invisibleduck.org)'s article Andrei Alexandrescu seewebsiteforem...@erdani.org wrote: On 2/21/11 6:08 PM, bearophile wrote: Andrei Alexandrescu: This is a long-standing myth. I worked on Wall Street and have friends who have been doing it for years. Everybody uses double. Unbrutal Python programmers are encouraged to avoid the float type to manage money values, and use decimal instead: http://docs.python.org/library/decimal.html Bye, bearophile ... and nobody uses Python :o). C/C++, Visual Basic, and SQL in my experience. Though in SQL they may use the 'money' type. Using double really isn't an issue so long as rounding is handled appropriately. It seems like a lot of the trouble stems from people thinking values must be naturally exactly represented by the storage type. The semantics of floating point are complicated, and I think the human brain usually sees 'complicated' and 'unpredictable/unstable' as synonyms. As for me, I heard long ago that double wasn't used and I think I believed it because I'd like to believe that someone out there is doing the responsible thing. Now that I have a bit more experience I can see that floating point is reliable, even if it is unintuitive. The instinct is to avoid it where correctness is important but if you analyze the code you can predict a lot about what floating point will do even if not always everything. If you can keep the error to the sub-penny level you should be fine with either approach. But I think C taught a lot of people to be wary. C considers an amazingly large percentage of its allowed syntax to have undefined semantics. Which direction integers round, sizes of basic types, bit shifting with values greater than sizeof(int)-1, etc. Almost anything that varies from CPU to CPU is not defined by C, with the result that you need to build a mini-jail in your mind to write portable code. Floating point is part of this because many floating point expressions produce different results depending on choices made by the compiler while optimizing, e.g. when to shift numbers from real - double. The number of things a language defines in a 'do whatever is easy or quick for your platform' ends up being a coefficient of the difficulty of using a language to write (portably) correct code. As a result, almost all large C programs are non-portable and I'm convinced the majority have undefined behavior if you take a narrow view of what C guarantees about the size of int, the order of side effects, and (if you really want to play hardball) the memory hierarchy in a multithreaded program. Kevin
Re: Uh... destructors?
== Quote from Stewart Gordon (smjg_1...@yahoo.com)'s article On 23/02/2011 18:07, Ary Manzana wrote: On 2/22/11 10:36 AM, Simen Kjaeraas wrote: %u Wrote: Well, the trouble is, pretty much all of these are invalid attributes: - static obviously makes no sense And here is where you're wrong. You have defined a static destructor, which is called with module destructor as the program goes out of scope, rather than when your struct or class is destroyed. This is why attributes that make no sense must be an error: you don't know if an attribute you put is being ignored by the compiler or not (like what has just happened here). Uh, that's a total non sequitur. The point Simen is making is that static _does_ make sense here. Stewart. But if the compiler always rejected the nonsensical ones it would be clear that the ones that are not rejected have some meaning. The ambiguity he is talking about (I think) is between the two ideas the compiler accepted this but its meaningless and the compiler accepted this so it *must* be meaningful. If you can't be sure the second is true, then you don't realize there is a bug, e.g. if you did not intend the destructor to be a class/module destructor. Kevin
Re: Uh... destructors?
== Quote from bearophile (bearophileh...@lycos.com)'s article ... Currently you are able to write functions like: pure bool randomPure() { int[] a1 = new int[1]; int[] a2 = new int[2]; return a1.ptr a2.ptr; } Is it possible to fix this by disallowing using the value of a pointer, except allowing them to be compared for equality (only)? In other words, within a pure function, only 'is' '!is', ==, != are permitted as tests of pointers? Other arithmetic forbidden... this sort of matches the Java model where you can't fiddle with references except to check if they are the same. Kevin
Re: Uh... destructors?
== Quote from Kevin Bealer (kevindangerbea...@removedanger.gmail.com)'s article == Quote from bearophile (bearophileh...@lycos.com)'s article ... Currently you are able to write functions like: pure bool randomPure() { int[] a1 = new int[1]; int[] a2 = new int[2]; return a1.ptr a2.ptr; } Is it possible to fix this by disallowing using the value of a pointer, except allowing them to be compared for equality (only)? In other words, within a pure function, only 'is' '!is', ==, != are permitted as tests of pointers? Other arithmetic forbidden... this sort of matches the Java model where you can't fiddle with references except to check if they are the same. Kevin I wanted to add something I just realized -- malloc is essentially pure if you can't compare the pointers it returns. Of course if you can't do either casting or arithmetic on these pointers I'm not sure how to use malloc. Kevin
Re: float equality
== Quote from Walter Bright (newshou...@digitalmars.com)'s article Kevin Bealer wrote: A reasonable way to do financial work is to use longs to represent pennies. After all, you don't have fractional cents in your accounts. Using floating point to represent money is a disaster in the making. ... There's just no getting around needing to understand how computer arithmetic works. ... I.e. you have to know what you're doing :-) I'm sure that banks do what you are suggesting and have a policy (maybe a regulatorily required one) on who gets the fractional cents. This is easy because cents are a standard that everyone agrees is the 'bottom'. But there are lots of other cases that aren't standardized... i.e. is time represented in seconds, milliseconds, nanoseconds? Java, Windows and Linux use different versions. Some IBM computers measure time in 1/2^Nths of a second, where N is around 20 or 30. Is land measured in square feet or acres? Whatever you pick as the bottom may not be good enough in the future. If you introduce more exact measurements, now you have mixed representations. If you had a rational number of acres, on the other hand, you could start doing measurements in square feet and representing them as x/(square feet in an acre). Mixed representations are trouble because they invite usage errors. Rational doesn't change the facts of life but it simplifies some things because it makes the choice of base an arbitrary problem-domain kind of choice whereas in fixed point it is very difficult to revisit this decision. Every time I think about this, though, I think of classroom examples, so it's probably the case that my arguments are academic (in the bad sense)... and I should give it up. If this was more useful it would probably be used more. Kevin
Re: float equality
== Quote from Jonathan M Davis (jmdavisp...@gmx.com)'s article ... It may be that you would still end up with situations where two values that you would think would be the same aren't due to rounding error or whatnot. However, with a fixed point value, you wouldn't have the problem where a particular value could not be held in it even if it's within its range of precision. As I understand it, there are a number of values which cannot be held in a floating point and therefore end up being rounded up or down simply because of how floating points work and not because there the precision isn't high enough. It's definitely true however, that using fractions would be much more accurate for a lot of stuff. That wouldn't be particulary efficient though. Still, if you're doing a lot of math that needs to be accurate, that may be the way to go. - Jonathan M Davis Floating point numbers are a good compromise for a lot of purposes, but yeah they are limited. Here are some ways to open up the limits a bit... (First some math for those who haven't gone down this path...) The reason that fixed point does better than floating is that fixed point classes use base ten and floating point uses base 2. In base two, all fractions that have a denominator of a power of 2 can be represented (e.g. 3/16) exactly, and all others can't. Fixed point solves this for numbers like .1 or .0003 because the denominator is a power of ten. Ten is 5*2 and the way it works is that any denominator that is a power of two times a power of ten can be represented exactly. So you're good for .01 or .039028 because these are (1/100) and (39028/100). But neither fixed point nor floating can represent something like 1/3 or 1/11 exactly. If you used base 3, then 1/3 would be perfectly representable but 1/2 would not be. I think this is why the ancients used 360 degrees in a circle... you can represent a lot of common fractions as parts of 360, including 1/2, 1/3, 1/4, 1/5, 1/6, 1/8. But still not 1/7 or 1/11. How to solve this problem? You can define a type like this: struct Rational { // represents numerator over denominator ... (various arithmetic methods and operators)... long numerator; long denominator; } (Lots of people have written classes like this, maybe even for D. ;) If you've taken high school math and CS 101 you can write the methods here, and then you can represent any fraction. This is a better way to go for a lot of problems if you need exactness, but its a bit slower. This isn't perfect though ... if you try to add the numbers 1/2, 1/3 1/100 you will eventually find the denominator overflows -- you can check for this and 'reduce' the fraction (remove common divisors from top and bottom) periodically, but eventually you will even get an irreducable fraction and you will need, again, to compromise and give away some precision. The simplest way to do this compromise is probably to divide the top and bottom number by 2, try to reduce again, and then try the operation again (repeat until the operation doesn't overflow). (Maybe there is a better way though.) For some problems the precision loss will never happen -- for example if you are adding dollars and cents represented as (x/100) then the denominator never needs to get bigger than 100. If you need to compute mortgage payments it might get a little bigger... but I'm not sure if it will ever overflow for this case. Another source of compromise is when a number is really big or small, for example a number like 2^200 or 1/2^200 can be represented in a 'double' but would overflow a Rational as defined above. You can delay the compromise a bit by using a struct like this: struct Rational2 { // represents the number numerator / (denominator ^ exponent) // adjust integer types to taste long numerator; long denominator; long exponent; }; This lets you go much further, and represent truly astonishingly large and small numbers. It complicates the math in the arithmetic operators a bit though. Eventually, even this version will lose precision as the fraction still becomes irreducable if there are a unique set of prime factors in the denominator. So this can represent much larger ranges than double or real, but I think it still can't represent 1/2*1/3*1/1000_000_000 exactly. Now... what if you wanted to avoid the compromise completely? You could switch to this: struct { BigInt numerator; BigInt denominator; }; Bingo -- no compromise. Two down sides to this: 1. You trade off memory for accuracy as the BigInts grow over time, assuming the fraction is irreducable. 2. It's slower -- BigInt objects are essentially byte strings and even things like addition require a function call and looping. This is a pretty flexible solution though. However, numbers like pi and 'e' can't be represented as a rational number at all, so you are still stuck if you want to do logarithms or
Re: float equality
== Quote from Walter Bright (newshou...@digitalmars.com)'s article Kevin Bealer wrote: You could switch to this: struct { BigInt numerator; BigInt denominator; }; Bingo -- no compromise. It cannot represent irrational numbers accurately. True but I did mention this a few lines later. Kevin
Re: float equality
== Quote from Walter Bright (newshou...@digitalmars.com)'s article ... I do understand that if you have a full symbolic representation, you can do so with zero losses. But Kevin's proposal was not that, it was for a ratio representation. All it represents symbolically is division. There are plenty of other operations. I'm just answering the original poster's question. You're right though -- it's not a complete numerical system, (and I don't propose it for inclusion in the language or even necessarily the library.) I had two goals: 1. To solve the basic problem the original poster was asking -- if you are working with simple decimals and arithmetic you can get completely accurate representations this way. For some cases like simple financial work this might work really well. e.g. where float would not be because of the slow leak of information with each operation. (I assume real professional financial work is already done using a (better) representation.) 2. To explain why the 'simple' task of representing something like .1 wasn't as easy as it looks. In other words, why the people who designed float weren't just brain dead. I think they really knew what they were doing but it shocks most people at first that a modern computer can't do what they see as grade school arithmetic. I think for some purposes though, lossless domain specific representations can be a good tool -- if you can represent a problem in a way that is lossless you can maybe do better calculations over long series than working with 'double' and taking the accuracy hit. This is necessarily an application specific technique though. Kevin
Re: Integer conversions too pedantic in 64-bit
== Quote from Daniel Gibson (metalcae...@gmail.com)'s article It was not proposed to alter ulong (int64), but to only a size_t equivalent. ;) And I agree that not having unsigned types (like in Java) just sucks. Wasn't Java even advertised as a programming language for network stuff? Quite ridiculous without unsigned types.. Cheers, - Daniel Ah yes, but if you want to copy data quickly you want to use the efficient size for doing so. Since architectures vary, size_t (or the new name if one is added) would seem to new users to be the natural choice for that size. So it becomes a likely error if it doesn't behave as expected. My personal reaction to this thread is that I think most of the arguments of the people who want to change the name or add a new one are true -- but not sufficient to make it worth while. There is always some learning curve and size_t is not that hard to learn or that hard to accept. Kevin
Re: Integer conversions too pedantic in 64-bit
== Quote from spir (denis.s...@gmail.com)'s article On 02/16/2011 03:07 AM, Jonathan M Davis wrote: On Tuesday, February 15, 2011 15:13:33 spir wrote: On 02/15/2011 11:24 PM, Jonathan M Davis wrote: Is there some low level reason why size_t should be signed or something I'm completely missing? My personal issue with unsigned ints in general as implemented in C-like languages is that the range of non-negative signed integers is half of the range of corresponding unsigned integers (for same size). * practically: known issues, and bugs if not checked by the language * conceptually: contradicts the obvious idea that unsigned (aka naturals) is a subset of signed (aka integers) It's inevitable in any systems language. What are you going to do, throw away a bit for unsigned integers? That's not acceptable for a systems language. On some level, you must live with the fact that you're running code on a specific machine with a specific set of constraints. Trying to do otherwise will pretty much always harm efficiency. True, there are common bugs that might be better prevented, but part of it ultimately comes down to the programmer having some clue as to what they're doing. On some level, we want to prevent common bugs, but the programmer can't have their hand held all the time either. I cannot prove it, but I really think you're wrong on that. First, the question of 1 bit. Think at this -- speaking of 64 bit size: * 99.999% of all uses of unsigned fit under 2^63 * To benefit from the last bit, you must have the need to store a value 2^63 = v 2^64 * Not only this, you must step on a case where /any/ possible value for v (depending on execution data) could be = 2^63, but /all/ possible values for v are guaranteed 2^64 This can only be a very small fraction of cases where your value does not fit in 63 bits, don't you think. Has it ever happened to you (even in 32 bits)? Something like: what a luck! this value would not (always) fit in 31 bits, but (due to this constraint), I can be sure it will fit in 32 bits (always, whatever input data it depends on). In fact, n bits do the job because (1) nearly all unsigned values are very small (2) the size used at a time covers the memory range at the same time. Upon efficiency, if unsigned is not a subset of signed, then at a low level you may be forced to add checks in numerous utility routines, the kind constantly used, everywhere one type may play with the other. I'm not sure where the gain is. Upon correctness, intuitively I guess (just a wild guess indeed) if unigned values form a subset of signed ones programmers will more easily reason correctly about them. Now, I perfectly understand the sacrifice of one bit sounds like a sacrilege ;-) (*) Denis (*) But you know, when as a young guy you have coded for 8 16-bit machines, having 63 or 64... If you write low level code, it happens all the time. For example, you can copy memory areas quickly on some machines by treating them as arrays of long and copying the values -- which requires the upper bit to be preserved. Or you compute a 64 bit hash value using an algorithm that is part of some standard protocol. Oops -- requires an unsigned 64 bit number, the signed version would produce the wrong result. And since the standard expects normal behaving int64's you are stuck -- you'd have to write a little class to simulate unsigned 64 bit math. E.g. a library that computes md5 sums. Not to mention all the code that uses 64 bit numbers as bit fields where the different bits or sets of bits are really subfields of the total range of values. What you are saying is true of high level code that models real life -- if the value is someone's salary or the number of toasters they are buying from a store you are probably fine -- but a lot of low level software (ipv4 stacks, video encoders, databases, etc) are based on designs that require numbers to behave a certain way, and losing a bit is going to be a pain. I've run into this with Java, which lacks unsigned types, and once you run into a case that needs that extra bit it gets annoying right quick. Kevin
Re: is there any way to get a list of classes that inherit a class?
I don't know if you can find all of them easily but you can find the instantiated ones by adding a line to the Foo constructor as shown here. Two limits: 1. This doesn't report Bar itself since a Bar object is never created; however in a sense a 'Bar' object was created when Baz and Qux are created. Since you know how to get the parent of a type you should be able to fix this if desired. 2. As mentioned you can't get the non-instantiated classes this way -- it only detects classes as 'new' is called on them. By the way this wouldn't work in C++ because in C++ object identity changes as the successive constructors are called -- it would just report Foo. 3. Of course you could add a pure virtual function to the class... testrtti.d: import std.stdio; int[string] fooTypes; class Foo { this() { fooTypes[this.classinfo.name] = 1; } }; class Bar : Foo { }; class Baz : Bar { }; class Qux : Baz { }; int main() { Foo a = new Foo; Foo b = new Qux; Bar f = new Baz; foreach(key, value; fooTypes) { writefln(foo subtype: %s, key); } return 0; } foo subtype: testrtti.Foo foo subtype: testrtti.Baz foo subtype: testrtti.Qux Kevin
Re: tooling quality and some random rant
our famous Reddit trolls, that is retard = uriel = eternium = lurker In case anyone doubts gay's guess... for those who don't follow entertainment trivia, Alan Smithee is a pseudonym used by directors disowning a film (google it). So anyone using this name is actually effectively *claiming* to be a imposter. K
Re: tooling quality and some random rant
Sorry this was a completely unintentional error --- I meant to say in case anyone doubts Gary's post. Blame the lateness of the night and/or my annoyingly lossy wireless keyboard. Kevin
Re: What's C's biggest mistake?
bearophile Wrote: Walter Bright: 3. The glaring fact that std::vectorchar and std::string are different suggests something is still wrong. In an array/vector you want O(1) access time to all items (ignoring RAM-cache access/transfer delays), while in a string with variable-width Unicode encoding that can be hard to do. So they look like two different data structures. Bye, bearophile Yeah, I think the charset thing was probably the main reason for the string/vector split, that and the desire to have special properties like conversion from char* that wouldn't be in vector. Using basic_stringT with locales is something of a historical wart, because with Unicode, getting your charset from your locale is somewhat obsolete for general purpose computers. (Maybe very small profile systems will continue to use ascii or the code page of whatever culture buildt them.) But I don't think C++'s string can be made to index by character unless you use wchar_t for the T in basic_stringT. I don't think string.size() is ever anything but a bytes or wchar_t count. Kevin
Re: What's C's biggest mistake?
BCS Wrote: Hello Walter, BCS wrote: I guess my point is that aside from VERY resource limited systems, almost no one will have C as their first choice. Even with those limited systems I'd bet that most people would rather be working in something else if they could. That said, there are many places where it ends up being the lingua franca. I still think you're underestimating C's audience. Consider the Linux effort - they can choose any implementation language they want, and they choose C. They aren't forced into C. Yes, C has a wide audience (any one who says differently is selling something). But I still thing that their are very few cases where C will be chosen for reasons other than it's purely technical merits. If C++/Java/C#/python/whatever would have done just as good a job in Linux as C, I'd almost bet that Linux wouldn't have been written in C. My point isn't that C is never the right choice (as that is clearly false) but that when C is chosen, it's (almost) always for technical reasons rather than aesthetic ones (where it is merely good enough). I would say these are the technical merits of C that get it chosen these days: 1. The new code they're writing will be part of a large body of existing C code which they don't have time, permission, or inclination to convert to C++. 2. They need to be aware of every tiny low level detail anyway, so having the language do too many things for you is not the desired approach (security, O/S and embedded work). 3. C has a real ABI on almost every platform; therefore, C is chosen for most inter-language work such as writing python modules. But some people really *are* choosing C for aesthetics. Linus Torvalds, bless his little world dominating heart, chose C for a normal app (git), and he cited that the existence of operator overloading in C++ is bad because it hides information -- e.g. in the general case you never know what an expression is actually doing. I think this can be seen as mainly an aesthetic choice. Avoiding a language because it *supports* information hiding (which is what I think operator overloading is) is not really an 'economic' tradeoff, since you could choose not to hide information by not using those features. He'd just rather not be in the vicinity of language features that make those kinds of choices because they seem wrong to him (and because he wants to keep C++ies out of his code I think.) Some people want their language to have a WYSIWYG relationship with the generated assembler code (if I'm right, it does seem consistent with him being an OS developer). I also know some scientists and mathematicians who use C rather than C++. I think the reason is that by using a simpler language they can know everything about the language. I think the sooner they can 'get the computer science stuff out of the way', the sooner they can focus on what they see as the domain issues. (I think once the program gets big enough, the CompSci aspects reassert themself and scalability and maintainability issues begin to bite you in the rear.) Kevin
Re: Go rant
Don Wrote: retard wrote: ... I'd say it's easier. If you watch someone sorting some cards, they'll use either insertion sort or selection sort. Nobody should have ever heard of bubble sort, I'm pleased to hear some schools aren't mentioning it. Such a foolish algorithm. the bubble sort seems to have nothing to recommend it, except a catchy name - Knuth. (Non-software) people doing routine tasks often come up with better algorithms intuitively than my intuition expects them to. I think a lot of people would do even better than insertion with a deck of poker cards -- they might group cards by either suit or rank (or rank groups) (e.g. Hmm, I'll make three piles of 1-5, 6-10, and J-A), then order the buckets, then stick these ordered sets back together. If you think about it, this is a lot like a radix sort or a multi-pivot cousin of the quick sort. I think when you ask people to do a computer version of how they would sort, they do worse than this. It's so hard to communicate anything complicated to a computer that they instinctively dumb it down and do insertion or bubble. Insertion is what you would manually use when dealing with adding a few new elements to a short stack, whereas something like a bubble might be what you would use manually when it's hard to jump around among the elements of the sequence (e.g. if you have a bunch of see also references embedded in dictionary entries and you need to sort within the see-also 'chain'). That approach (kind of) matches the claims often given by computer programmers that bubble sort is good for linked lists and/or only fixing a few things that are out of place in a mostly sorted list. When you have a pupil that is as hard to talk to and unmotivated to learn as a computer, the first attempt is to try to teach it anything at all and call that a success if it works. A more eager human pupil tries to meet you half way, so you naturally respond and take pleasure in teaching them more and more advanced stuff. If you're really hard headed and immune to tedium, you keep trying with even a super-dumb pupil like a CPU. You keep going in for the hard case so many times that you and eventually major in CompSci out of habit. Kevin
Re: Go rant
dsimcha Wrote: == Quote from Kevin Bealer (kevinbea...@gmail.com)'s article (Non-software) people doing routine tasks often come up with better algorithms intuitively than my intuition expects them to. I think a lot of people would do even better than insertion with a deck of poker cards -- they might group cards by either suit or rank (or rank groups) (e.g. Hmm, I'll make three piles of 1-5, 6-10, and J-A), then order the buckets, then stick these ordered sets back together. If you think about it, this is a lot like a radix sort or a multi-pivot cousin of the quick sort. You mean a bucket sort? http://en.wikipedia.org/wiki/Bucket_sort More or less, though I think human beings use algorithms in a more artistic way than sticking to any one algorithm. I'm curious if the multi-pivot quicksort (I think everyone gets what I mean by this? Divide by more than one pivot on each pass? I can give details if you like ...) has been tried out much. It seems like it must have been, but it also seems like something that would have cache-awareness advantages that would not show up in the simplified comparison-counting way of thinking about efficiency. Kevin
Re: Go rant
Rainer Deyke Wrote: Kevin Bealer wrote: I think a lot of people would do even better than insertion with a deck of poker cards -- they might group cards by either suit or rank (or rank groups) (e.g. Hmm, I'll make three piles of 1-5, 6-10, and J-A), then order the buckets, then stick these ordered sets back together. If you think about it, this is a lot like a radix sort or a multi-pivot cousin of the quick sort. When sorting a deck of cards, insertion sort performs in O(n log n), the same as the best-case performance of a quick sort. When sorting arrays, you can find the insertion point in O(log n), but the actual insertion takes O(n). With a deck of cards, you can perform the insertion in O(1). There is absolutely no point in using quick sort for a deck of cards. Only the non-comparison-based sorts (radix sort, bucket sort) can do better than insertion sort. -- Rainer Deyke - rain...@eldwood.com Right, if you mean strictly sorting a physical deck of cards by hand. Insertion sort isn't efficient in a computer because you need to find the insertion point quickly and insert quickly to keep the efficiency, but no data structure can really do both quickly. For linked lists, insert is quick (O(1)) but find is slow (O(n/2)); for arrays, find is quick (assuming you use binary search, O(ln n)) but insert is slow (O(n)). Either choice sticks you with a slow operation. Kevin
Re: algorithms that take ranges by reference
Andrei Alexandrescu Wrote: The grand STL tradition is to _always_ take iterators by value. If iterators are computed, they are returned by the algorithm. My initial approach to defining std.algorithm was to continue that tradition for ranges: ranges are values. No algorithm currently takes a range by reference. There are a couple of simple functions that emphatically do take ref, namely popFrontN and popBackN in std.range. It is becoming, however, increasingly clear that there are algorithms that could and should manipulate ranges by reference. They might take and return values, but it's just too messy to do so. (Cue music for the improve built-in tuples choir.) A concrete case is text processing. Many contemporary libraries use index-based processing, but that's difficult when handling multibyte characters. To address that, bidirectional ranges are one correct way to go. Phobos defines a ByCodeUnit range that spans a string correctly, one dchar at a time. With that, you can write: ... Andrei I would vote yes -- I've used this technique (with my own character slice classes in C++) and they are a great idiom to work with. I think ranges (and slices) have some of the properties from each of pointers, containers, and streams. A stream is always a by-ref kind of thing unless you are in a language that needs monads etc. Let me suggest one more function I've found very useful: bool readUntil(R1, R2, R3)(ref R1 input, R2 delim, ref R3 result) { // Like findSkip, but returning intervening text. } Then you can write something like: bool consumeQuotedString(R1,R2)(ref R1 text, ref R2 quoted) { if (skip(text, ')) { if (! readUntil(text, ', quoted)) { /*error*/ } return true; } return found; } Kevin
Re: What's wrong with D's templates?
dsimcha Wrote: == Quote from Yigal Chripun (yigal...@gmail.com)'s article but even more frustrating is the fact that template compilation bugs will also happen at the client. There's a whole range of designs for this and related issues and IMO the C++ design is by far the worst of them all. not to mention the fact that it isn't an orthogonal design (like many other features in c++). I'd much prefer a true generics design to be separated from compile-time execution of code with e.g. CTFE or AST macros, or other designs. Since generics work by basically casting stuff to Object (possibly boxing it) and casting back, I wonder if it would be easy to implement generics on top of templates through a minimal wrapper. The main uses for this would be executable bloat (for those that insist that this matters in practice) and allowing virtual functions where templates can't be virtual. In C++ you could define a MyObject and MyRef (smart pointer to Object) types that implement the methods you need (like comparisons) as virtual functions that are either pure or throw exceptions. Then just define a non-template class that inherits from (or just aggregates and wraps) a vectorMyRef and mapMyReft, MyRef. Now you can use this map and vector code to build whatever solution you need, using dynamic_castT to insure the types are what you want. If you want you can throw a type safe wrapper that uses dynamic_castT around this, as long as all the methods of that class are inlined, it should have no extra bloat. Now your back at the Java level of expressiveness. I wonder, though, if you actually save anything. Since vector::operator[](size_t) is just an array index that will get inlined into your code, I think it should have much *less* code bloat in your executable than a function call (to virtual MyRef vectorMyRef::operator[](size_t)) plus dynamic casts to figure out if all the types are castable. Movie Poster: Dr. Stroustrouplove: or How I Stopped Worrying and Learned to Love the Bloat. As for mixing template code and client code, is it that big of a deal in practice? If you are making something big enough to be worth patenting, won't most of the heavy lifting classes probably not be templates? Unless you are marketing template/container libraries specifically I guess... Personally I'm reluctant to buy that sort of thing from someone if I *can't* see the source. If I need a class like vector, in practice I need to be able to dig into its methods to see what they do, e.g. is it really guaranteed that storage is contiguous, etc. On the other hand, if I was buying code to print a Word document to a pdf file, I could accept that I don't need to look into the code. But something like vector or map ends up getting so intimate with the rest of my design that I want to know how it works in detail just as an end-user. (Yeah, yeah, I know encapsulation says I shouldn't have to.) I think performance outweighs the needs of any particular business model, especially when you can do your own Object based version if you need to. I agree there should be ways to hide the implementation from the client, but if templates aren't one of them, then just factor that into where and how you use templates and when you use OO and virtual instead. It's enough that it's possible and practical to hide your impl, you don't need *every* language feature to make that its #1 priority. The performance / impl-hiding conflict is a fundamental problem -- if the user's compiler can't see the template method definitions, then it can't optimize them very well. If it can, then the user can too. Any method of compiling them that preserves enough info for the compiler to work with will probably be pretty easily and cleanly byte-code-decompilable. Kevin
Re: What's wrong with D's templates?
yigal chripun Wrote: of the three above I think option 3 is the worst design and option 2 is my favorite design. I think that in reality you'll almost always want to define such an interface and I really can't think of any useful use cases for an unrestricted template parameter as in C++. I think there are some. In C++ using to extract a type T to an output stream is common. Similarly, you could do something like dividing (x.size() + 0.0)/x.capacity() to find the ratio of internal fragmentation, assuming most of the containers in question have a capacity method. Another common example is creating a type like OptionalT which wraps up a boolean plus a T to allow you to have optional values, e.g. the this value is not set concept. templateclass T class Optional { T x; bool have; public: Optional(bool have1 = false) : have(have1) {} Optional(T x1) : x(x1), have(true){} bool haveValue() { return have; } T get() { assert(have); return x; } }; Which is useful for passing around lists of configuration options etc. Essentially, any time you want to leverage a property common to a lot of different objects and with the same API across all of them, but don't want to (or can't) build that property into an inheritance diagram. Often in C++ the reason you can't or don't add a virtual method (toString) to solve the same problem is either that (1) you don't have access to the base class code, or (2) it's a template and you can't have virtual templates. As Bill Clinton said, It depends on what the meaning of IS-A is. Often a bunch of types have a kind of unrecognized IS-A relationship -- they all have some common characteristic that is not recognized as common (by the type system) until you come along to write code that deals with that characteristic. (But if the T code is changing it's fragile to do this.) Kevin
Re: Go rant
Walter Bright Wrote: dsimcha wrote: == Quote from Walter Bright (newshou...@digitalmars.com)'s article qsort (x:xs) = qsort (filter ( x) xs) ++ [x] ++ qsort (filter (= x) xs) prominently featured in Haskell's official introduction page: http://www.haskell.org/haskellwiki/Introduction#Quicksort_in_Haskell Ain't that cool? But look closer. The advantage of quicksort is that it is an *in-place* sort. The Haskell one creates new arrays for every state in the sort. I think this can be optimized out in theory, though I have no idea how often it is in practice. I'm very doubtful it can be optimized out in theory. I'll bet it isn't done in practice. Ironically, I think you could write a nearly identical approach in D. By using array slicing, you could get something like Haskell's elegant syntax but still be sorting the array in-place. The other fundamental problem with the Haskell version is that it selects the first array element as the pivot. This is the worst choice, since arrays to be sorted tend to already be mostly sorted, and quicksort gives quadratic performance to sort an already sorted array with the first element as the pivot. True. All you'd need to remedy this, though, is a median of 3 function. Then it's not looking so simple and elegant anymore :-)
Re: D growing pains (was Re: The Demise of Dynamic Arrays?!)
Walter Bright Wrote: Kevin Bealer wrote: To smooth this out, it would help to have the best practices for doing common things in D (e.g. serialization, logging) somewhat documented for the consumption of non-experts. I wonder what a good way of doing this is? It's really impossible to predict what the best practices would be. Only time and usage will tell. The best practices for both C and C++ have evolved significantly over time. I think D2 is so different from D (and in some important ways, from everything out there) that a new usage style would have needed to be created even if D was already the lingua franca. The question that I find myself thinking is -- is chaos better or is it better to try to establish a particular style of programming (however simply or poorly) in the original books and resources that a programmer will find upon discovering D? Chaos has its appeal of course, but others would say that any initial style that will stick is better than no style. In other words, that you cannot evolve until you cohere. It's hard to choose... On the other hand I suppose Phobos is consistent enough with itself that if people follow that they will have a launch point. Kevin
D growing pains (was Re: The Demise of Dynamic Arrays?!)
retard Wrote: Tue, 15 Dec 2009 09:42:26 -0500, Steven Schveighoffer wrote: Most likely Walter won't even tell what kind of arrays D2 will have. Anything can happen. The final word is the undocumented executable you can download when the book hits stores. Even then dmd's behavior isn't the final word, it might as well be a bug. I find following D these days to be interesting but occasionally frustrating. It's a little like talking to a teenager who is growing so fast intellectually that his opinions change every day. You can't really argue with him because you don't know where he stands on anything if it's been more than a few days since the last conversation. But I think this is clearly a complaint that actually works out to an amazing compliment, e.g. that the language is still developing and maturing so much, and the changes seem well thought out, too. Congrats, Walter and Andrei! (If a teenager was already stuck in his ways mentally at 15 yrs old, it would be a much worse thing!) I think the new const regime, thread-local data as the default, easy code generation, opDispatch etc etc have enormous power and newness. For this reason I think that the stable set of best practices for working in D2 will take a while to settle down unless there is a lot of effort put into designing them somehow (if that's even possible). I find it hard to imagine what the average D programming style might look like a few years out in a 'typical work place'. Maybe this settling-out can't happen until a lot of D code is traded among both D experts and also more middle level developers. It seems like the best practices and expected usage for C++ and Java are often driven by the friction between people who are at different skill levels. The best (or just most common) practices seem to follow the don't surprise me principle, which probably doesn't coalesce until people have built up some common sets of expectations. Mid level developers (the rank and file of any language community) hate to work with code that is too clever to easily navigate through. I think best practices and style guidelines often tend (intentionally or not) to become a bulwark against too much invention and cleverness -- a trade-off of brain cycles to read the code versus expressiveness / compactness / performance / perfectionism. D's feature set has the ability to do enormously clever things. The rank and file maintenance programmer will probably prefer these things at least modularized and packed off into controlled sections of the code. This isn't a criticism, it's more like a house keeping thing, e.g. Thomas Edison's friends probably preferred the chemical batteries and various sharp items to be stored in the cabinets when they stopped by for tea. In general, a house is better suited for company if the sharp and useful things are put away. Likewise a code-base is probably more suitable for visits by coders unfamiliar with it's peculiarities when the use of advanced features is somehow kept under control or at least marked as special and surrounded by velvet ropes. To smooth this out, it would help to have the best practices for doing common things in D (e.g. serialization, logging) somewhat documented for the consumption of non-experts. I wonder what a good way of doing this is? Kevin
private (aka no-inherit) implementations for public methods
I think I have a solution to some problems that occur in OO design. The problems occurs when a method can be implemented that provides a great functionality for a specific class, but which it is not valid for its subclasses without extra work. The classic examples in C++ would be serialize. void serialize(ostream); A class can always implement serialize(), but it's children will have to override the method with their own definition, which will probably call parent::serialize(), then go on to copy any added fields. If they forget, the class does not return the right data (it loses the extra fields). The solution: even though the method is public, make it possible to mark the implementation as non-inherited. In other words, B will inherit the signature for clone() from A, but if the user attempts to link against B::clone() they had better implement it -- such a call will not resolve to A::clone(). If B wants to just use A's version they can just call A.serialize() in their method body. It's not a private method, it just requires developer attention one way or the other to derive from the class. I think this would make inheritance in OO design a lot less brittle. Note that constructors (in C++ anyway) already have this property -- they aren't inherited by default (except the default constructor). This is because methods like serialize, toString(), constructors, opCmp, etc are 'holistic' in the sense that they address all elements of a class and it is probably an error if they miss one. Other methods like increaseRefCount() or getName() are field-wise operations that deal with a few select member variables. Kevin