Re: Questions about Unicode, particularly Japanese
On 2010-06-08 23:16, Nick Sabalausky wrote: "Matti Niemenmaa" wrote in message news:hum6ft$2ja...@digitalmars.com... On 2010-06-08 22:27, Nick Sabalausky wrote: 6. Are there other languages with similar things for which the answers to #3 and #4 are different? (And if so, how does Phobos/Tango handle it?) Factor has pretty good support for Unicode: http://docs.factorcode.org/content/article-unicode.html Actually, I meant other human-languages. Like, are there other combining characters for some language other than Japanese that are indended to be compared as unequal to their corresponding singe-code-point version? Ah, sorry for the misunderstanding. :-) I don't think so, no. The Unicode FAQ at http://www.unicode.org/faq/normalization.html says "Programs should always compare canonical-equivalent Unicode strings as equal". Any idea if "Ruby markup" has anything to do with the Ruby programming language? It's not clear from that Wikipedia article. No, they're completely unrelated. -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
Re: Questions about Unicode, particularly Japanese
On 2010-06-08 22:27, Nick Sabalausky wrote: 1. Am I correct in all of that? Yes. In particular, the three-byteness of CJK characters is an often-cited reason to use UTF-16 instead of UTF-8. 2. Is there a proper way to encode that modifier character by itself? For instance, if you wanted to write "Japanese has a (the modifier by itself here) that changes a sound". You can combine it with a space, but yes: that mark, called the dakuten or voicing mark, can be encoded by itself as U+309B. I recommend http://rishida.net/scripts/uniview/ for searching through Unicode. 3. A text editor, for instance, is intended to treat something like (U+305D, U+3099) as a single character, right? Yes, I'd say so. I suppose it could allow for removing only the modifier (or the modified), but that doesn't seem like it should be the default behaviour. 4. When comparing strings, are (U+305E) and (U+305D, U+3099) intended to compare as equal? Yes. You might want to read about equivalence and normalization in Unicode: http://en.wikipedia.org/wiki/Unicode_equivalence 5. Does Phobos/Tango correctly abide by whatever the answer to #4 is? AFAIK, neither support normalization of any kind. 6. Are there other languages with similar things for which the answers to #3 and #4 are different? (And if so, how does Phobos/Tango handle it?) Factor has pretty good support for Unicode: http://docs.factorcode.org/content/article-unicode.html 7. I assume Unicode doesn't have any provisions for Furigana, right? I assume that would be outside the scope of Unicode, but I thought I'd ask. There's: U+FFF9 INTERLINEAR ANNOTATION ANCHOR U+FFFA INTERLINEAR ANNOTATION SEPARATOR U+FFFB INTERLINEAR ANNOTATION TERMINATOR But it's usually recommended to use some kind of ruby markup instead. See: http://en.wikipedia.org/wiki/Ruby_character#Ruby_in_Unicode -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
Re: Perfect hashing for string switch
On 2010-01-27 15:17, bearophile wrote: BCS: Have you compared it to a decisition tree or lex style state mechine? I have now implemented that too, it was not an immediate thing to do (I have removed the versions 2 to 5 to reduce code size on codepad): http://codepad.org/zOmPeE13 The results are good: Timings, ldc, seconds: test1: 4.48 // normal string switch test2: 2.98 // perfect hash test3: 2.09 test4: 2.07 test5: 5.44 // AA. Tango AA opIn_r is bug-slow test6: 1.18 // new result I hope this is enough. I have created that large finite state machine in D with a Python program :-) Your test6 is invalid: it reads beyond the bounds of the given array. For example with input "th", it will check whether the third character is 'i'; but the length of the array is only 2, so it shouldn't be doing that. -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
Re: array operation a[] + b[] not implemented??
On 2010-01-18 00:42, Trass3r wrote: It is implemented in the runtime so why doesn't it work? /*** * Computes: * a[] = b[] + c[] */ T[] _arraySliceSliceAddSliceAssign_f(T[] a, T[] c, T[] b) ... ... void main() { float[] a = [0.f, 0.5f, 1.0f, 1.5f, 2.0f, 2.5f]; float[] b = [0.f, 0.5f, 1.0f, 1.5f, 2.0f, 2.5f]; // float[] c = a[] + b[]; // <-- Array operation a[] + b[] not implemented float[] d = a[] * 4.f + 1.f; writeln(d); // <-- access violation when program is run } This is Bug 3066: http://d.puremagic.com/issues/show_bug.cgi?id=3066 Your code is invalid, since per the spec (http://www.digitalmars.com/d/1.0/arrays.html) array operations are valid only on the RHS of an assignment when there's a slice on the LHS. In your case, you're not performing an assignment at all, you're initializing, and array operations are not valid initializers. To fix, preallocate and then perform the array op: auto c = new float[a.length]; c[] = a[] + b[]; auto d = new float[a.length]; d[] = a[] * 4.f + 1.f; -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
Re: Compiler: Size of generated executable file
On 2010-01-11 11:04, Walter Bright wrote: bearophile wrote: I don't remember what --gc-sections is, but I guess it's something different. The code removed during the LTO is for example unreachable functions, or functions/methods, that once inlined are called from nowhere else, unused constants, etc. Here you can see an example on C code (in D1 it's the same): http://llvm.org/docs/LinkTimeOptimization.html Anyway, currently the LDC project is mostly sleeping. Optlink does this too. It's llldd technology, been around since the 80's. Consider the following program: int x; void foo() { x++; } int main() { return 0; } Compile it, dmd foo -L/map Now look at the map file with: grep foo foo.map = 0004:0090 _D3foo12__ModuleInfoZ 00434090 0003:0004 _D3foo1xi 00433004 0003:0004 _D3foo1xi 00433004 0004:0090 _D3foo12__ModuleInfoZ 00434090 = and we see that _D3foo3fooFZv does not appear in it. Optlink does this by default, you don't even have to throw a switch. _D3foo1xi, however, does appear in it, even though it's just as unused as _D3foo3fooFZv. Why doesn't Optlink remove that? LLVM's LTO does. -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
Re: Why not?
Uriel wrote: class Foo { private Foo[] m_SomeData; public this(int a, double b, string c) {} public Foo append(Foo obj) { m_SomeData ~= obj; return this; } } void foo(Foo obj) {} void main() { foo(1, 1.0, "1"); Foo obj = new Foo(); obj.append(1, 1.0, "1").append(2, 2.0, "2"); } Why not to do implicitly cast of these three parameters to new Foo object. We know that bar should recieve a Foo object and we have a call with parameters which exactly match one of Foo's constructors. It could be a nice syntactic sugar though not very hard to implement I think. This feature already exists, you just need to declare append and foo a bit differently: public Foo append(Foo obj...) {} void foo(Foo obj...) {}
Re: typedef: what's it good for?
Andrei Alexandrescu wrote: * typedef is hopelessly broken in very many ways * nobody noticed (i.e. no bugzilla reports), so probably nobody uses it No Bugzilla reports? Here're just a few: http://d.puremagic.com/issues/show_bug.cgi?id=632 http://d.puremagic.com/issues/show_bug.cgi?id=1335 http://d.puremagic.com/issues/show_bug.cgi?id=1344 http://d.puremagic.com/issues/show_bug.cgi?id=1595 I use typedefs of integral types in one project, mostly because I read the D spec when I started out and thought it'd be a good idea, but because of 1335 and 1344 I eventually realized it wasn't such a good idea after all. I do still use them, but I wouldn't miss them much if they were gone.
Re: opPow, opDollar
Stewart Gordon wrote: Matti Niemenmaa wrote: It's essentially because Haskell has separate type classes (knda like D interfaces... I won't go into that topic) for integers, fractional numbers, and floating-point numbers. In D the types of those three operators could be something like: anyinteger ^(anyintegerbase, anyinteger exponent); anyfractional ^^(anyfractional base, anyinteger exponent); anyfloating **(anyfloating base, anyfloating exponent); You've merely expanded on what I'd already made out - it doesn't explain why these generic functions can't share the same name. Is it because Haskell doesn't support function overloading as D does, or for some other reason? The former. Haskell does function overloading via type classes. I think that the reason why these functions can't have the same name is that they should all have a single, well-defined type and value. If they're all called 'pow', what is the type of pow? It can't have all three types at once, that makes no sense. And what happens if I give pow to a higher-order function: which one does it end up calling? You'd need some kind of notation to disambiguate. The developers of Haskell evidently opted to simply force differently-typed values to have different names, instead of being able to give them all the same name but then having to qualify which one you mean whenever you use it. That'd pretty much amount to them having different names anyway, I think. Just to show that this quality of Haskell isn't very limiting in practice, a somewhat tangential explanation of the way these exponentiation functions are overloaded follows. The types of these functions in Haskell are (read '::' as 'has type', the type after the last '->' as the return value and the others as the parameters): (^) :: (Num a, Integral b)=> a -> b -> a (^^) :: (Fractional a, Integral b) => a -> b -> a (**) :: (Floating a) => a -> a -> a The part before the '=>' is the class context, restricting the type variables 'a' and 'b'. 'a' and 'b' can be any type at all, as long as they satisfy the constraints. For instance, for (^), the base can be of any numeric type, but the exponent must be integral, and the result is of the same numeric type as the base. So when you're actually using the function, you might be using it under any of the following types: (^) :: Integer -> Integer -> Integer (^) :: Float -> Integer -> Float (^) :: Double -> Int8 -> Double As you can see, the functions are already overloaded, in a sense. What Haskell does not support is 'overloading the implementation' the way derivatives of C++ (or whatever language first came up with this) do: a function cannot have different implementations for different types. Instead, a type class defines certain methods that each type that is an instance of it must implement. For example, (^) could be defined in terms of (==), (*), and (-), like so: base ^ pow = if pow == 0 then 1 else base * (base ^ (pow-1)) (*) and (-) are methods of the Num class, and (==) belongs to a superclass of Num, so we can infer the type of this as: (^) :: (Num a, Num b) => a -> b -> a (The standard-library one restricts b to Integral, because this kind of definition is obviously valid only for integer exponents.) We now have a generic implementation of (^) that works for any two number types. What we can't do is say that it should do something different for certain types: its definition shows that it depends only on the methods (==), (*), and (-), so if we want to change the behaviour of (^) we can do so only by changing their behaviour. This is doable only by changing the Num instance involved, which can only be done by changing the types in question. The only things that can change their behaviour directly depending on the types involved are class methods, which are defined separately for each type. For instance, (-) is defined for Integers as bignum subtraction and (-) for Floats is some kind of built-in operation which eventually compiles to an fsub on x86. In fact, (**) is a method of the Floating class, and thus has a separate implementation for all floating-point types. -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
Re: opPow, opDollar
Stewart Gordon wrote: Andrei Alexandrescu wrote: Matti Niemenmaa wrote: Haskell has three exponentiation operators in the standard library: ^, ^^, and **. They are for non-negative integral exponents, integral exponents, and floating-point exponents respectively. I wonder whether that's an illustration of the power or of the failure of function overloading. (Seriously.) I'm not sure either. I don't speak Haskell, but my guess is that ^ and ^^ were meant to cut out the confusion that would happen if Word32 ^ Word32 (what weird naming conventions Haskell has!) returned an integer type but Int32 ^ Int32 returned a floating point type. But why it needs yet another for floating-point exponents, I don't know. Maybe Haskell supports only IFTI rather than true function overloading. It's essentially because Haskell has separate type classes (knda like D interfaces... I won't go into that topic) for integers, fractional numbers, and floating-point numbers. In D the types of those three operators could be something like: anyinteger ^(anyintegerbase, anyinteger exponent); anyfractional ^^(anyfractional base, anyinteger exponent); anyfloating **(anyfloating base, anyfloating exponent); A noteworthy fractional is the Rational type, a ratio of two integral values. Note that 0.5 is a valid Rational: it's 1/2. Note, still, that 0.5 ** 0.5 is no longer a valid Rational: it's the square root of 1/2. This is why ^^ is separate: fractionals can be safely raised to integer exponents, but if you take a fractional and raise it to a fractional power, you might not get a fractional back.
Re: opPow, opDollar
Don wrote: Yes, ^^ hasn't been used for exponentiation before. Fortran used ** because it had such a limited character set, but it's not really a natural choice; the more mathematically-oriented languages use ^. Obviously C-family languages don't have that possibility. Haskell has three exponentiation operators in the standard library: ^, ^^, and **. They are for non-negative integral exponents, integral exponents, and floating-point exponents respectively.
Re: Number literals (Was: Re: Case Range Statement ..)
BCS wrote: Hello Andrei, Derek Parnell wrote: On Tue, 07 Jul 2009 20:08:55 -0400, bearophile wrote: Nick Sabalausky: why in the world is anyone defending the continued existance of "5." and ".5"?< I'm for disallowing them; 5.0 ad 0.5 are better. Anyone else pro/against this idea? I would not complain if trailing dot and leading dot were disallowed. I think the question that should be asked is: would anyone complain if they were kept? We have bigger rocks to move than that one. Andrei Dropping them makes lexing cleaner: strictly "5..5" should lex as "5." ".5" based on maximal munch ( http://en.wikipedia.org/wiki/Maximal_munch ) http://d.puremagic.com/issues/show_bug.cgi?id=1466
Re: static this sucks, we should deprecate it
Steven Schveighoffer wrote: If we were importing compiled files (or even generated files), then the compiled file could have annotated the "static this" with the dependencies it has... I don't want to start another long thread on this, I understand Walter's "I want to use standard linkers" position. I don't think that's an argument against this; you can always compile both an intermediate representation for purposes such as these in addition to the standard object file. It's what the Haskell compiler GHC does, for instance. -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi
Re: std.partition is fucked
Sean Kelly wrote: The sort I wrote for Tango uses the same basic heuristics, thanks to a ticket that either you or Stewart Gordon submitted long ago. *Ahem*, I believe that http://www.dsource.org/projects/tango/ticket/571 was one of mine ;-) -- E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi