from:"\"Matti Niemenmaa\""

Re: Questions about Unicode, particularly Japanese

2010-06-08 Thread Matti Niemenmaa


On 2010-06-08 23:16, Nick Sabalausky wrote:

"Matti Niemenmaa"  wrote in message
news:hum6ft$2ja...@digitalmars.com...

On 2010-06-08 22:27, Nick Sabalausky wrote:

6. Are there other languages with similar things for which the answers to
#3
and #4 are different? (And if so, how does Phobos/Tango handle it?)


Factor has pretty good support for Unicode:

http://docs.factorcode.org/content/article-unicode.html



Actually, I meant other human-languages. Like, are there other combining
characters for some language other than Japanese that are indended to be
compared as unequal to their corresponding singe-code-point version?


Ah, sorry for the misunderstanding. :-)

I don't think so, no. The Unicode FAQ at 
http://www.unicode.org/faq/normalization.html says "Programs should 
always compare canonical-equivalent Unicode strings as equal".



Any idea if "Ruby markup" has anything to do with the Ruby programming
language? It's not clear from that Wikipedia article.


No, they're completely unrelated.

--
E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi

Re: Questions about Unicode, particularly Japanese

2010-06-08 Thread Matti Niemenmaa


On 2010-06-08 22:27, Nick Sabalausky wrote:



1. Am I correct in all of that?


Yes. In particular, the three-byteness of CJK characters is an 
often-cited reason to use UTF-16 instead of UTF-8.



2. Is there a proper way to encode that modifier character by itself? For
instance, if you wanted to write "Japanese has a (the modifier by itself
here) that changes a sound".


You can combine it with a space, but yes: that mark, called the dakuten 
or voicing mark, can be encoded by itself as U+309B.


I recommend http://rishida.net/scripts/uniview/ for searching through 
Unicode.



3. A text editor, for instance, is intended to treat something like (U+305D,
U+3099) as a single character, right?


Yes, I'd say so. I suppose it could allow for removing only the modifier 
(or the modified), but that doesn't seem like it should be the default 
behaviour.



4. When comparing strings, are (U+305E) and (U+305D, U+3099) intended to
compare as equal?


Yes. You might want to read about equivalence and normalization in Unicode:

http://en.wikipedia.org/wiki/Unicode_equivalence


5. Does Phobos/Tango correctly abide by whatever the answer to #4 is?


AFAIK, neither support normalization of any kind.


6. Are there other languages with similar things for which the answers to #3
and #4 are different? (And if so, how does Phobos/Tango handle it?)


Factor has pretty good support for Unicode:

http://docs.factorcode.org/content/article-unicode.html


7. I assume Unicode doesn't have any provisions for Furigana, right? I
assume that would be outside the scope of Unicode, but I thought I'd ask.


There's:

U+FFF9  INTERLINEAR ANNOTATION ANCHOR
U+FFFA  INTERLINEAR ANNOTATION SEPARATOR
U+FFFB  INTERLINEAR ANNOTATION TERMINATOR

But it's usually recommended to use some kind of ruby markup instead. See:

http://en.wikipedia.org/wiki/Ruby_character#Ruby_in_Unicode

--
E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi

Re: Perfect hashing for string switch

2010-01-27 Thread Matti Niemenmaa


On 2010-01-27 15:17, bearophile wrote:

BCS:

Have you compared it to a decisition tree or lex style state mechine?


I have now implemented that too, it was not an immediate thing to do (I have 
removed the versions 2 to 5 to reduce code size on codepad):
http://codepad.org/zOmPeE13

The results are good:
Timings, ldc, seconds:
   test1: 4.48 // normal string switch
   test2: 2.98 // perfect hash
   test3: 2.09
   test4: 2.07
   test5: 5.44 // AA. Tango AA opIn_r is bug-slow
   test6: 1.18 // new result

I hope this is enough.
I have created that large finite state machine in D with a Python program :-)


Your test6 is invalid: it reads beyond the bounds of the given array. 
For example with input "th", it will check whether the third character 
is 'i'; but the length of the array is only 2, so it shouldn't be doing 
that.


--
E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi

Re: array operation a[] + b[] not implemented??

2010-01-17 Thread Matti Niemenmaa


On 2010-01-18 00:42, Trass3r wrote:

It is implemented in the runtime so why doesn't it work?


/***
* Computes:
* a[] = b[] + c[]
*/

T[] _arraySliceSliceAddSliceAssign_f(T[] a, T[] c, T[] b)
...
...



void main()
{
float[] a = [0.f, 0.5f, 1.0f, 1.5f, 2.0f, 2.5f];
float[] b = [0.f, 0.5f, 1.0f, 1.5f, 2.0f, 2.5f];

// float[] c = a[] + b[]; // <-- Array operation a[] + b[] not implemented
float[] d = a[] * 4.f + 1.f;
writeln(d); // <-- access violation when program is run
}


This is Bug 3066: http://d.puremagic.com/issues/show_bug.cgi?id=3066

Your code is invalid, since per the spec 
(http://www.digitalmars.com/d/1.0/arrays.html) array operations are 
valid only on the RHS of an assignment when there's a slice on the LHS. 
In your case, you're not performing an assignment at all, you're 
initializing, and array operations are not valid initializers.


To fix, preallocate and then perform the array op:

auto c = new float[a.length];
c[] = a[] + b[];
auto d = new float[a.length];
d[] = a[] * 4.f + 1.f;

--
E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi

Re: Compiler: Size of generated executable file

2010-01-11 Thread Matti Niemenmaa


On 2010-01-11 11:04, Walter Bright wrote:

bearophile wrote:

I don't remember what --gc-sections is, but I guess it's something
different. The code removed during the LTO is for example unreachable
functions, or functions/methods, that once inlined are called from
nowhere else, unused constants, etc. Here you can see an example on C
code (in D1 it's the same):
http://llvm.org/docs/LinkTimeOptimization.html Anyway, currently the
LDC project is mostly sleeping.


Optlink does this too. It's llldd technology, been around since the
80's. Consider the following program:


int x;
void foo() { x++; }
int main() { return 0; }


Compile it,

dmd foo -L/map



Now look at the map file with:

grep foo foo.map

=
0004:0090 _D3foo12__ModuleInfoZ 00434090
0003:0004 _D3foo1xi 00433004
0003:0004 _D3foo1xi 00433004
0004:0090 _D3foo12__ModuleInfoZ 00434090
=

and we see that _D3foo3fooFZv does not appear in it. Optlink does this
by default, you don't even have to throw a switch.


_D3foo1xi, however, does appear in it, even though it's just as unused 
as _D3foo3fooFZv. Why doesn't Optlink remove that? LLVM's LTO does.


--
E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi

Re: Why not?

2009-11-29 Thread Matti Niemenmaa


Uriel wrote:

class Foo {
  private Foo[] m_SomeData;

  public this(int a, double b, string c) {}

  public Foo append(Foo obj) {
m_SomeData ~= obj;
return this;
  }
}

void foo(Foo obj) {}

void main() {
  foo(1, 1.0, "1");

  Foo obj = new Foo();
  obj.append(1, 1.0, "1").append(2, 2.0, "2");
}

Why not to do implicitly cast of these three parameters to new Foo 
object. We know that bar should recieve a Foo object and we have a call 
with parameters which exactly match one of Foo's constructors. It could 
be a nice syntactic sugar though not very hard to implement I think.


This feature already exists, you just need to declare append and foo a 
bit differently:


public Foo append(Foo obj...) {}
void foo(Foo obj...) {}

Re: typedef: what's it good for?

2009-11-11 Thread Matti Niemenmaa


Andrei Alexandrescu wrote:

* typedef is hopelessly broken in very many ways

* nobody noticed (i.e. no bugzilla reports), so probably nobody uses it


No Bugzilla reports? Here're just a few:

http://d.puremagic.com/issues/show_bug.cgi?id=632
http://d.puremagic.com/issues/show_bug.cgi?id=1335
http://d.puremagic.com/issues/show_bug.cgi?id=1344
http://d.puremagic.com/issues/show_bug.cgi?id=1595

I use typedefs of integral types in one project, mostly because I read 
the D spec when I started out and thought it'd be a good idea, but 
because of 1335 and 1344 I eventually realized it wasn't such a good 
idea after all.


I do still use them, but I wouldn't miss them much if they were gone.

Re: opPow, opDollar

2009-11-08 Thread Matti Niemenmaa


Stewart Gordon wrote:

Matti Niemenmaa wrote:

It's essentially because Haskell has separate type classes (knda 
like D interfaces... I won't go into that topic) for integers, 
fractional numbers, and floating-point numbers. In D the types of 
those three operators could be something like:


anyinteger ^(anyintegerbase, anyinteger  exponent);
anyfractional ^^(anyfractional base, anyinteger  exponent);
anyfloating   **(anyfloating   base, anyfloating exponent);



You've merely expanded on what I'd already made out - it doesn't explain 
why these generic functions can't share the same name.  Is it because 
Haskell doesn't support function overloading as D does, or for some 
other reason?


The former. Haskell does function overloading via type classes.

I think that the reason why these functions can't have the same name is 
that they should all have a single, well-defined type and value. If 
they're all called 'pow', what is the type of pow? It can't have all 
three types at once, that makes no sense. And what happens if I give pow 
to a higher-order function: which one does it end up calling? You'd need 
some kind of notation to disambiguate. The developers of Haskell 
evidently opted to simply force differently-typed values to have 
different names, instead of being able to give them all the same name 
but then having to qualify which one you mean whenever you use it. 
That'd pretty much amount to them having different names anyway, I think.


Just to show that this quality of Haskell isn't very limiting in 
practice, a somewhat tangential explanation of the way these 
exponentiation functions are overloaded follows.


The types of these functions in Haskell are (read '::' as 'has type', 
the type after the last '->' as the return value and the others as the 
parameters):


(^)  :: (Num a, Integral b)=> a -> b -> a
(^^) :: (Fractional a, Integral b) => a -> b -> a
(**) :: (Floating a)   => a -> a -> a

The part before the '=>' is the class context, restricting the type 
variables 'a' and 'b'. 'a' and 'b' can be any type at all, as long as 
they satisfy the constraints. For instance, for (^), the base can be of 
any numeric type, but the exponent must be integral, and the result is 
of the same numeric type as the base. So when you're actually using the 
function, you might be using it under any of the following types:


(^) :: Integer -> Integer -> Integer
(^) :: Float -> Integer -> Float
(^) :: Double -> Int8 -> Double

As you can see, the functions are already overloaded, in a sense. What 
Haskell does not support is 'overloading the implementation' the way 
derivatives of C++ (or whatever language first came up with this) do: a 
function cannot have different implementations for different types. 
Instead, a type class defines certain methods that each type that is an 
instance of it must implement. For example, (^) could be defined in 
terms of (==), (*), and (-), like so:


base ^ pow = if pow == 0 then 1 else base * (base ^ (pow-1))

(*) and (-) are methods of the Num class, and (==) belongs to a 
superclass of Num, so we can infer the type of this as:


(^) :: (Num a, Num b) => a -> b -> a

(The standard-library one restricts b to Integral, because this kind of 
definition is obviously valid only for integer exponents.)


We now have a generic implementation of (^) that works for any two 
number types. What we can't do is say that it should do something 
different for certain types: its definition shows that it depends only 
on the methods (==), (*), and (-), so if we want to change the behaviour 
of (^) we can do so only by changing their behaviour. This is doable 
only by changing the Num instance involved, which can only be done by 
changing the types in question.


The only things that can change their behaviour directly depending on 
the types involved are class methods, which are defined separately for 
each type. For instance, (-) is defined for Integers as bignum 
subtraction and (-) for Floats is some kind of built-in operation which 
eventually compiles to an fsub on x86. In fact, (**) is a method of the 
Floating class, and thus has a separate implementation for all 
floating-point types.


--
E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi

Re: opPow, opDollar

2009-11-08 Thread Matti Niemenmaa


Stewart Gordon wrote:

Andrei Alexandrescu wrote:

Matti Niemenmaa wrote:
Haskell has three exponentiation operators in the standard library: 
^, ^^, and **. They are for non-negative integral exponents, integral 
exponents, and floating-point exponents respectively.


I wonder whether that's an illustration of the power or of the failure 
of function overloading. (Seriously.)


I'm not sure either.  I don't speak Haskell, but my guess is that ^ and 
^^ were meant to cut out the confusion that would happen if Word32 ^ 
Word32 (what weird naming conventions Haskell has!) returned an integer 
type but Int32 ^ Int32 returned a floating point type.


But why it needs yet another for floating-point exponents, I don't know. 
 Maybe Haskell supports only IFTI rather than true function overloading.


It's essentially because Haskell has separate type classes (knda 
like D interfaces... I won't go into that topic) for integers, 
fractional numbers, and floating-point numbers. In D the types of those 
three operators could be something like:


anyinteger ^(anyintegerbase, anyinteger  exponent);
anyfractional ^^(anyfractional base, anyinteger  exponent);
anyfloating   **(anyfloating   base, anyfloating exponent);

A noteworthy fractional is the Rational type, a ratio of two integral 
values. Note that 0.5 is a valid Rational: it's 1/2. Note, still, that 
0.5 ** 0.5 is no longer a valid Rational: it's the square root of 1/2. 
This is why ^^ is separate: fractionals can be safely raised to integer 
exponents, but if you take a fractional and raise it to a fractional 
power, you might not get a fractional back.

Re: opPow, opDollar

2009-11-07 Thread Matti Niemenmaa


Don wrote:
Yes, ^^ hasn't been used for exponentiation before. Fortran used ** 
because it had such a limited character set, but it's not really a 
natural choice; the more mathematically-oriented languages use ^. 
Obviously C-family languages don't have that possibility.


Haskell has three exponentiation operators in the standard library: ^, 
^^, and **. They are for non-negative integral exponents, integral 
exponents, and floating-point exponents respectively.

Re: Number literals (Was: Re: Case Range Statement ..)

2009-07-07 Thread Matti Niemenmaa


BCS wrote:

Hello Andrei,


Derek Parnell wrote:


On Tue, 07 Jul 2009 20:08:55 -0400, bearophile wrote:


Nick Sabalausky:


why in the world is anyone defending the continued existance of
"5." and ".5"?<


I'm for disallowing them; 5.0 ad 0.5 are better.
Anyone else pro/against this idea?

I would not complain if trailing dot and leading dot were disallowed.


I think the question that should be asked is: would anyone complain if
they were kept? We have bigger rocks to move than that one.

Andrei



Dropping them makes lexing cleaner: strictly "5..5" should lex as "5." 
".5" based on maximal munch ( http://en.wikipedia.org/wiki/Maximal_munch )


http://d.puremagic.com/issues/show_bug.cgi?id=1466

Re: static this sucks, we should deprecate it

2009-05-28 Thread Matti Niemenmaa


Steven Schveighoffer wrote:
If we were importing compiled files (or even generated files), then the 
compiled file could have annotated the "static this" with the 
dependencies it has...


I don't want to start another long thread on this, I understand Walter's 
"I want to use standard linkers" position.


I don't think that's an argument against this; you can always compile 
both an intermediate representation for purposes such as these in 
addition to the standard object file. It's what the Haskell compiler GHC 
does, for instance.


--
E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi

Re: std.partition is fucked

2009-05-13 Thread Matti Niemenmaa


Sean Kelly wrote:

The sort I wrote for Tango uses the same basic heuristics, thanks to
a ticket that either you or Stewart Gordon submitted long ago.


*Ahem*, I believe that http://www.dsource.org/projects/tango/ticket/571 
was one of mine ;-)


--
E-mail address: matti.niemenmaa+news, domain is iki (DOT) fi

Re: Questions about Unicode, particularly Japanese

Re: Questions about Unicode, particularly Japanese

Re: Perfect hashing for string switch

Re: array operation a[] + b[] not implemented??

Re: Compiler: Size of generated executable file

Re: Why not?

Re: typedef: what's it good for?

Re: opPow, opDollar

Re: opPow, opDollar

Re: opPow, opDollar

Re: Number literals (Was: Re: Case Range Statement ..)

Re: static this sucks, we should deprecate it

Re: std.partition is fucked

13 matches

Site Navigation

Mail list logo

Footer information