Re: Is @property implementable?

2011-03-04 Thread Kevin Bealer
== Quote from Jonathan M Davis (jmdavisp...@gmx.com)'s article
 But that's not how the function is written. The left parameter is the
 destination. If myString.strcpy(myString2) is confusing, I would expect
 strcpy(myString, myString2) to be just as confusing. I don't see how using the
 member function call syntax like that makes it any more confusing. Not to
 mention, functions with a source and destination like that end up in both 
 orders
 all the time, so I don't think that you can generally expect it in one order 
 or
 the other anyway.
 - Jonathan M Davis

I personally think strcpy got it wrong anyway -- everyone says 'input and 
output',
not 'output and input', so it's weird to think of the output mentioned first.
It's as if you wrote:

// everyone who uses this will get the argument order wrong at least once
int everythingNice(int spice, int sugar);

But that's a side note...

Personally, I think the UFCS for anything but arrays is an engraved invitation 
to
function hijacking.  Imagine defining this in your code:

int find(T, U)(T a, U b)
{
...
}

class Foo {
int find(double x, char[] y);
};

Foo x;
x.find(a, hello); // would this call the template or the method?

Whoops -- now every class, structure, and array in your program has a method
called find() that hijacks any arguments that don't exactly match an existing
function.  If you call x.find(), you can't know if you are getting the method in
the class or a free function that got sucked in because of this syntax.  Of 
course
if you get the signature exactly right it probably prefers the method over the
free function.  Arrays are special because the only methods they have are *built
in* methods.  Thus, a basic understanding of the language makes it possible to
avoid the ambiguity.  But implementing UFCS for other types is a problem.

If I do a.getIndex(5) and the compiler says there is no method getIndex on 
class
A, this is a gift --- it is almost always the case that this is an error.  I 
doubt
I ever want getIndex(A, int) but wrote A.getIndex(5) or where I wanted f(A, int)
but wrote a.f(5).

If you want another method in your class *add it to your class*.  The reason it
makes sense to have it for arrays is so that you can define common methods 
across
all your types like serialize method or toString or deepCopy and know what
these methods do.  Then you can use them in templates.  Arrays can't be 
subclassed
to add the methods so you need a way to produce the same syntax as a method 
call.

Since arrays can't be derived from, UFCS fills in a real gap and doesn't mask
existing or future methods unless the interface for arrays changes which is 
rare,
we hope.  Allowing it for other types doesn't fill a gap, it creates an
overlap/ambiguity.  It doesn't let you do anything you can't already do.  It's
like a built-in alias this that you can't turn off and that matches across all 
the
free functions of the world if I'm understanding the matching correctly.

In most cases you can also use functions (for deepCopy etc) to work with both
classes and arrays, but this doesn't work with virtual functions, so the method
syntax is nicer for that.  So UFCS is justified to make arrays 'look like 
classes'
to template code.  But it's not justified in my opinion to make classes look 
like
classes with one more method.

Kevin


Re: Appender and CTFE

2011-03-04 Thread Kevin Bealer
== Quote from Andrei Alexandrescu (seewebsiteforem...@erdani.org)'s article
 I see nothing wrong with the occasional forking conditioned by __ctfe.
 Even today, code may fork an optimized but nonportable implementation of
 some algorithm. The main requirement is that such forks are rare enough
 to not cause undue maintenance burden.

 Andrei

Regarding maintenance burden, it should be easy to test the correctness of
such code:

in a unit test:

  enum a = f(...);
  assert(a == f(...));

Kevin


Re: std.parallelism: Call for benchmarks

2011-02-26 Thread Kevin Bealer
I would say look for small things people do which are CPU intensive.  One 
example
is sorting -- it can be parallelized fairly well and yet sorting 10_000_000
elements still takes long enough to be worth parallelizing.

If you want examples of working parallel sort etc you can look in the 'Futurism'
library that I wrote when exploring Futures for D a few years ago, it's on
dsource.  Feel free to use whatever you find in there... There might be some 
other
examples but I don't remember off hand.

Kevin


uniqueness propagation

2011-02-24 Thread Kevin Bealer
I think immutable could benefit from a Value Range Propagation-like uniqueness
logic:

string a;
char[] b;

string c = a ~ b; // result of ~ is always unique
string z = c.dup; // also any kind of .dup is definitely unique
char[] q = z.dup; // also okay -- assign to a non-immutable

The lines w/ dup or ~ could be considered legal, since these ops produce
definitely unique values.  A handful
of specific actions known to produce unique values would produce temporaries
with a 'uniqueness' property, which
allows them to go either way toward mutable or immutable.  This list includes
(malloc, new, ~, .dup) but maybe
others.

Once an expression is assigned to a variable it becomes immutable or mutable
as per that variable's traits.

As an extension variables could be labeled as 'unique' which essentially means
they contain a value that is not
shared.  Unique would only be legal in cases where escape analysis can easily
prove that the value does not
get copied into both immutable and mutable variables (see below about
portability).

You could label a function as producing unique values -- only legal if the
value returned is definitely unique.
Again, if the language cannot do escape analysis then the user will be
required to do a defensive dup.

Portability note:  For portability reasons it is better to have well defined
escape analysis rules that can be
implemented by any compiler -- so a smart compiler might actually know there
is no escape but still be required
to forbid a particular expression since it breaks the rules.

Kevin


Re: float equality

2011-02-23 Thread Kevin Bealer
== Quote from Sean Kelly (s...@invisibleduck.org)'s article
 Andrei Alexandrescu seewebsiteforem...@erdani.org wrote:
  On 2/21/11 6:08 PM, bearophile wrote:
  Andrei Alexandrescu:
 
  This is a long-standing myth. I worked on Wall Street and have friends
  who have been doing it for years. Everybody uses double.
 
  Unbrutal Python programmers are encouraged to avoid the float type to
  manage money values, and use decimal instead:
  http://docs.python.org/library/decimal.html
 
  Bye,
  bearophile
 
  ... and nobody uses Python :o).
 C/C++, Visual Basic, and SQL in my experience. Though in SQL they may use
 the 'money' type. Using double really isn't an issue so long as rounding is
 handled appropriately. It seems like a lot of the trouble stems from people
 thinking values must be naturally exactly represented by the storage type.

The semantics of floating point are complicated, and I think the human brain
usually sees 'complicated' and 'unpredictable/unstable' as synonyms.  As for me,
I heard long ago that double wasn't used and I think I believed it because I'd
like to believe that someone out there is doing the responsible thing.  Now that
I have a bit more experience I can see that floating point is reliable, even if
it is unintuitive.  The instinct is to avoid it where correctness is important
but if you analyze the code you can predict a lot about what floating point will
do even if not always everything.  If you can keep the error to the sub-penny
level you should be fine with either approach.

But I think C taught a lot of people to be wary.  C considers an amazingly large
percentage of its allowed syntax to have undefined semantics.  Which direction
integers round, sizes of basic types, bit shifting with values greater than
sizeof(int)-1, etc.  Almost anything that varies from CPU to CPU is not defined
by C, with the result that you need to build a mini-jail in your mind to write
portable code.  Floating point is part of this because many floating point
expressions produce different results depending on choices made by the compiler
while optimizing, e.g. when to shift numbers from real - double.

The number of things a language defines in a 'do whatever is easy or quick for
your platform' ends up being a coefficient of the difficulty of using a language
to write (portably) correct code.

As a result, almost all large C programs are non-portable and I'm convinced the
majority have undefined behavior if you take a narrow view of what C guarantees
about the size of int, the order of side effects, and (if you really want to 
play
hardball) the memory hierarchy in a multithreaded program.

Kevin


Re: Uh... destructors?

2011-02-23 Thread Kevin Bealer
== Quote from Stewart Gordon (smjg_1...@yahoo.com)'s article
 On 23/02/2011 18:07, Ary Manzana wrote:
  On 2/22/11 10:36 AM, Simen Kjaeraas wrote:
  %u Wrote:
  Well, the trouble is, pretty much all of these are invalid attributes:
 
  - static obviously makes no sense
 
  And here is where you're wrong. You have defined a static destructor, which
is called
  with module destructor as the program goes out of scope, rather than when
your struct or
  class is destroyed.
 
  This is why attributes that make no sense must be an error: you don't know 
  if
an attribute
  you put is being ignored by the compiler or not (like what has just happened
here).
 Uh, that's a total non sequitur.  The point Simen is making is that static
_does_ make
 sense here.
 Stewart.

But if the compiler always rejected the nonsensical ones it would be clear that
the ones that
are not rejected have some meaning.  The ambiguity he is talking about (I think)
is between
the two ideas the compiler accepted this but its meaningless and the compiler
accepted this
so it *must* be meaningful.  If you can't be sure the second is true, then you
don't realize
there is a bug, e.g. if you did not intend the destructor to be a class/module
destructor.

Kevin


Re: Uh... destructors?

2011-02-23 Thread Kevin Bealer
== Quote from bearophile (bearophileh...@lycos.com)'s article
...
 Currently you are able to write functions like:
 pure bool randomPure() {
 int[] a1 = new int[1];
 int[] a2 = new int[2];
 return a1.ptr  a2.ptr;
 }

Is it possible to fix this by disallowing using the value of a pointer,
except allowing them to be compared for equality (only)?  In other words,
within a pure function, only 'is' '!is', ==, != are permitted as tests
of pointers?  Other arithmetic forbidden... this sort of matches the Java
model where you can't fiddle with references except to check if they are
the same.

Kevin


Re: Uh... destructors?

2011-02-23 Thread Kevin Bealer
== Quote from Kevin Bealer (kevindangerbea...@removedanger.gmail.com)'s article
 == Quote from bearophile (bearophileh...@lycos.com)'s article
 ...
  Currently you are able to write functions like:
  pure bool randomPure() {
  int[] a1 = new int[1];
  int[] a2 = new int[2];
  return a1.ptr  a2.ptr;
  }
 Is it possible to fix this by disallowing using the value of a pointer,
 except allowing them to be compared for equality (only)?  In other words,
 within a pure function, only 'is' '!is', ==, != are permitted as tests
 of pointers?  Other arithmetic forbidden... this sort of matches the Java
 model where you can't fiddle with references except to check if they are
 the same.
 Kevin

I wanted to add something I just realized -- malloc is essentially pure if
you can't compare the pointers it returns.  Of course if you can't do either
casting or arithmetic on these pointers I'm not sure how to use malloc.

Kevin


Re: float equality

2011-02-21 Thread Kevin Bealer
== Quote from Walter Bright (newshou...@digitalmars.com)'s article
 Kevin Bealer wrote:
 A reasonable way to do financial work is to use longs to represent pennies.
 After all, you don't have fractional cents in your accounts.
 Using floating point to represent money is a disaster in the making.
...
 There's just no getting around needing to understand how computer arithmetic 
 works.
...
 I.e. you have to know what you're doing :-)

I'm sure that banks do what you are suggesting and have a policy (maybe a
regulatorily required one) on who gets the fractional cents.  This is easy 
because
cents are a standard that everyone agrees is the 'bottom'.

But there are lots of other cases that aren't standardized... i.e. is time
represented in seconds, milliseconds, nanoseconds?  Java, Windows and Linux use
different versions.  Some IBM computers measure time in 1/2^Nths of a second,
where N is around 20 or 30.  Is land measured in square feet or acres?

Whatever you pick as the bottom may not be good enough in the future.  If you
introduce more exact measurements, now you have mixed representations.  If you 
had
a rational number of acres, on the other hand, you could start doing 
measurements
in square feet and representing them as x/(square feet in an acre).

Mixed representations are trouble because they invite usage errors.  Rational
doesn't change the facts of life but it simplifies some things because it makes
the choice of base an arbitrary problem-domain kind of choice whereas in fixed
point it is very difficult to revisit this decision.

Every time I think about this, though, I think of classroom examples, so it's
probably the case that my arguments are academic (in the bad sense)... and I
should give it up.  If this was more useful it would probably be used more.

Kevin


Re: float equality

2011-02-20 Thread Kevin Bealer
== Quote from Jonathan M Davis (jmdavisp...@gmx.com)'s article
...
 It may be that you would still end up with situations where two values that 
 you
 would think would be the same aren't due to rounding error or whatnot. 
 However,
 with a fixed point value, you wouldn't have the problem where a particular 
 value
 could not be held in it even if it's within its range of precision. As I
 understand it, there are a number of values which cannot be held in a floating
 point and therefore end up being rounded up or down simply because of how
 floating points work and not because there the precision isn't high enough.

 It's definitely true however, that using fractions would be much more accurate
 for a lot of stuff. That wouldn't be particulary efficient though. Still, if 
 you're
 doing a lot of math that needs to be accurate, that may be the way to go.

 - Jonathan M Davis

Floating point numbers are a good compromise for a lot of purposes, but yeah 
they
are limited.  Here are some ways to open up the limits a bit...

(First some math for those who haven't gone down this path...)

The reason that fixed point does better than floating is that fixed point 
classes
use base ten and floating point uses base 2.  In base two, all fractions that 
have
a denominator of a power of 2 can be represented (e.g. 3/16) exactly, and all 
others
can't.

Fixed point solves this for numbers like .1 or .0003 because the denominator is 
a
power of ten.  Ten is 5*2 and the way it works is that any denominator that is 
a power
of two times a power of ten can be represented exactly.

So you're good for .01 or .039028 because these are (1/100) and 
(39028/100).  But
neither fixed point nor floating can represent something like 1/3 or 1/11 
exactly.

If you used base 3, then 1/3 would be perfectly representable but 1/2 would not 
be.
I think this is why the ancients used 360 degrees in a circle... you can 
represent
a lot of common fractions as parts of 360, including 1/2, 1/3, 1/4, 1/5, 1/6, 
1/8.
But still not 1/7 or 1/11.

How to solve this problem?

You can define a type like this:

struct Rational {
// represents numerator over denominator
... (various arithmetic methods and operators)...

long numerator;
long denominator;
}

(Lots of people have written classes like this, maybe even for D. ;)

If you've taken high school math and CS 101 you can write the methods here, and 
then
you can represent any fraction.  This is a better way to go for a lot of 
problems if
you need exactness, but its a bit slower.

This isn't perfect though ... if you try to add the numbers 1/2, 1/3 
1/100
you will eventually find the denominator overflows -- you can check for this and
'reduce' the fraction (remove common divisors from top and bottom) periodically,
but eventually you will even get an irreducable fraction and you will need, 
again,
to compromise and give away some precision.  The simplest way to do this 
compromise
is probably to divide the top and bottom number by 2, try to reduce again, and 
then
try the operation again (repeat until the operation doesn't overflow).  (Maybe 
there
is a better way though.)

For some problems the precision loss will never happen -- for example if you are
adding dollars and cents represented as (x/100) then the denominator never needs
to get bigger than 100.  If you need to compute mortgage payments it might get a
little bigger... but I'm not sure if it will ever overflow for this case.

Another source of compromise is when a number is really big or small, for 
example
a number like 2^200 or 1/2^200 can be represented in a 'double' but would 
overflow
a Rational as defined above.

You can delay the compromise a bit by using a struct like this:

struct Rational2 {
// represents the number numerator / (denominator ^ exponent)
// adjust integer types to taste
long numerator;
long denominator;
long exponent;
};

This lets you go much further, and represent truly astonishingly large and small
numbers.  It complicates the math in the arithmetic operators a bit though.
Eventually, even this version will lose precision as the fraction still becomes
irreducable if there are a unique set of prime factors in the denominator.  So
this can represent much larger ranges than double or real, but I think it still
can't represent 1/2*1/3*1/1000_000_000 exactly.

Now... what if you wanted to avoid the compromise completely?

You could switch to this:

struct {
BigInt numerator;
BigInt denominator;
};

Bingo -- no compromise.  Two down sides to this:

1. You trade off memory for accuracy as the BigInts grow over time, assuming
the fraction is irreducable.

2. It's slower -- BigInt objects are essentially byte strings and even things
like addition require a function call and looping.

This is a pretty flexible solution though.  However, numbers like pi and 'e'
can't be represented as a rational number at all, so you are still stuck if
you want to do logarithms or 

Re: float equality

2011-02-20 Thread Kevin Bealer
== Quote from Walter Bright (newshou...@digitalmars.com)'s article
 Kevin Bealer wrote:
  You could switch to this:
 
  struct {
  BigInt numerator;
  BigInt denominator;
  };
 
  Bingo -- no compromise.
 It cannot represent irrational numbers accurately.

True but I did mention this a few lines later.

Kevin


Re: float equality

2011-02-20 Thread Kevin Bealer
== Quote from Walter Bright (newshou...@digitalmars.com)'s article
...
 I do understand that if you have a full symbolic representation, you can do so
 with zero losses. But Kevin's proposal was not that, it was for a ratio
 representation.

 All it represents symbolically is division. There are plenty of other 
 operations.

I'm just answering the original poster's question.  You're right though -- it's 
not
a complete numerical system, (and I don't propose it for inclusion in the 
language
or even necessarily the library.)

I had two goals:

1. To solve the basic problem the original poster was asking -- if you are 
working
with simple decimals and arithmetic you can get completely accurate 
representations
this way.  For some cases like simple financial work this might work really 
well.
e.g. where float would not be because of the slow leak of information with each
operation.  (I assume real professional financial work is already done using a
(better) representation.)

2. To explain why the 'simple' task of representing something like .1 wasn't as 
easy
as it looks.  In other words, why the people who designed float weren't just 
brain
dead.  I think they really knew what they were doing but it shocks most people 
at
first that a modern computer can't do what they see as grade school arithmetic.

I think for some purposes though, lossless domain specific representations can
be a good tool -- if you can represent a problem in a way that is lossless you 
can
maybe do better calculations over long series than working with 'double' and 
taking
the accuracy hit.  This is necessarily an application specific technique though.

Kevin



Re: Integer conversions too pedantic in 64-bit

2011-02-17 Thread Kevin Bealer
== Quote from Daniel Gibson (metalcae...@gmail.com)'s article
 It was not proposed to alter ulong (int64), but to only a size_t equivalent. 
 ;)
 And I agree that not having unsigned types (like in Java) just sucks.
 Wasn't Java even advertised as a programming language for network stuff? Quite
 ridiculous without unsigned types..
 Cheers,
 - Daniel

Ah yes, but if you want to copy data quickly you want to use the efficient size
for doing so.  Since architectures vary, size_t (or the new name if one is 
added)
would seem to new users to be the natural choice for that size.  So it becomes a
likely error if it doesn't behave as expected.

My personal reaction to this thread is that I think most of the arguments of the
people who want to change the name or add a new one are true -- but not 
sufficient
to make it worth while.  There is always some learning curve and size_t is not
that hard to learn or that hard to accept.

Kevin


Re: Integer conversions too pedantic in 64-bit

2011-02-16 Thread Kevin Bealer
== Quote from spir (denis.s...@gmail.com)'s article
 On 02/16/2011 03:07 AM, Jonathan M Davis wrote:
  On Tuesday, February 15, 2011 15:13:33 spir wrote:
  On 02/15/2011 11:24 PM, Jonathan M Davis wrote:
  Is there some low level reason why size_t should be signed or something
  I'm completely missing?
 
  My personal issue with unsigned ints in general as implemented in C-like
  languages is that the range of non-negative signed integers is half of the
  range of corresponding unsigned integers (for same size).
  * practically: known issues, and bugs if not checked by the language
  * conceptually: contradicts the obvious idea that unsigned (aka naturals)
  is a subset of signed (aka integers)
 
  It's inevitable in any systems language. What are you going to do, throw 
  away a
  bit for unsigned integers? That's not acceptable for a systems language. On 
  some
  level, you must live with the fact that you're running code on a specific 
  machine
  with a specific set of constraints. Trying to do otherwise will pretty much
  always harm efficiency. True, there are common bugs that might be better
  prevented, but part of it ultimately comes down to the programmer having 
  some
  clue as to what they're doing. On some level, we want to prevent common 
  bugs,
  but the programmer can't have their hand held all the time either.
 I cannot prove it, but I really think you're wrong on that.
 First, the question of 1 bit. Think at this -- speaking of 64 bit size:
 * 99.999% of all uses of unsigned fit under 2^63
 * To benefit from the last bit, you must have the need to store a value 2^63 
 =
 v  2^64
 * Not only this, you must step on a case where /any/ possible value for v
 (depending on execution data) could be = 2^63, but /all/ possible values for 
 v
 are guaranteed  2^64
 This can only be a very small fraction of cases where your value does not fit
 in 63 bits, don't you think. Has it ever happened to you (even in 32 bits)?
 Something like: what a luck! this value would not (always) fit in 31 bits, 
 but
 (due to this constraint), I can be sure it will fit in 32 bits (always,
 whatever input data it depends on).
 In fact, n bits do the job because (1) nearly all unsigned values are very
 small (2) the size used at a time covers the memory range at the same time.
 Upon efficiency, if unsigned is not a subset of signed, then at a low level 
 you
 may be forced to add checks in numerous utility routines, the kind constantly
 used, everywhere one type may play with the other. I'm not sure where the 
 gain is.
 Upon correctness, intuitively I guess (just a wild guess indeed) if unigned
 values form a subset of signed ones programmers will more easily reason
 correctly about them.
 Now, I perfectly understand the sacrifice of one bit sounds like a 
 sacrilege ;-)
 (*)
 Denis
 (*) But you know, when as a young guy you have coded for 8  16-bit machines,
 having 63 or 64...

If you write low level code, it happens all the time.  For example, you can copy
memory areas quickly on some machines by treating them as arrays of long and
copying the values -- which requires the upper bit to be preserved.

Or you compute a 64 bit hash value using an algorithm that is part of some
standard protocol.  Oops -- requires an unsigned 64 bit number, the signed 
version
would produce the wrong result.  And since the standard expects normal behaving
int64's you are stuck -- you'd have to write a little class to simulate unsigned
64 bit math.  E.g. a library that computes md5 sums.

Not to mention all the code that uses 64 bit numbers as bit fields where the
different bits or sets of bits are really subfields of the total range of 
values.

What you are saying is true of high level code that models real life -- if the
value is someone's salary or the number of toasters they are buying from a store
you are probably fine -- but a lot of low level software (ipv4 stacks, video
encoders, databases, etc) are based on designs that require numbers to behave a
certain way, and losing a bit is going to be a pain.

I've run into this with Java, which lacks unsigned types, and once you run into 
a
case that needs that extra bit it gets annoying right quick.

Kevin


Re: is there any way to get a list of classes that inherit a class?

2011-02-13 Thread Kevin Bealer
I don't know if you can find all of them easily but you can find the 
instantiated
ones by adding a line to the Foo constructor as shown here.

Two limits:

1. This doesn't report Bar itself since a Bar object is never created; however 
in
a sense a 'Bar' object was created when Baz and Qux are created.  Since you know
how to get the parent of a type you should be able to fix this if desired.

2. As mentioned you can't get the non-instantiated classes this way -- it only
detects classes as 'new' is called on them.  By the way this wouldn't work in 
C++
because in C++ object identity changes as the successive constructors are called
-- it would just report Foo.

3. Of course you could add a pure virtual function to the class...

testrtti.d:

import std.stdio;

int[string] fooTypes;

class Foo {
this() { fooTypes[this.classinfo.name] = 1; }
};

class Bar : Foo {
};

class Baz : Bar {
};

class Qux : Baz {
};

int main()
{
Foo a = new Foo;
Foo b = new Qux;
Bar f = new Baz;

foreach(key, value; fooTypes) {
writefln(foo subtype: %s, key);
}

return 0;
}

foo subtype: testrtti.Foo
foo subtype: testrtti.Baz
foo subtype: testrtti.Qux

Kevin


Re: tooling quality and some random rant

2011-02-13 Thread Kevin Bealer
 our famous Reddit trolls, that is retard = uriel = eternium = lurker

In case anyone doubts gay's guess... for those who don't follow entertainment
trivia, Alan Smithee is a pseudonym used by directors disowning a film (google
it).  So anyone using this name is actually effectively *claiming* to be a 
imposter.

K


Re: tooling quality and some random rant

2011-02-13 Thread Kevin Bealer
Sorry this was a completely unintentional error --- I meant to say in case 
anyone
doubts Gary's post.  Blame the lateness of the night and/or my annoyingly lossy
wireless keyboard.

Kevin


Re: What's C's biggest mistake?

2010-01-02 Thread Kevin Bealer
bearophile Wrote:

 Walter Bright:
  3. The glaring fact that std::vectorchar and std::string are different 
  suggests something is still wrong.
 
 In an array/vector you want O(1) access time to all items (ignoring RAM-cache 
 access/transfer delays), while in a string with variable-width Unicode 
 encoding that can be hard to do. So they look like two different data 
 structures.
 
 Bye,
 bearophile

Yeah, I think the charset thing was probably the main reason for the 
string/vector split, that and the desire to have special properties like 
conversion from char* that wouldn't be in vector.  Using basic_stringT with 
locales is something of a historical wart, because with Unicode, getting your 
charset from your locale is somewhat obsolete for general purpose computers.  
(Maybe very small profile systems will continue to use ascii or the code page 
of whatever culture buildt them.)

But I don't think C++'s string can be made to index by character unless you use 
wchar_t for the T in basic_stringT.  I don't think string.size() is ever 
anything but a bytes or wchar_t count.

Kevin



Re: What's C's biggest mistake?

2009-12-31 Thread Kevin Bealer
BCS Wrote:

 Hello Walter,
 
  BCS wrote:
  
  I guess my point is that aside from VERY resource limited systems,
  almost no one will have C as their first choice. Even with those
  limited systems I'd bet that most people would rather be working in
  something else if they could. That said, there are many places where
  it ends up being the lingua franca.
  
  I still think you're underestimating C's audience. Consider the Linux
  effort - they can choose any implementation language they want, and
  they choose C. They aren't forced into C.
  
 
 Yes, C has a wide audience (any one who says differently is selling 
 something). 
 But I still thing that their are very few cases where C will be chosen for 
 reasons other than it's purely technical merits. If 
 C++/Java/C#/python/whatever 
 would have done just as good a job in Linux as C, I'd almost bet that Linux 
 wouldn't have been written in C. My point isn't that C is never the right 
 choice (as that is clearly false) but that when C is chosen, it's (almost) 
 always for technical reasons rather than aesthetic ones (where it is merely 
 good enough).

I would say these are the technical merits of C that get it chosen these days:

1. The new code they're writing will be part of a large body of existing C code 
which they don't have time, permission, or inclination to convert to C++.

2. They need to be aware of every tiny low level detail anyway, so having the 
language do too many things for you is not the desired approach (security, 
O/S and embedded work).

3. C has a real ABI on almost every platform; therefore, C is chosen for most 
inter-language work such as writing python modules.

But some people really *are* choosing C for aesthetics.  Linus Torvalds, bless 
his little world dominating heart, chose C for a normal app (git), and he cited 
that the existence of operator overloading in C++ is bad because it hides 
information -- e.g. in the general case you never know what an expression is 
actually doing.

I think this can be seen as mainly an aesthetic choice.  Avoiding a language 
because it *supports* information hiding (which is what I think operator 
overloading is) is not really an 'economic' tradeoff, since you could choose 
not to hide information by not using those features.  He'd just rather not be 
in the vicinity of language features that make those kinds of choices because 
they seem wrong to him (and because he wants to keep C++ies out of his code I 
think.)

Some people want their language to have a WYSIWYG relationship with the 
generated assembler code (if I'm right, it does seem consistent with him being 
an OS developer).

I also know some scientists and mathematicians who use C rather than C++.  I 
think the reason is that by using a simpler language they can know everything 
about the language.  I think the sooner they can 'get the computer science 
stuff out of the way', the sooner they can focus on what they see as the domain 
issues.  (I think once the program gets big enough, the CompSci aspects 
reassert themself and scalability and maintainability issues begin to bite you 
in the rear.)

Kevin



Re: Go rant

2009-12-30 Thread Kevin Bealer
Don Wrote:
 retard wrote:
...
 
 I'd say it's easier. If you watch someone sorting some cards, they'll 
 use either insertion sort or selection sort. Nobody should have ever 
 heard of bubble sort, I'm pleased to hear some schools aren't mentioning 
 it. Such a foolish algorithm.
 
 the bubble sort seems to have nothing to recommend it, except a catchy 
 name  - Knuth.

(Non-software) people doing routine tasks often come up with better algorithms 
intuitively than my intuition expects them to.

I think a lot of people would do even better than insertion with a deck of 
poker cards -- they might group cards by either suit or rank (or rank groups) 
(e.g. Hmm, I'll make three piles of 1-5, 6-10, and J-A), then order the 
buckets, then stick these ordered sets back together.  If you think about it, 
this is a lot like a radix sort or a multi-pivot cousin of the quick sort.

I think when you ask people to do a computer version of how they would sort, 
they do worse than this.  It's so hard to communicate anything complicated to a 
computer that they instinctively dumb it down and do insertion or bubble.  
Insertion is what you would manually use when dealing with adding a few new 
elements to a short stack, whereas something like a bubble might be what you 
would use manually when it's hard to jump around among the elements of the 
sequence (e.g. if you have a bunch of see also references embedded in 
dictionary entries and you need to sort within the see-also 'chain').  That 
approach (kind of) matches the claims often given by computer programmers that 
bubble sort is good for linked lists and/or only fixing a few things that are 
out of place in a mostly sorted list.

When you have a pupil that is as hard to talk to and unmotivated to learn as a 
computer, the first attempt is to try to teach it anything at all and call that 
a success if it works.  A more eager human pupil tries to meet you half way, so 
you naturally respond and take pleasure in teaching them more and more advanced 
stuff.  If you're really hard headed and immune to tedium, you keep trying with 
even a super-dumb pupil like a CPU.  You keep going in for the hard case so 
many times that you and eventually major in CompSci out of habit.

Kevin



Re: Go rant

2009-12-30 Thread Kevin Bealer
dsimcha Wrote:

 == Quote from Kevin Bealer (kevinbea...@gmail.com)'s article
  (Non-software) people doing routine tasks often come up with better 
  algorithms
 intuitively than my intuition expects them to.
  I think a lot of people would do even better than insertion with a deck of 
  poker
 cards -- they might group cards by either suit or rank (or rank groups) (e.g.
 Hmm, I'll make three piles of 1-5, 6-10, and J-A), then order the buckets,
 then stick these ordered sets back together.  If you think about it, this is 
 a lot
 like a radix sort or a multi-pivot cousin of the quick sort.
 
 You mean a bucket sort?  http://en.wikipedia.org/wiki/Bucket_sort

More or less, though I think human beings use algorithms in a more artistic way 
than sticking to any one algorithm.

I'm curious if the multi-pivot quicksort (I think everyone gets what I mean by 
this?  Divide by more than one pivot on each pass?  I can give details if you 
like ...) has been tried out much.  It seems like it must have been, but it 
also seems like something that would have cache-awareness advantages that would 
not show up in the simplified comparison-counting way of thinking about 
efficiency.

Kevin




Re: Go rant

2009-12-30 Thread Kevin Bealer
Rainer Deyke Wrote:

 Kevin Bealer wrote:
  I think a lot of people would do even better than insertion with a
  deck of poker cards -- they might group cards by either suit or rank
  (or rank groups) (e.g. Hmm, I'll make three piles of 1-5, 6-10, and
  J-A), then order the buckets, then stick these ordered sets back
  together.  If you think about it, this is a lot like a radix sort or
  a multi-pivot cousin of the quick sort.
 
 When sorting a deck of cards, insertion sort performs in O(n log n), the
 same as the best-case performance of a quick sort.  When sorting arrays,
 you can find the insertion point in O(log n), but the actual insertion
 takes O(n).  With a deck of cards, you can perform the insertion in
 O(1).  There is absolutely no point in using quick sort for a deck of
 cards.  Only the non-comparison-based sorts (radix sort, bucket sort)
 can do better than insertion sort.
 
 
 -- 
 Rainer Deyke - rain...@eldwood.com

Right, if you mean strictly sorting a physical deck of cards by hand.  
Insertion sort isn't efficient in a computer because you need to find the 
insertion point quickly and insert quickly to keep the efficiency, but no data 
structure can really do both quickly.  For linked lists, insert is quick (O(1)) 
but find is slow (O(n/2)); for arrays, find is quick (assuming you use binary 
search, O(ln n)) but insert is slow (O(n)).  Either choice sticks you with a 
slow operation.

Kevin



Re: algorithms that take ranges by reference

2009-12-30 Thread Kevin Bealer
Andrei Alexandrescu Wrote:

 The grand STL tradition is to _always_ take iterators by value. If 
 iterators are computed, they are returned by the algorithm.
 
 My initial approach to defining std.algorithm was to continue that 
 tradition for ranges: ranges are values. No algorithm currently takes a 
 range by reference. There are a couple of simple functions that 
 emphatically do take ref, namely popFrontN and popBackN in std.range.
 
 It is becoming, however, increasingly clear that there are algorithms 
 that could and should manipulate ranges by reference. They might take 
 and return values, but it's just too messy to do so. (Cue music for the 
 improve built-in tuples choir.)
 
 A concrete case is text processing. Many contemporary libraries use 
 index-based processing, but that's difficult when handling multibyte 
 characters. To address that, bidirectional ranges are one correct way to 
 go. Phobos defines a ByCodeUnit range that spans a string correctly, one 
 dchar at a time. With that, you can write:
 
...
 Andrei

I would vote yes -- I've used this technique (with my own character slice 
classes in C++) and they are a great idiom to work with.

I think ranges (and slices) have some of the properties from each of pointers, 
containers, and streams.  A stream is always a by-ref kind of thing unless you 
are in a language that needs monads etc.

Let me suggest one more function I've found very useful:

bool readUntil(R1, R2, R3)(ref R1 input, R2 delim, ref R3 result)
{
  // Like findSkip, but returning intervening text.
}

Then you can write something like:

bool consumeQuotedString(R1,R2)(ref R1 text, ref R2 quoted)
{
if (skip(text, ')) {
if (! readUntil(text, ', quoted)) {
/*error*/
}
return true;
}
return found;
}

Kevin



Re: What's wrong with D's templates?

2009-12-21 Thread Kevin Bealer
dsimcha Wrote:

 == Quote from Yigal Chripun (yigal...@gmail.com)'s article
  but even more frustrating is the fact that
  template compilation bugs will also happen at the client.
  There's a whole range of designs for this and related issues and IMO the
  C++ design is by far the worst of them all. not to mention the fact that
  it isn't an orthogonal design (like many other features in c++). I'd
  much prefer a true generics design to be separated from compile-time
  execution of code with e.g. CTFE or AST macros, or other designs.
 
 Since generics work by basically casting stuff to Object (possibly boxing it) 
 and
 casting back, I wonder if it would be easy to implement generics on top of
 templates through a minimal wrapper.  The main uses for this would be 
 executable
 bloat (for those that insist that this matters in practice) and allowing 
 virtual
 functions where templates can't be virtual.

In C++ you could define a MyObject and MyRef (smart pointer to Object) types 
that implement the methods you need (like comparisons) as virtual functions 
that are either pure or throw exceptions.  Then just define a non-template 
class that inherits from (or just aggregates and wraps) a vectorMyRef and 
mapMyReft, MyRef.  Now you can use this map and vector code to build whatever 
solution you need, using dynamic_castT to insure the types are what you want.

If you want you can throw a type safe wrapper that uses dynamic_castT around 
this, as long as all the methods of that class are inlined, it should have no 
extra bloat.  Now your back at the Java level of expressiveness.

I wonder, though, if you actually save anything.  Since 
vector::operator[](size_t) is just an array index that will get inlined into 
your code, I think it should have much *less* code bloat in your executable 
than a function call (to virtual MyRef vectorMyRef::operator[](size_t)) plus 
dynamic casts to figure out if all the types are castable.

Movie Poster: Dr. Stroustrouplove: or How I Stopped Worrying and Learned to 
Love the Bloat.

As for mixing template code and client code, is it that big of a deal in 
practice?  If you are making something big enough to be worth patenting, won't 
most of the heavy lifting classes probably not be templates?  Unless you are 
marketing template/container libraries specifically I guess...

Personally I'm reluctant to buy that sort of thing from someone if I *can't* 
see the source.  If I need a class like vector, in practice I need to be able 
to dig into its methods to see what they do, e.g. is it really guaranteed that 
storage is contiguous, etc.  On the other hand, if I was buying code to print a 
Word document to a pdf file, I could accept that I don't need to look into the 
code.  But something like vector or map ends up getting so intimate with 
the rest of my design that I want to know how it works in detail just as an 
end-user.  (Yeah, yeah, I know encapsulation says I shouldn't have to.)

I think performance outweighs the needs of any particular business model, 
especially when you can do your own Object based version if you need to.  I 
agree there should be ways to hide the implementation from the client, but if 
templates aren't one of them, then just factor that into where and how you use 
templates and when you use OO and virtual instead.  It's enough that it's 
possible and practical to hide your impl, you don't need *every* language 
feature to make that its #1 priority.

The performance / impl-hiding conflict is a fundamental problem -- if the 
user's compiler can't see the template method definitions, then it can't 
optimize them very well.  If it can, then the user can too.  Any method of 
compiling them that preserves enough info for the compiler to work with will 
probably be pretty easily and cleanly byte-code-decompilable.

Kevin



Re: What's wrong with D's templates?

2009-12-21 Thread Kevin Bealer
yigal chripun Wrote:

 of the three above I think option 3 is the worst design and option 2 is my 
 favorite design. I think that in reality you'll almost always want to define 
 such an interface and I really can't think of any useful use cases for an 
 unrestricted template parameter as in C++. 

I think there are some.  In C++ using  to extract a type T to an output 
stream is common.  Similarly, you could do something like dividing (x.size() + 
0.0)/x.capacity() to find the ratio of internal fragmentation, assuming most of 
the containers in question have a capacity method.  Another common example is 
creating a type like OptionalT which wraps up a boolean plus a T to allow 
you to have optional values, e.g. the this value is not set concept.

templateclass T class Optional {
   T x;
   bool have;
public:
   Optional(bool have1 = false) : have(have1) {}
   Optional(T  x1) : x(x1), have(true){}
   bool haveValue() { return have; }
   T  get() { assert(have); return x; }
};

Which is useful for passing around lists of configuration options etc.

Essentially, any time you want to leverage a property common to a lot of 
different objects and with the same API across all of them, but don't want to 
(or can't) build that property into an inheritance diagram.  Often in C++ the 
reason you can't or don't add a virtual method (toString) to solve the same 
problem is either that (1) you don't have access to the base class code, or (2) 
it's a template and you can't have virtual templates.

As Bill Clinton said, It depends on what the meaning of IS-A is.

Often a bunch of types have a kind of unrecognized IS-A relationship -- they 
all have some common characteristic that is not recognized as common (by the 
type system) until you come along to write code that deals with that 
characteristic.  (But if the T code is changing it's fragile to do this.)

Kevin



Re: Go rant

2009-12-21 Thread Kevin Bealer
Walter Bright Wrote:

 dsimcha wrote:
  == Quote from Walter Bright (newshou...@digitalmars.com)'s article
  qsort (x:xs) = qsort (filter ( x) xs) ++ [x] ++ qsort (filter (= x) xs)
  prominently featured in Haskell's official introduction page:
  http://www.haskell.org/haskellwiki/Introduction#Quicksort_in_Haskell
  Ain't that cool? But look closer. The advantage of quicksort is that it
  is an *in-place* sort. The Haskell one creates new arrays for every
  state in the sort.
  
  I think this can be optimized out in theory, though I have no idea how 
  often it is
  in practice.
 
 I'm very doubtful it can be optimized out in theory. I'll bet it isn't 
 done in practice.

Ironically, I think you could write a nearly identical approach in D.  By using 
array
slicing, you could get something like Haskell's elegant syntax but still be 
sorting
the array in-place.

  The other fundamental problem with the Haskell
  version is that it selects the first array element as the pivot. This
  is the worst choice, since arrays to be sorted tend to already be mostly
  sorted, and quicksort gives quadratic performance to sort an already
  sorted array with the first element as the pivot.
  
  True.  All you'd need to remedy this, though, is a median of 3 function.
 
 Then it's not looking so simple and elegant anymore :-)



Re: D growing pains (was Re: The Demise of Dynamic Arrays?!)

2009-12-17 Thread Kevin Bealer
Walter Bright Wrote:

 Kevin Bealer wrote:
  To smooth this out, it would help to have the best practices for
  doing common things in D (e.g. serialization, logging) somewhat
  documented for the consumption of non-experts.  I wonder what a good
  way of doing this is?
 
 It's really impossible to predict what the best practices would be. Only 
 time and usage will tell. The best practices for both C and C++ have 
 evolved significantly over time.

I think D2 is so different from D (and in some important ways, from everything 
out there) that a new usage style would have needed to be created even if D was 
already the lingua franca.

The question that I find myself thinking is -- is chaos better or is it better 
to try to establish a particular style of programming (however simply or 
poorly) in the original books and resources that a programmer will find upon 
discovering D?

Chaos has its appeal of course, but others would say that any initial style 
that will stick is better than no style.  In other words, that you cannot 
evolve until you cohere.  It's hard to choose...  On the other hand I suppose 
Phobos is consistent enough with itself that if people follow that they will 
have a launch point.

Kevin



D growing pains (was Re: The Demise of Dynamic Arrays?!)

2009-12-15 Thread Kevin Bealer
retard Wrote:

 Tue, 15 Dec 2009 09:42:26 -0500, Steven Schveighoffer wrote:

 Most likely Walter won't even tell what kind of arrays D2 will have. 
 Anything can happen. The final word is the undocumented executable you 
 can download when the book hits stores. Even then dmd's behavior isn't 
 the final word, it might as well be a bug.

I find following D these days to be interesting but occasionally frustrating.  
It's a little like talking to a teenager who is growing so fast intellectually 
that his opinions change every day.  You can't really argue with him because 
you don't know where he stands on anything if it's been more than a few days 
since the last conversation.

But I think this is clearly a complaint that actually works out to an amazing 
compliment, e.g. that the language is still developing and maturing so much, 
and the changes seem well thought out, too.  Congrats, Walter and Andrei!  (If 
a teenager was already stuck in his ways mentally at 15 yrs old, it would be a 
much worse thing!)

I think the new const regime, thread-local data as the default, easy code 
generation, opDispatch etc etc have enormous power and newness.  For this 
reason I think that the stable set of best practices for working in D2 will 
take a while to settle down unless there is a lot of effort put into designing 
them somehow (if that's even possible).  I find it hard to imagine what the 
average D programming style might look like a few years out in a 'typical work 
place'.

Maybe this settling-out can't happen until a lot of D code is traded among both 
D experts and also more middle level developers.  It seems like the best 
practices and expected usage for C++ and Java are often driven by the 
friction between people who are at different skill levels.  The best (or just 
most common) practices seem to follow the don't surprise me principle, which 
probably doesn't coalesce until people have built up some common sets of 
expectations.  Mid level developers (the rank and file of any language 
community) hate to work with code that is too clever to easily navigate 
through.

I think best practices and style guidelines often tend (intentionally or not) 
to become a bulwark against too much invention and cleverness -- a trade-off of 
brain cycles to read the code versus expressiveness / compactness / performance 
/ perfectionism.  D's feature set has the ability to do enormously clever 
things.  The rank and file maintenance programmer will probably prefer these 
things at least modularized and packed off into controlled sections of the code.

This isn't a criticism, it's more like a house keeping thing, e.g. Thomas 
Edison's friends probably preferred the chemical batteries and various sharp 
items to be stored in the cabinets when they stopped by for tea.  In general, a 
house is better suited for company if the sharp and useful things are put 
away.  Likewise a code-base is probably more suitable for visits by coders 
unfamiliar with it's peculiarities when the use of advanced features is somehow 
kept under control or at least marked as special and surrounded by velvet ropes.

To smooth this out, it would help to have the best practices for doing common 
things in D (e.g. serialization, logging) somewhat documented for the 
consumption of non-experts.  I wonder what a good way of doing this is?

Kevin




private (aka no-inherit) implementations for public methods

2009-12-03 Thread Kevin Bealer
I think I have a solution to some problems that occur in OO design.  The 
problems occurs when a method can be implemented that provides a great 
functionality for a specific class, but which it is not valid for its 
subclasses without extra work.  The classic examples in C++ would be serialize.

void serialize(ostream);

A class can always implement serialize(), but it's children will have to 
override the method with their own definition, which will probably call 
parent::serialize(), then go on to copy any added fields.  If they forget, the 
class does not return the right data (it loses the extra fields).

The solution: even though the method is public, make it possible to mark the 
implementation as non-inherited.  In other words, B will inherit the signature 
for clone() from A, but if the user attempts to link against B::clone() they 
had better implement it -- such a call will not resolve to A::clone().

If B wants to just use A's version they can just call A.serialize() in their 
method body.  It's not a private method, it just requires developer attention 
one way or the other to derive from the class.

I think this would make inheritance in OO design a lot less brittle.  Note that 
constructors (in C++ anyway) already have this property -- they aren't 
inherited by default (except the default constructor).

This is because methods like serialize, toString(), constructors, opCmp, etc 
are 'holistic' in the sense that they address all elements of a class and it is 
probably an error if they miss one.  Other methods like increaseRefCount() or 
getName() are field-wise operations that deal with a few select member 
variables.

Kevin