date:20100323


On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:

Andrei Alexandrescu Wrote:

I'd gladly reconsider E* getNext(), and I like it a lot, but that
doesn't accommodate ranges that want to return rvalues without
storing them (e.g. a range using getchar() as a back-end, and
generally streams that don't correspond to stuff stored in memory).
If it's not in memory, there's no pointer to it.


Second, you *have* to read data into memory.  Even with the ranges as
they currently are, you have to read into memory.  At least this is
less awkward.


I agree. But it's one thing to read and pass along, and a different 
thing to read and keep in a buffer inside the range.



Take for instance a line iterator.  You have to read enough to see
the line terminator, but you most likely do not read *exactly* to the
line terminator, so you just read in chunks until you get a line,
then return the pointer to the data.  It works actually quite
elegantly.


I disagree about the elegance part. If the range arrogates the right to 
use its own buffering, then when you decide you're done with that range 
and try to read some more from the stream, you discover data has been lost.


The Phobos file I/O functions all avoid doing any more buffering than 
the backing FILE* does. They achieve performance by locking the file 
once with flockfile/funlockfile and then using fgetc_unlocked().


This puts me in real trouble with the formatted reading functions (a la 
fscanf but generalized to all input ranges), which I'm gestating about. 
The problem with the current API is that if you call input.front(), it 
will call fgetc(). But then say I decide I'm done with the range, as is 
the case with e.g. reading an integer and stopping at the first 
non-digit. That non-digit character will be lost. So there's a need to 
say, hey, put this guy back because whoever reads after me will need to 
look at it. So I need a putBackFront() or something (which would call 
fungetc()). I wish things were simpler.



Third, the memory could be supplied by the caller.  For instance, if
you wrote the function like this:

E* getNext(E* buf = null);

Then foreach could do something like this:

foreach(e; streamrange)

=>

E _e; while(auto e = streamrange.getNext(&_e))

To avoid heap allocation.  Of course, heap allocation would be the
default if buf is null.

Tango does this sort of trick quite often, and it makes the I/O code
extremely fast.


The problem is that that speed doesn't translate very well to in-memory 
containers. For containers it's preferable to pass null so you get a 
pointer to the actual element; for streams it's preferable to not pass 
null. So it's difficult to write code that works well for both.



Also, another thing to think about is we can generalize the return
type to satisfying the condition:

iff range is empty then cast(bool)range.getNext == false.

This means as long as your range cannot return a null element for a
non-empty return, it is OK not to use a pointer.  For example, the
line iterator again... it can be written like:

const(char)[] getNext()

because you will only ever return a null const(char)[] when there is
no data left.


I see, but if I'm looking for ints? I'll have to return a pointer - or a 
nullable or something.



I don't think we should give up on trying to make a stream range that
is not awkward, I really dislike the way today's input ranges map to
streams.


Me too. Let's keep on looking, I have the feeling something good is 
right behind the corner. But then I felt that way for a year :o).



Andrei

Re: Ranges and/versus iterators


On 03/23/2010 09:12 PM, Steven Schveighoffer wrote:

Andrei Alexandrescu Wrote:


On 03/23/2010 03:46 PM, Steven Schveighoffer wrote:

A while back, you identified one of the best interfaces for input ranges:

E* getNext();

Which allows for null returns when no data is left. The drawback is that
E must be either referenced or allocated on the heap (providing storage
to the function is an option). But the killer issue was that safeD would
not allow it. However, in recent times, you have hinted that safeD may
allow pointers, but disallow bad pointer operations. In light of this,
can we reconsider this interface, or other alternatives using pointers?

I've always felt that if we were to define ranges for streams in a
non-awkward way, we would need an "all in one" operation, since not only
does getting data from the range move the range, but checking for empty
might also move the range (empty on a stream means you tried to read and
got nothing).


I'd gladly reconsider E* getNext(), and I like it a lot, but that
doesn't accommodate ranges that want to return rvalues without storing
them (e.g. a range using getchar() as a back-end, and generally streams
that don't correspond to stuff stored in memory). If it's not in memory,
there's no pointer to it.


First, a range backed by getchar is about as useful as functional qsort ;)


Actually I need one. Think fscanf, i.e. unformat() for streams.

Andrei

Re: Go updates


On 24/03/2010 03:02, bearophile wrote:

BLS:

So Go is just a pascalized C. who cares.


:-) view of point your with agree fully don't I

Bye,
bearophile


Good point !!

Re: Ranges and/versus iterators

Andrei Alexandrescu Wrote:

> On 03/23/2010 03:46 PM, Steven Schveighoffer wrote:
> > A while back, you identified one of the best interfaces for input ranges:
> >
> > E* getNext();
> >
> > Which allows for null returns when no data is left. The drawback is that
> > E must be either referenced or allocated on the heap (providing storage
> > to the function is an option). But the killer issue was that safeD would
> > not allow it. However, in recent times, you have hinted that safeD may
> > allow pointers, but disallow bad pointer operations. In light of this,
> > can we reconsider this interface, or other alternatives using pointers?
> >
> > I've always felt that if we were to define ranges for streams in a
> > non-awkward way, we would need an "all in one" operation, since not only
> > does getting data from the range move the range, but checking for empty
> > might also move the range (empty on a stream means you tried to read and
> > got nothing).
> 
> I'd gladly reconsider E* getNext(), and I like it a lot, but that 
> doesn't accommodate ranges that want to return rvalues without storing 
> them (e.g. a range using getchar() as a back-end, and generally streams 
> that don't correspond to stuff stored in memory). If it's not in memory, 
> there's no pointer to it.

First, a range backed by getchar is about as useful as functional qsort ;)

Second, you *have* to read data into memory.  Even with the ranges as they 
currently are, you have to read into memory.  At least this is less awkward.

Take for instance a line iterator.  You have to read enough to see the line 
terminator, but you most likely do not read *exactly* to the line terminator, 
so you just read in chunks until you get a line, then return the pointer to the 
data.  It works actually quite elegantly.

Third, the memory could be supplied by the caller.  For instance, if you wrote 
the function like this:

E* getNext(E* buf = null);

Then foreach could do something like this:

foreach(e; streamrange)

=>

E _e;
while(auto e = streamrange.getNext(&_e))

To avoid heap allocation.  Of course, heap allocation would be the default if 
buf is null.

Tango does this sort of trick quite often, and it makes the I/O code extremely 
fast.

Also, another thing to think about is we can generalize the return type to 
satisfying the condition:

iff range is empty then cast(bool)range.getNext == false.

This means as long as your range cannot return a null element for a non-empty 
return, it is OK not to use a pointer.  For example, the line iterator again... 
it can be written like:

const(char)[] getNext()

because you will only ever return a null const(char)[] when there is no data 
left.

I don't think we should give up on trying to make a stream range that is not 
awkward, I really dislike the way today's input ranges map to streams.

-Steve

Re: Go updates


On 24/03/2010 03:02, bearophile wrote:

BLS:

So Go is just a pascalized C. who cares.


:-) view of point your with agree fully don't I

Bye,
bearophile

at least the Arabian guys should like it :) Bjoern

Re: Go updates

BLS:
> So Go is just a pascalized C. who cares.

:-) view of point your with agree fully don't I

Bye,
bearophile

Re: Go updates


On 24/03/2010 02:39, bearophile wrote:

Thanks to being backed by Google Go seems to improve:
http://blog.golang.org/2010/03/go-whats-new-in-march-2010.html


Go also now natively supports complex numbers.<


While D2 will unsupport them, because D2 is probably flexible enough to not 
need to keep them as built-ins :-)



The syntax x[lo:] is now shorthand for x[lo:len(x)].<


That's identical to the Python syntax. But the D version x[lo .. $] is 
acceptable.

But there's a len() my dlibs too. It helps me avoid to write "length" all the 
time and avoids my typos, and it can be used as delegate too:
map(&len, arr);

This Go syntax is cute:
Pointer to int: *int
Array of ints: []int
Array of pointer to ints: []*int
Pointer to array of ints: *[]int

In D it becomes:
Pointer to int: int*
Array of ints: int[]
Array of pointer to ints: int*[]
Pointer to array of ints: int[]*

Here I think I like the Go version better :-(

Bye,
bearophile


D vs Go

I do not agree
If we read D from RIGHT to LEFT like
Pointer to array of ints:
int[]*

than we have
*   //pointer to
[]  // array  of
int

in Go
From LEFT to RIGHT
*
[]
int

So Go is just a pascalized C. who cares.

Go updates

Thanks to being backed by Google Go seems to improve:
http://blog.golang.org/2010/03/go-whats-new-in-march-2010.html

>Go also now natively supports complex numbers.<

While D2 will unsupport them, because D2 is probably flexible enough to not 
need to keep them as built-ins :-)


>The syntax x[lo:] is now shorthand for x[lo:len(x)].<

That's identical to the Python syntax. But the D version x[lo .. $] is 
acceptable.

But there's a len() my dlibs too. It helps me avoid to write "length" all the 
time and avoids my typos, and it can be used as delegate too:
map(&len, arr);

This Go syntax is cute:
Pointer to int: *int
Array of ints: []int
Array of pointer to ints: []*int
Pointer to array of ints: *[]int

In D it becomes:
Pointer to int: int*
Array of ints: int[]
Array of pointer to ints: int*[]
Pointer to array of ints: int[]*

Here I think I like the Go version better :-(

Bye,
bearophile

Re: D2 std.container ? container update events..


On 24/03/2010 01:02, Andrei Alexandrescu wrote:

On 03/23/2010 06:10 PM, BLS wrote:

On 24/03/2010 00:04, Andrei Alexandrescu wrote:

On 03/23/2010 05:58 PM, BLS wrote:

Now How is it..
We had a long and interesting discussion about ranges vs java/C
whatever
style(d) iterators. ... to bring in an idea.. what about
implementing update events for collections. IMHO this is a very
reasonable feature.


I think it's a great feature to be implemented on top of basic
containers.

Andrei


--On top of-- I guess it means inheritance Or do you think about
template mixins ..pattern wise ?


Probably composition should suffice.

Andrei


Thanks Andrei. ahem   composition is a pattern   ok, guess I have to 
wait for the sources.

Re: D2 std.container ? container update events..


On 03/23/2010 06:10 PM, BLS wrote:

On 24/03/2010 00:04, Andrei Alexandrescu wrote:

On 03/23/2010 05:58 PM, BLS wrote:

Now How is it..
We had a long and interesting discussion about ranges vs java/C whatever
style(d) iterators. ... to bring in an idea.. what about
implementing update events for collections. IMHO this is a very
reasonable feature.


I think it's a great feature to be implemented on top of basic
containers.

Andrei


--On top of-- I guess it means inheritance Or do you think about
template mixins ..pattern wise ?


Probably composition should suffice.

Andrei

Re: D2 std.container ? container update events..


On 24/03/2010 00:25, Robert Clipsham wrote:

On 23/03/10 22:58, BLS wrote:

D celebrates meanwhile it s 10? birthday and phobos still lacks an
collection lib. Ouch


Something I eagerly await :) I'm currently using tango's containers when
I need them, porting them to D2... Fortunately they're all self
contained so I didn't need to port the rest of tango to use them.



The latest release of Tango was indeed a step forward.
(I never understand what fixed sized vectors in 0.97 are good for.)

However, I am convinced that Andrei and Steven (Sometimes together maybe 
sometimes not ...but finally as a win) are brewing an outstanding 
collection library.


Once we have this lib. enhancing Phobos, as well as porting C-ish 
software will be much easier. my 2 euro cents

Re: Implicit enum conversions are a stupid PITA

Nick Sabalausky:
> It still bugs the hell out of me even today, but I've largely shut up about 
> it 
> since Walter hasn't wanted to change it even though he seems to be the only 
> one who doesn't feel it's a bad idea (and it's not like it causes practical 
> problems when actually using the language...although I'm sure it must be a 
> big WTF for new and prospective D users).

Recently D2 has introduced the name "inout", that doesn't seem very linked to 
its semantic purpose. I think "auto_const",  "auto const" or "autoconst" are 
better.
The recently introduced "auto ref" is clear, but I think "auto_ref" or 
"autoref"are better still.

Bye,
bearophile

Re: Associative Arrays need cleanout method or property to help


Michael Rynn wrote:

I do not think this is true of current builtin AA,


Correct, it is not.

and therefore not true 
of this enhanced implementation. I note the current builtin _aaGet will 
locate a new node by pointer, maybe do a rehash, and then return the node 
it previously relocated.  Relax, this AA has no fear of nodes moving 
around on rehash. They are just relinked. All done with pointers. 


I was looking at:

http://www.dsource.org/projects/aa/browser/trunk/randAA/RandAA.d

which contains the data structures:

  K* _keys;
  V* vals;
  ubyte* flags;

i.e. 3 parallel arrays. I apologize since you are apparently not referring to 
that, but this:



http://www.dsource.org/projects/aa/browser/trunk/druntime/aaA.d

but that uses the data structure:

  aaNode *left_;
  aaNode *right_;
  static if (keyBig)
   hash_t hash_;

so I don't see where the linear congruent random probe thingy is . You 
*could* get rid of the left and right pointers and use linear congruent probing, 
and that might be worth a try.

Re: D2 std.container ? container update events..

2010-03-23 Thread Robert Clipsham


On 23/03/10 22:58, BLS wrote:

D celebrates meanwhile it s 10? birthday and phobos still lacks an
collection lib. Ouch


Something I eagerly await :) I'm currently using tango's containers when 
I need them, porting them to D2... Fortunately they're all self 
contained so I didn't need to port the rest of tango to use them.

Re: Implicit enum conversions are a stupid PITA

2010-03-23 Thread Nick Sabalausky

"yigal chripun"  wrote in message 
news:hobg4b$12e...@digitalmars.com...
>
> This also interacts with the crude hack of "this enum is actually a 
> constant".
> if you remove the implicit casts than how would you be able to do:
> void foo(int p);
> enum { bar = 4 }; // don't remember the exact syntax here
> foo(bar); // compile-error?!
>

AIUI, That style enum is already considered different by the compiler 
anyway. Specifically, it's doesn't create any new type, whereas the other 
type of enum creates a new semi-weak type. I don't think it would be too big 
of a step to go one step further and change "this kind of enum creates a new 
semi-weak type" to "this kind of enum creates a new strong type". But yea, I 
absolutely agree that calling a manifest constant an "enum" is absurd. It 
still bugs the hell out of me even today, but I've largely shut up about it 
since Walter hasn't wanted to change it even though he seems to be the only 
one who doesn't feel it's a bad idea (and it's not like it causes practical 
problems when actually using the language...although I'm sure it must be a 
big WTF for new and prospective D users).


> I feel that enum needs to be re-designed. I think that C style "enums are 
> numbers" are *bad*, *wrong* designs that expose internal implementation 
> and the only valid design is that of Java 5.
>
> e.g.
> enum Color {blue, green}
> Color c = Color.blue;
> c++; // WTF?  should NOT compile
>
> A C style enum with values assigned is *not* an enumeration but rather a 
> set of meaningful integral values and should be represented as such.
>
> This was brought up many many times in the NG before and based on past 
> occurences will most likely never change.

I would hate to see enums lose the concept of *having* a base type and base 
values because I do find that to be extremely useful (Haxe's enums don't 
have a base type and, from direct experience with them, I've found that to 
be a PITA too). But I feel very strongly that conversions both to and from 
the base type need to be explicit. In fact, that was one of the things that 
was bugging me about C/C++ even before I came across D. D improves the 
situation of course, but it's still only half-way.

Re: D2 std.container ? container update events..


On 24/03/2010 00:04, Andrei Alexandrescu wrote:

On 03/23/2010 05:58 PM, BLS wrote:

Now How is it..
We had a long and interesting discussion about ranges vs java/C whatever
style(d) iterators. ... to bring in an idea.. what about
implementing update events for collections. IMHO this is a very
reasonable feature.


I think it's a great feature to be implemented on top of basic containers.

Andrei


--On top of-- I guess it means inheritance  Or do you think about 
template mixins ..pattern wise ?

Re: storing the hash multiplier instead of the hash value



On 23-mar-10, at 23:07, Walter Bright wrote:


Fawzi Mohamed wrote:

On 23-mar-10, at 19:04, Andrei Alexandrescu wrote:
What I'm pushing for as of now is to move the associative array  
definition from opacity into templated goodies in object_.d.
that would be nice, that is one the main reasons Steven  
implementation is faster.
It would be nice if this would be done by the compiler as rewriting  
the calls as call to "normal" templates, i.e. to a specially named  
templated struct (AArray for example) so that (syntactic sugar for  
the name aside) that would be the same as a library implementation.

This would have two advantages:
- easy to replace the library implementation, as it would be even  
less special
- easy to replace the usage in one piece of code with another  
implementation (well truth to be told that is rather easy also now)


D2 already allows this.


ok I did not know that was possible.

Re: D2 std.container ? container update events..


On 03/23/2010 05:58 PM, BLS wrote:

Now How is it..
We had a long and interesting discussion about ranges vs java/C whatever
style(d) iterators. ... to bring in an idea.. what about
implementing update events for collections. IMHO this is a very
reasonable feature.


I think it's a great feature to be implemented on top of basic containers.

Andrei

Re: Ranges and/versus iterators


On 03/23/2010 05:41 PM, Fawzi Mohamed wrote:


On 23-mar-10, at 21:34, Andrei Alexandrescu wrote:


[...]
We've discussed this extensively, and I lost sleep over this simple
matter more than once. The main problem with bool popFront(ref E) is
that it doesn't work meaningfully for containers that expose
references to their elements.


yes I agree that it is a difficult problem, the single function works
well in the basic iterator case, but does not generalize well to
modifiable values.
In most cases I resorted to returning pointers. The templates that
generate opApply (still D1.0 ;) from that is smart enough to remove the
pointer when possible as the ref already gives that.
Still not perfect, as always there are tradeoffs...


The interface with front() leaves it to the range to return E or ref E.

An alternative is this:

bool empty();
ref E getNext(); // ref or no ref

I'm thinking seriously of defining input ranges that way. The
underlying notion is that you always move forward - getting an element
is simultaneous with moving to the next.


already better (for basic iterators), but still not reentrant, if
another thread executes between empty and getNext...


It can't :o).


anyway any choice has some drawbacks... I like bool next(ref T) because
it works well also for streams... and somehow (using T* or not depending
on the type) can accommodate all iteration needs.
Not perfectly nice, but workable.


next(ref T) works well _only_ on streams. It works badly on containers.


Andrei

Re: storing the hash multiplier instead of the hash value



On 23-mar-10, at 23:02, Walter Bright wrote:


Fawzi Mohamed wrote:
I think that the public interface should be exposed (or re-exposed)  
somewhere to the outside so that one can easily create efficient  
hashes for user defined types.
For example it is nice to be able to chain hash functions  
(something that the default one does not allow).


Just overload the toHash() function for your user-defined type to be  
whatever you want it to be.


I know, maybe I have not expressed myself clearly, but this overridden  
function has to be written.
For objects combining various pieces, one has to create a unique hash  
from various pieces.
The functions I have defined in hash.d help in doing that is such a  
way that changing a bit anywhere most likely changes the whole hash.

Re: Implicit enum conversions are a stupid PITA

2010-03-23 Thread yigal chripun

bearophile Wrote:

> yigal chripun:
> > This was brought up many many times in the NG before and based on past 
> > occurences will most likely never change.
> 
> If I see some semantic holes I'd like to see them filled/fixed, when 
> possible. Keeping the muzzle doesn't improve the situation :-)
> 
> Bye,
> bearophile

I agree with you about the gaping semantic hole. All I'm saying is that after 
bringing this so many times to discussion before I lost hope that this design 
choice will ever be re-considered.

D2 std.container ? container update events..


Now How is it..
We had a long and interesting discussion about ranges vs java/C whatever 
style(d) iterators.  ... to bring in an idea.. what about
implementing update events for collections. IMHO this is a very 
reasonable feature.


(I am not talking about non-modifying events likewise cursor/snapshot 
creation-events or cursor moving-events.. Interesting though )


sample: database update events will force to pull data when needed and 
not just in case.

Indeed this is a high end feature, so what about
create auto structures ..say.. backup structures on events..

Our container(T) implementation can be based on a linked list 
implementing a forward range and will feed our UnDo container(T) 
(based i.e. on remove events)


It would be very interesting to see how all these necessary D2 
components : ranges, std.algo, collection/container, aliases and 
delegates will come together.


...I think our node like stuff (the core data structure) should be 
implemented as structure, our container as class-interface pair where 
the interface describes at least the range. I agree with Steven and 
Andrei, hierarchic containers are over-estimated.


Bjoern
PS
D celebrates meanwhile it s 10? birthday and phobos still lacks an 
collection lib. Ouch

Re: Implicit enum conversions are a stupid PITA

2010-03-23 Thread yigal chripun

yigal chripun Wrote:

> A C style enum with values assigned is *not* an enumeration but rather a set 
> of meaningful integral values and should be represented as such.
> 

The above isn't accurate. I'll re-phrase:
The values assigned to the members of the enums are just properties of the 
members, they do not define their identity. 
void bar(int);
bar(Color.Red.rgb); // no-problem
bar(Color.Red); // compile-error

Re: Ranges and/versus iterators



On 23-mar-10, at 21:51, Andrei Alexandrescu wrote:


On 03/23/2010 03:46 PM, Steven Schveighoffer wrote:

On Tue, 23 Mar 2010 16:34:24 -0400, Andrei Alexandrescu
 wrote:

[...]
A while back, you identified one of the best interfaces for input  
ranges:


E* getNext();

Which allows for null returns when no data is left. The drawback is  
that
E must be either referenced or allocated on the heap (providing  
storage
to the function is an option). But the killer issue was that safeD  
would
not allow it. However, in recent times, you have hinted that safeD  
may
allow pointers, but disallow bad pointer operations. In light of  
this,
can we reconsider this interface, or other alternatives using  
pointers?


I've always felt that if we were to define ranges for streams in a
non-awkward way, we would need an "all in one" operation, since not  
only
does getting data from the range move the range, but checking for  
empty
might also move the range (empty on a stream means you tried to  
read and

got nothing).


yes that also makes filters/combiners really nice to write.
the basic thing is that you have to return two things, 1. if there is  
more and 2. if yes the element.


I'd gladly reconsider E* getNext(), and I like it a lot, but that  
doesn't accommodate ranges that want to return rvalues without  
storing them (e.g. a range using getchar() as a back-end, and  
generally streams that don't correspond to stuff stored in memory).  
If it's not in memory, there's no pointer to it.


E* getNext would probably also be workable, at the cost of storing one  
element. But then as andrei correctly points out one still cannot  
expect the pointer to be valid after one iteration, so as soon as you  
don't have memory storage you loose thread safety...


Now reentrancy/thread safety is not necessarily the most important  
thing for iterators, but I like that my queues, sources of work can  
have the same interface.

Re: Implicit enum conversions are a stupid PITA

yigal chripun:
> This was brought up many many times in the NG before and based on past 
> occurences will most likely never change.

If I see some semantic holes I'd like to see them filled/fixed, when possible. 
Keeping the muzzle doesn't improve the situation :-)

Bye,
bearophile

Re: Implicit enum conversions are a stupid PITA

2010-03-23 Thread yigal chripun

Nick Sabalausky Wrote:

> I'm bringing this over here from a couple separate threads over on "D.learn" 
> (My "D1: Overloading across modules" and bearophile's "Enum equality test").
> 
> Background summary:
> 
> bearophile:
> > I'm looking for D2 rough edges. I've found that this D2 code
> > compiles and doesn't assert at runtime:
> >
> > enum Foo { V1 = 10 }
> > void main() {
> >  assert(Foo.V1 == 10);
> > }
> >
> > But I think enums and integers are not the same type,
> > and I don't want to see D code that hard-codes comparisons
> > between enum instances and number literals, so I think an
> > equal between an enum and an int has to require a cast:
> >
> > assert(cast(int)(Foo.V1) == 10); // OK
> 
> He goes on to mention C++0x's "enum class" that, smartly, gets rid of that 
> implicit conversion nonsense.
> 
> To put it simply, I agree with this even on mere principle. I'm convinced 
> that the current D behavior is a blatant violation of strong-typing and 
> smacks way too much of C's so-called "type system".
> 
> But here's another reason to get rid it that I, quite coincidentally, 
> stumbled upon right about the same time:
> 
> Me:
> > In D1, is there any reason I should be getting an error on this?:
> >
> > // module A:
> > enum FooA { fooA };
> > void bar(FooA x) {}
> >
> > // module B:
> > import A;
> > enum FooB { fooB };
> > void bar(FooB x) {}
> >
> > bar(FooB.fooB); // Error: A.bar conflicts with B.bar (WTF?)
> 
> In the resulting discussion (which included a really hackish workaround), it 
> was said that this is because of a rule (that I assume exists in D2 as well) 
> that basically goes "two functions from different modules are in conflict if 
> they have the same name." I assume (and very much hope) that the rule also 
> has a qualification "...but only if implicit conversion rules make it 
> possible for one to hijack the other".
> 
> It was said that this is to prevent a function call from getting hijacked by 
> merely importing a module (or making a change in an imported module). That I 
> can completely agree with. But I couldn't understand why this would cause 
> conflicts involving enums until I thought about implicit enum-to-base-type 
> conversion and came up with this scenario:
> 
> // Module Foo:
> enum Foo { foo }
> 
> // module A:
> import Foo;
> void bar(Foo x){}
> 
> // module B version 1:
> import Foo; // Note: A is not imported yet
> void bar(int x){}
> bar(Foo.foo); // Stupid crap that should never be allowed in the first place
> 
> // module B version 2:
> import Foo;
> import A; // <- This line added
> void bar(int x){}
> bar(Foo.foo); // Now that conflict error *cough* "helps".
> 
> So thanks to the useless and dangerous ability to implicitly convert an enum 
> to its base type, we can't have certain perfectly sensible cross-module 
> overloads.
> 
> Although, frankly, I *still* don't see why "bar(SomeEnum)" and 
> "bar(SomeOtherEnum)" should ever be in conflict (unless that's only D1, or 
> if implicit base-type-to-enum conversions are allowed (which would make 
> things even worse)).
> 
> 

This also interacts with the crude hack of "this enum is actually a constant". 
if you remove the implicit casts than how would you be able to do:
void foo(int p); 
enum { bar = 4 }; // don't remember the exact syntax here
foo(bar); // compile-error?!

I feel that enum needs to be re-designed. I think that C style "enums are 
numbers" are *bad*, *wrong* designs that expose internal implementation and the 
only valid design is that of Java 5.

e.g.
enum Color {blue, green}
Color c = Color.blue;
c++; // WTF?  should NOT compile

A C style enum with values assigned is *not* an enumeration but rather a set of 
meaningful integral values and should be represented as such.

This was brought up many many times in the NG before and based on past 
occurences will most likely never change.

Re: Ranges and/versus iterators



On 23-mar-10, at 21:34, Andrei Alexandrescu wrote:


[...]
We've discussed this extensively, and I lost sleep over this simple  
matter more than once. The main problem with bool popFront(ref E) is  
that it doesn't work meaningfully for containers that expose  
references to their elements.


yes I agree that it is a difficult problem, the single function works  
well in the basic iterator case, but does not generalize well to  
modifiable values.
In most cases I resorted to returning pointers. The templates that  
generate opApply (still D1.0 ;) from that is smart enough to remove  
the pointer when possible as the ref already gives that.

Still not perfect, as always there are tradeoffs...

The interface with front() leaves it to the range to return E or ref  
E.


An alternative is this:

bool empty();
ref E getNext(); // ref or no ref

I'm thinking seriously of defining input ranges that way. The  
underlying notion is that you always move forward - getting an  
element is simultaneous with moving to the next.


already better (for basic iterators), but still not reentrant, if  
another thread executes between empty and getNext...


anyway any choice has some drawbacks... I like bool next(ref T)  
because it works well also for streams... and somehow (using T* or not  
depending on the type) can accommodate all iteration needs.

Not perfectly nice, but workable.

Fawzi

Re: Implicit enum conversions are a stupid PITA

2010-03-23 Thread Marianne Gagnon

Nick Sabalausky Wrote:

> I'm bringing this over here from a couple separate threads over on "D.learn" 
> (My "D1: Overloading across modules" and bearophile's "Enum equality test").
> 
> Background summary:
> 
> bearophile:
> > I'm looking for D2 rough edges. I've found that this D2 code
> > compiles and doesn't assert at runtime:
> >
> > enum Foo { V1 = 10 }
> > void main() {
> >  assert(Foo.V1 == 10);
> > }
> >
> > But I think enums and integers are not the same type,
> > and I don't want to see D code that hard-codes comparisons
> > between enum instances and number literals, so I think an
> > equal between an enum and an int has to require a cast:
> >
> > assert(cast(int)(Foo.V1) == 10); // OK
> 
> He goes on to mention C++0x's "enum class" that, smartly, gets rid of that 
> implicit conversion nonsense.
> 
> To put it simply, I agree with this even on mere principle. I'm convinced 
> that the current D behavior is a blatant violation of strong-typing and 
> smacks way too much of C's so-called "type system".
> 
> But here's another reason to get rid it that I, quite coincidentally, 
> stumbled upon right about the same time:
> 
> Me:
> > In D1, is there any reason I should be getting an error on this?:
> >
> > // module A:
> > enum FooA { fooA };
> > void bar(FooA x) {}
> >
> > // module B:
> > import A;
> > enum FooB { fooB };
> > void bar(FooB x) {}
> >
> > bar(FooB.fooB); // Error: A.bar conflicts with B.bar (WTF?)
> 
> In the resulting discussion (which included a really hackish workaround), it 
> was said that this is because of a rule (that I assume exists in D2 as well) 
> that basically goes "two functions from different modules are in conflict if 
> they have the same name." I assume (and very much hope) that the rule also 
> has a qualification "...but only if implicit conversion rules make it 
> possible for one to hijack the other".
> 
> It was said that this is to prevent a function call from getting hijacked by 
> merely importing a module (or making a change in an imported module). That I 
> can completely agree with. But I couldn't understand why this would cause 
> conflicts involving enums until I thought about implicit enum-to-base-type 
> conversion and came up with this scenario:
> 
> // Module Foo:
> enum Foo { foo }
> 
> // module A:
> import Foo;
> void bar(Foo x){}
> 
> // module B version 1:
> import Foo; // Note: A is not imported yet
> void bar(int x){}
> bar(Foo.foo); // Stupid crap that should never be allowed in the first place
> 
> // module B version 2:
> import Foo;
> import A; // <- This line added
> void bar(int x){}
> bar(Foo.foo); // Now that conflict error *cough* "helps".
> 
> So thanks to the useless and dangerous ability to implicitly convert an enum 
> to its base type, we can't have certain perfectly sensible cross-module 
> overloads.
> 
> Although, frankly, I *still* don't see why "bar(SomeEnum)" and 
> "bar(SomeOtherEnum)" should ever be in conflict (unless that's only D1, or 
> if implicit base-type-to-enum conversions are allowed (which would make 
> things even worse)).
> 
> 

Hum...

+1 

What can I add, you said it all ;)

Re: Summary on unit testing situation



On 23-mar-10, at 20:29, bearophile wrote:


Pelle M.:


I'm not sure I understand, could you explain?<


That was my best explanation, sorry.


I am not experienced with unittest frameworks, and would like to  
understand what the D system lacks.<


I think two times in the past I have written a list of those lacking  
things. To give a good answer to your question I have to write a  
lot, and it's not nice to write a lot when the words get ignored. So  
first devs have to agree that a problem exists, then later we can  
design things to improve the situation. Otherwise it's just a waste  
of my energy, like trying to talk in vacuum.


actually there are some hooks in tango (and I believe similarly in  
phobos) to do what you want


the module info contains the unittests and you can replace the default  
for example the unittester of tango looks like this


import all modules to be tested
import tango.io.Stdout;
import tango.core.Runtime;
import tango.core.stacktrace.TraceExceptions;

bool tangoUnitTester()
{
uint countFailed = 0;
uint countTotal = 1;
Stdout ("NOTE: This is still fairly rudimentary, and will only  
report the").newline;

Stdout ("first error per module.").newline;
foreach ( m; ModuleInfo )  // _moduleinfo_array )
{
if ( m.unitTest) {
Stdout.format ("{}. Executing unittests in '{}' ",  
countTotal, m.name).flush;

countTotal++;
try {
   m.unitTest();
}
catch (Exception e) {
countFailed++;
Stdout(" - Unittest failed.").newline;
e.writeOut(delegate void(char[]s){ Stdout(s); });
continue;
}
Stdout(" - Success.").newline;
}
}

Stdout.format ("{} out of {} tests failed.", countFailed,  
countTotal - 1).newline;

return true;
}

static this() {
Runtime.moduleUnitTester( &tangoUnitTester );
}

void main() {}

one can do something fancier if he wants.
To really have all test one would need to have an array (or iterator)  
in the module information instead of a single global unittest  
function. Alternatively one could pass some flags to the unittest  
function to control its execution.


Unit testing has to continue when tests fail. All code must be  
testable, compile-time code too. You need a way to assert that  
things go wrong too, like exceptions, asserts, compile-time asserts,  
etc when they are designed to. It's good to have a way to give a  
name to tests. And unit test systems enjoy some reflection to  
organize themselves, to attach tests to code automatically. During  
development you want to test only parts of the code, not the whole  
program. Unit testing OOP code has other needs, because in a test  
you may need to break data hiding of classes and structs. If you  
unit test hundred of classes you soon find the necessity of  
something to help creation of fake testing objects. You need some  
tools for creating mock test objects (objects that simulate external  
resources). You need a help to perform performance tests, to print  
reports of the testing. You need layers of testing, slow tests and  
quick tests that you can run every few minutes or seconds of  
programming.!
 Generally the more the unit test system does automatically the  
better it is, because you want to write and use unit tests in the  
most fast way possible. Those things are useful, but putting most of  
those things inside a compiler is not a good idea.


I think that what you want is beyond normal requests, executing all  
tests, tests of one module, a single test, yes that should be  
relatively simple.
More complex test series/combination are probably better served by a  
specialized regression tester.


Actually I use a specialized tester that is parallel, and whose basic  
testing building block is a testing function in which the arguments  
for it are generated automatically (derived types have to implement  
generating functions).
This is a somewhat different way to look at tests that the usual one  
(inspired from haskell's QuickCheck), but one that I prefer.
In the end the power is the same, instead of fixtures to prepare a  
test environment you can define a derived type whose generating  
function do the fixtures, and then have the tests as function having  
that type as argument.


Test suites I normally organize like the package structure.

What I have is something like that would also be useful in tango/ 
phobos is a pre written main like function, so that one can easily  
create test suites for pieces of code.

I have for example
int mainTestFun(char[][] argStr,SingleRTest testSuite)
which can be used to create a unittester that recognizes flags to  
initialize it, perform subtests,...


Fawzi

Re: Associative Arrays need cleanout method or property to help

2010-03-23 Thread Michael Rynn

On Tue, 23 Mar 2010 13:41:56 -0700, Walter Bright wrote:

> Michael Rynn wrote:
>> On Mon, 22 Mar 2010 00:52:24 -0700, Walter Bright wrote:
>> 
>>> bearophile wrote:
 A way to know the usage patterns of D AAs is to add instrumentation
 in nonrelease mode, they can save few statistical data once in a
 while in the install directory of D, and then the user can give this
 little txt file to Walter to tune the language :-)
>>> There's no need to tune the language. The implementation and data
>>> structure of the AAs is completely opaque to the language. The
>>> implementation is in aaA.d. Feel free to try different implementations
>>> in it!
>> 
>> Well, I have.
>> 
>> See the code here : http://www.dsource.org/projects/aa/browser/trunk/
>> druntime/aaA.d
>> 
>> See the benchmarks and comments here :
>> http://www.dsource.org/projects/aa/ wiki/DrunTimeAA.
>> 
>> The result squeezes some extra performance for integer or less sized
>> keys (about 20% faster for lookups).
>> 
>> Insertions and cleanups are faster too.
> 
> It's a nice implementation. There are some problems with it, though:
> 
> 1. Deleted entries cannot be returned to the garbage collector pool
> unless the entire hash table is removed.
> 
> 2. All the keys are stored in a single array, as are all the values.
> This requires that a contiguous chunk of memory can be found for the
> arrays when rehashing to a larger hashmap. This could make it very
> difficult for this to work when hash tables grow to be of significant
> size relative to all available memory. Note that the current
> implementation has a bucket array that has as many entries as the number
> of elements in the hashtable, but the element size is only a pointer,
> not the arbitrary size of a key or value.
> 

   The file aaA.d is not using the randomAA implementation. Its your very 
own original AA with modifications (Added mag wheels, turbo charged 
memory, specializations). 

So assertion 2 is not quite true, as this is still the same underlying 
algorithm as for the original AA.  I actually benchmarked and tried to 
improve the random AA a fair chance out of politeness, for comparison 
interest, to demonstrate that its performance was not as good as the 
others, and to demonstrate the builtin AA had further potential. I also 
compared the tango hashmap, which is very good. 

 Your hashtree approach is to have a separate node is allocated for each 
Key-Value association.  So all the keys and values are not stored 
algorithmically in a single array, and do not actually require a 
contiguous large chunk of memory. There is however a intermediate heap 
between the AA and the garbage collector, which allocates 4K blocks of 
memory, and obviously, is unlikely to free any of those blocks until all 
nodes within are released from use, unless there is specially favourable 
insertion and deletion patterns. 

Now the memory chunker could be chucked out, just leaving the 
specialization for the integer sized keys, and that would settle 
assertion 1.  This is changed on only a few lines of code , and could 
actually be versioned.

What would also be lost is some degree of improved insertion and cleanup 
performance. And also some degree of that benchmark performance, which 
cheats, by the way, by looking up keys in the very same order that they 
were inserted, so that nodes stored in insertion order are likely to be 
grouped in memory together, and so benefit from CPU memory cache 
concurrency. I have not got around yet to changing it to a thoroughly 
stirred and mixed up sequence for the lookups, which should put figures 
up a bit.

> 3. Rehashing (required as the hashmap grows) means that the key/value
> arrays must be reallocated and copied. That's fine for the keys, but for
> the values this is a huge problem because the user may have dangling
> pointers into the values. (This is why Andrei started the thread
> entitled "An important potential change to the language: transitory
> ref".) I think this is currently the killer problem for randAA's
> approach.

I do not think this is true of current builtin AA, and therefore not true 
of this enhanced implementation. I note the current builtin _aaGet will 
locate a new node by pointer, maybe do a rehash, and then return the node 
it previously relocated.  Relax, this AA has no fear of nodes moving 
around on rehash. They are just relinked. All done with pointers. 

---
Michael Rynn

Re: storing the hash multiplier instead of the hash value


Andrei Alexandrescu wrote:
Walter is afraid that that's going to mean the balkanization of D - 
everyone will define their own object.d. That may be a risk, but I 
strongly believe it's a risk worth taking.


It would mean the end of precompiled libraries.

Re: storing the hash multiplier instead of the hash value


Fawzi Mohamed wrote:


On 23-mar-10, at 19:04, Andrei Alexandrescu wrote:

What I'm pushing for as of now is to move the associative array 
definition from opacity into templated goodies in object_.d.


that would be nice, that is one the main reasons Steven implementation 
is faster.
It would be nice if this would be done by the compiler as rewriting the 
calls as call to "normal" templates, i.e. to a specially named templated 
struct (AArray for example) so that (syntactic sugar for the name aside) 
that would be the same as a library implementation.

This would have two advantages:
- easy to replace the library implementation, as it would be even less 
special
- easy to replace the usage in one piece of code with another 
implementation (well truth to be told that is rather easy also now)


D2 already allows this.

Re: storing the hash multiplier instead of the hash value


Fawzi Mohamed wrote:
I think that the public interface should be exposed (or re-exposed) 
somewhere to the outside so that one can easily create efficient hashes 
for user defined types.
For example it is nice to be able to chain hash functions (something 
that the default one does not allow).


Just overload the toHash() function for your user-defined type to be whatever 
you want it to be.

Implicit enum conversions are a stupid PITA

2010-03-23 Thread Nick Sabalausky

I'm bringing this over here from a couple separate threads over on "D.learn" 
(My "D1: Overloading across modules" and bearophile's "Enum equality test").

Background summary:

bearophile:
> I'm looking for D2 rough edges. I've found that this D2 code
> compiles and doesn't assert at runtime:
>
> enum Foo { V1 = 10 }
> void main() {
>  assert(Foo.V1 == 10);
> }
>
> But I think enums and integers are not the same type,
> and I don't want to see D code that hard-codes comparisons
> between enum instances and number literals, so I think an
> equal between an enum and an int has to require a cast:
>
> assert(cast(int)(Foo.V1) == 10); // OK

He goes on to mention C++0x's "enum class" that, smartly, gets rid of that 
implicit conversion nonsense.

To put it simply, I agree with this even on mere principle. I'm convinced 
that the current D behavior is a blatant violation of strong-typing and 
smacks way too much of C's so-called "type system".

But here's another reason to get rid it that I, quite coincidentally, 
stumbled upon right about the same time:

Me:
> In D1, is there any reason I should be getting an error on this?:
>
> // module A:
> enum FooA { fooA };
> void bar(FooA x) {}
>
> // module B:
> import A;
> enum FooB { fooB };
> void bar(FooB x) {}
>
> bar(FooB.fooB); // Error: A.bar conflicts with B.bar (WTF?)

In the resulting discussion (which included a really hackish workaround), it 
was said that this is because of a rule (that I assume exists in D2 as well) 
that basically goes "two functions from different modules are in conflict if 
they have the same name." I assume (and very much hope) that the rule also 
has a qualification "...but only if implicit conversion rules make it 
possible for one to hijack the other".

It was said that this is to prevent a function call from getting hijacked by 
merely importing a module (or making a change in an imported module). That I 
can completely agree with. But I couldn't understand why this would cause 
conflicts involving enums until I thought about implicit enum-to-base-type 
conversion and came up with this scenario:

// Module Foo:
enum Foo { foo }

// module A:
import Foo;
void bar(Foo x){}

// module B version 1:
import Foo; // Note: A is not imported yet
void bar(int x){}
bar(Foo.foo); // Stupid crap that should never be allowed in the first place

// module B version 2:
import Foo;
import A; // <- This line added
void bar(int x){}
bar(Foo.foo); // Now that conflict error *cough* "helps".

So thanks to the useless and dangerous ability to implicitly convert an enum 
to its base type, we can't have certain perfectly sensible cross-module 
overloads.

Although, frankly, I *still* don't see why "bar(SomeEnum)" and 
"bar(SomeOtherEnum)" should ever be in conflict (unless that's only D1, or 
if implicit base-type-to-enum conversions are allowed (which would make 
things even worse)).

Re: Ranges and/versus iterators

2010-03-23 Thread grauzone


Andrei Alexandrescu wrote:

On 03/23/2010 04:06 PM, grauzone wrote:

Steven Schveighoffer wrote:
A while back, you identified one of the best interfaces for input 
ranges:


E* getNext();

Which allows for null returns when no data is left. The drawback is
that E must be either referenced or allocated on the heap (providing
storage to the function is an option). But the killer issue was that
safeD would not allow it. However, in recent times, you have hinted



Nullable!(E) getNext(); ?


And if returning a reference...?


Extend auto ref to template parameters:

struct Nullable(auto ref T) { ... }

T would be actually a reference type if and only if you could return a 
reference to the variable the template parameter was inferred from from 
a SafeD function. Basically, the compiler would know that references to 
T can be passed around freely. (SafeD allows ref returns under 
circumstances.)


Not a solution I would prefer, but in the spirit of the design of D2 and 
SafeD in general.



Andrei

Re: Summary on unit testing situation

2010-03-23 Thread Pelle Månsson


On 03/23/2010 08:29 PM, bearophile wrote:

Pelle M.:


I'm not sure I understand, could you explain?<


That was my best explanation, sorry.



I am not experienced with unittest frameworks, and would like to understand what 
the D system lacks.<


I think two times in the past I have written a list of those lacking things. To 
give a good answer to your question I have to write a lot, and it's not nice to 
write a lot when the words get ignored. So first devs have to agree that a 
problem exists, then later we can design things to improve the situation. 
Otherwise it's just a waste of my energy, like trying to talk in vacuum.

Unit testing has to continue when tests fail. All code must be testable, 
compile-time code too. You need a way to assert that things go wrong too, like 
exceptions, asserts, compile-time asserts, etc when they are designed to. It's 
good to have a way to give a name to tests. And unit test systems enjoy some 
reflection to organize themselves, to attach tests to code automatically. 
During development you want to test only parts of the code, not the whole 
program. Unit testing OOP code has other needs, because in a test you may need 
to break data hiding of classes and structs. If you unit test hundred of 
classes you soon find the necessity of something to help creation of fake 
testing objects. You need some tools for creating mock test objects (objects 
that simulate external resources). You need a help to perform performance 
tests, to print reports of the testing. You need layers of testing, slow tests 
and quick tests that you can run every few minutes or seconds of programming

. Generally the more the unit test system does automatically the better it is, 
because you want to write and use unit tests in the most fast way possible. 
Those things are useful, but putting most of those things inside a compiler is 
not a good idea.


The best thing you can do is to write some code in C#/Java/Python/etc and to 
add some unit tests, so you can learn what's useful and what is not. All unit 
test systems have some documentation, you can start reading that too. In two 
days you can learn more than I can ever tell you. If you don't try to use unit 
testing you probably will not be able to understand my words :-)

Bye,
bearophile


I see, and I think most of these problems are solvable within the 
language. For example, you could choose not to use asserts in unittests, 
and __traits should help in other cases.


Some of the problems may need a separate framework, so you are probably 
right about the need for improvement.

Re: Ranges and/versus iterators


On 03/23/2010 04:06 PM, grauzone wrote:

Steven Schveighoffer wrote:

A while back, you identified one of the best interfaces for input ranges:

E* getNext();

Which allows for null returns when no data is left. The drawback is
that E must be either referenced or allocated on the heap (providing
storage to the function is an option). But the killer issue was that
safeD would not allow it. However, in recent times, you have hinted



Nullable!(E) getNext(); ?


And if returning a reference...?

Andrei

Re: Ranges and/versus iterators

2010-03-23 Thread grauzone


Steven Schveighoffer wrote:

A while back, you identified one of the best interfaces for input ranges:

E* getNext();

Which allows for null returns when no data is left.  The drawback is 
that E must be either referenced or allocated on the heap (providing 
storage to the function is an option).  But the killer issue was that 
safeD would not allow it.  However, in recent times, you have hinted 



Nullable!(E) getNext(); ?

Re: Ranges and/versus iterators


On 03/23/2010 03:46 PM, Steven Schveighoffer wrote:

On Tue, 23 Mar 2010 16:34:24 -0400, Andrei Alexandrescu
 wrote:


On 03/23/2010 02:45 PM, Fawzi Mohamed wrote:

Andrei, as the topic just came up a comment on the range interface.
Just for plain forward iterators iterators having

bool empty()
E front()
void popFront()

makes the interface non reentrant.
For that purpose having a single function is better.
I use

bool popFront(ref T t)
// returns true if there is a next element, and in that case returns it
in t

this can be used by several consumers concurrently without problems and
creating filters, combiners,... is simple.
Another advantage is that a single object can implement several
iterators.
A disadvantage is that even if there is a single iterator D makes type
inference cumbersome, i.e. you cannot simply use auto, as in a loop you
have to declare the variable before using it as the loop is
T a;
while (it.popFront(a)){
//...
}


We've discussed this extensively, and I lost sleep over this simple
matter more than once. The main problem with bool popFront(ref E) is
that it doesn't work meaningfully for containers that expose
references to their elements.

The interface with front() leaves it to the range to return E or ref E.

An alternative is this:

bool empty();
ref E getNext(); // ref or no ref

I'm thinking seriously of defining input ranges that way. The
underlying notion is that you always move forward - getting an element
is simultaneous with moving to the next.


A while back, you identified one of the best interfaces for input ranges:

E* getNext();

Which allows for null returns when no data is left. The drawback is that
E must be either referenced or allocated on the heap (providing storage
to the function is an option). But the killer issue was that safeD would
not allow it. However, in recent times, you have hinted that safeD may
allow pointers, but disallow bad pointer operations. In light of this,
can we reconsider this interface, or other alternatives using pointers?

I've always felt that if we were to define ranges for streams in a
non-awkward way, we would need an "all in one" operation, since not only
does getting data from the range move the range, but checking for empty
might also move the range (empty on a stream means you tried to read and
got nothing).


I'd gladly reconsider E* getNext(), and I like it a lot, but that 
doesn't accommodate ranges that want to return rvalues without storing 
them (e.g. a range using getchar() as a back-end, and generally streams 
that don't correspond to stuff stored in memory). If it's not in memory, 
there's no pointer to it.



Andrei

Re: Summary on unit testing situation

2010-03-23 Thread Paul D. Anderson

bearophile Wrote:



> IV) Keep the built-in unit testing of D, keep them almost as simple as they 
> are now, but somehow add hooks and flexibility to allow to external D code to 
> refine *them* as much as needed (this "external" code can be a Phobos module, 
> or it can be a third-part library written by other people, or it can be born 
> as external lib and added to Phobos later, as it happens often in Python, 
> that's why they say it has "batteries included", such batteries often were 
> not born in the std library), so they can be used in professional situations 
> too. This will increase the complexity of the built-in unit tests, but 
> probably not much. It can increase the complexity of the compiler a little, 
> but I think this extra complexity (some reflection, maybe) can be then used 
> for other purposes too.
> 
 

> If you agree with me that the better solution is the IV, then those 
> hooks/reflection have to be designed first.
> 
> Bye,
> bearophile

I think your analysis is accurate. Having the simple unit testing built in is 
better than not having it at all. I use it as much as I can, but I haven't 
written complex applications -- just library modules, where it's perhaps more 
suitable.

I'm not having much luck at conceptualizing the hooks/reflection that you refer 
to. (Might just be a having a slow day.) (Or, I might just be slow!) 

It seems like we need some of the xUnit kind of tools -- test suites, more 
elaborat assertions, test result reporting (not just halting), named tests, and 
so on.

What is needed to support that?

* More elaborate asserts can be built from the basic assert. A library of 
assert templates or functions doesn't need additional compiler support.

* Named tests are essential. I'm surprised names (and qualified names -- 
test.math.divide, etc.) aren't already available. So this would have to be a 
part of the package.

* Test suites would depend, I think on having names available. Again, 
qualification may be necessary -- perhaps to include the modulel name.

* Test running needs to be extended. Running the tests before executing main is 
better than not running the tests. But, as you say, that's really only suitable 
for toy programs. We'd need some kind of control -- order of execution, action 
on failure, etc.

I don't know enough about unit testing or compiler writing to know how much 
work is involved, but it seems that just a few "small additions" would go a 
long way.

Paul

Re: Ranges and/versus iterators


On 03/23/2010 06:58 AM, clueless bystander wrote:

Lars T. Kyllingstad Wrote:


clueless bystander wrote:

Watching D evolve from the outside there seems to be a lot of ongoing discussion
on this newsgroup about the D range idiom which is somehow opposed to 
conventional
thinking about iterators.

Can someone please explain in plain words just exactly what a range is and how
it differs from the iterator concept (if that's appropriate???) and what are 
the benefits
from a data modeling and access perspective.

Sure, I'm clueless, though suspect many other bystanders would appreciate a
succinct heads-up.

Thanks,
clueless bystander



I'm probably not the right person to answer your question, since I have
virtually no experience with C++ iterators.  Instead I'll just refer you
to Andrei's own article on the subject:

http://www.informit.com/articles/article.aspx?p=1407357

Please don't hesitate to ask again if it didn't clear things up for you. :)

-Lars


Yes, well, thanks again.  The first 7 pages seemed to have plausible arguments
but the going get tough thereafter.  Maybe the reason ranges are not popular
is that they are hard to explain even though they might be simple and obvious
in hindsight.

Sigh,
c.b.


Thank you for your interest. The article 
(http://erdani.com/publications/on-iteration.html) is long because 
following my keynote talk at BoostCon 2009, I've received a flurry of 
emails asking me to flesh out the ranges design in greater detail and to 
better motivate them. That article not only describes the design, but it 
also explains the historical artifacts that led to today's imperfect 
state of affairs, motivates defining ranges with categories, and gives 
examples.


The price for such thoroughness is - sorry - size.

If you are familiar with STL iterators and GoF-style iterators (with 
hasmore/get/advance), ranges are so simple, they define themselves: a 
range is a GoF-style iterator that recognizes the necessity of STL's 
iterator categories (input, forward, bidirectional, and random). All the 
rest is aftermath.



Andrei

Re: Ranges and/versus iterators

On Tue, 23 Mar 2010 16:34:24 -0400, Andrei Alexandrescu  
 wrote:



On 03/23/2010 02:45 PM, Fawzi Mohamed wrote:

Andrei, as the topic just came up a comment on the range interface.
Just for plain forward iterators iterators having

bool empty()
E front()
void popFront()

makes the interface non reentrant.
For that purpose having a single function is better.
I use

bool popFront(ref T t)
// returns true if there is a next element, and in that case returns it
in t

this can be used by several consumers concurrently without problems and
creating filters, combiners,... is simple.
Another advantage is that a single object can implement several  
iterators.

A disadvantage is that even if there is a single iterator D makes type
inference cumbersome, i.e. you cannot simply use auto, as in a loop you
have to declare the variable before using it as the loop is
T a;
while (it.popFront(a)){
//...
}


We've discussed this extensively, and I lost sleep over this simple  
matter more than once. The main problem with bool popFront(ref E) is  
that it doesn't work meaningfully for containers that expose references  
to their elements.


The interface with front() leaves it to the range to return E or ref E.

An alternative is this:

bool empty();
ref E getNext(); // ref or no ref

I'm thinking seriously of defining input ranges that way. The underlying  
notion is that you always move forward - getting an element is  
simultaneous with moving to the next.


A while back, you identified one of the best interfaces for input ranges:

E* getNext();

Which allows for null returns when no data is left.  The drawback is that  
E must be either referenced or allocated on the heap (providing storage to  
the function is an option).  But the killer issue was that safeD would not  
allow it.  However, in recent times, you have hinted that safeD may allow  
pointers, but disallow bad pointer operations.  In light of this, can we  
reconsider this interface, or other alternatives using pointers?


I've always felt that if we were to define ranges for streams in a  
non-awkward way, we would need an "all in one" operation, since not only  
does getting data from the range move the range, but checking for empty  
might also move the range (empty on a stream means you tried to read and  
got nothing).


-Steve

Re: Summary on unit testing situation

2010-03-23 Thread Pelle Månsson


On 03/23/2010 08:10 PM, Trip Volpe wrote:

Trip Volpe Wrote:

...and if an assert fails in one test in a module, _all_ subsequent tests in 
that module will be aborted, even though this makes no sense.


Actually I said this wrong. It's worse than that: after one assert failure, 
_all_ further execution is aborted, meaning that even unit tests in _other_ 
modules will be prevented from running. And you can't change this behavior, 
even if you override the assert failure handler, since for some reason the 
compiler expects the handler to throw an AssertError, and if it doesn't, a 
segfault may result.



The solution to this would be not to use asserts in unittests.

Re: Associative Arrays need cleanout method or property to help


Michael Rynn wrote:

On Mon, 22 Mar 2010 00:52:24 -0700, Walter Bright wrote:


bearophile wrote:

A way to know the usage patterns of D AAs is to add instrumentation in
nonrelease mode, they can save few statistical data once in a while in
the install directory of D, and then the user can give this little txt
file to Walter to tune the language :-)

There's no need to tune the language. The implementation and data
structure of the AAs is completely opaque to the language. The
implementation is in aaA.d. Feel free to try different implementations
in it!


Well, I have.

See the code here : http://www.dsource.org/projects/aa/browser/trunk/
druntime/aaA.d

See the benchmarks and comments here : http://www.dsource.org/projects/aa/
wiki/DrunTimeAA.  

The result squeezes some extra performance for integer or less sized keys 
(about 20% faster for lookups).


Insertions and cleanups are faster too.


It's a nice implementation. There are some problems with it, though:

1. Deleted entries cannot be returned to the garbage collector pool unless the 
entire hash table is removed.


2. All the keys are stored in a single array, as are all the values. This 
requires that a contiguous chunk of memory can be found for the arrays when 
rehashing to a larger hashmap. This could make it very difficult for this to 
work when hash tables grow to be of significant size relative to all available 
memory. Note that the current implementation has a bucket array that has as many 
entries as the number of elements in the hashtable, but the element size is only 
a pointer, not the arbitrary size of a key or value.


3. Rehashing (required as the hashmap grows) means that the key/value arrays 
must be reallocated and copied. That's fine for the keys, but for the values 
this is a huge problem because the user may have dangling pointers into the 
values. (This is why Andrei started the thread entitled "An important potential 
change to the language: transitory ref".) I think this is currently the killer 
problem for randAA's approach.

Re: storing the hash multiplier instead of the hash value


On 03/23/2010 02:34 PM, Fawzi Mohamed wrote:


On 23-mar-10, at 19:04, Andrei Alexandrescu wrote:


What I'm pushing for as of now is to move the associative array
definition from opacity into templated goodies in object_.d.


that would be nice, that is one the main reasons Steven implementation
is faster.
It would be nice if this would be done by the compiler as rewriting the
calls as call to "normal" templates, i.e. to a specially named templated
struct (AArray for example) so that (syntactic sugar for the name aside)
that would be the same as a library implementation.
This would have two advantages:
- easy to replace the library implementation, as it would be even less
special
- easy to replace the usage in one piece of code with another
implementation (well truth to be told that is rather easy also now)


I have much loftier goals, which scare Walter sometimes :o). In the long 
term, my plan is to allow object.d to decide on a number of fundamental 
program-wide choices, such as the use of mark-sweep GC versus reference 
counting GC. In wake of my experience with D in heavyset programs, I 
reckon that there is a necessity to have deterministic memory management 
for a subset of applications.


Walter is afraid that that's going to mean the balkanization of D - 
everyone will define their own object.d. That may be a risk, but I 
strongly believe it's a risk worth taking.



Andrei

Re: Ranges and/versus iterators


On 03/23/2010 02:45 PM, Fawzi Mohamed wrote:

Andrei, as the topic just came up a comment on the range interface.
Just for plain forward iterators iterators having

bool empty()
E front()
void popFront()

makes the interface non reentrant.
For that purpose having a single function is better.
I use

bool popFront(ref T t)
// returns true if there is a next element, and in that case returns it
in t

this can be used by several consumers concurrently without problems and
creating filters, combiners,... is simple.
Another advantage is that a single object can implement several iterators.
A disadvantage is that even if there is a single iterator D makes type
inference cumbersome, i.e. you cannot simply use auto, as in a loop you
have to declare the variable before using it as the loop is
T a;
while (it.popFront(a)){
//...
}


We've discussed this extensively, and I lost sleep over this simple 
matter more than once. The main problem with bool popFront(ref E) is 
that it doesn't work meaningfully for containers that expose references 
to their elements.


The interface with front() leaves it to the range to return E or ref E.

An alternative is this:

bool empty();
ref E getNext(); // ref or no ref

I'm thinking seriously of defining input ranges that way. The underlying 
notion is that you always move forward - getting an element is 
simultaneous with moving to the next.



Andrei

Re: DMD 2.042 -- what happened to ModuleInfo.name?


Trip Volpe wrote:

FWIW, I found that if I just sneak into src/druntime/import/object.di and add
"@property string name();" to the ModuleInfo struct, everything works as
before -- the name property is defined in src/druntime/src/object_.d, so I'm
curious as to why it's excluded from the core.runtime-facing version.

Is there a particular reason for this?


It's an oversight. Sorry.

Re: Can we drop casts from float to bool?

2010-03-23 Thread grauzone


Don wrote:
In D some people have proposed to change the semantics of the "is" 
operator, to make it more useful and tidy, so if you want to know if x 
is a NaN you can then write if(x is nan).
That won't work in this case, unless you make nan a keyword. There are 
2^^62 different NaNs, so a bitwise comparison will not work.


There are other cases where if and is seem to do non-obvious things, 
that can lead to subtle bugs. If "x" is an array, if(x) checks if the 
array descriptor is null, and not whether x has length 0.


To be consistent, I see two choices how this could be fixed:
1. "if" is "intelligent", then both the float and the array case should 
be fixed to do what one would expect
2. just turn if(x) into if(x is x.init) and make "is" a bitwise 
comparison even for floats


If way 2. is chosen, I don't think multiple NaN would be a problem. The 
programmer just has to be aware that NaN isn't a single value. "is" 
yields true if the value is exactly the same, and nothing else. In 
conclusion, if(x) would be about as useless for floats as it is for 
arrays. In any way, the situation wouldn't be worse than before, but at 
least more consistent.

Re: Ranges and/versus iterators


Andrei, as the topic just came up a comment on the range interface.
Just for plain forward iterators iterators having

bool empty()
E front()
void popFront()

makes the interface non reentrant.
For that purpose having a single function is better.
I use

bool popFront(ref T t)
	// returns true if there is a next element, and in that case returns  
it in t


this can be used by several consumers concurrently without problems  
and creating filters, combiners,... is simple.
Another advantage is that a single object can implement several  
iterators.
A disadvantage is that even if there is a single iterator D makes type  
inference cumbersome, i.e. you cannot simply use auto, as in a loop  
you have to declare the variable before using it as the loop is

T a;
while (it.popFront(a)){
//...
}

Re: Summary on unit testing situation

Pelle M.:

>I'm not sure I understand, could you explain?<

That was my best explanation, sorry.


>I am not experienced with unittest frameworks, and would like to understand 
>what the D system lacks.<

I think two times in the past I have written a list of those lacking things. To 
give a good answer to your question I have to write a lot, and it's not nice to 
write a lot when the words get ignored. So first devs have to agree that a 
problem exists, then later we can design things to improve the situation. 
Otherwise it's just a waste of my energy, like trying to talk in vacuum.

Unit testing has to continue when tests fail. All code must be testable, 
compile-time code too. You need a way to assert that things go wrong too, like 
exceptions, asserts, compile-time asserts, etc when they are designed to. It's 
good to have a way to give a name to tests. And unit test systems enjoy some 
reflection to organize themselves, to attach tests to code automatically. 
During development you want to test only parts of the code, not the whole 
program. Unit testing OOP code has other needs, because in a test you may need 
to break data hiding of classes and structs. If you unit test hundred of 
classes you soon find the necessity of something to help creation of fake 
testing objects. You need some tools for creating mock test objects (objects 
that simulate external resources). You need a help to perform performance 
tests, to print reports of the testing. You need layers of testing, slow tests 
and quick tests that you can run every few minutes or seconds of programming.!
  Generally the more the unit test system does automatically the better it is, 
because you want to write and use unit tests in the most fast way possible. 
Those things are useful, but putting most of those things inside a compiler is 
not a good idea.

The best thing you can do is to write some code in C#/Java/Python/etc and to 
add some unit tests, so you can learn what's useful and what is not. All unit 
test systems have some documentation, you can start reading that too. In two 
days you can learn more than I can ever tell you. If you don't try to use unit 
testing you probably will not be able to understand my words :-)

Bye,
bearophile

Re: storing the hash multiplier instead of the hash value



On 23-mar-10, at 19:04, Andrei Alexandrescu wrote:

What I'm pushing for as of now is to move the associative array  
definition from opacity into templated goodies in object_.d.


that would be nice, that is one the main reasons Steven implementation  
is faster.
It would be nice if this would be done by the compiler as rewriting  
the calls as call to "normal" templates, i.e. to a specially named  
templated struct (AArray for example) so that (syntactic sugar for the  
name aside) that would be the same as a library implementation.

This would have two advantages:
- easy to replace the library implementation, as it would be even less  
special
- easy to replace the usage in one piece of code with another  
implementation (well truth to be told that is rather easy also now)


Fawzi

Re: Summary on unit testing situation

2010-03-23 Thread Trip Volpe

Trip Volpe Wrote:
> ...and if an assert fails in one test in a module, _all_ subsequent tests in 
> that module will be aborted, even though this makes no sense.

Actually I said this wrong. It's worse than that: after one assert failure, 
_all_ further execution is aborted, meaning that even unit tests in _other_ 
modules will be prevented from running. And you can't change this behavior, 
even if you override the assert failure handler, since for some reason the 
compiler expects the handler to throw an AssertError, and if it doesn't, a 
segfault may result.

Re: Can we drop casts from float to bool?

2010-03-23 Thread Don

bearophile wrote:

Don:
This conversion seems to be confusing, bug prone, and not useful. Can we
just get rid if it?

So let me understand better the situation.
FP can be: various kinds of NaN, +0.0, -0.0, +infinite, -infinite, and normal
numbers.
The language designers have to decide what to do for those various values in an
instruction like if(x).
At the moment in D NaNs are true, +0.0 and -0.0 are false, and the other values
are true.

Even that's not completely clear. Read the bug report. Sometimes NaNs
are true, sometimes not!

Considering NaN as true looks like an arbitrary decision, but in general NaN
means not a value, so it's not zero nor different from zero. So probably
if(NaN) has to be a runtime error. I think that's what happens when NaNs are
set as signalling.
Your solution is to forbid implicit and explicit casts of floating points to boolean values, because the choice of the true/false values is arbitrary,

Yes.
if(x) normally checks to see if x is "null". But for FP, there are two
different forms of "null", and checking only one of them makes little sense.

and because I think you are saying that packing normal values and NaNs
as true is not meaningful in real programs.

No, I'm not saying that at all. I'm saying that this operation is
extremely subtle, and the syntax gives you a false sense of security.
BTW, once you change if(x) into if (x == 0), you're may ask a second
question: "do I mean exact equality, or should this really be: if
(fabs(x)<= TOLERANCE)" ?

Disabling conversions from FP to bools seems like able to break generic code. If I have a function template that can accept an int or FP and contains an if(x) it can't compile if the type of x is a FP.

The current behaviour breaks generic code, too.
Then if x is a class, then if (x) is true if x has been initialized.
But if x is floating point, if (x) is true if x hasn't been initialized!
This is arguably the main use for `if (x)`.

Note that writing generic code that is correct for both FP and ints is
very difficult. There are so many cases where the same syntax has
different semantics.

This can be solved writing if(x==0) that works with both ints and FPs (and it's true for +0.0 and -0.0).

Exactly. I think this is almost always superior.

In D some people have proposed to change the semantics of the "is" operator, to
make it more useful and tidy, so if you want to know if x is a NaN you can then write
if(x is nan).
That won't work in this case, unless you make nan a keyword. There are
2^^62 different NaNs, so a bitwise comparison will not work.

Re: Summary on unit testing situation

2010-03-23 Thread Trip Volpe

bearophile Wrote:
> Among those four solutions the one I like more is the 'IV'. Because it keeps 
> the work of developing the library out of the busy hands of Walter, but 
> produces something that can have nice enough syntax, with a not too much 
> complex compiler, and it probably allows for some future changes in how 
> people do tests. It can also allow to write both very simple unit tests for 
> novices or single-module programs as now, and professional/complex unit tests 
> for harder situations or larger projects.
> 
> If you agree with me that the better solution is the IV, then those 
> hooks/reflection have to be designed first.
> 

I definitely agree with this. The built-in unit tests are a great convenience, 
but the provided facilities just aren't good enough _as they are_ for actual 
use. Particular problems include the fact that all tests are anonymous, no 
detailed contextual feedback is provided, and if an assert fails in one test in 
a module, _all_ subsequent tests in that module will be aborted, even though 
this makes no sense.

I've actually been doing a bit of solution IV already for my own projects. 
Starting with the Runtime.moduleUnitTester() function, I've built a simple 
framework that runs all the unit tests in the project but keeps track of 
context as well, so every failure (checked with an accompanying set of 
non-throwing "expect" functions) is logged by module, specific test, source 
line, and nature of failure.

Just a few extra hooks provided to the programmer would enable the construction 
of some very useful unit testing on the basis of D's provided primitives (which 
I agree were an excellent idea). For one example, being able to name each test 
or otherwise associate it with some form of contextual or identifying 
information would be very nice.

I also posted some thoughts a while ago ( 
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=105682
 ) on how template alias parameters could be expanded to alias complete 
expressions to allow for automatic logging of the particular expression that 
caused an expectation failure. This is the sort of helpful diagnostic already 
provided by any serious C or C++ unit testing system, and it would be great to 
see it made possible in D.

Re: Summary on unit testing situation

2010-03-23 Thread Bane

Pelle Månsson Wrote:

> I'm not sure I understand, could you explain?
> 
> I am not experienced with unittest frameworks, and would like to 
> understand what the D system lacks.
> 
> Thank you.


Me too.

Re: Summary on unit testing situation

2010-03-23 Thread Pelle Månsson


I'm not sure I understand, could you explain?

I am not experienced with unittest frameworks, and would like to 
understand what the D system lacks.


Thank you.

Re: Scope operator like in C++?

2010-03-23 Thread Trass3r

No, I didn't.  If you read my post, later I said that the compiler can  
do it for you, but it is not suitable for human consumption.  The header  
file generated has lots of formatting removed, and no comments.  It  
doesn't even always strip out the implementation.  The OP's intention  
was to maintain the header file with just the prototypes and docs so a  
user of his code would not have the implementations in the way of the  
documentation.


Ah indeed, missed that.

Re: storing the hash multiplier instead of the hash value


On 03/23/2010 12:08 PM, Fawzi Mohamed wrote:

On 22-mar-10, at 21:04, Andrei Alexandrescu wrote:

Better suggestions are always welcome. For integrals I'm unclear on
what we could use to make things better. (Clearly we could and should
get rid of the extraneous field.)


I like murmurhash, that I what I put into tango, see
http://www.dsource.org/projects/tango/browser/trunk/tango/core/rt/compiler/util/hash.d

all that file is based on freely available code and written by me, and I
give my code with whatever license would be needed.


Thanks, Fawzi, that's great.

I'm trying to collect evidence of MurmurHash's effectiveness versus 
Hsieh's SuperFastHash. I seemed to find some:


http://tanjent.livejournal.com/756623.htmlhttp://tanjent.livejournal.com/756623.html

I also found what seems to be a thorough benchmark:

http://www.strchr.com/hash_functions

Looks pretty solid to me. The benchmarks look only at strings, not at 
typical workloads on void* (e.g. arbitrary objects containing integers 
and pointers and whatnot). Hsieh's hash is smack in the middle, whereas 
Murmur2 is fourth, faster by 10%.



I think that the public interface should be exposed (or re-exposed)
somewhere to the outside so that one can easily create efficient hashes
for user defined types.
For example it is nice to be able to chain hash functions (something
that the default one does not allow).


What I'm pushing for as of now is to move the associative array 
definition from opacity into templated goodies in object_.d.



Andrei

Re: Ranges and/versus iterators

2010-03-23 Thread Igor Lesik

>Can someone please explain in plain words just exactly what a range is and how
>it differs from the iterator concept (if that's appropriate???) and what are 
>the benefits
>from a data modeling and access perspective.

Check Andrei's presentation "Iterators must go":
http://www.slideshare.net/rawwell/iteratorsmustgo

I am interested myself to know how ranges
are envisioned to be in D. Does the book talk
about ranges?

Re: storing the hash multiplier instead of the hash value



On 22-mar-10, at 21:04, Andrei Alexandrescu wrote:


On 03/22/2010 02:50 PM, Daniel Keep wrote:
Sadly, I lack the background to make any sort of objective  
judgement of

the hash function *itself*, so I'll just reiterate what I've heard
repeated to me over and over again by numerous people: D's builtin  
hash

function sucks like a black holes.


Then you might be glad to hear that Walter has recently improved the  
default hashtable implementation significantly (at least for heavy- 
duty applications).


We're using Paul Hsieh's hash function for strings and general  
memory, which is no slouch.


http://www.dsource.org/projects/druntime/browser/trunk/src/rt/util/hash.d

Better suggestions are always welcome. For integrals I'm unclear on  
what we could use to make things better. (Clearly we could and  
should get rid of the extraneous field.)


I like murmurhash, that I what I put into tango, see

http://www.dsource.org/projects/tango/browser/trunk/tango/core/rt/compiler/util/hash.d
all that file is based on freely available code and written by me, and  
I give my code with whatever license would be needed.
I think that the public interface should be exposed (or re-exposed)  
somewhere to the outside so that one can easily create efficient  
hashes for user defined types.
For example it is nice to be able to chain hash functions (something  
that the default one does not allow).


Fawzi

Summary on unit testing situation

I have already written one or two times about this topic, but I think 
summarizing the situation again a little can't hurt. Feel free to ignore this 
post.

1) D follows Walter's theory that programmers are often lazy or in a rush, they 
are often not trained to use unit tests (especially if they come from C or C++) 
and they don't like to learn to use too much complex things. So it's better to 
put in the D as simple as possible means to perform something useful, in this 
case to write unit testing. I was already "test-infected" before learning D, so 
I have used unit tests in D from almost day zero, and I have found them very 
easy to use, there's very little to learn, just to add some unittest{} spread 
in modules, filled with normal code and asserts, plus an argument -unittest for 
the compiler (but catching expected exceptions and testing expected 
compile-time errors is less easy. I have written a Throws!() function for the 
first, and I use is() for the second. And I add a comment that tells what I am 
testing inside a single unittest. Every thing has a separate unittest, to keep 
things a little more tidy). It can't be simpler than !
 this. So I think Walter was right. And in future I hope to see D code in the 
wild that uses a good amount of unittests (but I think currently Phobos has not 
enough tests).

2) Dynamic languages perform less sanity tests on the code, so programmers are 
trained to write more unit tests. In Python/Ruby you must write unit tests, a 
good amount of them. In theory in a statically typed language like D you can 
avoid some tests because the type system catches some problems for you, saving 
you the time to write some of them. In practice most of the things you have to 
write in normal D unit tests are not enforced by normal type systems (even a 
type system like D2 one that's better than Java one). I have seen that the 
tests that I don't need to put in D unit tests (because the type system catches 
them) are only the very simple ones. All the other little more complex tests 
must be written in D unit tests too, as in Python. So the save in time is not 
much. I write about 2-2.5 lines of tests for every 1 line of code. In Python I 
write about 2.5-3 lines of code of tests for every 1 line of code. (But in 
Python I often use doctests that are a way to write test!
 s that's even faster than D unittests).

3) The problem is that D unit tests are a toy. If you start writing programs 
composed by many modules you want more flexibility. I have written in the past 
some of the important things missing in D unit testing, and I don't repeat them 
here, ask me if you want another list. If you take a look at unit test systems 
in Java or Python or Ruby or C# you can see that D unit testing is not enough 
for a professional use, they are a toy. For example in the Python standard 
library there are two different (but they can be joined) unit test systems, and 
they are both quite more refined than the D one. And people often use a third 
external library that ties things together, like one called "nose".

How to solve this situation? There can be various possibilities:
I) Remove the built-in unit testing of D, and wait for someone to write an 
external "professional" unit test system for D. This external unit test system 
can have not nice/clean syntax/semantics.
II) Keep the built-in unit testing of D, but essentially all serious future 
programmers will ignore them and use an external unit type system. This wastes 
code in the compiler (and information in the head of programmers, but not a 
lot, because the built-in ones are very simple to learn) and has the 
disadvantages of the solution I too. D newbies will be adviced to avoid 
built-in unit tests as soon as possible.
III) Keep the built-in unit testing of D, and improve it until it becomes fit 
for serious usage. This can make the compiler a bit too much complex. Walter 
has enough to do already with the core of the front-end. Developing and 
improving a serious unit test system is not too much hard, but it's a full job 
or almost full job. Another bad thing of this is that unit testing is not set 
in stone, in ten years someone can invent a better way to do them, at that 
point it will be hard to change the compiler to have the newer type of tests.
IV) Keep the built-in unit testing of D, keep them almost as simple as they are 
now, but somehow add hooks and flexibility to allow to external D code to 
refine *them* as much as needed (this "external" code can be a Phobos module, 
or it can be a third-part library written by other people, or it can be born as 
external lib and added to Phobos later, as it happens often in Python, that's 
why they say it has "batteries included", such batteries often were not born in 
the std library), so they can be used in professional situations too. This will 
increase the complexity of the built-in unit tests, but probably not much. It 
can increase the complexity of the compiler a little, b

Re: Ranges and/versus iterators

2010-03-23 Thread Jesse Phillips

I think a good way to learn ranges is by looking at the interface to them, 
compared to those used by other languages. For one thing, C++ iterators are way 
more complicated than what you will find in many other languages; for example 
Java's iterator interface[1] is similar to a D range[2].

In both languages the interface is 3 functions (there are many types of ranges 
which require more, but only one kind of iterator).

Java:
bool hasNext()
E next()
void remove()

D:
bool empty()
E front()
void popFront()

These look almost identical but the semantics are very different. For example, 
hasNext() requires a look-ahead, while empty() does not. This is important 
since you may not be able to look ahead in a range.

next() performs the equivalent of calling front(); popFront(); And remove() has 
nothing to do with the iterator as it performs on the underlining collection.

Removing the look-ahead is probably one of the biggest improvements over Java's 
iterator.

1. http://java.sun.com/j2se/1.5.0/docs/api/java/util/Iterator.html
2. http://digitalmars.com/d/2.0/phobos/std_range.html#isInputRange

clueless bystander Wrote:

> Watching D evolve from the outside there seems to be a lot of ongoing 
> discussion
> on this newsgroup about the D range idiom which is somehow opposed to 
> conventional
> thinking about iterators.
> 
> Can someone please explain in plain words just exactly what a range is and how
> it differs from the iterator concept (if that's appropriate???) and what are 
> the benefits
> from a data modeling and access perspective.
> 
> Sure, I'm clueless, though suspect many other bystanders would appreciate a
> succinct heads-up.
> 
> Thanks,
> clueless bystander

Re: storing the hash multiplier instead of the hash value


On 03/23/2010 08:11 AM, Michael Rynn wrote:

On Mon, 22 Mar 2010 16:59:36 -0500, Andrei Alexandrescu wrote:


On 03/22/2010 04:03 PM, Andrei Alexandrescu wrote:

On 03/22/2010 03:36 PM, Walter Bright wrote:

Andrei Alexandrescu wrote:

Better suggestions are always welcome. For integrals I'm unclear on
what we could use to make things better. (Clearly we could and should
get rid of the extraneous field.)


Unfortunately, it won't be much of a win. Memory allocation is done in
buckets of size 16, 32, 64, etc. Reducing the node size for a
uint[uint] from 16 to 12 saves no memory at all.


As we discussed, if nodes are allocated in bunches, you could store 5
nodes in a 64-byte block instead of 4. That's more than a 25% increase
in effectiveness because the per-block bookkeeping is also slashed by
5.

Andrei


One more idea before I forget: the bunches approach requires using a
freelist for nodes that are available but not used. (Freelists are a
good idea whether or not bunches are used.)

One good possibility is to store the head of the freelist in a static
variable, such that e.g. all int[string] instances use the same
freelist. And there's no thread contention issue because each thread has
its own freelist.


Andrei


I just committed a aaA.d version that uses some heap node memory
management, although its on a per class instance basis. Also it dispences
with a separate hash storage for keys<= size_t.  Sharing would mean some
working out of which classes share the same sized node blocks.
Much easier to implement class sharing using templates.

See the code here : http://www.dsource.org/projects/aa/browser/trunk/
druntime/aaA.d

See the benchmarks and comments here : http://www.dsource.org/projects/aa/
wiki/DrunTimeAA.

The result squeezes some extra performance for integer or less sized keys
(about 20% faster for lookups).


For the druntime, the generic C interface constrains the kinds of
specialized AA versions that can be instantiated using run-time TypeInfo
and var-arged calls. Maybe a class / interface direct calls would work
better. Just imagine at runtime looking at the TypeInfo_AssociatedArray
and trying to work out exactly which template is going to be instantiated.

And having a low code size, one or few versions fit  all approach, that
performance sacrifice is unavoidable in a basic runtime libary.

But in template land , options and combinations and gains abound.


Absolutely!

This is solid work. It would be interesting to assess how your 
implementation and the old built-in hashes compare with Walter's new 
implementation, which uses a singly-linked list instead of a tree. Do 
you have what it takes to check out and build druntime and phobos?



Andrei

Re: Scope operator like in C++?

2010-03-23 Thread Trass3r

If you want to provide a "header file" for this, you may do so with a di  
file (d interface).  [...]
The drawbacks of doing it this way are 1) you have to maintain two files  
instead of one, with almost identical contents


You forget the -H switch.
http://www.digitalmars.com/d/2.0/dmd-linux.html#interface_files

Re: Scope operator like in C++?


On Tue, 23 Mar 2010 10:08:08 -0400, Trass3r  wrote:

If you want to provide a "header file" for this, you may do so with a  
di file (d interface).  [...]
The drawbacks of doing it this way are 1) you have to maintain two  
files instead of one, with almost identical contents


You forget the -H switch.
http://www.digitalmars.com/d/2.0/dmd-linux.html#interface_files


No, I didn't.  If you read my post, later I said that the compiler can do  
it for you, but it is not suitable for human consumption.  The header file  
generated has lots of formatting removed, and no comments.  It doesn't  
even always strip out the implementation.  The OP's intention was to  
maintain the header file with just the prototypes and docs so a user of  
his code would not have the implementations in the way of the  
documentation.


-Steve

Re: Scope operator like in C++?

TimK:

> I have a little (private) project in C++ and was thinking that I should give D
> a chance (especially since it's not exactly new).

Welcome, and I suggest you to use D V.1 language still, because it's simpler to 
learn and its compilers have less bugs. The latest LDC and DMD are OK, I prefer 
LDC on Ubuntu and DMD on Windows, plus something to automate building (because 
the D1 compiler is not able to find modules by itself as Java does). Eventually 
D2 will be debugged and refined enough to be used by novices too. Hopefully it 
will not take too much time. You can of course play with D2 too (DMD only), but 
you can find some bugs.


> So I have a class TPerson where the declaration is in the header file and the
> definition in the cpp-source file. Naturally a defined function looks
> something like this:
> 
> void TCharacter::DoSomething(int x){...}
> 
> I do this for several reasons:
> - The class would get really awkward to read through and understand, with all
> the code in it - so I usually just put the declaration and the documentation
> for public functions into it.
> - a programmer who just wants to understand the class doesn't care about the
> implementation.
> 
> Now I tried to make a class in D, too, following this style.
> Unfortunately that doesn't work and I couldn't find a working alternative in
> the documentation.
> 
> Is this not possible in D?

Steven Schveighoffer has already told you most things. Generally when you 
change language it's better to use its specific idioms even if you don't 
like/understand them, because a well designed language is an organic thing, its 
parts are designed to be fit to each other. So what's bad and good is often 
language-specific.

I've read people call D a Java++, or to define it a tidy C++, but in practice 
there are several differences between D and C++, for example the garbage 
collector changes many things.

In D classes/structs have their methods written inside. The indentation helps 
the programmer see the grouping. In a module you can add related classes, if 
they aren't too much long.

Bye,
bearophile

Re: Scope operator like in C++?


On Tue, 23 Mar 2010 08:06:59 -0400, TimK  wrote:


== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article

Second, GDC is horrifically old.  I'd recommend switching to ldc or the
latest dmd 1.


Unfortunately I'm using Debian Stable 64 Bit right now and switching to  
either of
those is out of the question (at least for now). Maybe I'll upgrade to  
Squeeze

(and the LLVM compiler) soon.
Anyway, thanks for the input, I'll keep that in mind if something won't  
work.


I think the latest ldc is 64-bit.  And there are a fair amount of people  
who use a 32-bit compiler on a 64-bit system, but I agree it's not savory.






The D compiler has a builtin documentation
generator, so someone trying to use your functions will want to turn to
that instead (much easier to read), and maintenance on two files that  
have

to always be in sync is sucky to say the least.
For an example of what really good generated docs look like, see tango's
docs: http://www.dsource.org/projects/tango/docs/current
My recommendation is to simply build your code inline, just like Java  
and
C#, and use the doc generation tools to document your functions.  It  
will

take some getting used to, but I think you will appreciate the
single-point-of-editing aspect of it.


Oh, I didn't know that D had a builtin documentation generator... Well  
if the
documentation will look like that then reading the class itself would  
become

useless anyway.
Guess I'll use that :)


Just a note, I think Tango uses dil (http://github.com/azizk/dil/ a D  
compiler written in D, not complete but the parser works) to generate the  
docs, so in order to make them look that good, you may have to do  
something similar.  dmd also generates decent docs, I use it for  
dcollections along with candydoc.  In general, I don't think there's any  
excuse to not use a doc generator :)


-Steve

Re: Scope operator like in C++?

2010-03-23 Thread TimK

== Quote from Steven Schveighoffer (schvei...@yahoo.com)'s article
> Second, GDC is horrifically old.  I'd recommend switching to ldc or the
> latest dmd 1.

Unfortunately I'm using Debian Stable 64 Bit right now and switching to either 
of
those is out of the question (at least for now). Maybe I'll upgrade to Squeeze
(and the LLVM compiler) soon.
Anyway, thanks for the input, I'll keep that in mind if something won't work.


> The D compiler has a builtin documentation
> generator, so someone trying to use your functions will want to turn to
> that instead (much easier to read), and maintenance on two files that have
> to always be in sync is sucky to say the least.
> For an example of what really good generated docs look like, see tango's
> docs: http://www.dsource.org/projects/tango/docs/current
> My recommendation is to simply build your code inline, just like Java and
> C#, and use the doc generation tools to document your functions.  It will
> take some getting used to, but I think you will appreciate the
> single-point-of-editing aspect of it.

Oh, I didn't know that D had a builtin documentation generator... Well if the
documentation will look like that then reading the class itself would become
useless anyway.
Guess I'll use that :)


Thank you for your fast reply.

Regards,
Tim

Re: storing the hash multiplier instead of the hash value

2010-03-23 Thread Michael Rynn

On Mon, 22 Mar 2010 16:59:36 -0500, Andrei Alexandrescu wrote:

> On 03/22/2010 04:03 PM, Andrei Alexandrescu wrote:
>> On 03/22/2010 03:36 PM, Walter Bright wrote:
>>> Andrei Alexandrescu wrote:
 Better suggestions are always welcome. For integrals I'm unclear on
 what we could use to make things better. (Clearly we could and should
 get rid of the extraneous field.)
>>>
>>> Unfortunately, it won't be much of a win. Memory allocation is done in
>>> buckets of size 16, 32, 64, etc. Reducing the node size for a
>>> uint[uint] from 16 to 12 saves no memory at all.
>>
>> As we discussed, if nodes are allocated in bunches, you could store 5
>> nodes in a 64-byte block instead of 4. That's more than a 25% increase
>> in effectiveness because the per-block bookkeeping is also slashed by
>> 5.
>>
>> Andrei
> 
> One more idea before I forget: the bunches approach requires using a
> freelist for nodes that are available but not used. (Freelists are a
> good idea whether or not bunches are used.)
> 
> One good possibility is to store the head of the freelist in a static
> variable, such that e.g. all int[string] instances use the same
> freelist. And there's no thread contention issue because each thread has
> its own freelist.
> 
> 
> Andrei

I just committed a aaA.d version that uses some heap node memory 
management, although its on a per class instance basis. Also it dispences 
with a separate hash storage for keys <= size_t.  Sharing would mean some 
working out of which classes share the same sized node blocks. 
Much easier to implement class sharing using templates.

See the code here : http://www.dsource.org/projects/aa/browser/trunk/
druntime/aaA.d

See the benchmarks and comments here : http://www.dsource.org/projects/aa/
wiki/DrunTimeAA.

The result squeezes some extra performance for integer or less sized keys
(about 20% faster for lookups).

For the druntime, the generic C interface constrains the kinds of 
specialized AA versions that can be instantiated using run-time TypeInfo 
and var-arged calls. Maybe a class / interface direct calls would work 
better. Just imagine at runtime looking at the TypeInfo_AssociatedArray 
and trying to work out exactly which template is going to be instantiated.

And having a low code size, one or few versions fit  all approach, that 
performance sacrifice is unavoidable in a basic runtime libary.

But in template land , options and combinations and gains abound.
---
Michael Rynn

Re: Ranges and/versus iterators

2010-03-23 Thread clueless bystander

Lars T. Kyllingstad Wrote:

> clueless bystander wrote:
> > Watching D evolve from the outside there seems to be a lot of ongoing 
> > discussion
> > on this newsgroup about the D range idiom which is somehow opposed to 
> > conventional
> > thinking about iterators.
> > 
> > Can someone please explain in plain words just exactly what a range is and 
> > how
> > it differs from the iterator concept (if that's appropriate???) and what 
> > are the benefits
> > from a data modeling and access perspective.
> > 
> > Sure, I'm clueless, though suspect many other bystanders would appreciate a
> > succinct heads-up.
> > 
> > Thanks,
> > clueless bystander
> 
> 
> I'm probably not the right person to answer your question, since I have 
> virtually no experience with C++ iterators.  Instead I'll just refer you 
> to Andrei's own article on the subject:
> 
>http://www.informit.com/articles/article.aspx?p=1407357
> 
> Please don't hesitate to ask again if it didn't clear things up for you. :)
> 
> -Lars

Thanks Lars.  I'm not quite willing to accept that 15 pages is succinct but if 
that's
what it takes to build up the background then that's what it takes.  I'm up
to page 3 now and, btw, like the way the author *does not* mince his words:

"Such matters as a polynomial slowdown were too mundane to hinder the power of 
S-lists, so some functional programmers got imbued with an attitude of contempt 
toward arrays and associative arrays, data structures essential to many 
algorithms."

c.b.

Re: Associative Arrays need cleanout method or property to help

2010-03-23 Thread Michael Rynn

On Mon, 22 Mar 2010 00:52:24 -0700, Walter Bright wrote:

> bearophile wrote:
>> A way to know the usage patterns of D AAs is to add instrumentation in
>> nonrelease mode, they can save few statistical data once in a while in
>> the install directory of D, and then the user can give this little txt
>> file to Walter to tune the language :-)
> 
> There's no need to tune the language. The implementation and data
> structure of the AAs is completely opaque to the language. The
> implementation is in aaA.d. Feel free to try different implementations
> in it!

Well, I have.

See the code here : http://www.dsource.org/projects/aa/browser/trunk/
druntime/aaA.d

See the benchmarks and comments here : http://www.dsource.org/projects/aa/
wiki/DrunTimeAA.  

The result squeezes some extra performance for integer or less sized keys 
(about 20% faster for lookups).

Insertions and cleanups are faster too.

-- taf
Michael Rynn.

Re: storing the hash multiplier instead of the hash value

2010-03-23 Thread Leandro Lucarella

Andrei Alexandrescu, el 22 de marzo a las 15:04 me escribiste:
> On 03/22/2010 02:50 PM, Daniel Keep wrote:
> >
> >How about just *fixing* the hashtable so it doesn't generate false
> >pointers in the first place?  Maybe I'm just strange, but that seems
> >like a more direct, efficacious solution...
> 
> This was of course the first thing Walter and I discussed. It turns
> out that that would necessitate precise GC, which we don't have yet.

Well... there is an implementation hanging in bugzilla...

-- 
Leandro Lucarella (AKA luca) http://llucax.com.ar/
--
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
--

Re: suspected ctfe bug

2010-03-23 Thread Zólyomi István

Steven Schveighoffer Wrote:
> I think it's a bug.  Please file  
> http://d.puremagic.com/issues/enter_bug.cgi

Currently it seems to be down, I'll try not to forget this later.

Istvan

Re: suspected ctfe bug

On Tue, 23 Mar 2010 03:25:25 -0400, Zólyomi István  
 wrote:



Hi,

recently I've been experimenting with the metaprogramming capabilities  
of the new dmd 2.042 compiler. I think I've found a bug, but since I'm  
still not really experienced with D, please confirm if so. I've reduced  
the test case to the following code:


import std.metastrings;

// dummy ctfe-capable function needed to reproduce the error
long parseLong(string timeStr) { return 42; }

// ctfe-capable function to demonstrate that a value is calculated in  
compile-time

string longToStr(long val)
{
if (val < 10) { return "" ~ cast(char)(val + '0'); }
else { return longToStr(val / 10) ~ longToStr(val % 10); }
}

void main(string[] args)
{
const long mylong = parseLong("mystring");
pragma(msg, "fine ", longToStr(mylong) ); // compiles and prints
pragma(msg, "bug? ", std.metastrings.toStringNow!(mylong) ); // error
}

The compiler output is

fine 42
bug? c:\Program Files\.\src\phobos\std\metastrings.d(97): Error:  
expression mylong < 0L is not constant or does not evaluate to a bool


Note that parseLong() is needed to reproduce the bug, if a constant is  
written instead of mylong, it works just fine. Am I missing something?



I think it's a bug.  Please file  
http://d.puremagic.com/issues/enter_bug.cgi


-Steve

Re: Ranges and/versus iterators

2010-03-23 Thread Lars T. Kyllingstad


clueless bystander wrote:

Lars T. Kyllingstad Wrote:


clueless bystander wrote:

Watching D evolve from the outside there seems to be a lot of ongoing discussion
on this newsgroup about the D range idiom which is somehow opposed to 
conventional
thinking about iterators.

Can someone please explain in plain words just exactly what a range is and how
it differs from the iterator concept (if that's appropriate???) and what are 
the benefits
from a data modeling and access perspective.

Sure, I'm clueless, though suspect many other bystanders would appreciate a
succinct heads-up.

Thanks,
clueless bystander


I'm probably not the right person to answer your question, since I have 
virtually no experience with C++ iterators.  Instead I'll just refer you 
to Andrei's own article on the subject:


   http://www.informit.com/articles/article.aspx?p=1407357

Please don't hesitate to ask again if it didn't clear things up for you. :)

-Lars


Yes, well, thanks again.  The first 7 pages seemed to have plausible arguments
but the going get tough thereafter.  Maybe the reason ranges are not popular
is that they are hard to explain even though they might be simple and obvious
in hindsight.

Sigh,
c.b.


The reason ranges are not popular is because they haven't had time to 
become popular yet.  The range concept itself is rather new, not much 
older than the article I referred you to, and AFAIK it's only been 
implemented in the D2 standard library.


I'm sure there are several people on this forum that can give you a 
satisfactory (and succinct) answer.  You may want to check back in a few 
hours, when activity picks up.


FWIW, I don't think the concept of ranges is very hard to grasp.  I just 
haven't used them that much, that's why I don't want to be the one to 
answer your question (and, like I said before, I have never used C++ 
iterators so I can't compare them either).


-Lars

Re: Ranges and/versus iterators

2010-03-23 Thread clueless bystander

Lars T. Kyllingstad Wrote:

> clueless bystander wrote:
> > Watching D evolve from the outside there seems to be a lot of ongoing 
> > discussion
> > on this newsgroup about the D range idiom which is somehow opposed to 
> > conventional
> > thinking about iterators.
> > 
> > Can someone please explain in plain words just exactly what a range is and 
> > how
> > it differs from the iterator concept (if that's appropriate???) and what 
> > are the benefits
> > from a data modeling and access perspective.
> > 
> > Sure, I'm clueless, though suspect many other bystanders would appreciate a
> > succinct heads-up.
> > 
> > Thanks,
> > clueless bystander
> 
> 
> I'm probably not the right person to answer your question, since I have 
> virtually no experience with C++ iterators.  Instead I'll just refer you 
> to Andrei's own article on the subject:
> 
>http://www.informit.com/articles/article.aspx?p=1407357
> 
> Please don't hesitate to ask again if it didn't clear things up for you. :)
> 
> -Lars

Yes, well, thanks again.  The first 7 pages seemed to have plausible arguments
but the going get tough thereafter.  Maybe the reason ranges are not popular
is that they are hard to explain even though they might be simple and obvious
in hindsight.

Sigh,
c.b.

Re: Scope operator like in C++?