Re: std.random2

2013-01-09 Thread Fawzi Mohamed

On 01/08/13 22:23, Dmitry Olshansky wrote:

09-Jan-2013 00:38, H. S. Teoh пишет:

On Tue, Jan 08, 2013 at 07:46:46PM +0100, Joseph Rushton Wakeling wrote:

On 01/08/2013 07:38 PM, ixid wrote:

I imagine there has been some detailed discussion of the std.nameX
idea of libraries so forgive me if this has been discussed.


I appreciate your concern on this point, but I don't think it's the
right thing to focus on in this specific discussion.

What I really want address is: how do we get the design of std.random
_right_?

How we go about incorporating that new design into Phobos with
minimal hassle for users is a different issue and one we can face
when the time comes.


For one thing, I'd say definitely make RNGs have reference semantics.
Passing by value just doesn't make sense; for large generators it
consumes too much stack space and risks stack overflow, and in any case
copying RNGs unintentionally causes duplicated sequences, which is very
bad.

For those cases where you *want* to copy an RNG, it should be made into
a forward range and then you use .save explicitly.

I wonder if it even makes sense to force RNGs to inherit from a common
base class, so that reference semantics are enforced even for
user-defined types. But this may be a bit too heavy-handed (and it will
alienate the no-GC crowd).


I'd push a well working (from my POV) design of polymorphic adapters. 
The idea is that wrapping is doable but buing the speed back is 
unsolved problem. BTW there is one for input ranges - InputRangeObject.


Then each generator/distribution defines specific structs and there is 
one templated polymorphic wrapper that has a common base class.


IMO this gives the best of both worlds (this is how std.digest was 
designed) at no implementation cost.


The no-GC, performance and "just give me this one little PRGN" needs 
are served with specific structs.


Then polymorphic behavior with hot-swappable PRNG, "I'm coming from 
Java/Ruby/..." etc. needs are served with a wrapper + base class or 
interface. It may even give a per struct aliases of the respective 
wrapper to be more user-friendly.


I think that is the correct approach, and was what I wanted to write 
yestarday, but I had no internet... a good idea is bound to crop up 
again :).







Re: std.random2

2013-01-09 Thread Fawzi Mohamed

I wrote what became the tango random* modules.
In it I did use a struct based approach for the core thing, but the
actual generator is then a class.

This because that way I was able to efficiently combine several
generators, and even thread safety, while keeping the actual object used
a class.
In my opinion reference semantic is the correct semantic for random
number generator
because normally you want random numbers, and repeating the sequence
(because you unwittingly copied the rng) is an error.
To repeat the sequence one should have functions to store the state and
reset with a stored state.

I have implemented gaussian distribution and a couple of others, and
also other generators, because I find the twister one very inflexible,
yes one would like a large internal state to be able to generate more
random sequences, and avoid collisions, but sometime one doesn't
want a very large state if he needs many different generators, or needs
to store and restart generators often.
From this discussion you might imagine that the default generator is not
twister based.

I did use a final class for performance reasons, one could create
a non templatized superclass to let non perfrmance sensitive users
avoid templatization.

I wrote everything for D1, and Tango-D2 has the D2 port, but most likely
some decisions should be revised for a real D2 rewrite.
I am willing to give my code with the phobos license, so feel free to
borrow.

I wrote the code splitting it up in several modules with a main "exporter"
module for the normal user. I like this approach (and often also splitting
tests outside the main module so they aren't parsed in the normal case),
but that is obviously agianst the phobos philosophy.

ciao
Fawzi
On 01/08/13 18:52, Joseph Rushton Wakeling wrote:

Hello all,

Following discussion on the pull request adding normal random number 
generation to std.random [ 
https://github.com/D-Programming-Language/phobos/pull/1029 ], some 
issues were raised which are best discussed with the whole community.


The heart of this is the design of pseudo-random number generators 
(PRNGs) in Phobos.  Currently these are implemented as value types, 
which has a number of unpleasant effects:


   -- They're expensive to pass around (some PRNGs have a size of a MB 
or more)


   -- Passing by value is statistically unsafe, as it can result in 
identical
  random sequences being generated in different parts of the 
code.  This

  already affects at least one part of std.random itself: see,
http://d.puremagic.com/issues/show_bug.cgi?id=8247
http://forum.dlang.org/thread/oiczxzkzxketxitnc...@forum.dlang.org

   -- Simply passing by reference isn't an adequate solution, as there 
will be
  cases (as with RandomSample, detailed in bug 8247) where you 
have to
  store the RNG.  Storing e.g. a pointer or reference would be 
unsafe; the

  only adequate solution is that PRNGs be (safe) reference types.

monarch_dodra did some work on this which was set aside in the short 
term because it would (unavoidably) be a breaking change [ 
https://github.com/D-Programming-Language/phobos/pull/893 ].  To avoid 
this, the proposed solution is to create a std.random2.


However, these issues seem to me to have broader implications for the 
design of random-number functionality in Phobos, so if we're going to 
do a std.random2, it's worth mapping them out so as to get them right.


The most obvious (to me) is that these issues which apply to PRNGs 
apply equally well to random number distributions.  For example the 
Ziggurat algorithm requires storing several hundred constants, so 
passing by value is expensive; and several different algorithms 
generate and store multiple random variates at a time, so 
copying/passing by value will result in unintended correlations in 
sequences of variates.


This has further implications if for example we want to create a 
VariateGenerator (or perhaps in D-ish terms, a VariateRange) which 
couples a random distribution with a PRNG -- this is unlikely to work 
unless both the random distribution and the PRNG are reference types.


Finally, there are more general issues about how new functionality 
should be implemented.  C++11 is given as a model in the std.random 
documentation, but this is clearly a guide rather than something to 
copy blindly -- existing functions and structs already deviate from it 
in ways that reflect D's preference for ranges and its superior 
generics.  We need a clear picture of how to do this for functionality 
that has not yet been implemented.


For example: in the case of random distributions, the current example 
of uniform() offers only a function interface and therefore little 
guidance about how to create struct implementations for more complex 
algorithms which require persistent storage (e.g. Ziggurat or 
Box-Muller).  Should they follow C++11/Boost.Random in returning 
variates via opCall, or should they be coupled with a PRNG at 
construction time (as wi

Re: GSoC Mentor Summit Observations and D Marketing

2011-10-30 Thread Fawzi Mohamed
At the gsoc I was using something like this:

a better (simplified) C++:
- close to C but not 100% backward compatible, either it compiles or it gives 
an error
- easy to link C, partially possible to use C++ libs (no template 
instantiation)
- single inheritance + interfaces (and contracts)
- garbage collection (possible to avoid with some effort)
- better templates (template language close to normal D, constraints), CTFE and 
mixing
- auto/foreach
- delegates

- immutable/const/pure if the discussions goes in that direction

and going more in depth in the things that did strike the listener.

I was also surprised about how many already heard something about D, I got even 
some questions about D1/D2 tango/phobos.
My answers were something along these lines:
- D2 toolchain becoming now robust enough to be chosen for new projects (with 
gdc finally also 64 bits)
- D1 simpler, but misses some nice features of D2, not all D2 features are 
perfect, but the proposal is compelling
- it was difficult for the community to improve the lib & druntime -> tango, 
personally I think it is the better library for D1 (I was contributing to it), 
unfortunately not compatible with phobos. D2 runtime, and no tango yet, but 
if/when it comes likely it will use druntime.

Fawzi

Gsoc Mentor summit… and fortran ;)

2011-10-27 Thread Fawzi Mohamed
I came back from the Goggle Summer of Code mentor summit.
It was nice to see many people from other open source communities, and meet 
David face to face ;).

We did try to leave some notes to remember what we said and for those who could 
not attend, but this wasn't the strong point of this conference, but it did 
improve (at least my note taking activity did)…
Anyway if you are interested to have a glimpse on what was discussed you can go 
to http://gsoc-wiki.osuosl.org/index.php/2011

David and me obviously did try to show how nice D is, but we also saw the cool 
stuff other are doing, and discussed both the practical and the more 
philosophical aspects of open source.

For example from the unexpected interesting stuff I can tell of a discussion 
that I had with Tobias Burnus that works on fortran fronted I realized that 
intent(in) in fortran is very close to immutable (actually even stronger, as it 
guarantees that the pointer will not escape, so the compiler is even ok in 
copying stuff on entry (this also for intent(inout), which had no real 
corresponding thing in D).

intent(inout) x even guarantees that x=5; f(); assert(x==5);. f can obviously 
also have x as intent(in).
Fortran does this to give the optimizer as much freedom as possible.
D doesn't have all that, but with immutable and pure, it can use some of the 
same optimizations.
Indeed it is possible that gdc could use some of the fortran annotations 
something that I promptly mailed Iain.

Here the different philosophy is visible: D give safe primitives, and behavior, 
and try to optimize; fortran choose fast options, define it as the way things 
work, and make the programmer job to make sure he uses things right, something 
that is simplified for the fact that fortran is typically threaded only through 
OpenMP.

ciao
Fawzi

Re: Vote on std.regex (FReD)

2011-10-25 Thread Fawzi Mohamed
Yes

Fawzi


Re: Formal Review of std.regex (FReD)

2011-10-22 Thread Fawzi Mohamed

On Oct 22, 2011, at 12:05 PM, Dmitry Olshansky wrote:

> On 22.10.2011 20:56, Rainer Schuetze wrote:
>> […]
>> I think, both versions use implementation specifics, maybe there should
>> be a documented way to test for being initialized.
>> 
> 
> Definitely. How about adding an empty property + opCast to bool, with that 
> you'd get:
> if(!re)
> {
> //create re
> }

I think this is better, should one ever want to switch to plain pointer…, also 
you need less thinking if it works like for classes.

> and a bit more verbose:
> if(re.empty)
> {
> //create re
> }



Re: The CAPI Manifesto

2011-10-22 Thread Fawzi Mohamed

On Oct 22, 2011, at 4:03 AM, Timon Gehr wrote:

> On 10/22/2011 04:33 AM, Walter Bright wrote:
>> On 10/21/2011 4:32 PM, Fawzi Mohamed wrote:
>>> 
>>> On Oct 21, 2011, at 4:20 PM, Fawzi Mohamed wrote:
>>> 
>>>> The main problem with this approach is how to support different
>>>> versions of
>>>> a library, or of OS. It quickly becomes difficult to support anything
>>>> but
>>>> the latest, or a fixed version. It works beautifully for mature libs.
>> 
>> Since github has excellent support for branches, I don't see why this is
>> a major problem.

do you have a repo per library? if yes then indeed it is feasible. I didn't 
think about that.

>>>> I still cannot avoid thinking that a C frontend automatically
>>>> generating D
>>>> modules with the help of recipes would be a better way. It will need
>>>> some
>>>> manual intervention for "difficult" cases, mainly giving manual
>>>> translation
>>>> of some macros, but it should be small.
>>> 
>>> … and it seems that in the time I was offline others came up with the
>>> same
>>> idea...
>> 
>> It's an old idea. The trouble is, as always, the C preprocessor. I'm
>> currently converting the openssl .h files, and they are a zoo of
>> metaprogramming using C preprocessor macros.
>> 
>> People are going to demand perfect translation if it is automatic.

that was the reason I talked about recipes that can add manual fixes where 
needed (for selected macros).

>> 
>> The only way to do it is to work with the preprocessed output of the .h
>> file, and just forget about the preprocessor.
> 
> Another way is to replace the preprocessor with CTFE and string mixins. I 
> think that could be automated quite easily. (modulo the possibility of some 
> extremely heavy abuse on the C side that could make the other parts of the 
> translation a lot harder of course)

I think string mixings are an extremely ugly solution to this problem, and I 
would try to avoid that, especially if they are used to represent a function 
that should be inlined, and might be replaced by a normal function in a later 
version.




Re: The CAPI Manifesto

2011-10-21 Thread Fawzi Mohamed

On Oct 21, 2011, at 4:20 PM, Fawzi Mohamed wrote:

> The main problem with this approach is how to support different versions of a 
> library, or of OS. It quickly becomes difficult to support anything but the 
> latest, or a fixed version.
> It works beautifully for mature libs.
> 
> I still cannot avoid thinking that a C frontend automatically generating D 
> modules with the help of recipes would be a better way.
> It will need some manual intervention for "difficult" cases, mainly giving 
> manual translation of some macros, but it should be small.

… and it seems that in the time I was offline others came up with the same 
idea...



Re: The CAPI Manifesto

2011-10-21 Thread Fawzi Mohamed
The main problem with this approach is how to support different versions of a 
library, or of OS. It quickly becomes difficult to support anything but the 
latest, or a fixed version.
It works beautifully for mature libs.

I still cannot avoid thinking that a C frontend automatically generating D 
modules with the help of recipes would be a better way.
It will need some manual intervention for "difficult" cases, mainly giving 
manual translation of some macros, but it should be small.

One would set if all the files correspond to modules, or there are just some 
"main" directories/files.

Some things are easy:
#define a
enum { a=true }
#define b "xyz"
enum { b="xyz" }

one could be tempted to replace 
#ifdef x
with
static if (is(typeof(x)) && x)
and treat other #if in a similar way, but in D a static if must contain a full 
statement, as its content must be syntactically valid, whereas the C 
preprocessor does not have this limitation.
The way to work around this, if we create the headers on demand is simple: we 
already evaluate all #if using the building definitions of the associated C 
compiler (gcc -E -dD for example) and its default include paths (or directly 
use it, keeping in account the # line file directives).

real macros are more tricky, for example one could do

#define isSmall(x) (x<2)
isSmall(T)(T x){
return x<2;
}

#define c(x) { x , #x }
template(alias x){
{ x, x.stringof }
}

thus c(t) has to become c!(t).

and maybe one has to provide some macros definition by hand, but I guess the 
cases are not so much.

In all this there is still a major pitfall: redefinitions of the same macro. It 
is not common, but it happens, and when it does everything breaks.
One could give different names for the clashing symbols, but it remains ugly.
Furthermore in D one cannot define the same interface to a C function twice and 
import it in the same scope through two modules, because it will clash, even if 
private.

This makes the whole more complicated, but I think that a few recipes coding 
the exceptions like macro translations, macros/defs to suppress or rename it 
should work pretty well.

Once could analyze if different "views" of the same include file are 
compatible, and automatically check for double definitions.
It isn't an easy project, but it would be very useful if done correctly. I 
remember talking about it with Lindquist quite some time ago…

Fawzi

Re: generative programming and debugging

2011-10-19 Thread Fawzi Mohamed
On Oct 19, 2011, at 11:07 AM, Gor Gyolchanyan wrote:

> Do anybody know a good way to (statically) debug a generated code?
> Currently it takes lots of effort to determine the place, where the
> code is being incorrectly generated.
my method is to try to have a single mixing rather than several (i.e. making 
nested mixing simple concatenate the corresponding code).
So I can pragma(msg,myMixinFunction()); and possibly even replace the mixing 
with the generated code when debugging it.

this supposing you were referring to mixing problems, templates normally behave 
better, but I tend to pass explicit template parameters to improve their 
behavior in case of errors

Fawzi


Re: Selective and renamed imports

2011-10-19 Thread Fawzi Mohamed
I think that the treatment of normal and private imports in the patch proposed 
is the correct one.
Importing should be hidden, this is also basically what LDC 1 does, as it is 
much more strict on indirect imports.

Public imports on the other hand should be roughly equivalent to declaring the 
things in the module the imports them.
To be more precise they should be *exactly* equivalent to a private import + 
alias for all the imported things.

Being selective or renaming in my opinion should not be connected with outside 
visibility, that should be controlled with public/private.

There is indeed a difference between import and selective or renamed imports, 
and plain full module imports.
Those are imported with the goal of being used, thus one should check for 
collisions.
The solution is to make

import x: y, a=z;
public import m1;
public import m2: t, t2=r;

which is equivalent to

private import x: y, a=z;
public import m1;
public import m2: t, t2=r;

equivalent to

import x;
private alias x.y y;
private alias x.z a;
import m1;
public alias m1.* *; // this is a shortcut to mean public aliases to 
all things declared in m1
import m2;
public alias m2.t t;
public alias m2.r t2;

and privately exported symbols should never be visible when imported again.

I think that this is a semantic that makes sense, and looks natural, and 
piggybacks on already defined concepts.

Fawzi

Gsoc Mentor summit

2011-10-17 Thread Fawzi Mohamed
Are there ideas on what it would be worth bringing/presenting at the Goggle 
Summer of Code Mentor Summit (aside obviously my experience with it)?
Any good D-connected (or maybe not) topic, or open source relevant topic...
Well I will have to present that, so if I think that I cannot do it, I might 
drop a good topic.
Anyway if someone has good ideas (and I suppose some will come up during the 
conference itself), please tell.
I will also try to give a feedback here about it.

ciao
Fawzi

Re: Exchange of possible interest :o)

2011-10-06 Thread Fawzi Mohamed
really a great news, I am looking forward to this.

Fawzi


Re: std.parallelism: VOTE IN THIS THREAD

2011-04-22 Thread Fawzi Mohamed

YES

it is a step in the right direction, I have ome comments, but I will  
put them in another thread


Re: GC for pure functions -- implementation ideas

2011-04-18 Thread Fawzi Mohamed

On 17-apr-11, at 21:44, Don wrote:


[...]
Basically, my contribution is this: the compiler can easily work  
out, for each function, whenever it has entered and exited a non- 
leaky pure function. It can make a call into the GC whenever this  
happens. This gives the GC many more potential strategies.


yes more info is always better, I didn't want to diminish your work,  
but to point toward a general improvement in the GC
My fear is that the overhead of this approach will make it worth only  
for big allocations (getting rid of eventual false pointers), and thus  
only under the programmer control.


Classifying the "allocation potential" of a function might help make  
better automatic decisions.
That is difficult in general  (one can flag alloc in loop with unknown  
length or more than 4 elements as large for example, something that  
would miss recursive allocations), but it would be useful, because for  
those that allocate little could do nothing, or use a fixed stack like  
heap, whereas those that allocate a lot could checkpoint the heap  
(assuming a separate pool for each thread) or use a new pool.


Fawzi


Re: GC for pure functions -- implementation ideas

2011-04-17 Thread Fawzi Mohamed


On 16-apr-11, at 22:49, Timon Gehr wrote:


[...]
The problem is, that inside a non-leaky pure function the general  
case for dynamic
allocations might be just as complicated as in other parts of the  
program.


indeed, this is exactly what I wanted to write, yes in some cases, one  
can get away with simple stack like, or similar but it breaks down  
very quickly.
In fact GC were introduced by functional languages, because they are  
kind of needed for them, already that should hint to the fact that  
functional, or pure languages are not intrinsically easier to collect.


What can be useful is allowing one to add a set of pools, that then  
can be freed all at once.
Having several pools is also what is needed to remove the global lock  
in malloc, so that is definitely the way to go imho.
Then one can give the control of these extra pools to the programmer,  
so that it is easy use a special pool for a part of the program and  
then release a lot of objects at once. Even then one should put quite  
some thought into it (for example about static/global objects that  
might be allocated for caching purposes).
A strictly pure function returning a value without pointers gives  
guarantees, but as soon as some caching (even behind the scenes) goes  
on, then things will fail. If a separate pool is used consistently for  
cached or shared objects one should be able to allow even caching.
All this comes back again to having several pools, showing how useful  
such a primitive is.


Fawzi



Re: Floating Point + Threads?

2011-04-16 Thread Fawzi Mohamed


On 16-apr-11, at 05:22, dsimcha wrote:

I'm trying to debug an extremely strange bug whose symptoms appear  
in a std.parallelism example, though I'm not at all sure the root  
cause is in std.parallelism.  The bug report is at https://github.com/dsimcha/std.parallelism/issues/1 
#issuecomment-1011717 .


Basically, the example in question sums up all the elements of a  
lazy range (actually, std.algorithm.map) in parallel.  It uses  
taskPool.reduce, which divides the summation into work units to be  
executed in parallel.  When executed in parallel, the results of the  
summation are non-deterministic after about the 12th decimal place,  
even though all of the following properties are true:


1.  The work is divided into work units in a deterministic fashion.

2.  Within each work unit, the summation happens in a deterministic  
order.


3.  The final summation of the results of all the work units is done  
in a deterministic order.


4.  The smallest term in the summation is about 5e-10.  This means  
the difference across runs is about two orders of magnitude smaller  
than the smallest term.  It can't be a concurrency bug where some  
terms sometimes get skipped.


5.  The results for the individual tasks, not just the final  
summation, differ in the low-order bits.  Each task is executed in a  
single thread.


6.  The rounding mode is apparently the same in all of the threads.

7.  The bug appears even on machines with only one core, as long as  
the number of task pool threads is manually set to >0.  Since it's a  
single core machine, it can't be a low level memory model issue.


What could possibly cause such small, non-deterministic differences  
in floating point results, given everything above?  I'm just looking  
for suggestions here, as I don't even know where to start hunting  
for a bug like this.


It might be due to context switch of threads, that might push out a  
double out of the higher precision 80-bit fpu register, and loose the  
extra precision.
SSE, or float should not have these problems. gcc has an option to  
always store the result in memory, and avoid the extra precision.
maybe having such an optionin dmd to debug such issues would be a nice  
thing.


Fawzi


Re: Floating Point + Threads?

2011-04-16 Thread Fawzi Mohamed


On 16-apr-11, at 09:41, Walter Bright wrote:


On 4/15/2011 8:40 PM, Andrei Alexandrescu wrote:

On 4/15/11 10:22 PM, dsimcha wrote:
I'm trying to debug an extremely strange bug whose symptoms appear  
in a
std.parallelism example, though I'm not at all sure the root cause  
is in

std.parallelism. The bug report is at
https://github.com/dsimcha/std.parallelism/issues/ 
1#issuecomment-1011717 .


Does the scheduling affect the summation order?


That's a good thought. FP addition results can differ dramatically  
depending on associativity.


yes, one can avoid this by using a tree algorithm with a fixed  
blocksize, then the results will be the same bothe in single and  
parallel case.

Normally one uses atomic sumation though.
In blip I spent quite a bit of thought on tree like algorithms and  
their parallelization exactly because the parallelize well and are  
independent form the paralleization


Fawzi


Re: GSoC 2011 update: we have 3 slots

2011-04-14 Thread Fawzi Mohamed


On 14-apr-11, at 00:36, Andrei Alexandrescu wrote:

Digital Mars has received 3 slots for GSoC 2011. That means we need  
to choose three student projects to go with.


We have enjoyed many strong applications and we have a great lineup  
of mentors, but Google is reluctant to allocate a lot of slots to  
first-time participants because historically Google has experienced  
a high rate of burn-out with new organizations. Most new  
participants have only received 2 slots, so this is in fact a vote  
of confidence for our fledgling participation.


So we need to prove ourselves by choosing three outstanding projects  
and by taking them to completion. I warmly thank all participants  
(students and mentors) and I hope they all will stick around and  
help even though it is impossible to select them all for formal roles.


I can only agree with Andrei, choosing just 3 projects will be hard,  
and if you happen not to be between the 3 choosen, it doesn't  
necessarily mean that your contribution wasn't good, or you aren't  
good, but simply that given the constraints we had we judged another  
project sightly better or slightly more important for our current  
priorities.
It will be a though decision, and we would for sure be honored and  
happy if some of those that in the end will not be selected still sick  
around contribute to D community, and in any case we wish you all the  
best.
And sorry if we make you jump through some hoops, but it is just that  
it is difficult to decide, and we should choose project that will have  
success, even to guarantee that google will continue to support us.


Fawzi



Re: [OT] open-source license issues

2011-04-12 Thread Fawzi Mohamed

For my personal libs/programs I fully agree with spir:

1) attribution is a very light burden
2) it is nice, and somehow the right thing to do
3) it gives back at least a bit of advertisement to the stuff *you can  
use freely*


For those reasons I did release blip with an apache 2.0 license, by  
using it I can also easily integrate/use all kinds of free libraries,

but it stays free, and usable also in commercial contexts.

That said I think that having phobos using the Boost license, aside  
form "being nice for the user" it other subtle effects:
yes it makes "standing on the shoulders of giants" more difficult,  
because you cannot as easily use other libraries,
but exactly for that reason it forces one to rebuild things from  
scratch (or almost).
For a *base* library, this is not necessarily a bad thing, it reduces  
dependencies, and might even give code that is more optimized.


I think that there is space in D for other libraries, libraries that  
use licenses like apache 2.0 or BSD.
Still I understand that the base library is boost licensed, it might  
not be my first choice for my own projects (I don't want to always
develop everything from scratch), but it is a clear choice, and a  
choice that has its own merits.


Fawzi

On 12-apr-11, at 15:34, spir wrote:


On 04/12/2011 11:55 AM, Daniel Gibson wrote:

Am 12.04.2011 11:34, schrieb spir:

On 04/12/2011 04:06 AM, Daniel Gibson wrote:

Well I'd always use PostgreSQL instead of MySQL when having the
choice, but
1. often MySQL needs to be used because it's already there
2. PostgreSQL uses the BSD-License which also isn't suitable for  
Phobos.


BTW: I think PHP has a native SQL driver (under their BSD-style PHP
license) - maybe that could be adapted to be used with D, if it's
written in C. This still couldn't be shipped with Phobos, but at  
least
there are no stupid restrictions on using it for commercial  
software.


I don't understand this story of shipping, neither. It seems to me  
D's
style rather pushes to reuse libs (esp written in C), that users  
(both
programmer&  end-user) are forced to install anyway. Licenses that  
allow
reuse (and shipping) provided a copyright note is properly  
inserted do

not change anything for me.

Instead, I find this copyright note, not only *extremely* light, but
also fair, and even nice. In my views, people who do not agree  
with that
are the kinds who want to freely take from a community and give  
nothing
back in exchange (rather corporations in fact); not even attribute  
their

work to authors. Bad and sad :-(

Denis


Yeah this is all fine when you use a third-party lib, but IMHO for a
standard-lib of a language such a copyright-note shouldn't be  
necessary.

It's not like you have to know that phobos uses zlib, for example.
Sure it's nice if you add to your README "I used the D programming
language and it's standardlib Phobos, which includes the zlib I  
used for
compression and SQLite for simple database stuff to create this",  
but it

shouldn't be necessary.


Right, I now understand the point. But still find it an extremely  
light constraint: just put an "attributions.txt" into your package.  
Compare this constraint eg with makefile issues ;-)


Denis
--
_
vita es estrany
spir.wikidot.com





Re: GSoC Proposals: Level of Detail

2011-04-08 Thread Fawzi Mohamed


On 8-apr-11, at 17:15, Andrei Alexandrescu wrote:


On 4/8/11 8:40 AM, dsimcha wrote:
I've been looking over some of the GSoC proposals and I've noticed  
that

most aren't very detailed. It seems most of the students have only a
very rough idea of what they want to do and plan on filling in the
details at the beginning of the project. I don't have experience with
GSoC and I'm trying to understand whether this is a problem or is  
what's

expected. How detailed are the proposals supposed to be?


I emailed all student proposing a project the following. After the  
email we got a lot of updates.



Andrei


Hello,


Apologies for the semi-automated email.

You should know that the deadline is only a few hours away - on the  
8th April at 19:00 UTC. Be careful! That may mean a different time  
at your location. Refer to this link:


http://www.timeanddate.com/worldclock/fixedtime.html?month=4&day=8&year=2011&hour=19&min=0&sec=0&p1=0

You should expect an interview during the application review period.  
There is no need for special preparation. The interview consists of  
a few simple questions and a couple of coding exercises. You should  
have an Internet connection handy; the interview uses www.collabedit.com 
 for writing code. Phone is fine, Skype is preferable.


Below are a few tips regarding last-minute polishing of your  
application.


* Make sure you send our way a detailed overview of the project you  
are embarking on. A good overview should clarify that you have a  
good understanding of the problem domain and that you are capable of  
carrying the task through.


* Please mention your fluency in the D programming language.

* Specify a plan for your project, with deadlines and deliverables.  
Make sure it is something that you can realistically commit to.


* Mention how much time you realistically expect to spend on the  
project. If you plan to take a vacation or otherwise be unavailable  
for some time, please specify.


* Needless to say, it is in your best interest to be honest.

* Mention in brief, if you can, alternative topics/projects you  
might be working on. We have had quite a few overlapping  
applications - there are five proposals for containers, for example.  
We wouldn't want to let you compete and then choose the best  
implementation, so we will allow only 1-2 applications on  
containers. In case you are interested in containers, how  
comfortable are you with advanced containers - Bloom filters, tries,  
generalized suffix trees, skip lists...?


* At the same time, don't spread yourself too thin. A too broad  
application loses focus and enthusiasm for any one specific topic.


* Include anything that you believe is relevant to the project(s) of  
your choice: courses completed, grades, references, experience on  
similar projects. Feel free to paste your resume. Don't forget we  
start with knowing nothing about you.


* Above all, be honest about everything. This program is at Google's  
considerable expense, not to mention the time your mentors will  
invest. Above everything, the best outcome of this for you is  
establishing an excellent reputation with everybody involved.



Good luck!

Andrei


Excellent, I was thinking that an interview would be the best thing to  
evaluate the candidates.


Fawzi


Re: Jonas Drewsen has been accepted as a GSoC 2011 mentor for Digital Mars

2011-04-08 Thread Fawzi Mohamed


On 8-apr-11, at 15:01, dsimcha wrote:


On 4/8/2011 4:37 AM, Jacob Carlborg wrote:

On 2011-04-08 02:33, Andrei Alexandrescu wrote:

Jonas Drewsen has been accepted as a mentor for the Google Summer of
Code 2011 program for Digital Mars. He is particularly interested in
topics related to networking.

Please join me in congratulating and wishing the best to Jonas.

We have 18 student applications and only 6 mentors. Probably we  
won't be

able to accept all student applications, but we could definitely use
more mentors. Please apply:

http://d-programming-language.org/gsoc2011.html


Andrei


Do we need a 1:1 mapping between students and mentors? I though  
several

students were interested in the same topics and perhaps could share a
mentor.



Possibly better question:  Would we even accept multiple students  
for the same project?


That is a good question, and I guess, excluding cases where effort has  
been done to reduce the overlap (containers), the answer will probably  
be no.


Personally I will not have time to follow more than one student (I  
might help out or something, but I can commit to only one student.


Fawzi


Re: GSoC Proposals: Level of Detail

2011-04-08 Thread Fawzi Mohamed


On 8-apr-11, at 15:40, dsimcha wrote:

I've been looking over some of the GSoC proposals and I've noticed  
that most aren't very detailed.  It seems most of the students have  
only a very rough idea of what they want to do and plan on filling  
in the details at the beginning of the project.  I don't have  
experience with GSoC and I'm trying to understand whether this is a  
problem or is what's expected.  How detailed are the proposals  
supposed to be?


I don't have experience with GSoC either, but I fear that the very  
simple "I would like to work on X" proposal will have to be discarded.
As andrei said the students have to convince us that they can do the  
project, and this means:

- good project
- knowledge of the field
- knowledge of the tools (D)
- skills
- motivation

Obviously projects don't have to be perfect, this is *before* doing  
it, but just saying "I would like to work on X" is not going to cut it.


Fawzi


GSoC and licenses

2011-04-07 Thread Fawzi Mohamed
During a google summer of code project you are supposed to produce  
your own code.


I also wasn't sure about the exact licensing needs, but as this is  
relevant for a project I tried to clarify the situation.


As far as I understood if your code has to become part of phobos then  
it should be released with the Boost license.


You cannot incorporate BSD code, or translate BSD code, to do that you  
have to ask the permission to use the Boost license to the original  
author.


You can wrap a shared library and use it if that library is a separate  
package.


I hope this helps.

Fawzi


Re: GSoC 2011

2011-04-07 Thread Fawzi Mohamed


On 7-apr-11, at 09:05, Mihir Patil wrote:


Hi,
Thanks Andrei for the heads up. I read the discussion in the  
archives as you said.
I still did not find anyone talking about the garbage collection  
part. I have a decent enough knowledge about the theoretical aspects  
of it. And I am confident that I can implement it.

I have read the garbage collection page of D on the website.
If you can tell me if there is anything in particular required to be  
done in the current code, I can start with that. I'll also work on  
the proposal today.


Garbage collector is a challenging project (bugs are extremly ugly).
There are several things that can be improved, and also several ideas  
floating around in particular wrt. to separate garbage collectors.
I have a good knowledge of that part of the ode, and I would be  
willing to mentor such a project, I think also dsimcha had expressed  
interest, but I don't know what he had in mind/what he wanted to do.


I see the following tasks as very useful improvements that will  
improve the current GC much, and still be feasible (and kind of  
progressive)


1) remove the global lock from malloc
This entails having "local" (either to cpu/numa node or thread) pools,  
for at least some of the memory allocation


2) improve collection sequence, and guarantees for destructors (2  
phases collection)


3) improve the parallelization of the gc collection (this has some  
limits, but one can for sure use more than one thread)


3b) hierarchical (generational) collector, this is quite challenging,  
as we cannot have a moving collector, probably too much for such a  
project, unless you have already written GCs


3c) implement a concurrent GC always working in its own thread, one  
has to define partial "safe" points, and possibly waiting points, like  
3b I fear it might be too challenging for a GSoC.


4) use luca's trick of forking to reduce pauses on linux, this gives  
an almost concurrent GC taking advantage of the properties of fork, it  
also needs safe points and waiting points, but its realization is  
simpler.


5) separate GC for thread local (non shared) memory, this was  
discussed much, but one has to be careful to guarantee that no memory  
of that can be transferred by anything between threads,also when  
allocating one has to know which kind of memory should be allocated.  
Step number 1, will help building something like this, but it needs  
more compiler support, and it usefulness is more limited, so I would  
leave this as last.


you can contact me by email or on IRC to discuss the details

bye

Fawzi

Thanks,

On Thu, Apr 7, 2011 at 3:49 AM, Andrei Alexandrescu > wrote:

On 4/6/11 3:21 PM, Mihir Patil wrote:
Hi,
I am Mihir. I am a Bachelors student of Computer Science. I am
interested in applying for the lexing and parsing project in D. I am
also interested in the garbage collection project. I am doing these
tasks as my term project this semester and would love if I get this
opportunity.
There are no mentors listed on the ideas page. So whom should I
contact or can anyone in the mailing list tell me some details about
the task?
Thank you.

This list is a good point of contact because most mentors and  
potential mentors are frequenting it.


We already have a strong candidature for a lexing and parsing  
project. Please look at discussions initiated by Luca Boasso in this  
forum.


That doesn't mean you can't define a complementary project.  
Alternatively, feel free to select another potential project or come  
up with your own.



Best regards,

Andrei



--
Mihir Patil
3rd year, BE (Computer Science),
Joint Co-ordinator, Computer Science Association
Birla Institute of Technology and Science,Pilani
+91-9772974127





Re: [GSOC] Database API draft proposal

2011-04-04 Thread Fawzi Mohamed


On 4-apr-11, at 02:01, Piotr Szturmaj wrote:


Fawzi Mohamed wrote:

[...]
I think that your responses are very relevant, as it seems to me that
your work is nice, and I find that if a GSoC is done in that  
direction

it should definitely work together with the good work that is already
done, let's don't create multiple competing projects if people are
willing to work together.


I'm ready to cooperate :)


great :)


* support for static and dynamic types.
how access of dynamic and static types differs, should be as  
little as
possible, and definitely the access one uses for dynamic types  
should

work without changes on static types


If you mean statically or dynamically typed data row then I can  
say my

DBRow support both.


yes but as I said I find the support for dynamic data rows weak.


I've just added row["column"] bracket syntax for dynamic rows.


excellent, ideally that should work also for untyped, because one  
wants to be able to switch to a typed Row without needing to change  
its code (and it should work exactly the same, so the typed rows will  
need to wrap things in Variants when using that interface).



* class or struct for row object


I'm using struct, because I think row received from database is a
value type rather than reference. If one selects rows from one table
then yes, it is possible to do some referencing based on primary  
key,

but anyway I think updates should be done explicitly, because row
could be deleted in the meantime. In more complex queries, not all  
of

selected rows are materialized, i.e. they may be from computed
columns, view columns, aggregate functions and so on. Allocation
overhead is also lower for structs.


* support for table specific classes?


Table specific classes may be written by user and somehow wrap
underlying row type.


well with the current approach it is ugly because your calls would be
another type, thus either you remove all typing or you can't have
generic functions, accepting rows, everything has to be a template,
looping on a table or a row you always need a template.



Could you elaborate? I don't know what do you mean.


Well I am not totally sure either, having the row handle better the  
dynamic case i already a nice step forward, I still fear that we will  
have problems with the ORM level, I am not 100% sure, that is the  
reason I would like to try to flesh out the ORM level a bit more.
I would likethat one can loop on all the tables and for each one get  
the either the generic or the specialized object depending on what is  
needed.
If one wants to have business logic in the specialized object it  
should be difficult to bypass them.
Maybe I am asking too much and the ORM level should never expose the  
rows directly, because if we use structs we cannot have a common type  
representing a generic row of a DB which might be specialized or not  
(without major hacking).


* reference to description of the table (to be able to get also  
dynamic
types by column name, but avoid using too much memory for the  
structure)


My PostgreSQL client already supports that. Class PGCommand has  
member

"fields", which contain information about returned columns. You can
even check what columns will be returned from a query, before  
actually

executing it.


ok that is nice, and my point is that the type that the user sees by
default should automatically take advantage of that

* Nice to define table structure, and what happens if the db has  
another

structure.


This is a problem for ORM, but at first, we need standard query API.


I am not so sure about this, yes these (also classes for tables) are
part of the ORM, but the normal users will more often be at the ORM
level I think, and how exactly we want the things look like that the
object level can influence the choice of the best low level  
interface.


A "defined" DBRow or static one, if used on result which has inequal  
number of columns or their types aren't convertible to row fields  
then it's an error. But, if someone uses a static fields, he should  
also take care that the query result is consistent with those fields.


For example doe we want lazy loading of an object from the db? if yes  
how we represent it with current Rows objects?


* you want to support only access or also db creation and  
modification?


First, I'm preparing base "traditional" API. Then I want to write
simple object-relational mapping. I've already written some code  
that

generated CREATE TABLE for structs at compile time. Static typing of
row fields is very helpful here.


Very good I think that working on getting the API right there and  
having

it nice to use is important.
Maybe you are right and the current DBRow is indeed the best
abstraction, but I am not yet 100% sure, to me it looks like it isn't
the best end user abstraction (but it might be an excellent low level
object)



I

Re: [GSOC] Database API draft proposal

2011-04-03 Thread Fawzi Mohamed


On 3-apr-11, at 22:54, Daniel Gibson wrote:


Am 03.04.2011 20:15, schrieb Fawzi Mohamed:

On 3-apr-11, at 19:28, Piotr Szturmaj wrote:

* Nice to define table structure, and what happens if the db has  
another

structure.


This is a problem for ORM, but at first, we need standard query API.


I am not so sure about this, yes these (also classes for tables)  
are part of the
ORM, but the normal users will more often be at the ORM level I  
think, and how
exactly we want the things look like that the object level can  
influence the

choice of the best low level interface.

* you want to support only access or also db creation and  
modification?


First, I'm preparing base "traditional" API. Then I want to write  
simple
object-relational mapping. I've already written some code that  
generated
CREATE TABLE for structs at compile time. Static typing of row  
fields is very

helpful here.


Very good I think that working on getting the API right there and  
having it nice

to use is important.
Maybe you are right and the current DBRow is indeed the best  
abstraction, but I
am not yet 100% sure, to me it looks like it isn't the best end  
user abstraction

(but it might be an excellent low level object)



I'd hate not having a rows-and-tables view onto the database.
An Object-Relational-Mapper is nice to have of course, but I agree  
with Piotr
that a traditional view onto the DB is a good start to built an ORM  
on and I
think that the traditional view should also be available to the user  
(it'll be
there internally anyway, at least for traditional relational  
databases).


I fully agree, I probably did not express me clearly enough, a basic  
table view is a must, but the ORM that one wants to realize might  
influence how exactly the basic view looks like.
For example it would be nice if a basic row would also somehow be the  
basic object of the ORM with a dynamic description, and automatically  
specialized if the db description is available at compiletime.
As I had said before "the object level can influence the choice of the  
best low level interface", this does not imply that a lower level  
interface is not needed :).


Also: How are you gonna write queries with only the ORM view? Parse  
your own
SQL-like-syntax that uses the Object type? Or have the SQL operators  
as methods?

And then generate the apropriate SQL string?
What about differences in SQL-syntax between different databases?
What about tweaks that may be possible when you write the SQL  
yourself and not

have it generated from your ORM?

No, being able to write the SQL-queries yourself and having a "low  
level" view

(tables and rows, like it's saved in the DB) is quite important.


again I fully agree, but if we want to be able to store business logic  
in objects that come from the database, and be able to express them  
easily (for example like ruby does), can be very useful.
At the ORM level one should express at most simple queries, for more  
complex stuff SQL is a must (there is no point to define another DSL  
when SQL is already one (but having special methods with common  
queries can be useful to more easily support non SQL dbs).


However: Since Piotr already seems to have much work done, maybe  
Christian
Manning could polish Piotrs work (if necessary) and create a ORM on  
top of it?


if accepted I definitely think that Piotrs and Christian will have to  
coordinate their work


Oh, and just an Idea: Maybe something like LINQ is feasible for ORM?  
So you can
write a query that includes local containers/ranges, remote  
Databases (=> part
of it will internally be translated to SQL) and maybe even XML (but  
that could

be added later once the std.xml replacement is ready)?


well simple queries, not sure if a full LINQ implementation is too  
much to ask, but simple queries should be feasible.


Fawzi



Re: [GSOC] Database API draft proposal

2011-04-03 Thread Fawzi Mohamed


On 3-apr-11, at 19:54, Piotr Szturmaj wrote:


Fawzi Mohamed wrote:


On 3-apr-11, at 18:37, Piotr Szturmaj wrote:


Fawzi Mohamed wrote:
I think that you project looks nice, but see some of the comments  
in my

other message.
I would for example consider separating table definition from row
object, and while your row object is really nice, often one has  
either a
single DB model, described in a few model files or goes with a  
fully

dynamic model.
In large project one does not/should not, define RowTypes on the  
fly

everywhere in the code.


There's no need to declare all row types. DBRow support both static
and dynamic models. For dynamic rows, DBRow uses Variant[] as its
underlying type. This is previous sample code, but changed to use
dynamic row:

auto cmd = new PGCommand(conn, "SELECT typname, typlen FROM  
pg_type");

auto result = cmd.executeQuery;

foreach (row; result)
{
// here, row subtypes a Variant[]
writeln(row[0], ", ", row[1]);
}

Btw. I've just updated documentation, so you can take another  
look :)


Yes I saw that, that is exactly the reason I was telling about  
splitting
the table definition in another object, so that also in the dynamic  
case

one can use the column names (that normally are known, or can be
retrieved from the db schema).
That would only add a pointer to each row (to its description), and
would make it much nicer to use.
Your DBRow is very nice to use, and I like how it can accommodate  
both

types, but it degrades too much for dynamic types imho.


Ah, I see what you mean :) This is yet to be done feature :)

I assume you mean something like row["typname"]. Soon, I will add  
support for this.


yes exactly, great


Re: [GSOC] Database API draft proposal

2011-04-03 Thread Fawzi Mohamed

On 3-apr-11, at 19:28, Piotr Szturmaj wrote:


Fawzi Mohamed wrote:

Looking more maybe I was a bit too harsh, if you define clearly the
goals of your API then yes it might be a good project.
The api doesn't have to be defined yet, but a more detailed  
definition

of its goals should be there, maybe with code example of some usages.
Questions that should be answered:


I know your response is'nt to me, but please let me answer these  
questions from my point of view, based on my recent work on ddb.


I think that your responses are very relevant, as it seems to me that  
your work is nice, and I find that if a GSoC is done in that direction  
it should definitely work together with the good work that is already  
done, let's don't create multiple competing projects if people are  
willing to work together.



* support for static and dynamic types.
how access of dynamic and static types differs, should be as little  
as

possible, and definitely the access one uses for dynamic types should
work without changes on static types


If you mean statically or dynamically typed data row then I can say  
my DBRow support both.


yes but as I said I find the support for dynamic data rows weak.


* class or struct for row object


I'm using struct, because I think row received from database is a  
value type rather than reference. If one selects rows from one table  
then yes, it is possible to do some referencing based on primary  
key, but anyway I think updates should be done explicitly, because  
row could be deleted in the meantime. In more complex queries, not  
all of selected rows are materialized, i.e. they may be from  
computed columns, view columns, aggregate functions and so on.  
Allocation overhead is also lower for structs.



* support for table specific classes?


Table specific classes may be written by user and somehow wrap  
underlying row type.


well with the current approach it is ugly because your calls would be  
another type, thus either you remove all typing or you can't have  
generic functions, accepting rows, everything has to be a template,  
looping on a table or a row you always need a template.


* reference to description of the table (to be able to get also  
dynamic
types by column name, but avoid using too much memory for the  
structure)


My PostgreSQL client already supports that. Class PGCommand has  
member "fields", which contain information about returned columns.  
You can even check what columns will be returned from a query,  
before actually executing it.


ok that is nice, and my point is that the type that the user sees by  
default should automatically take advantage of that


* Nice to define table structure, and what happens if the db has  
another

structure.


This is a problem for ORM, but at first, we need standard query API.


I am not so sure about this, yes these (also classes for tables) are  
part of the ORM, but the normal users will more often be at the ORM  
level I think, and how exactly we want the things look like that the  
object level can influence the choice of the best low level interface.


* you want to support only access or also db creation and  
modification?


First, I'm preparing base "traditional" API. Then I want to write  
simple object-relational mapping. I've already written some code  
that generated CREATE TABLE for structs at compile time. Static  
typing of row fields is very helpful here.


Very good I think that working on getting the API right there and  
having it nice to use is important.
Maybe you are right and the current DBRow is indeed the best  
abstraction, but I am not yet 100% sure, to me it looks like it isn't  
the best end user abstraction (but it might be an excellent low level  
object)




Re: [GSOC] Database API draft proposal

2011-04-03 Thread Fawzi Mohamed


On 3-apr-11, at 18:37, Piotr Szturmaj wrote:


Fawzi Mohamed wrote:
I think that you project looks nice, but see some of the comments  
in my

other message.
I would for example consider separating table definition from row
object, and while your row object is really nice, often one has  
either a

single DB model, described in a few model files or goes with a fully
dynamic model.
In large project one does not/should not, define RowTypes on the fly
everywhere in the code.


There's no need to declare all row types. DBRow support both static  
and dynamic models. For dynamic rows, DBRow uses Variant[] as its  
underlying type. This is previous sample code, but changed to use  
dynamic row:


auto cmd = new PGCommand(conn, "SELECT typname, typlen FROM pg_type");
auto result = cmd.executeQuery;

foreach (row; result)
{
   // here, row subtypes a Variant[]
   writeln(row[0], ", ", row[1]);
}

Btw. I've just updated documentation, so you can take another look :)


Yes I saw that, that is exactly the reason I was telling about  
splitting the table definition in another object, so that also in the  
dynamic case one can use the column names (that normally are known, or  
can be retrieved from the db schema).
That would only add a pointer to each row (to its description), and  
would make it much nicer to use.
Your DBRow is very nice to use, and I like how it can accommodate both  
types, but it degrades too much for dynamic types imho.


Fawzi



Re: [GSOC] Database API draft proposal

2011-04-03 Thread Fawzi Mohamed


On 3-apr-11, at 16:52, Piotr Szturmaj wrote:


[...]
Thanks. At this time, you can write an interface for MySQL, SQLite  
or other relational databases, using the same DBRow struct. Naming  
of course may be changed to DataRow, Row or other, depending on the  
choice of community.


In regards of base interfaces like IConnection or (semi-)abstract  
class DBConnection, I think we should have common API for all  
clients, but only to some extent. There are many features available  
in some database servers, while not available in others, for example  
OIDs (object identifiers) are fundamental thing in PostgreSQL, but  
they simply don't exist in MySQL. So, PGCommand would give you  
information on lastInsertedOID, while MySQLCommand would not.
This is also proven in ADO.NET, where each client is rarely used  
with common base interface, because it blocks many of its useful  
features.


I think base interface should be defined only after some of the most  
popular RDBMS clients are finished. Also interface should be choosen  
to cover the most featured/advanced database client. This is why I  
started with PostgreSQL, as its the most powerful open-source RDBMS.  
If base interface will cover it, it will also cover some less  
powerful RDBMSes.


I think that you project looks nice, but see some of the comments in  
my other message.
I would for example consider separating table definition from row  
object, and while your row object is really nice, often one has either  
a single DB model, described in a few model files or goes with a fully  
dynamic model.
In large project one does not/should not, define RowTypes on the fly  
everywhere in the code.
So I would try to improve the way one describes a table, or a full  
database.
Your DBRow type is definitely nice, and is a good starting point, but  
there is definitely more work to do (not that you had said otherwise :).


Fawzi


Re: [GSOC] Database API draft proposal

2011-04-03 Thread Fawzi Mohamed


On 3-apr-11, at 16:44, Fawzi Mohamed wrote:


On 3-apr-11, at 15:59, Christian Manning wrote:


[...]
I was going to reply with a link to your work but you beat me to it.
I think it's a great design and incorporating it or something  
similar into the API may be the way to go.


Indeed ddb looks really nice (I hadn't looked at it yet), given it  
though, I have to agree that just adding mySQL support is too little  
and not really innovative for 3 months work...


Looking more maybe I was a bit too harsh, if you define clearly the  
goals of your API then yes it might be a good project.
The api doesn't have to be defined yet, but a more detailed definition  
of its goals should be there, maybe with code example of some usages.  
Questions that should be answered:


* support for static and dynamic types.
how access of dynamic and static types differs, should be as little as  
possible, and definitely the access one uses for dynamic types should  
work without changes on static types

* class or struct for row object
* support for table specific classes?
* reference to description of the table (to be able to get also  
dynamic types by column name, but avoid using too much memory for the  
structure)
* Nice to define table structure, and what happens if the db has  
another structure.

* you want to support only access or also db creation and modification?

I feel that these things should be addressed in a complete proposal,  
with possible answers that might be changed later on depending on how  
things actually go.


Fawzi

Re: [GSOC] Database API draft proposal

2011-04-03 Thread Fawzi Mohamed

On 3-apr-11, at 15:59, Christian Manning wrote:


On 03/04/2011 14:42, Piotr Szturmaj wrote:

Fawzi Mohamed wrote:
Well the comments in there are what is important, and will need to  
be

specified better IMHO.

The most important part in my opinion is how one chooses to  
represent a

record.
A big design choice is if the various fields are defined at  
compile time

or at runtime.
Also how does one add special behavior to a record? Do you use a
subclasses of the generic record type (as ruby does for example)?


I'm working on DB API for few months in my spare time. I'm delayed  
that

much by my other projects. Please take a look at my ideas:

http://github.com/pszturmaj/ddb

Documentation:
http://pszturmaj.github.com/ddb/db.html
http://pszturmaj.github.com/ddb/postgres.html

In my code, row is represented using struct DBRow!(Specs...).  
Fields may

be known at compile time or not. DBRow besides base types, may be
instantiated using structs, tuples or arrays. Untyped row (no compile
time information) is DBRow!(Variant[]).

Typed rows are very useful, for example you don't need to manually  
cast

columns to your types, it's done automatically, e.g.:

auto cmd = new PGCommand(conn, "SELECT typname, typlen FROM  
pg_type");

auto result = cmd.executeQuery!(string, "typName", int, "len");

foreach (row; result)
{
// here, row DBRow subtypes
// a Tuple!(string, "typName", int, "len")
writeln(row.typName, ", ", row.len);
}

What do you think? :)


I was going to reply with a link to your work but you beat me to it.
I think it's a great design and incorporating it or something  
similar into the API may be the way to go.


Indeed ddb looks really nice (I hadn't looked at it yet), given it  
though, I have to agree that just adding mySQL support is too little  
and not really innovative for 3 months work...


Fawzi


Re: [GSOC] Database API draft proposal

2011-04-03 Thread Fawzi Mohamed
Well the comments in there are what is important, and will need to be  
specified better IMHO.


The most important part in my opinion is how one chooses to represent  
a record.
A big design choice is if the various fields are defined at compile  
time or at runtime.
Also how does one add special behavior to a record? Do you use a  
subclasses of the generic record type (as ruby does for example)?


D2 adds some more method to allow for generic accessors, so one can  
have a dynamic implementation, while still using static accessors.

Maybe one should allow for both dynamic records and static ones.
The efficient storage of results of a db query is an important point.

Are you aware of http://dsource.org/projects/ddbi for D1?

If one wants to have a nice efficient and well tested interface,  
supporting more than one DB then I think that there is enough work to  
do.


Fawzi
On 3-apr-11, at 14:33, Christian Manning wrote:


On 03/04/2011 13:10, spir wrote:

On 04/02/2011 10:03 PM, Christian Manning wrote:

I plan to have several interfaces in a database module which are  
then

implemented for specific DBMSs.
For example:

module database;

interface Connection {
//method definitions for connecting to databases go here.
}

Then in an implementation of MySQL for example:

module mysql;

import database;

class Connect : Connection {
//implement defined methods tailoring to MySQL.
}


I would recommend to use slightly longer names for generic  
interfaces,

eg "IConnection" or "DBConnection". Then, authors of libraries /
implementations for specific DBMS like MySQL can use the shorter  
ones,

eg "Connection", which will be all what library clients will see and
use. This also avoids the need for "lexical hacks" like "Connection"
versus "Connect".
What do you think?


When I was writing that it really didn't sit well and "DBConnection"  
in particular is a much better way of doing it to reduce some  
confusion there.



What goes in to these interfaces will be decided in conjunction with
the D
community so that there is minimal conflict and it will benefit as  
many
circumstances as possible. I believe this to be the best route to  
take

as I
cannot speak for everyone who will be using this.

Using the API created I plan to create an example implementation,
initially
wrapping around the MySQL C API. This will be a good starting point
for this
project and more can be created, time permitting.


I have no idea of the actual size of such an interface design, but I
doubt it can make you busy for 3 months full time, especially since
there are (probably good) precedents for other languages. Maybe the
example implementation should be specified as part of the project?


I'm aware that it wouldn't take 3 months, but I don't know how long  
it will take to have the API agreed upon so that there's a general  
consensus. Another way I could do it is to decide on the API myself  
and begin implementing DBMSs with it and then adapt to the ideas  
brought forth by the community. Then, everyone's happy, just in a  
different time frame. Though, if there are a lot of changes wanted  
I'd have to change all of my implementations depending on how far I  
am at the time. What do you think about that path?


Thanks for the feedback, it's much appreciated :)

Chris




Re: image processing in D

2011-04-03 Thread Fawzi Mohamed

On 2-apr-11, at 10:40, aman bansal wrote:

i was trying to chalk out a strategy to go for image processing in  
D.the closest reference i found was in the implementation of python  
imaging library.it has modules for imaging,and input output of jpeg  
and bmp file formats.the data structures used also are quite  
accurate.i would like to ask developers what can be the possible  
problems in implementing image i/o in D on the lines of python  
imaging library.


As I told you via email

there are two main things, one is reading/writing several formats, I  
would say that you should support at least one standard (simple)  
format natively (the simplest would be netpbm format) but then you  
could rely on libraries to support more.
Use of external libraries should be discussed with others also,  
because one should rely only on libraries that are widely available,  
cross-platform and with acceptable licensing.


About the image processing itself you probably want to have a simple  
flat representation of the image (as 2d array), and then be able to  
apply several operations on it.
General convolution is probably something you want to have, masked  
operation might also be very useful.

Not sure about which other operations you want to support.
In D1 as part of blip I have implemented nearest neighbor convolution,  
it could be useful to you.


Not sure which bit depth you want to support, one can build a wrapper  
to access arbitrary bit depth/bitchannel contiguous block of memory as  
1d or 2d array (for example 4channel 12 bit images), this can be  
useful as base type for operations with images that might have an  
extended bit range.


is this for a GSoC project?

anyway about Python Image Library, I think that the high level  
functionality is mostly ok, and probably you can lift some of the C  
part, but lots seems to be a bit python connection specific, not sure  
that it is worth using.
PIL has a BSD style licensing, this might be an issue if you want to  
have it in phobos (Boost).


PIL uses the following libraries:

   JPEG support libjpeg (6a or 6b)
http://www.ijg.org
http://www.ijg.org/files/jpegsrc.v6b.tar.gz
ftp://ftp.uu.net/graphics/jpeg/

   PNG support  zlib (1.2.3 or later is recommended)
http://www.gzip.org/zlib/

   OpenType/TrueTypefreetype2 (2.3.9 or later is recommended)
   support
http://www.freetype.org
http://freetype.sourceforge.net

   CMS support  littleCMS (1.1.5 or later is recommended)
   support
http://www.littlecms.com/

They seem reasonable, so the first step would be to wrap their C  
interface (if not already available, like zlib).
I thought there were some image processing projects in D1, but I  
haven't found them just now.
Still also these libraries licensing, and support should be checked,  
and evaluated also by others.


Fawzi

Asynchronicity and more

2011-04-02 Thread Fawzi Mohamed
There are several difficult issues connected with asynchronicity, high  
performace networking and connected things.
I had to deal with them developing blip ( http://fawzi.github.com/ 
blip ).
My goal with it was to have a good basis for my program dchem, and as  
consequence is not so optimized in particular for non recursive tasks,  
and it is D1, but I think that the issues are generally relevant.


i/o and asynchronicity is a very important aspect and one that will  
tend to "pollute" many parts of the library, and introduce  
dependencies that are difficult to remove thus those choices have to  
be done carefully.


Overview:


Threads vs fibers:
---

* an issue not yet brought up is that thread wire some memory, and so  
have an extra cost that fibers don't.
* evaluation strategy of fibers can be chosen by the user, this is  
relevant for recursive tasks where each task
  spawns other tasks, different strategies (breadth first evaluation  
like threads uses a *lot* more resources
  than depth first, by having many more tasks concurrently in  
evaluation)


Otherwise the relevant points already  brought forth by others are:

- context switch of fibers (assuming that memory is active) is much  
faster
- context switch are chosen by the user in fibers (cooperative  
multitasking), this allows
  one to choose the optima point to switch, but a "bad" fibers can  
ruin the response time the others.
- d is not stackless (like Go for example), so each fiber needs to  
have enough space for the stack
  (something that often is not so easy to predict). This makes fiber  
still a bit costly if one really needs a lot of them.
  64 bit can help here, because hopefully the active part is small,  
and it can be kept in RAM, even using a rather
  large virtual space. Still as correctly said by Brad for heavily  
uniform handling of many tasks manual
  management (and using stateless functions as much as possible) can  
be much more efficient.


Closures

When possible and for the low level (often used) operations delegates  
and functions calls are a better solution than , structs and manual  
memory handling for "closures" are a good choice for low level  
operations, because one can avoid the heap allocation connected with  
the automatic closure.
This approach cannot be avoided in D1, whereas D2 has the very useful  
closures, but at low level their cost should be avoided when possible.
About using structs there are subtle issues that I think are connected  
with optimization of the compiler (I never really investigated them, I  
always changed the code, or resorted to heap allocation.
The main issue is that one would like to optimize as much as possible,  
and to do it it normally assumes that the current thread is the only  
user of the stack. If you pass stack stored structures to other  
threads these assumptions aren't true anymore, so the memory of a  
stack allocated struct might be reused even before the function  
returns (unless I am mistaken and the ABI forbids it, in this case  
tell me).


Async i/o
--

* almost always i/o is much slower than CPU, so an i/o operation is  
bound to make the cpu wait, so one wants to use the wait efficiently.
  - A very simple way is to just use blocking i/o, and just have  
other threads do other threads.

  - async i/o allows overlap of several operations in a single thread.
  - for files an even more efficient way to communicate sharing of  
the buffer with the kernel (aio_*)
  - an important issue is avoiding waste of cpu cycles while waiting,  
to achieve this one can collect several waiting operations and use a  
single thread to wait on several of them, select, poll and epoll allow  
this, and increase the efficiency of several kinds of programs
  - libev and libevent are cross platform libraries that can help  
having an event based approach, taking care to check a large number of  
events and call a user defined callback when they happen in a robust  
cross platform way


locks, semaphores

to synchronize between threads locks and semaphores are a standard way  
to synchronize.
One has to be careful to mix them with fiber scheduling with locks, as  
one can easily deadlock.


Hardware informationy
-
Efficient usage of computational resource depends also on being able  
to identify the available hardware.
Don did quite some hacking to get useful information out of cpuinfo,  
but if one is interested in more complex computers more info would be  
nice.

I use hwloc for this purpose, it is cross plattform, can be embedded.

Possible solutions
==

Having async i/o can be presented as normal synchronous (blocking) i/ 
o, but this makes sense only if one has several objects waiting, or  
uses fibers, and executes other fiber while waiting.
How acceptable it is to rely (and thus introduce a dependency on)  
things like libev or hwloc?
For my purposes using them was ok, 

Re: Fawzi Mohamed has been accepted as a GSoC 2011 mentor for Digital Mars

2011-04-02 Thread Fawzi Mohamed


On 1-apr-11, at 20:13, Walter Bright wrote:


On 4/1/2011 10:29 AM, Andrei Alexandrescu wrote:
Fawzi Mohamed from Humboldt University has been accepted as a  
mentor for the
Google Summer of Code 2011 program for Digital Mars. He is  
particularly
interested in topics related to concurrency, parallelism, and  
garbage collection.


Great! Congrats, Fawzi!


thanks, I am clearly happy about this, looking forward to mentor a  
project, and making D even better :)


Fawzi


Re: GSoC-2011 project:: Containers

2011-03-30 Thread Fawzi Mohamed
I think that a doubly linked list is useful, actually it one should  
implement most things so that the can work on any object that has prev  
and next pointers, and give a templated default list wrapper. That is  
what I did for singly linked lists, and it works well.

Often one wants to avoid allocating lot of small wrappers...

About the containers I did propose the persistent ones, because they  
are useful, and currently there aren't any, whereas for more classic  
dcollection is there (even if not part of phobos).


Fawzi
On 30-mar-11, at 01:55, Jonathan M Davis wrote:


On 2011-03-29 14:50, dsimcha wrote:

== Quote from Jonathan M Davis (jmdavisp...@gmx.com)'s article

The fancier stuff would be nice, but we don't even have a doubly- 
linked

list yet. We should get the simpler stuff sorted out before we get
particularly fancy, not to mention that it's usually the simple  
stuff

that gets heavily used.


For the most part I agree, but a doubly linked list might be **too**
simple. Linked lists are so trivial to implement that I'd tend to  
roll my

own that does exactly what I need with regard additional behavior on
insertion, etc. rather than wrapping a library solution to get these
features.


A doubly-linked list is on the list of containers that every  
standard library
should have or it's likely to be considered lacking. I can  
understand rolling
your own for specific uses, but _I_ sure don't want to be doing that  
if I
don't have to. If I want a doubly-linked list, I want to be able to  
just
create a standard one and use it. C++, C#, and Java all have doubly- 
linked

lists in their standard libraries.

If no one else ever implements a doubly-linked list for Phobos, I'll  
probably
do it eventually simply because it's one of the containers that is  
on the

short list of containers that pretty much every standard library has.

- Jonathan M Davis




Re: GSoC-2011 project:: Containers

2011-03-29 Thread Fawzi Mohamed

On 29-mar-11, at 21:32, spir wrote:


On 03/29/2011 08:34 PM, Ishan Thilina wrote:

I can try to answer your questions, but I have not applied to be an
official mentor.  Just want to make that clear.

My previous message was "I would be a mentor for this, but  
(reasons why I

will not)"

Sorry if that is not what you read.

-Steve


That's ok, You have given me enough encouragement to carry on this  
project. :-)


-

The project idea page gives only a brief introduction to the  
project. From the
ongoing conversation in this mailing list I feel that different  
people different
expectations from this project. But it is hard to make everyone  
happy. (Me my self
being new to will need more time to get familiar with the advanced  
concepts of D.)
So I want to know what are the containers that you hope me to  
implement in phobos?


I have in mind a general Tree & Node, or TreeNode idea; especially  
implementing all common tree traversal schemes (including leaves  
only) and the background mechanisms to insert/remove/change given  
nodes and search/count/replace given values. It may not be used as  
is --because various tree kinds actually are very different-- but  
could be highly useful, I guess, either as template or as supertype  
to be derived from.

Comments? (I would probably do it anyway.)


I would very much like to have also persistent containers.
I have implemented a Batched array for example in D1, which can grow  
on one side, and one can have persistent slices, and pointers to  
elements that remain fixed.
For structures that just grow and multithreaded programming without  
(or with very limited) locking these structures are very useful.
Also variations of those presented in Purely Functional Data  
Structures of Chris Okasaki, would be nice to have.


Fawzi



Re: Please vote on std.datetime

2010-12-10 Thread Fawzi Mohamed


On 10-dic-10, at 21:14, Kagamin wrote:


Fawzi Mohamed Wrote:

Last thing, well is something I would have done differently (as I  
said
already in the past), is using doubles expressing number of seconds  
to

represent point in time, durations, and TimeOfDay. I know other
differs about this, but I really think that it is a very simple and
versatile type.


I actually have a problem with this format. I have an application  
that works with messages. The message has send date. The application  
was written in delphi so it uses double to represent DateTime. The  
message can be signed, the date can be included to the data to be  
signed, so the application uses the double format for sign buffer.  
Then I have .net application that should interoperate with delphi  
application, but you can't compute double value from string  
representation of DateTime in an interoperable way, the last bit  
depends on the order of computations, and if you miscompute it, the  
signatures will be incompatible.
When signing normally one sends also the original message (with date),  
so I guess I don't understand your example, sorry...


And again I am *not* arguing to use double exclusively, just for  
absolute times, and as option for durations, and for the TimeOfDay


I think, the point in time should be long, and millisecond precision  
is enough. Any higher precision is a very special case.


but what is the problem if you do have it? that it is more difficult  
to guess the exact value you will read? why would any sane setting  
need that?




Re: Please vote on std.datetime

2010-12-10 Thread Fawzi Mohamed


On 10-dic-10, at 20:07, Jonathan M Davis wrote:


On Friday, December 10, 2010 09:55:02 Fawzi Mohamed wrote:

On 10-dic-10, at 18:02, Jonathan M Davis wrote:

thanks for the answers


On Friday 10 December 2010 03:18:29 Fawzi Mohamed wrote:
Clock is used as a namespace of sorts specifically to make the code
clearer. You
can think of it as a sort of singleton which has the functions which
give you
the time from the system clock. I think that it improves useability.


having a separate module for it would give a similar effect


Having a separate module would defeat the point at least in part.  
The idea is

that you call Clock.currTime() rather than currTime(). You call
IRange.everyDayOfWeek!() instead of everyDayOfWeek!(). The benefit  
is in having
the extra name to make it clear what the function is associated  
with, what it's
doing. For instance, conceptually, you're getting the current time  
from the
system clock. This makes it explicit. If it were a separate module,  
then it
would normally just be imported and you'd call currTime() and  
everyDayOfWeek!()
and all the benefit of putting them in that separate namespace is  
lost.

import Clock=std.time.Clock;

Clock.currTime();



- I find that there is a loss of orthogonality between SysTime and
DateTime. For me there are a calendar dates, and absolute points in
time. To interconvert between the two one needs a timezone. I would
associate the timezone with the calendar date and *not* with the
absolute time.
I find that SysTime makes too much effort to be a calendar date
instead of a "point in time".
Also if one wants to use a point in time at low level it should be
"lean and mean", what is the timezone doing there?


I don't really get this. Date and DateTime (and thus TimeOfDay) is
intended for
calendar use. There is no time zone because you're not dealing with
exact times
which care about the time zone that they're in. They don't
necessarily have any
relation to UTC or local time. A lot of calendar stuff isn't going
to care one
whit about time zones.

SysTime is specifically supposed to handle the "system time." The
system
definitely cares about the time zone. You have a local time zone
that your system
is in. You potentially have to convert between time zones when
playing around
with time stamps and the like. It's when dealing with the system
time that
you're really going to care about time zones. So, SysTime includes a
time zone,
and it is the type to use when you care about the time zone. If you
really want
dealing with the system time to work correctly in the general case,
you need it
to have a time zone. I've run into a number of bugs at work
precisely because
time_t was passed around naked and constantly converted (which, on
top of being
bug-prone, _cannot_ work correctly due to DST). By having the time
in UTC
internally at all times and converting it as necessary to the time
zone that you
want, you avoid a _lot_ of problems with time.


I see two uses of time, one is calender the other a point in time.
A point in time needs only to know if other events are before or  
after

it, or how far they are.
It should definitely use a unique reference point (for example NSDate
uses 1 january 2001).
Using UTC is correct, I never argued for something else.
the thing is that a point in tame doesn't *need* a timezone, it needs
just a reference point.

A timezone is needed to convert between calender (TimeDate) and a
point in time.
So if one wants to store a timezone somewhere (and not use it just
when converting between point in time and calender date), then I  
would

store it in calender date, because without it I cannot know to which
absolute time it refers, and a calendar date is already larger, and
the extra storage is probably not something one would care in typical
use of the calendar.


We're obviously thinking of very different use cases here. A SysTime  
is a specific
point in time. It's exact. You can't possibly mistake it for any  
other point in
time ever. It can effectively be displayed and manipulated in a  
particular time

zone, but it is an exact point in time with no ambiguity.


That is *exactly* my point SysTime is a point in time *not* a  
calender, the current apy tries too hard to make it behave like a  
calender, and the presence of a timezone in there reflects this


DateTime, on the other hand, could represent over 24 different  
points in time
(and that's assuming that it doesn't hit a DST transition). It is  
not at all
exact. It's exact enough for manipulating general dates and times,  
but it's not
exact enough for doing anything that is concerned with being an  
unambiguous
point in time. Sure, it _could_ be made to have a time zone, but  
that would
complicate it further, and many uses wouldn't care at all. It's not  
associated
with machine/system time at all. It's a conceptua

Re: Please vote on std.datetime

2010-12-10 Thread Fawzi Mohamed


On 10-dic-10, at 18:02, Jonathan M Davis wrote:

thanks for the answers


On Friday 10 December 2010 03:18:29 Fawzi Mohamed wrote:

On 10-dic-10, at 01:26, Andrei Alexandrescu wrote:

Jonathan M. Davis has diligently worked on his std.datetime
proposal, and it has been through a few review cycles in this
newsgroup.

It's time to vote. Please vote for or against inclusion of datetime
into Phobos, along with your reasons.


Thank you,

Andrei


I think it is quite complete and usable, lot of work obviously went
into this,...
an clearly after the praise comes a "but"... so here are my comments
on it, some are just personal preferences

- I would split the unittests to a separate test module, I like  
having
unittests, but having many of them make the code more difficult for  
me
to skim through, and grasp, one test (example) can be useful, but  
lots

of the hide the code structure.
Maybe it is just my personal preference, but I like compact code,  
code

that can be read, and so many unittests stop the flow of my reading.


This has been discussed a time or two. D really isn't set up to work  
that way.
Yoo can't just move them over because then those that need private  
access won't
work. The lack of named unittest blocks really hurts that as well.  
You _could_

turn them into mixins of some kind, but that could get quite messy.

But honestly, I find it _way_ easier to maintain the code with each  
unittest
block immediately following the function that it's testing. The  
interval code is
quite irritating precisely because I couldn't put the tests next to  
the code
(since it's templatized it just didn't work in that case). I agree  
that it does
harm your ability to skim through the code, but the ddoc html files  
let you skim
the API, and I really di think think it's more maintainable this  
way. Besides,
if we really want to, we can change that sort of thing later.  
Exactly how the
unit tests are handled doesn't affect the public API or the general  
useability of

the module.


ok, sorry I hadn't followed the discussion, as I said that is just my  
personal perference.





- I would split this into several modules
(Timezone,SysTime,TimeDate,Clock), and if you want a "helper" module
that make a public export.
Modules should be used to define modules/namespaces, using classes
seems a misuse to me (I am looking a Clock for example, which is a
separated functionality imho).


It was already discussed that it would be better as one module. We  
don't have
any kind of hard limit on the size of modules or anything like that,  
and it's

just simpler to have it in one module.


well but with public export you can easily have an exported module, by  
separate compilation you might spare something, but again it is a  
matter of style, I find that in d the main way to partition code are  
modules.


Clock is used as a namespace of sorts specifically to make the code  
clearer. You
can think of it as a sort of singleton which has the functions which  
give you

the time from the system clock. I think that it improves useability.


having a separate module for it would give a similar effect

Similarly, IRange is there specifically to namespace the functions  
which generate
functions used to generate ranges. It makes the code clearer to make  
it clear

that the functions are generating range generative functions.

There were other classes used to namespace code, and it was rightly  
pointed out
that they were unneeded. However, I believe that in these two cases,  
it's a
definite useability improvement to have them. It makes code clearer  
and easier to

read.


if I alone on this I will not argue, but I definitely have a different  
style.



- I find that there is a loss of orthogonality between SysTime and
DateTime. For me there are a calendar dates, and absolute points in
time. To interconvert between the two one needs a timezone. I would
associate the timezone with the calendar date and *not* with the
absolute time.
I find that SysTime makes too much effort to be a calendar date
instead of a "point in time".
Also if one wants to use a point in time at low level it should be
"lean and mean", what is the timezone doing there?


I don't really get this. Date and DateTime (and thus TimeOfDay) is  
intended for
calendar use. There is no time zone because you're not dealing with  
exact times
which care about the time zone that they're in. They don't  
necessarily have any
relation to UTC or local time. A lot of calendar stuff isn't going  
to care one

whit about time zones.

SysTime is specifically supposed to handle the "system time." The  
system
definitely cares about the time zone. You have a local time zone  
that your system
is in. You potentially have to convert between time zones when  
playing around
with time stamps and the like. It's when dealing

Re: How convince computer teacher

2010-12-10 Thread Fawzi Mohamed


On 10-dic-10, at 03:38, torhu wrote:


On 09.12.2010 17:27, Ddev wrote:

hi community,
How convince my teacher to go in D ?
After talk with my teacher, i do not think D is good because after  
10 years is not become the big one. she is very skeptical about D.  
If i could convince my teacher it will be great maybe i will teach  
to his

students :)

best regards


D is based on languages like C and Java, and is syntactically very  
similar to those.  So if you already know programming, D is probably  
very easy to learn.  Try to learn about programming in general, not  
about a specific language.  The language matters, but everything  
else is more important.


++

Now to convince a teacher insulting teachers that do not want to use D  
is not a very good strategy, there are good reasons not to choose D,  
and obscurity is for sure one of them, personally I find that the  
advantages offset the disadvantages, you can try to show her how close  
to C it is, but still clean. TDPL is a nice book, you can also give  
that to her to look at.
And you can tell her that the language is close enough to other  
languages so that it will not be wasted even if they later use another  
one.
The quick compiler and support of most C++ features, but in a clean  
way can be a good selling point.


Fawzi


Re: uniqueness

2010-12-10 Thread Fawzi Mohamed


On 10-dic-10, at 11:53, Don wrote:


Fawzi Mohamed wrote:

If one could declare return or out types as unique (note that  
unique is *not* part of the type, it is like the storage  
attributes), these methods could be implicitly castable to const or  
immutable, allowing nicer code.
Constructors *might* return unique objects (an object is unique  
only if all its references are to unique or immutable objects).
In several cases uniqueness could be checked by the compiler. I  
think that such a change would improve part of my code, removing  
the need for several spurious casts, while at the same time making  
the code safer.


Any mutable object returned from a strongly pure function, is  
guaranteed to be unique.


indeed good catch, I was saying that in some occasions the compiler  
can verify uniqueness, that is indeed an important case.


But I don't understand if you want to imply that uniqueness should not  
be explicit, but just guaranteed to be detected and used in some  
occasions, as in the case you gave.


Because any object builder (as for example an array concatenate  
object) cannot be pure, but can still return an unique object.


about front_const , there I thought harder about having an implicit  
only use of it, but also there I think that it is not such a good idea


Re: Please vote on std.datetime

2010-12-10 Thread Fawzi Mohamed


On 10-dic-10, at 01:26, Andrei Alexandrescu wrote:

Jonathan M. Davis has diligently worked on his std.datetime  
proposal, and it has been through a few review cycles in this  
newsgroup.


It's time to vote. Please vote for or against inclusion of datetime  
into Phobos, along with your reasons.



Thank you,

Andrei


I think it is quite complete and usable, lot of work obviously went  
into this,...
an clearly after the praise comes a "but"... so here are my comments  
on it, some are just personal preferences


- I would split the unittests to a separate test module, I like having  
unittests, but having many of them make the code more difficult for me  
to skim through, and grasp, one test (example) can be useful, but lots  
of the hide the code structure.
Maybe it is just my personal preference, but I like compact code, code  
that can be read, and so many unittests stop the flow of my reading.


- I would split this into several modules  
(Timezone,SysTime,TimeDate,Clock), and if you want a "helper" module  
that make a public export.
Modules should be used to define modules/namespaces, using classes  
seems a misuse to me (I am looking a Clock for example, which is a  
separated functionality imho).


- I find that there is a loss of orthogonality between SysTime and  
DateTime. For me there are a calendar dates, and absolute points in  
time. To interconvert between the two one needs a timezone. I would  
associate the timezone with the calendar date and *not* with the  
absolute time.
I find that SysTime makes too much effort to be a calendar date  
instead of a "point in time".
Also if one wants to use a point in time at low level it should be  
"lean and mean", what is the timezone doing there?


Last thing, well is something I would have done differently (as I said  
already in the past), is using doubles expressing number of seconds to  
represent point in time, durations, and TimeOfDay. I know other  
differs about this, but I really think that it is a very simple and  
versatile type.


Fawzi


uniqueness

2010-12-10 Thread Fawzi Mohamed
It is nice that Michel Fortin made the effort to propose a patch  
trying to address the ability to rebind const objects.


Looking at the "uglification" of my code to support const, I saw that  
many cases I actually had a unique type, or partially unique type.


There are several examples of similar attempts, like linear types, or  
various uniqueness types systems (more or less related to the Clean  
example).
It is known that some uniqueness settings are equivalent to shared, so  
maybe using uniqueness with const might be meaningful.


This is a difficult topic, as pushing those concepts into the type  
system is always tricky, and the consequences of various choices are  
often non obvious, anyway here is what I thought.


If one could declare return or out types as unique (note that unique  
is *not* part of the type, it is like the storage attributes), these  
methods could be implicitly castable to const or immutable, allowing  
nicer code.


Constructors *might* return unique objects (an object is unique only  
if all its references are to unique or immutable objects).


In several cases uniqueness could be checked by the compiler. I think  
that such a change would improve part of my code, removing the need  
for several spurious casts, while at the same time making the code  
safer.


I did also think about having a front_unique attribute that can be  
applied to any local variable or argument that would make it tail  
const in the sense discussed previously, and still implicitly castable  
to full const.
In that case the situation is more complex (one should ensure that  
local references cannot spill out, otherwise a full const is needed,  
and for immutable, making it immutable is "irreversible".
The front_unique property can almost always be checked by the  
compiler, but activating it implicitly would have effects that would  
probably deemed surprising by the programmer (front_unique immutable  
objects would be rebindable).
Thus I am not sold on front_unique, but I still find it interesting,  
due to its relationship with tail const.


Fawzi



Re: tail const

2010-12-05 Thread Fawzi Mohamed


On 5-dic-10, at 00:39, Fawzi Mohamed wrote:


[...]
Thus both have their function, but in general I think that tail  
const might be even more important (if I had to choose one const/ 
immutable type I would choose the tail one).


This was thought a bit as provocation, but as nobody reacted, and with  
trolls roving around I want to clarify.
the normal const obviously allows sharing, so it isn't a bad choice,  
but it introduces more constraints that needed to simply share memory.  
These constraints make the life more difficult, so I wondered if  
choosing tail const as only const would work.
It has issues, but not as bad as one would think, applying tail const  
to ref T is basically const on T. There are still issues but I thought  
lets throw that in and see what other think.


By the way I think that one of the problems in having const and tail  
const is that it is expressing the constness implied by one operator  
using the other operator, in this sense tail const might be (slightly)  
better


Fawzi


Re: tail const

2010-12-04 Thread Fawzi Mohamed


On 4-dic-10, at 02:26, Steven Schveighoffer wrote:

On Fri, 03 Dec 2010 19:06:36 -0500, Andrei Alexandrescu > wrote:



On 12/3/10 5:17 PM, Steven Schveighoffer wrote:

On Thu, 02 Dec 2010 17:02:42 -0500, Michel Fortin
 wrote:


On 2010-12-02 16:14:58 -0500, "Steven Schveighoffer"
 said:


On Thu, 02 Dec 2010 07:09:27 -0500, Michel Fortin
 wrote:

My only concern with the "const(Object)ref" syntax is that we're
reusing 'ref' to denote an object reference with different
properties (rebindable, nullable) than what 'ref' currently  
stands

for. But it remains the best syntax I've seen so far.

Where it would be beneficial is in mimicking the tail-const
properties of arrays in generic ranges.
I have a container C, which defines a range over its elements R.
const(R) is not a usable range, because popFront cannot be  
const. So

now I need to define constR, which is identical to R, except the
front() function returns a const element.
So now, I need the same for immutable.
And now I need to triplicate all my functions which accept the
ranges, or return them.
And I can't use inout(R) as a return value for ranges.
If you can solve the general problem, and not just the class
tail-const, it would be hugely beneficial.
My thought was that a modifier on const itself could be stored  
in the
TypeInfo_Const as a boolean (tail or not), and the equivalent  
done in

dmd source itself.


I'm not sure I get the problem. Can you show me in code?


Here is an example range from dcollections (well, at least the  
pertinant

part), for a linked list:

[snip analysis]

@tail inout(Range) opSlice() inout
{
 ...
}


I was about to post a similar analysis, but my suggested conclusion  
is very different: in my humble opinion, we must make-do without  
tail const. We can't afford to inflict such complexity on our users.


BTW, even though I conceed that my ideas are too complex to be worth  
using, I don't agree we must "make-do" without tail-const.  We just  
need to find a different way to solve the problem.  Let's talk about  
how we could add some sort of custom implicit casting to the type- 
system.  And actually, we need implicit lvalue casting (because all  
member functions have ref this).


I fully agree with this.
I will try to recap what I think is the most clean solution from the  
conceptual point of view, then maybe others have an idea on how to  
find a good solution that is not too difficult to implement, and  
doesn't break what was said in TDPL too much.


The current const implies that the references to that type have to be  
constant.


tail const, and has the recursive definition I had given:
valueConst(T) is
T for basic types functions and templates
refConst(U)* if is(T U==U*)
refConst(U)[] if is(T U==U[])
	V if is(T == struct), where V is a structure just like T, but where  
each of its fields is tail const


tail const marks all that is copied when one assigns T v2=v1; as  
mutable.
Indeed to protect v1 it is not needed to protect the values that get  
copied in the assignment, those values can be changed without changing  
v1.
For this reason tail const is the most weak const that one can have in  
pure functions, in ideally should mean tail const (thus in some way  
tail const comes from the protection of a starting const.


any lvalue by default should be tail const, if it was const, and tail  
immutable if it was immutable, but is implicitly convertible to full  
const/immutable.


in a way tail const is more fundamental as it is the least protection  
that one has to give to protect some data owned by others.


If I have a global variable the current const/immutable can guarantee  
that its value will not change, while tail immutable guarantees that  
one can safely point to that data as it won't be changed: that data  
can be shared safely (not necessarily by several threads, even simply  
by several objects).


Thus both have their function, but in general I think that tail const  
might be even more important (if I had to choose one const/immutable  
type I would choose the tail one).


Re: Logical const

2010-12-03 Thread Fawzi Mohamed


On 3-dic-10, at 17:23, Bruno Medeiros wrote:


On 03/12/2010 13:22, Steven Schveighoffer wrote:

On Fri, 03 Dec 2010 08:00:43 -0500, Bruno Medeiros
 wrote:


The above are not trivial differences, so I do not agree that it
constitutes full logical const, only a limited form of it. More
concretely, it doesn't constitute logical const in in the sense  
where

you can use that as argument to say "logical const already exists,
it's just clunky to use", so let's add it to the language formally.
Like if mutable members where just syntax sugar, or a betterment of
safety rules.


I disagree, I think it does prove logical const already exists. How  
do

you define logical const?



I define logical const as the ability to specify that operations on  
a given object reference will not modify the logical state of that  
object (through that reference), and the ability for the compiler to  
verify that statically.


No for me the compiler *cannot* verify logical const. Logical const  
can be verified only in some occasions:
for example a place where to store the result of a suspended  
evaluation (what functional languages call a thunk).
A dataflow variable for example in general cannot be verified by the  
compiler, but should also be logically const.
Normally the user should use only suspended pure operations or  
dataflow variables, so it is safe.
Still in D I want to be able to *implement* them, and that  cannot be  
verified by the compiler.
As for why I want to be able to implement it in D, the reasons (beyond  
the fact that it is a system language) is that in some occasions one  
can implement it much more efficiently that any generic implementation  
(for example if one knows that the value cannot be x, x can be  
directly used to mark the need to update it, and if one knows the  
functions and arguments to call an explicit thunk (i.e. closure with  
heap allocation) can also be spared.


So for me it is ok that the hole to implement them is ugly (haskell  
for example has unsafePerformIO), but I want an officially sanctioned  
hole.
I am actually against using "mutable", it that solution should be  
accepeted then the name should look much worse, like unsafeValue or  
something like that.

Casual use should be discouraged.

Logical state is defined (_not very precisely though_) as the data  
subset of an object which is relevant for opEquals calculations. So  
in that Matrix example the elements of the Matrix arrays are part of  
the logical state, the cached determinant is not.


No for me logical const means that all methods applied to the object  
will always return the same value, it is not connected with the data  
stored, that is exactly the reason one wants mutable values.


Mutable members is one way to implement support for logical const in  
a language. (There could be other ways.)


yes

In any case I find tail const (or weak const, as I prefer to call it,  
which is simply const on referred data, but not on the one that is  
always locally stored on the stack) more important, and *that* can be  
enforced by the compiler.

I think that "in" should mean the weak const, not const.

Fawzi


Re: tail const

2010-12-02 Thread Fawzi Mohamed


On 2-dic-10, at 13:09, Michel Fortin wrote:


On 2010-12-02 05:57:18 -0500, Fawzi Mohamed  said:


well as your are at it I would argue a bit more on the syntax.
[...]
I suppose that will probably considered too difficult to  
implement,  but I wanted to propose it again because I find that it  
is the most  clean solution conceptually.


It is significantly more complex, not only for the compiler but also  
for the one reading/writing the code, as you'd have to propagate  
that 'weak_const' as a new, distinct modifier for it to be usable  
across function calls. I don't think it's worth it really.


ok, eheh I just realized that also the tail shared protection has  
exactly the same constraints as the weak const (or tail const), and  
also for that it seems that the more complex struct case was scrapped,  
restricting it to pointer array and refs.


As for the syntax for classes, I feel "const(Object)ref" with the  
optional ref marker is easier to grasp than introducing a new  
concept called 'weak_const'. I welcome any suggestions, but my aim  
is to keep the changes as small and localized as possible in the  
compiler and follow as closely as possible existing language patterns.


My only concern with the "const(Object)ref" syntax is that we're  
reusing 'ref' to denote an object reference with different  
properties (rebindable, nullable) than what 'ref' currently stands  
for. But it remains the best syntax I've seen so far.


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/





Re: Logical const

2010-12-02 Thread Fawzi Mohamed


On 1-dic-10, at 22:18, Steven Schveighoffer wrote:


On Wed, 01 Dec 2010 11:49:36 -0500, so  wrote:


On Wed, 01 Dec 2010 18:38:23 +0200, so  wrote:

Since i called it a bad design, i am entitled to introduce a  
better design.


interface renderer {
void draw(rect rects, size_t n);
}

class widget {
void draw(renderer r) { ... }
}


Pfft sorry for that abomination!

interface renderer {
void draw(rect[] rects);
}

class widget {
rect r;
window owner;
void draw(renderer) const { ... }
}


This requires you to store the widget-renderer relationship outside  
the widget.  Since each widget has exactly one location where it  
lives, this is awkward.  Much better to just store the relationship  
on each widget.


indeed that is one of the main things that I want from logical const:  
being able to store/memoize values in a structure, not outside it.
It is ok to have to jump some hoops to get it, but it should be  
possible (casting that is guaranteed to work in some clearly defined  
circumstances would do it for example).


Fawzi



Re: Logical const

2010-12-02 Thread Fawzi Mohamed


On 1-dic-10, at 04:52, Jesse Phillips wrote:


Fawzi Mohamed Wrote:


The thing is that a lazy structure is very useful in functional
programming.
A lazy evaluation is something that should be possible using pure and
immutable.
I find it jarring that to do that one has to avoid D pure and  
immutable.


Don't know what you mean by this.


a lazy list (for example one that list all natural numbers) cannot be  
immutable, without the possibility of a backdoor because all the  
"next" elements will have to be set at creation time.
(Lazy structures can be seen as memoizing a value produced by a pure  
function forever (i.e. never forgetting it).



To be able to safely use pure and immutable as I said one would need
some idioms that are guaranteed to be non optimized by the compiler.
for example casting a heap allocated type should be guaranteed to
remain modifiable behind the back:
auto t=new T;
auto t2=cast(immutable(typeof(t)))t;

auto tModif=cast(typeof(t))t2; // the compiler has not moved or
flagged the memory of t, so one can modify tModif.


This code is valid, the requirements placed on cast will not allow  
it to move the data. Even types declared to be immutable my be  
modifiable when cast to Unqual!(T), but the compiler can not  
guarantee these.


If I am wrong, please let me know why.


The code works now, I would like some assurance that  
cast(immutable(T)) doesn't do fancy stuff (or some equivalent way to  
ensure that *some* idiom will remain allowed.
If you think about it already now with opCast you cannot really know  
what opCast does, so a compiler would be allowed to return an  
immutable copy, or (if it uses whole pages) make the whole memory as  
read only.



clearly this is unsafe and it is up to the implementer to make sure
that the object is really logically const
and no function will see the internal changes.


Yes, and I don't think compiler support adds any more guarantee than  
casting those you want to modify in a const function. This Mutable  
struct is supposed to help verify only modifiable data is cast:


https://gist.github.com/721066


you example has an error in parallel, this is a good example of why  
the casting away should not be made convenient, and a user should just  
use well tested library code (for example dong thunk evaluation), that  
might be in a special type or mixed in.


You cannot have separate flag, and value without any synchronization,  
as other threads could see their value in a different order, so they  
could see dirty=false, but still no determinant.


This is somehow related to dataflow variables that can be set several  
times, but only to the same value (and indeed with a lazy list one can  
allow two threads to calculate the next element, but then only one  
should set it (using atomic ops to avoid collistions).
I have implemented DataFlow variables (but using the blip  
paralelization, that delays a task that waits, and resumes it when the  
value is ready, not with a thread lock) in


https://github.com/fawzi/blip/blob/master/blip/parallel/smp/DataFlowVar.d
using D1


I've taken many example use-cases for logical const and added them  
as unittests. I think it is fairly reasonable if I could just get an  
answer to my question about concurrency and declaring immutable types.



This is something that should be done sparingly, probably just in
library code implementing lazy evaluation or memoization (but code
that might be mixed in).


Could you give an example of how lazy evaluation is achieved by  
modifying state?


lazy structures need to modify the state (as I showed with the linked  
list example), lazy evaluation alone does not need to modify the state  
(and is indeed possible in D), but the storing/memoizing of the result  
needs it. To say the truth in D memoizing can be done with a static  
variable, but think about making the singly linked list like that, and  
you should immediately see that if gets *very* inefficient.


Re: tail const

2010-12-02 Thread Fawzi Mohamed


On 1-dic-10, at 12:10, spir wrote:


On Tue, 30 Nov 2010 23:57:04 +0100
Fawzi Mohamed  wrote:


Speaking about D mistakes Steve spoke about missing tail const.
I was thinking about this, and I fully agree that it is a hole.
I don't know if it was already discussed, but I was thinking that one
could introduce
*const T t1;
and
*immutable T t2;
with the following meaning:
const or immutable is applied on the "dereferenced" type. For classes
this would mean the object, not the pointer.
Thus for example if T is a class type one would be able to reassign  
t1

t1=t2;
but not to modify the content of t1 or t2 in any way.
One can also extend it to array types: if T is U[], then it would  
mean

const(U)[] or immutable(U)[], and to pointer types,
*const int* would then mean const(int)*.
For other types maybe the best solution would be to
drop the const/immutable for basic types, functions and delegates
	apply it to all fields of a struct (not sure how much work this  
would

be to implement)

This use of * should not introduce much ambiguity (a pointer is T*,
indeed also const*T would be almost as unambiguos).


It seems, as Andrei pointed (!) a few days ago, that the issue is  
purely syntactic. Namely, since class instance de/referencing is  
completely implicit _including_ in class definition and variable  
declaration, there is no way to denote constness on the pointer's  
target. As you show, we do not need any new lexical or syntactic  
artifact for arrays or explictely pointed thingies.
This is in contrast with languages in which class instance  
dereferencing is implicit, to access slots:

o.someAspect = 1
write(o.someComputedData());
o.doSomething();
but type definition is explicitely pointed.

As an example, defining a list/node type in Oberon would look like:
TYPE
Node = POINTER TO NodeData;
NodeData = RECORD
next: Node;
value   : int;
END ;
(Oberon records are a mix of D structs and classes: value types like  
structs, but with inheritance and runtime-type method dispatch.)
Then, if Oberon had 'const', it could use the same syntax as for  
pointed arrays, for instance, to apply const on either a "class  
instance" (pointer to object data) or to the data properly speaking.
I'm not advocating for having D define classes that way ;-) Not even  
that D allows it as an alternative, just to be able to apply const  
on elements of the target type (*). Rather, I mean that a seemingly  
good idea (implicit referencing of class defs) may lead to  
unexpected consequences in practical coding.


yes indeed, but I would say (as I have just argued in another post)  
that having a weak const would be useful also for structures, at least  
for the user (while maybe possible it would be difficult for the user  
to build a weak const for a structure, as it implies defining another  
equivalent structure with different access qualifiers).





Denis

(*) For instance:
alias class(NodeData) Node;

-- -- -- -- -- -- --
vit esse estrany ☣

spir.wikidot.com





Re: tail const

2010-12-02 Thread Fawzi Mohamed


On 1-dic-10, at 20:07, Michel Fortin wrote:

On 2010-12-01 09:37:08 -0500, Michel Fortin  
 said:


On 2010-12-01 06:17:24 -0500, Jonathan M Davis  
 said:
I proposed the following a while ago. First allow the class  
reference

to (optionally) be made explicit:
C a; // mutable reference to mutable class
C ref b; // mutable reference to mutable class
And now you can apply tail-const to it:
const(C)ref c;  // mutable reference to const class
const(C ref) d; // const reference to const class
const(C) e; // const reference to const class
The real issue is not syntax but getting it into the compiler.  
Apparently, there
are difficulties in implementing tail const in the compiler which  
made Walter give
up on it in the past. It should be doable, but Walter is totally  
sick of the
issue and doesn't want to put the time in to do it - he has plenty  
on his plate
as it is. So, if it's going to be done, someone else has to step  
up to the plate
and do it. And with the general lack of dmd developers, that  
hasn't happened. No

one thus far has had both the inclination and the time.
Well... I just took a quick look at the problem from inside the  
compiler. The issue is this: the compiler has a type hierarchy, and  
TypeClass is one type in it. There is no separate type for a class  
reference, it just uses TypeClass do designate a class reference,  
which means that if your TypeClass has the const or immutable  
modifier, so does your reference. So either we create a  
TypeClassRef to designate the reference, or we add additional flags  
to TypeClass for the reference's modifier; in either case many  
parts of the semantic analysis has to be revised to take this into  
account.


Turns out it's there's a trick that makes it much simpler than I  
expected. Patch coming soon. ;-)


great!

well as your are at it I would argue a bit more on the syntax.
In my opinion it is useful more useful to have a weak_const, (or @tail  
const , or *const, I don't care so much about the syntax, but I care  
about the concept), like I sketched in my post, and not just fix the  
class issue.
Indeed as I did try to argue it is useful to have an easy way to say  
"all my local stack memory might be modified, but not anything that it  
refers to" (thus weak const).
This is the maximum modifiability that one can allow to arguments to  
pure functions, so a very useful level of protection.

weak_const can be defined recursively:
weak_const T
is
- const(T) if T is a reference, a D has not rebinding of refs  
(otherwise it should protect only the object, not the rebinding of the  
ref).

- T if T is a basic type, function or delegate
- const(U)* if is(T U==U*)
- const(U)[] if is(T U==U[]) // this is a special case of the next
- WeakConst!(T) if (T==struct) where WeakConst!(T) is a structure like  
T, but where all its fields are weak_const (i.e. apply recursively  
weak_const to the content of the structure.


Indeed the recursion on the structure is the most complex thing, and  
might be an implementation challenge, but if doable it would be very  
nice.
Basically one has to set a flag for all things that are local, and  
would not be affected by the weak const.


I suppose that will probably considered too difficult to implement,  
but I wanted to propose it again because I find that it is the most  
clean solution conceptually.


Fawzi


Re: D's greatest mistakes

2010-11-30 Thread Fawzi Mohamed

On 30-nov-10, at 22:24, Steven Schveighoffer wrote:

On Tue, 30 Nov 2010 15:47:22 -0500, Andrei Alexandrescu > wrote:



On 11/30/10 12:38 PM, Steven Schveighoffer wrote:



No, that's not what I'm saying. Creating a language-based tail-const
solution *unifies* all references, including any smart references  
you
can create. I can say tail-const anything and it always means the  
same
thing. It's another tool to allow creating of smart references.  
Without
this, we have to special case tail-const in all smart reference  
types.

If anything Rebindable is a special case smart reference, it only
addresses class tail-const. The language solution addresses general
tail-const. E.g. how does Rebindable address tail-const ranges?

I see it 100% opposite from what you are saying, a library solution
looks to me like "look! we don't have to change the language to add
language features, all you need is this template that adds 10k of  
bloat
to your exe! And why do you need to pass them as parameters, just  
create

a new local variable? And why do you need it to work with *all*
reference types, there's other syntax for that!" All for the sake  
of not

changing the language, which I think is the more direct and complete
solution. I don't really understand the resistance.


Understanding the "resistance" is very simple. Right now a lot of  
current readers cheerily ignore this thread. Also a lot of  
potential users couldn't care any less. Once the feature will be in  
the language, it will affect them all.


It affects them all as much as Rebindable affects them all.  Either  
way, it's something new they have to learn, whether it's a language  
feature or a library feature.  This argument makes no sense.


Another way to look at it -- whichever solution is used is really  
going to affect only library writers, not library users.  Whether  
the user writes:


@tail const(C);

or

Rebindable!(const(C))

or something else is really irrelevant to them, what's relevant is  
what hoops do I as a library writer have to jump through to get this  
damned thing to work seamlessly.


I mostly agree with Steve (I just made it though the thread), I also  
find tail const very useful.
In fact I independently wrote a related post about tail const (but  
with a different syntax).
The fact that tail const can be reasonably extended to any type, and  
still guarantees purity shows its usefulness.
mostly it can be seen as a shortcut to declare a special kind of  
constness, only for classes one can express something with tail const  
that cannot be  directly expressed with const()








Re: Logical const

2010-11-30 Thread Fawzi Mohamed


On 30-nov-10, at 16:39, Andrei Alexandrescu wrote:


On 11/30/10 5:25 AM, Max Samukha wrote:

On 11/30/2010 02:35 AM, Walter Bright wrote:

Fawzi Mohamed wrote:

logical const is useful for lazy functions and memoization, and if
implemented correctly it is perfectly safe.
As I said in an older discussions, to have it with the current  
system

all that is needed is some guarantees that the compiler will not
disallow "unsafe" changes (by moving to read only memory for
example)in some cases.
For example casted mutable types, so that casting to mutable works.


D allows escape from the type system, but the programmer who does  
that
loses the guarantees, and it's up to him to ensure that the result  
works.


String literals, for example, are going to often wind up in read  
only

memory.


The problem is that logical const has many perfectly valid use cases.
You cannot simply tell people: "Don't use it. It is a fraud". They  
will
still be using casts or not using D. As casting away const is  
undefined

behavior in D, the outcome will be every second non-trivial D program
relying on undefined behavior.


I'm not seeing half of non-trivial C++ programs using mutable.


The thing is that a lazy structure is very useful in functional  
programming.
A lazy evaluation is something that should be possible using pure and  
immutable.

I find it jarring that to do that one has to avoid D pure and immutable.

To be able to safely use pure and immutable as I said one would need  
some idioms that are guaranteed to be non optimized by the compiler.
for example casting a heap allocated type should be guaranteed to  
remain modifiable behind the back:

auto t=new T;
auto t2=cast(immutable(typeof(t)))t;

auto tModif=cast(typeof(t))t2; // the compiler has not moved or  
flagged the memory of t, so one can modify tModif.


clearly this is unsafe and it is up to the implementer to make sure  
that the object is really logically const

and no function will see the internal changes.

This is something that should be done sparingly, probably just in  
library code implementing lazy evaluation or memoization (but code  
that might be mixed in).


tail const

2010-11-30 Thread Fawzi Mohamed

Speaking about D mistakes Steve spoke about missing tail const.
I was thinking about this, and I fully agree that it is a hole.
I don't know if it was already discussed, but I was thinking that one  
could introduce

*const T t1;
and
*immutable T t2;
with the following meaning:
const or immutable is applied on the "dereferenced" type. For classes  
this would mean the object, not the pointer.

Thus for example if T is a class type one would be able to reassign t1
t1=t2;
but not to modify the content of t1 or t2 in any way.
One can also extend it to array types: if T is U[], then it would mean  
const(U)[] or immutable(U)[], and to pointer types,

*const int* would then mean const(int)*.
For other types maybe the best solution would be to
drop the const/immutable for basic types, functions and delegates
	apply it to all fields of a struct (not sure how much work this would  
be to implement)


This use of * should not introduce much ambiguity (a pointer is T*,  
indeed also const*T would be almost as unambiguos).


One can see that this tail const is really a common type, indeed  
string is such a type, and a function can be pure even if its  
arguments is *immutable, because any changes would be done to a local  
copy in the function.
I think that these things point toward the usefulness of a *const and  
*immutable attributes.


Fawzi


Re: Logical const

2010-11-29 Thread Fawzi Mohamed


On 30-nov-10, at 00:04, Walter Bright wrote:


Steven Schveighoffer wrote:
On Mon, 29 Nov 2010 15:58:10 -0500, Walter Bright > wrote:

Steven Schveighoffer wrote:
Having a logical const feature in D would not be a convention, it  
would be enforced, as much as const is enforced.  I don't  
understand why issues with C++ const or C++'s mutable feature  
makes any correlations on how a D logical const system would  
fare.  C++ const is not D const, not even close.



Because people coming from C++ ask "why not do it like C++'s?"
I don't get it.  A way to make a field mutable in a transitively- 
const system is syntactically similar to C++, but it's not the  
same.  Having a logical-const feature in D does not devolve D's  
const into C++'s const.  If anything it's just a political problem.


Having mutable members destroys any guarantees that const provides.  
That's not political.


And, I repeat, having a mutable type qualifier DOES NOT make logical  
const a language feature. This is why discussion and understanding  
of C++'s const system is so important - people impute  
characteristics into it that it simply does not have.


logical const is useful for lazy functions and memoization, and if  
implemented correctly it is perfectly safe.
As I said in an older discussions, to have it with the current system  
all that is needed is some guarantees that the compiler will not  
disallow "unsafe" changes (by moving to read only memory for  
example)in some cases.

For example casted mutable types, so that casting to mutable works.

Fawzi


a different kind of synchronized

2010-11-25 Thread Fawzi Mohamed

I have been thinking about this since some time.

When writing collections or similar I sometime want the object to be  
accessible from multiple threads, and sometime form only one.

Obviously I want the version that uses a single thread to be efficient.

To easily write both these version it would be really useful to be  
able to easily activate/deactivate with a flag some synchronization.


To do this the synchronized statement is bad because it synchronizes  
the following code, and thus cannot be switched off without switching  
off the code inside it.


A better solution would be

synchronized(bla...);

which would mean synchronized starting here, i.e.

monitor(bla).lock();
scope(exit){ monitor(bla).unlock(); }

(only that getting the monitor is a bit more complicated).
As in D ";" is not a valid statement one would not have issues with  
the usual synchronized statement.

The advantage is that with this you can easily do something like

static if (shouldLock) synchronized(this);

and thus easily write lock protected versions of an object.

Fawzi


Re: Faster uniform() in [0.0 - 1.0(

2010-11-23 Thread Fawzi Mohamed


On 23-nov-10, at 13:12, tn wrote:


Fawzi Mohamed Wrote:



On 23-nov-10, at 10:20, tn wrote:


bearophile Wrote:


Don:


Since the probability of actually generating a
zero is 1e-4000, it shouldn't affect the speed at all .


If bits in double have the same probability then I think there is a
much higher probability to hit a zero, about 1 in 2^^63, and I'm
not counting NaNs (but it's low enough to not change the substance
of what you have said).


For uniform distribution different bit combinations should have
different probabilities because floating point numbers have more
representable values close to zero. So for doubles the probability
should be about 1e-300 and for reals about 1e-4900.

But because uniform by default seems to use a 32 bit integer random
number generator, the probability is actually 2^^-32. And that is
actually verified: I generated 10 * 2^^32 samples of
uniform!"[]"(0.0, 1.0) and got 16 zeros which is close enough to
expected 10.

Of course 2^^-32 is still small enough to have no performance
penalty in practise.

-- tn


that is the reason I used a better generation algorithm in blip (and
tango) that guarantees the correct distribution, at the cost of being
slightly more costly, but then the basic generator is cheaper, and if
one needs maximum speed one can even use a cheaper source (from the
CMWC family) that still seems to pass all statistical tests.


Similar method would probably be nice also in phobos if the speed is  
almost the same.


Yes, I was thinking of porting my code to D2, but if someone else  
wants to do it...
please note that for double the speed will *not* be the same, because  
it always tries to guarantee that all bits of the mantissa are random,  
and with 52 or 63 bits this cannot be done with a single 32 bit random  
number.



The way I use to generate uniform numbers was shown to be better (and
detectably so) in the case of floats, when looking at the tails of
normal and other distributions generated from uniform numbers.
This is very relevant in some cases (for example is you are  
interested

in the probability of catastrophic events).

Fawzi


Just using 64 bit integers as source would be enough for almost(?)  
all cases. At the current speed it would take thousands of years for  
one modern computer to generate so much random numbers that better  
resolution was justifiable. (And if one wants to measure probability  
of rare enough events, one should use more advanced methods like  
importance sampling.)


I thought about directly having 64 bit as source, but the generators I  
know were written to generate 32 bit at a time.
Probably one could modify CMWC to work natively with 64 bit, but it  
should be done carefully.
So I simply decided to stick to 32 bit and generate two of them when  
needed.
Note that my default sources are faster than Twister (the one that is  
used in phobos), I especially like CMWC (but the default combines it  
with Kiss for extra safety).



-- tn




Re: Faster uniform() in [0.0 - 1.0(

2010-11-23 Thread Fawzi Mohamed


On 23-nov-10, at 10:20, tn wrote:


bearophile Wrote:


Don:


Since the probability of actually generating a
zero is 1e-4000, it shouldn't affect the speed at all .


If bits in double have the same probability then I think there is a  
much higher probability to hit a zero, about 1 in 2^^63, and I'm  
not counting NaNs (but it's low enough to not change the substance  
of what you have said).


For uniform distribution different bit combinations should have  
different probabilities because floating point numbers have more  
representable values close to zero. So for doubles the probability  
should be about 1e-300 and for reals about 1e-4900.


But because uniform by default seems to use a 32 bit integer random  
number generator, the probability is actually 2^^-32. And that is  
actually verified: I generated 10 * 2^^32 samples of  
uniform!"[]"(0.0, 1.0) and got 16 zeros which is close enough to  
expected 10.


Of course 2^^-32 is still small enough to have no performance  
penalty in practise.


-- tn


that is the reason I used a better generation algorithm in blip (and  
tango) that guarantees the correct distribution, at the cost of being  
slightly more costly, but then the basic generator is cheaper, and if  
one needs maximum speed one can even use a cheaper source (from the  
CMWC family) that still seems to pass all statistical tests.
The way I use to generate uniform numbers was shown to be better (and  
detectably so) in the case of floats, when looking at the tails of  
normal and other distributions generated from uniform numbers.
This is very relevant in some cases (for example is you are interested  
in the probability of catastrophic events).


Fawzi


Re: Faster uniform() in [0.0 - 1.0(

2010-11-22 Thread Fawzi Mohamed


On 22-nov-10, at 16:11, tn wrote:


bearophile Wrote:

Some kind of little D programs I write need a lot of random values,  
and tests have shown me that std.random.uniform is slow.


So I have suggested to add a faster special case to generate a  
random double in [0.0, 1.0), see:

http://d.puremagic.com/issues/show_bug.cgi?id=5240

Bye,
bearophile



I did some testing with different combinations of types and boundary  
types. The problem noticed is a bit different to the one bearophile  
mentioned. Here is my test code:



import std.conv;
import std.date;
import std.random;
import std.stdio;

void test(T, string boundaries)() {
void fun() {
uniform!(boundaries, T, T)(cast(T)0, cast(T)1000);
}
	writefln("%-8s %s  %6d", to!string(typeid(T)), boundaries,  
benchmark!fun(10_000_000)[0]);

}

void testBoundaries(T)() {
test!(T, "[]")();
test!(T, "[)")();
test!(T, "(]")();
test!(T, "()")();
writeln();
}

void main() {
testBoundaries!(int)();
testBoundaries!(long)();
testBoundaries!(float)();
testBoundaries!(double)();
testBoundaries!(real)();
}


And here are the results for 10 million calls of uniform (columns  
are: type, boundaries, elapsed time):



int  [] 271
int  [) 271
int  (] 283
int  () 285

long [] 372
long [) 399
long (] 401
long () 397

float[] 286
float[) 374
float(]5252
float()5691

double   [] 348
double   [) 573
double   (]5319
double   ()5875

real [] 434
real [) 702
real (]2832
real ()3056


In my opinion floating point uniforms with (] or () as boundary  
types are unacceptably slow. I had to use 1 - uniform!"[)"(0.0, 1.0)  
instead of uniform!"(]"(0.0, 1.0) because of this issue. I would  
also expect versions using float and double to be faster than the  
version using real.


-- tn
I suspect that the default random generator I have implemented (in  
blip & tango) is faster than phobos one, I did not try to support all  
possibilities, with floats just () and high probability (), but  
possible boundary values due to rounding when using an non 0-1 range,  
but I took lot of care to initialize *all* bits uniformly.
The problem you describe looks like a bug though, if done correctly  
one should just add an if or two to the [] implementation to get ()  
with very high probability.


Fawzi


Re: Principled method of lookup-or-insert in associative arrays?

2010-11-21 Thread Fawzi Mohamed

I was thinking about this, and revised my idea.
While I think that in general hiding idup is a bad practice, I think  
that for AA one could make the case that it might be acceptable.
idup is needed only when assigning a value using a key that isn't  
immutable.
Now I think that this use in AA is relatively seldom, one normally  
wants to read more than write, so some magic might be warranted.
If one has a use case where the the idup really becomes an issue he  
should expend the extra thinking to avoid it.
The advantage of this is that idup is not needed for basic operations,  
which I think is good, as one doesn't have to confront himself with  
the const complexity so early when using the language. const should be  
kid of avoidable in several tasks if one doesn't care about it.
What I am still uneasy about the original example is its reliance on  
the automatic initialization to 0 for undefined keys, an AA normally  
throws when the key is not present, it doesn't return the default  
value, why should updating be different?.


Fawzi



Re: Principled method of lookup-or-insert in associative arrays?

2010-11-20 Thread Fawzi Mohamed


On 20-nov-10, at 09:07, Andrei Alexandrescu wrote:


TDPL has an example that can be reduced as follows:

void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
   ++map[line];
 }
}



byLine reuses its buffer so it exposes it as char[]. Therefore,  
attempting to use map[line] will fail. The program compiled and did  
the wrong thing because of a bug.


The challenge is devising a means to make the program compile and  
work as expected. Looking up an uint[string] with a char[] should  
work, and if the char[] is to be inserted, a copy should be made  
(via .idup or to!string). The rule needs to be well defined and  
reasonably general.


I think that you basically stated the important rules: lookup, update  
should use const(char[]), set (i.e. potentially add a new key)  
immutable(char[]).




The effect is something like this:

void main() {
 uint[string] map;
 foreach (line; stdin.byLine()) {
   auto p = line in map;
   if (p) ++*p;
   else map[line.idup] = 1;
 }
}

Ideally the programmer would write the simple code (first snippet)  
and achieve the semantics of the more complex code (second snippet).  
Any ideas?


I would consider the first program as written invalid (at runtime)  
because the initial value is not set, so you cannot update it with ++.

Also potential hidden idup should not be added IMHO.
I don't find the second program so bad...

Fawzi



Re: Review: A new stab at a potential std.unittests

2010-11-20 Thread Fawzi Mohamed


On 19-nov-10, at 23:44, Sean Kelly wrote:


Leandro Lucarella Wrote:


Sean Kelly, el 19 de noviembre a las 14:59 me escribiste:


This should work:

void func(string x = __FILE__, T...)(T args);

D allows defaulted template arguments to occur before non- 
defaulted ones.


I wasn't aware that __FILE__ and __LINE__ did expand at the  
instantiation place, it is nice,  but that seems to be the case only  
for D2 :(


Re: Shared pain

2010-11-19 Thread Fawzi Mohamed


On 19-nov-10, at 17:42, Steve Teale wrote:


On Fri, 19 Nov 2010 11:23:44 -0500, bearophile wrote:

Regarding the creation of immutable data structures, there is a  
proposal
that is probably able to remove some of the pain: the result of  
strongly

pure functions may become implicitly castable to immutable.

Bye,
bearophile


BP,

I admire your dedication to language theory and purity, but there are
many who'd translate that to impracticality and obscurity.

I came to D in the first place because I found it refreshingly clear  
and

easy to use after C++ and Java. But now it's getting painful.

I've bailed out several times, just keep coming back to see how it is
doing. I have a bunch of code that worked with D1 and D2 at one time.
I've given up on D1, since it is now obviously legacy, but even  
without
the complexity of supporting both, It's been real hard this visit to  
get

things working again.

What's your estimate of how long it will be before D is a stable  
language?


well D1 is pretty stable I think, if you are interested in stability  
that is a good choice, has worked well for me.
This does not mean that I will not consider D2, but D1 is my main  
workhorse.


Fwzi



Thanks
Steve






Re: D1 -> D2

2010-11-19 Thread Fawzi Mohamed


On 19-nov-10, at 08:44, Walter Bright wrote:


Fawzi Mohamed wrote:
I don't find a valid D1 expression to put in place of scope, or to  
somehow hide it, i.e. how do you write something like

module t;
void f(scope void delegate() action){
   action();
}
void main(){
   f(scope delegate(){
   printf("bla\n");
   });
}
so that it is valid D1 and D2?


Just remove the 'scope'.

removing scope causes heap allocation in D2 that I want to avoid.
Still maybe you are right, I will use /+scope+/, so that one has  
something working, and easily go to the efficient D2 version


Re: DIP9 -- Redo toString API

2010-11-19 Thread Fawzi Mohamed


On 19-nov-10, at 11:13, Lars T. Kyllingstad wrote:


On Fri, 19 Nov 2010 10:22:29 +0100, Jacob Carlborg wrote:


On 2010-11-18 23:21, Steven Schveighoffer wrote:


I just created a new D Improvement Proposal to fix the toString  
problem

I brought up several posts ago.

See: http://prowiki.org/wiki4d/wiki.cgi?LanguageDevel/DIPs/DIP9

-Steve


Why do we have to remove toString, can't toString call writeTo and
behave as it does now?


Nobody's forcing anyone to remove toString(), the point is that Phobos
just won't be using it anymore.  Furthermore, std.conv.to!string()  
should
be defined to call writeTo(), so you won't have to define both  
toString()

and writeTo() for your types.

-Lars


I think that it is a good, a breaking change, but good.

having something like what I did with writeOut would minimize the  
hassles, because then you have a uniform way to print out a type:

writeOut(sink,type,possiblyExtraArgs)
that works for everything, basic types, old style objects with  
toString,... it is useful for generic code.


Re: DIP9 -- Redo toString API

2010-11-19 Thread Fawzi Mohamed


On 19-nov-10, at 10:22, Jacob Carlborg wrote:


On 2010-11-18 23:21, Steven Schveighoffer wrote:


I just created a new D Improvement Proposal to fix the toString  
problem

I brought up several posts ago.

See: http://prowiki.org/wiki4d/wiki.cgi?LanguageDevel/DIPs/DIP9

-Steve


Why do we have to remove toString, can't toString call writeTo and  
behave as it does now?


that is what I do in blip.
toString by default is

char[] toString(){
return collectAppender(&desc);
}

I called the method desc because I saw it as description of the object.

and if the object implements serialization then the description if  
just a serialization to json format (both of those are added by mixin  
printOut!();




Re: D1 -> D2

2010-11-18 Thread Fawzi Mohamed


On 19-nov-10, at 08:08, Walter Bright wrote:


Fawzi Mohamed wrote:

From the discussion it seems that defining something like:
version(D_Version2){

mixin(`

   template Const(T){
   alias const(T) Const;
   }
   template Immutable(T){
   alias immutable(T) Immutable;
   }
   immutable(T) Idup(T)(T val){
   return val.idup;
   }
   alias const(char)[] cstring;

`);

} else {
   template Const(T){
   alias T Const;
   }
   template Immutable(T){
   alias T Immutable;
   }
   T Idup(T)(T val){
   return val.dup;
   }
   alias char[] string;
   alias char[] cstring;
}
could help a lot
later one can simply replace Const! with const and Immutable! with  
immutable, Idup replacement is more complicated, but doable


The problem with this approach is it requires the D1 compiler to be  
able to parse D2 syntax, including recognizing all of D2's keywords.


yes you are right, I meant to put a mixin ther but then I forgot it.

What is not so clear is how to cope with scope, because I have lot  
of delegates around that will need it.
For it a preprocessing step might really be the best thing, or  
someone knows a smart way to cope with that?


Not sure what the issue is.


I don't find a valid D1 expression to put in place of scope, or to  
somehow hide it, i.e. how do you write something like

module t;

void f(scope void delegate() action){
action();
}

void main(){
f(scope delegate(){
printf("bla\n");
});
}

so that it is valid D1 and D2?



Re: D1 -> D2

2010-11-18 Thread Fawzi Mohamed

From the discussion it seems that defining something like:

version(D_Version2){
template Const(T){
alias const(T) Const;
}
template Immutable(T){
alias immutable(T) Immutable;
}
immutable(T) Idup(T)(T val){
return val.idup;
}
alias const(char)[] cstring;
} else {
template Const(T){
alias T Const;
}
template Immutable(T){
alias T Immutable;
}
T Idup(T)(T val){
return val.dup;
}
alias char[] string;
alias char[] cstring;
}

could help a lot
later one can simply replace Const! with const and Immutable! with  
immutable, Idup replacement is more complicated, but doable
What is not so clear is how to cope with scope, because I have lot of  
delegates around that will need it.
For it a preprocessing step might really be the best thing, or someone  
knows a smart way to cope with that?


Re: it's time to change how things are printed

2010-11-18 Thread Fawzi Mohamed


On 18-nov-10, at 16:53, Steven Schveighoffer wrote:


On Thu, 18 Nov 2010 10:44:00 -0500, Nick Sabalausky  wrote:

I like it, *provided that* there's a quick-and-easy way to just get  
a string
when that's all you want. At the very least there should be a  
standard sink
function that's a default argument to toString that just simply  
builds a
string. What we definitely *don't* want is for the user to ever  
have to
write their own sink delegate just to get a string (which I've had  
to do

with Tango on occasion).


to!string(x);

(which will probably do the delegate/etc when x.toString is defined)


I don't know I considered using the to!(T) conversion, but decided  
against it in blip, because I preferred having to for exact  
conversion, and use another set of methods for string conversion (that  
is special enough, and sometime used just for debugging, and not  
invertible).


by the way another nice effect of using a simple sink delegate is that  
you can easily redeclare it at low level and get rid of dependencies  
(well maybe you suffer a bit converting basic types, but it is doable)
Whereas using higher level streams is difficult in the runtime (you  
easily have object depending on them, forcing you to put them in  
object.d)


Fawzi


Re: it's time to change how things are printed

2010-11-18 Thread Fawzi Mohamed

On 18-nov-10, at 17:01, Fawzi Mohamed wrote:


[...]
If you take a look at blip.


ehm that was a leftover form my editing that I did not see because it  
was outside the visible area in my mail program... just ignore it.

well you *can* look at blip, but well you get the point...


Re: it's time to change how things are printed

2010-11-18 Thread Fawzi Mohamed


On 18-nov-10, at 16:14, Steven Schveighoffer wrote:

A recent bug report reminded me of how horrible D is at printing  
custom types.


Consider a container type that contains 1000 elements, such as a  
linked list.  If you print this type, you would expect to get a  
printout similar to an array, i.e.:


[ 1 2 3 4 5 ... 1000 ]

If you do this:

writeln(mylist);

then what happens is, writeln calls mylist.toString(), and prints  
that string.


But inside mylist.toString, it likely does things like  
elem[0].toString() and concatenates all these together.  This  
results in at least 1000 + 1 heap allocations, to go along with 1000  
appends,  to create a string that will be sent to an output stream  
and *discarded*.


So the seemingly innocuous line writeln(mylist) is like attaching a  
boat anchor to your code performance.


There is a better way, as demonstrated by BigInt (whose author  
refuses to implement toString()):


void toString(scope void delegate(scope const(char)[] data), string  
format = null)


What does this do?  Well, now, writeln can define a delegate that  
takes a string and sends it to an output stream.  Now, there is no  
copying of data, no heap allocations, and no need to concatenate  
anything together!  Not only that, but it can be given an optional  
format specifier to control output when writefln is used.  Let's see  
how a linked list would implement this function (ignoring format for  
now):


void toString(scope void delegate(scope const(char)[] data) sink,  
string format = null)

{
  sink("[");
  foreach(elem; this)
  {
 sink(" ");
 elem.toString(sink);
  }
  sink(" ]");
}

It looks just about as simple as the equivalent function that would  
currently be necessary, except you have *no* heap allocations, there  
is a possibility for formatting, and D will be that much better  
performing.  Note that using a delegate allows much more natural  
code which requires recursion.


Should we create a DIP for this?  I'll volunteer to spearhead the  
effort if people are on board.


I agree wholeheartedly with this, I have always pushed in this  
direction every time the subject came up.
In tango for example exception uses this, also because I did not want  
memory allocations printing the stacktrace.


This is the way used in blip to output everything, I always felt bad  
in allocating things on the heap.


- in object I look for a void desc(void delegate(const(char)[] data)  
sink) method (well D1, so scope is implied ;)
  optionally with extra format arguments that don't have to be  
restricted to a simple string.


- i have implemented a writeOut templatized function to easily dump  
out all kinds of objects to sinks or similar objects
  with it you write writeOut(sink,object,possiblyExtraArgs); // see  
in blip.io.BasicIO


- I have defined a dumper object (just a struct) and a helper function  
for easy call chaining, so you can do

  dumper(sink)("bla:")(myObject)("\n");

- blip.container.GrowableArray completes the offer by giving an easy  
way to collect the results, and has two helper functions:


/// collects what is appended by the appender in a single array and  
returns it
/// it buf is provided the appender tries to use it (but allocates if  
extra space is needed)
T[] collectAppender(T)(void delegate(void delegate(T[]))  
appender,char[] buf=null){}


/// collects what is appended by the appender and adds it at once to  
the given sink
void sinkTogether(U,T)(U sink,void delegate(void delegate(T[]))  
appender,char[] buf=null){}


I find that such an approach works well, is not too intrusive, and is  
efficient.


Fawzi

If you take a look at blip.




Re: D1 -> D2

2010-11-18 Thread Fawzi Mohamed


On 18-nov-10, at 12:07, Mike Parker wrote:


On 11/18/2010 7:51 PM, Fawzi Mohamed wrote:

Is there any "porting" guide around in a wiki?
If not a page where to share the best tricks would be nice "D1->D2
conversion tricks"?

In the short term I don't think that going D2 only is really an  
option
for me, so how feasible it is to keep the code base compatible to  
both

D1 and D2?

I know that one can define some templates (for example  
Const(T),),
and maybe use mixins, but how much uglier does the code become as  
result?
I choose D to have cleaner code, I am not interested in loosing all  
that
just to be D1 and D2, then I prefer to wait, and convert everything  
at

once.

Well that is about it...

thanks

Fawzi


In maintaining Derelict, which is nothing more than a simple  
collection of bindings to C libraries, I have had headaches keeping  
compatibility between D1/D2. It's nothing that has been difficult to  
solve, just ugly.


If something as simple as a C-binding is uglified, I cringe at the  
thought of maintaining something more complex. It's going to get  
very ugly, very quickly. My attitude is that any future D projects I  
make available will be D2 only. I just don't think it's worth being  
compatible with both versions from a code maintenance perspective.


that was my feeling too, but I wanted some confirmation from people  
having actually done it.

This just reinforces my choice of being D1 only for the moment...



D1 -> D2

2010-11-18 Thread Fawzi Mohamed

Is there any "porting" guide around in a wiki?
If not a page where to share the best tricks would be nice "D1->D2  
conversion tricks"?


In the short term I don't think that going D2 only is really an option  
for me, so how feasible it is to keep the code base compatible to both  
D1 and D2?


I know that one can define some templates (for example Const(T),),  
and maybe use mixins, but how much uglier does the code become as  
result?
I choose D to have cleaner code, I am not interested in loosing all  
that just to be D1 and D2, then I prefer to wait, and convert  
everything at once.


Well that is about it...

thanks

Fawzi


Re: D and multicore

2010-11-14 Thread Fawzi Mohamed


On 14-nov-10, at 02:44, Fawzi Mohamed wrote:


[...]
Sometime the problem you have is not so costly that you need to  
commit all resources to it, you just want to solve it efficiently  
and if possible taking advantage of the parallelization.


In this case a good model is the actor model where objects  
communicate with messages to each other.
One can have thread objects with mailboxes and pattern matching to  
select the message, or objects with an interface and a remote  
procedure call to invoke them.
You can organize the network of messages in several ways, you can  
have a central server, and clients that connect, you can have a  
central database to communicate, you can have a peer to peer  
structure, you can have producer/consumer relationships.

Normally given a problem one can see how to partition it optimally.

I use blip.parallel.rpc to give that kind of messaging between  
objects.
Note that one has to think about failure of one part in  this model,  
not necessarily a failure of one process should stop all processes  
(in some cases it might even be undetected).
At this level one could theoretically migrate processes/objects  
automatically, but given that the latency increase can be very large  
(~10^6) this automatic distribution is doable only for tasks that  
were considered by the programmer, a fully automatic redistribution  
of any object is not realistic.


I forgot to say that if that part of your problem fits a simple  
parallelization pattern, for example costly basically independent  
tasks, for which you might (for example) use a client/server approach,  
or a data parallel approach with huge amount of distributed data on  
which you want to apply something like map/reduce.
In these examples the actor model might work well allowing one to be  
more dynamic about the work distribution than MPI (that has no nice  
partial failure), and still use all system resources (in fact I use it  
exactly for those reasons).




Re: D and multicore

2010-11-13 Thread Fawzi Mohamed


On 13-nov-10, at 22:23, Gary Whatmore wrote:


parallel noob Wrote:


Hello

Intro: people with pseudonyms are often considered trolls here, but  
this is a really honest question by a sw engineer now writing  
mostly sequential web applications.  (I write "parallel" web apps,  
but the logic goes that you write sequential applications for each  
http query, the frontend distributes queries among backend http  
processes, and the database "magically" ensures proper locking.  
There's hardly ever any locks in my code.)


D is touted as the next gen of multicore languages. I pondered  
between D and D.learn, where to ask this. It just strikes me odd  
that there isn't any kind of documentation explaining how I should  
write (parallel) code for multicore in D. If D is much different,  
the general guidelines for PHP web applications, Java, or Erlang  
might don't work. From what I've gathered from these discussions,  
there are:


- array operations and auto-parallelization of loops
- mmx/sse intrinsics via library
- transactional memory (requires hardware support? doesn't work?)
- "erlang style" concurrency? == process functions in Phobos 2?
- threads, locks, and synchronization primitives

Sean, sybrandy, don, fawzi, tobias, gary, dsimcha, bearophile,  
russel, trass3r, dennis, and simen clearly have ideas how to work  
with parallel problems.


A quick look at wikipedia gave http://en.wikipedia.org/wiki/Parallel_computing 
 and http://en.wikipedia.org/wiki/Parallel_programming_model


I fail to map these concepts discussed here with the things listed  
on those pages. I found MPI, POSIX Threads, TBB, Erlang, OpenMP,  
and OpenCL there.


Sean mentioned:

"In the long term there may turn out to be better models, but I  
don't know of one today."


So he's basically saying that those others listed in the wikipedia  
pages are totally unsuitable for real world tasks? Only Erlang  
style message passing works?


The next machine I buy comes with 12 or 16 cores or even more --  
this one has 4 cores. The typical applications I use take advantage  
of 1-2 threads. For example a cd ripper starts a new process for  
each mp3 encoder. The program runs at most 3 threads (the gui, the  
mp3 encoder, the cd ripper). More and more applications run in the  
browser. The browser actively uses one thread + one thread per  
applet. I can't even utilize more than 50% of the power of the  
current gen!


The situation is different with GPUs. My Radeon 5970 has 3200  
cores. When the core count doubles, the FPS rating in games almost  
doubles. They definitely are not running Erlang style processes  
(one for GUI, one for sounds, one for physics, one for network).  
That would leave 3150 cores unused.


there are different kinds of parallel problems, some are trivially, or  
almost trivially parallel, other are less parallel.
Some tasks are very quick (one talks of micro parallelism), other are  
much more coarse.


typical code has a limited parallelization potential, out of order  
execution of modern processors tries to take advantage of this, but  
normally having a lot of execution hardware is not useful because  
there is a limited amount of instruction level parallelism (ILP) is  
limited.
There is an important exception: vector operations. So processor often  
have vector hardware to do them efficiently. Compiler to take  
advantage of them vectorize loops.
Array operations are a class of operations (that include vector  
operations) that are often very parallel.
If one for example wants to apply a pure operations on an array this  
is trivially parallel.
Data parallel languages are especially good to express this kind of  
parallelism.


GPU are optimized graphical operations which are mainly kind of vector  
and array operations and thus have a large amount of this kind of  
parallelization. This is also present in some scientific programs, and  
indeed GPU (with CUDA or openCL) are increasingly used for that.


The more coarse levels of parallelization use other means.
In my opinion shared memory parallelization can be done efficiently if  
on is able to treat independent recursive tasks.
Recursive task (that come form example from divide & conquer  
approaches, and for example can be used to perform array operations)  
can be evaluated efficiently evaluating subtasks first, and stealing  
supertasks keeping into account the locality of processors (cilk has a  
such an approach).


Independent tasks can be represented by threads, should be executed  
fairly, and work well to represent a single interacting object or  
different requests for a web server.
All OS give support for this, as threads (unlike processes) share  
memory one has to take care that changes from one thread and another  
are done in a meaningful way. To achieve this there are locks, atomic  
ops,...
Transactional memory works for changes that should be done atomically  
(the big problem is that if something fails one has to undo everything  
a

Re: Explicit Thread Local Heaps

2010-11-12 Thread Fawzi Mohamed

On 12-nov-10, at 16:36, dsimcha wrote:

There was some discussion around here a while back about the  
possibility of
using thread-local heaps in the standard GC.  This was rejected  
largely
because of the complexity it would add when casting to shared/ 
immutable.


I'm wondering if it would be a good idea to allow memory to be  
explicitly
allocated as thread-local through a separate GC.  Such a GC would be  
designed
from the ground up to assume thread-local data and would never be  
used to
allocate in standard Phobos or Druntime functions.  It would simply  
be a
Phobos module, something like std.localgc.  The only way to use it  
would be to
explicitly call something like ThreadLocal.malloc, or pass it as a  
parameter

to something that needs an allocator.

The collector would (unsafely) assume that you always maintain at  
least one
pointer to all thread-locally allocated data on either the relevant  
thread's
stack, the thread-local heap or in thread-local storage.  The global  
heap,

__gshared storage and other threads' stacks would not be scanned.

A major issue I see is interfacing such a GC with the regular GC  
such that

pointers from the thread-local memory to shared memory are dealt with
properly, without being excessively conservative.  The thread-local  
GC would
likely use core.stdc.malloc() to allocate large blocks of memory,  
and would
need a way to signal to the shared GC what blocks might contain  
pointers

without synchronizing on every update.

If this sounds like a good idea, maybe I'll start prototyping it.   
Overall,
the idea is that thread-local heaps are an optimization that should  
be done
explicitly when/if you need it, not something that needs to be built  
deep into

the language runtime.


In my code the lock during allocation is more an issue than GC scanning.
Having thread local (or better numa node local) pools for the  
allocation with separate locks would solve the main bottleneck.


I have always disliked extra memory hierarchies, I feel that its  
benefit/complexity ratio is too small, but I might be wrong.
The problem you identified of pointers to "global" memory is difficult  
to solve in a way that really gives the local GC and advantage over  
the a good GC implementation has uses several pools, without burdening  
the programmer.


Still I imagine that having a localgc library implementation could be  
useful to some.
I suspect that using it for general types that might allocate memory  
on their own would be difficult, but as this be used in special cases  
probably it isn't an issue.




Re: Memory Pools support in Phobos

2010-11-12 Thread Fawzi Mohamed
One structure that I found very useful in blip  is a set of numa node  
local caches that can be used for pools/freelists.
One can create such a "numa aware" pool with something like (see  
blip.container.Pool & blip.container.Cache)


	PoolI!(T) myPool=cachedPool(function T(PoolI!(T)p) { auto res=new T;  
res.pool=p; return res; });


One can also create such a pool with two functions addPool/rmPool that  
do a reference counting to allocate the pool if needed (this is used  
for example in loops for the context structures.


This is still less efficient than it should be because D1doesn't have  
TLS support like D2 for the default cache storage, but has the correct  
scaling (another place where blip suffer due to this is the current  
task, that also uses TLS).


Fawzi
On 12-nov-10, at 13:49, bearophile wrote:


F. Almeida:

Regardless of the direction chosen for delete and clear(), Phobos  
is still lacking in functions that ease the management of the C heap.


I think Andrei agrees with you, he discussed this a bit in past.

I agree, adding few manual memory management functions/structs to  
Phobos will help a lot lower level coding. The good thing is that I  
think only one or few hundreds of lines of code may be enough to  
solve most of this problem.


The data structures I'd like in this manual memory management Phobos  
module:


1) In my dlibs1 I have a "simple" struct that acts like a pool:

struct Foo {} // some struct
alias MemoryPool!Foo FooPool;

And then you may ask FooPool: a new Foo, clear them all, deallocate  
them all (if you want you may also add a freelist of Foo so you may  
deallocate only one of them). Surely there are ways to design  
something more general than this, but in lot of cases this was  
enough for me, and I have seen programs become twice faster just  
using this MemoryPool.


2) A second data structure that may be useful is a memory arena,  
where you allocate different sized objects.


3) A third data structure I'd like in Phobos is a hierarchical  
allocator, it's very easy to use, easy to implement, but it is  
powerful:

http://swapped.cc/halloc/

4) This old code by Walter, the mem.c/mem.h memory management  
utility is partially obsolete, but some of its ideas may be useful  
for people that use the C heap manually from D:

http://c.snippets.org/code/temp/snip-c.zip

5) This is a kind of hack, but helps a little: a function that  
carves a whole 2D dynamic array of dynamic arrays matrix out of a  
single C-heap allocated memory object (it has to respect alignments  
of its basic item, so sometimes it adds a bit of padding). This  
helps a little reduce memory usage and sometimes increases  
performance because of higher cache coherence.


6) An unsafe growable dynamic array allocated backwards on the  
stack :-) This works, but you need to use it carefully, not defining  
new variables in the middle.


Bye,
bearophile




Re: Thoughts on parallel programming?

2010-11-12 Thread Fawzi Mohamed

On 11-nov-10, at 20:10, Russel Winder wrote:


On Thu, 2010-11-11 at 18:24 +0100, Tobias Pfaff wrote:
[ . . . ]

Unfortunately I only know about the standard stuff, OpenMP/OpenCL...
Speaking of which: Are there any attempts to support lightweight
multithreading in D, that is, something like OpenMP ?


I'd hardly call OpenMP lightweight.  I agree that as a meta-notation  
for

directing the compiler how to insert appropriate code to force
multithreading of certain classes of code, using OpenMP generally  
beats
manual coding of the threads.  But OpenMP is very Fortran oriented  
even

though it can be useful for C, and indeed C++ as well.

However, given things like Threading Building Blocks (TBB) and the
functional programming inspired techniques used by Chapel, OpenMP
increasingly looks like a "hack" rather than a solution.


I agree I think that TBB offers primitives for many parallelization  
kinds, and is more clean and flexible than OpenMP, but in my opinion  
it has a big weakness: it cannot cope well with independent tasks.  
Coping well wit both nested parallelism and independent tasks is a  
crucial thing to have a generic solution that can be applied to  
several problems.

This is missing as far as I know also from Chapel.
I think that having a solution that copes well with both nested  
parallelism and independent tasks is an excellent starting on which to  
build almost all other higher level parallelization schemes.
It is important to handle this centrally, because the number of  
threads that one should spawn should ideally stay limited to the  
number of  execution units.




Re: Thoughts on parallel programming?

2010-11-12 Thread Fawzi Mohamed


On 11-nov-10, at 20:41, Russel Winder wrote:


On Thu, 2010-11-11 at 15:16 +0100, Fawzi Mohamed wrote:
[ . . . ]
on this I am not so sure, heterogeneous clusters are more difficult  
to
program, and GPU & co are slowly becoming more and more general  
purpose.

Being able to take advantage of those is useful, but I am not
convinced they are necessarily the future.


The Intel roadmap is for processor chips that have a number of cores
with different architectures.  Heterogeneity is not going going to  
be a
choice, it is going to be an imposition.  And this is at bus level,  
not

at cluster level.


Vector co processors, yes I see that, and short term the effect of  
things like AMD fusion (CPU/GPU merging).
Is this necessarily the future? I don't know, neither does intel I  
think, as they are still evaluating larabee.

But CPU/GPU will stay around fro some time more for sure.


[ . . . ]
yes many core is the future I agree on this, and also that  
distributed

approach is the only way to scale to a really large number of
processors.
Bud distributed systems *are* more complex, so I think that for the
foreseeable future one will have a hybrid approach.


Hybrid is what I am saying is the future whether we like it or not.   
SMP

as the whole system is the past.




I disagree that distributed systems are more complex per se.  I  
suspect
comments are getting so general here that anything anyone writes can  
be

seen as both true and false simultaneously.  My perception is that
shared memory multithreading is less and less a tool that applications
programmers should be thinking in terms of.  Multiple processes with  
an

hierarchy of communications costs is the overarching architecture with
each process potentially being SMP or CSP or . . .


I agree that on not too large shared memory machines a hierarchy of  
tasks is the correct approach.
This is what I did in blip.parallel.smp. Using that one can have  
fairly efficient automatic scheduling, and so forget most of the  
complexities, and actual hardware configuration.



again not sure the situation is as dire as you paint it, Linux does
quite well in the HPC field... but I agree that to be the ideal OS  
for

these architectures it will need more changes.


The Linux driver architecture is already creaking at the seams, it
implies a central monolithic approach to operating system.  This falls
down in a multiprocessor shared memory context.  The fact that the Top
500 generally use Linux is because it is the least worst option.  M$
despite throwing large amounts of money at the problem, and indeed
bought some very high profile names to try and do something about the
lack of traction, have failed to make any headway in the HPC operating
system stakes.  Do you want to have to run a virus checker on your HPC
system?

My gut reaction is that we are going to see a rise of hypervisors as  
per

Tilera chips, at least in the short to medium term, simply as a bridge
from the now OSes to the future.  My guess is that L4 microkernels
and/or nanokernels, exokernels, etc. will find a central place in  
future
systems.  The problem to be solved is ensuring that the appropriate  
ABI
is available on the appropriate core at the appropriate time.   
Mobility

of ABI is the critical factor here.


yes microkernels& co will be more and more important (but I wonder how  
much this will be the case for the desktop).
ABI mobility?not so sure, for hpc I can imagine having to compile to  
different ABIs (but maybe that is what you mean with ABI mobility)



[ . . . ]

Whole array operation are useful, and when possible one gains much
using them, unfortunately not all problems can be reduced to few  
large

array operations, data parallel languages are not the main type of
language for these reasons.


Agreed.  My point was that in 1960s code people explicitly handled  
array

operations using do loops because they had to.  Nowadays such code is
anathema to efficient execution.  My complaint here is that people  
have
put effort into compiler technology instead of rewriting the codes  
in a
better language and/or idiom.  Clearly whole array operations only  
apply

to algorithms that involve arrays!

[ . . . ]
well whole array operations are a generalization of the SPMD  
approach,
so I this sense you said that that kind of approach will have a  
future
(but with a more difficult optimization as the hardware is more  
complex.


I guess this is where the PGAS people are challenging things.
Applications can be couched in terms of array algorithms which can be
scattered across distributed memory systems.  Inappropriate operations
lead to huge inefficiencies, but handles correctly, code runs very
fast.

About MPI I think that many don't see what MPI really does, mpi  
offers

a simplified parallel model.
The main weakness of this model is that it assumes some kind of
reliability, but then it offers
a clear computational model with processors ordered in 

Re: Thoughts on parallel programming?

2010-11-12 Thread Fawzi Mohamed

On 12-nov-10, at 00:29, Tobias Pfaff wrote:


[...]
Well, I am looking for an easy & efficient way to perform parallel  
numerical calculations on our 4-8 core machines. With C++, that's  
OpenMP (or GPGPU stuff using CUDA/OpenCL) for us now. Maybe  
lightweight was the wrong word, what I meant is that OpenMP is easy  
to use, and efficient for the problems we are solving. There  
actually might be better tools for that, honestly we didn't look  
into that much options -- we are no HPC guys, 1000-cpu clusters are  
not a relevant scenario and we are happy that we even started  
parallelizing our code at all :)


Anyways, I was thinking about the logical thing to use in D for this  
scenario. It's nothing super-fancy, in cases just a parallel_for we  
will, and sometimes a map/reduce operation...


If you use D1 blip.parallel.smp offers that, and it does scale well to  
4-8 cores.




Re: Thoughts on parallel programming?

2010-11-11 Thread Fawzi Mohamed


On 11-nov-10, at 15:16, Fawzi Mohamed wrote:


On 11-nov-10, at 09:58, Russel Winder wrote:


MPI and all the SPMD approaches have a severely limited future, but I
bet the HPC codes are still using Fortran and MPI in 50 years time.


well whole array operations are a generalization of the SPMD  
approach, so I this sense you said that that kind of approach will  
have a future (but with a more difficult optimization as the  
hardware is more complex.


sorry I translated that as SIMD, not SPMD, but the answer below still  
holds in my opinion, if one has a complex parallel problem mpi is a  
worthy contender, the thing is that in many occasions one doesn't need  
all its power.
If a client server, a distributed or a map/reduce approach work, then  
simpler and more flexible solutions are superior.
That (and its reliability problem, that PGAS also shares) is, in my  
opinion, the reason MPI is not very used outside the computational  
community.
Being able to tackle also MPMD in a good way can be useful, and that  
is what the rpc level does between computers, and the event based  
scheduling within a single computer (ensuring that one processor can  
do meaningful work while the other waits.


About MPI I think that many don't see what MPI really does, mpi  
offers a simplified parallel model.
The main weakness of this model is that it assumes some kind of  
reliability, but then it offers
a clear computational model with processors ordered in a linear of  
higher dimensional structure and efficient collective communication  
primitives.
Yes MPI is not the right choice for all problems, but when usable it  
is very powerful, often superior to the alternatives, and  
programming with it is *simpler* than thinking about a generic  
distributed system.
So I think that for problems that are not trivially parallel, or  
easily parallelizable MPI will remain as the best choice.




Re: Thoughts on parallel programming?

2010-11-11 Thread Fawzi Mohamed


On 11-nov-10, at 15:16, Fawzi Mohamed wrote:


On 11-nov-10, at 09:58, Russel Winder wrote:


On Thu, 2010-11-11 at 02:24 +, jfd wrote:
Any thoughts on parallel programming.  I was looking at something  
about Chapel
and X10 languages etc. for parallelism, and it looks interesting.   
I know that
it is still an area of active research, and it is not yet (far  
from?) done,

but anyone have thoughts on this as future direction?  Thank you.


I just finished reading "Parallel Programmability and the Chapel  
Language" by Chamberlain, Callahan and Zima.

A very nice read, and overview of several languages and approaches.
Still I stand by my earlier view, an MPI like approach is more  
flexible, but indeed having a nice parallel implementation of  
distributed arrays (which on MPI one can have using Global Arrays for  
example), can be very useful.
I think that a language like D can hide these behind wrapper objects,  
and reach for these objects (that are not the only ones present in a  
complex parallel program) an expressivity similar to chapel using the  
approach I have in blip.
A direct implementation might be more efficient on shared memory  
machines though.


Re: Thoughts on parallel programming?

2010-11-11 Thread Fawzi Mohamed
s makes sense to distribute these objects on all processors  
or none, I find that the robust partitioning and collective  
communication primitives of MPI superior to PGAS.
With enough effort you probably can get everything also from PGAS, but  
then you loose all its simplicity.



The summary of the summary is:  programmers will either be developing
parallelism systems or they will be unemployed.


The situation is not so dire, some problems are trivially parallel, or  
can be solved with simple parallel patterns, others don't need to be  
solved in parallel, as the sequential solution if fast enough, but I  
do agree that being able to develop parallel systems is increasingly  
important.

In fact it is something that I like to do, and I thought about a lot.
I did program parallel systems, and out of my experience I tried to  
build something to do parallel programs "the way it  should be", or at  
least the way I would like it to be ;)


The result is what I did with blip, http://dsource.org/projects/blip .
I don't think that (excluding some simple examples) fully automatic  
(trasparent) parallelization is really feasible.
At some point being parallel is more complex, and it puts an extra  
burden on the programmer.
Still it is possible to have several levels of parallelization, and if  
you program a fully parallel program it should still be possible to  
use it relatively efficiently locally, but a local program will not  
automatically become fully parallel.


What I did is a basic smp parallelization for programs with shared  
memory.
This level tries to schedule efficiently independent recursive tasks  
using all processors as efficiently as possible (using the topology  
detected by libhwloc.
It leverages an event based framework (libev) to avoid blocking  
waiting for external tasks.
The ability to describe complex asynchronous processes can be very  
useful also to work with GPUs.


mpi parallelization is part of the hierarchy of parallelization, for  
the reasons I described before, it is wrapped so that on a single  
processor one can use a "pseudo" mpi.


rpc (remote procedure call) might be better described as distributed  
objects, offers a server that can responds to external requests at any  
moment and the possibility to publish objects that will be then  
identified by urls.
There urls can be used to create local proxies that call the remote  
object and get results from it.

This can be done using mpi, or directly sockets.
If one uses sockets he has the whole flexibility (but also the whole  
complexity) of a fully distributed system.
The basic building blocks of this can be used also in a distributed  
protocol like distributed hashtables.


blip is available now, and works with osx and linux. It should be  
possible to port it to windows, (both libhwloc and libev work on  
windows), but I didn't do it.
It needs D1 and tango, tango trunk can be compiled using the scripts  
in blip/buildTango, and then programs using blip can be compiled more  
easily with the dbuild script (that uses xfbuild behind the scenes).


I planned to make an official release this w.e., but you can look  
already now, the code is all there...


Fawzi

-
Dr. Fawzi Mohamed,  Office: 3'322
Humboldt-Universitaet zu Berlin, Institut fuer Chemie
Post:   Unter den Linden 6, 10099, Berlin
Besucher/Pakete:Brook-Taylor-Str. 2, 12489 Berlin
Tel: +49 30 20 93 7140  Fax: +49 30 2093 7136
-






Re: why a part of D community do not want go to D2 ?

2010-11-07 Thread Fawzi Mohamed

On 7-nov-10, at 13:05, Moritz Warning wrote:


On Sat, 06 Nov 2010 23:11:59 +, bioinfornatics wrote:


hello,
I have a question (i would like understand), they are many important
people of D community who do not want go to D2, why ?

thanks for answer


- D2 trades complexity for more features
- D2 has a lot of compiler bugs


Personally I am not "against" D2, in fact there are several features  
that I would like to use (like real closures and template  
constraints), some that could be useful, but are not critical (const/ 
pure), but for me D1 works better.
It is just about wanting to do a project *now*, D1 offers more  
guarantees, like several working compilers, good support for 64 bit,  
relatively stable libraries (tango).
I know that when looking for a new language one wants to go for the  
latest and greatest, but I think that a largish project has more  
chances of having success in D1 at the moment.
It does not mean that it is not possible to use D2 successfully,  
others here are doing that, and with the recent focus of W & co on  
stabilizing the toolchain and general interest in D2 the situation  
might change, but it is not there yet.


Fawzi



Re: Temporary suspension of disbelief (invariant)

2010-10-27 Thread Fawzi Mohamed


On 27-ott-10, at 22:41, Walter Bright wrote:


Fawzi Mohamed wrote:
the only thing is that the invariant is not checked *after* those  
method that destroy the object, it will be ok to require a special  
name for that method (for example destroy, dealloc or similar) if  
you want to avoid misusage.


There is a method that does that, it's called the destructor. An  
invariant is not checked after the destructor is run.


ok I don't think that this is a major feature, or anything like that,  
so I will stop arguing after this,

but I still believe that it would be useful.
The destructor is an example of a method that should not check the  
invariant, but it does not address my use case, because one cannot  
explicitly call a destructor, and there is an important difference  
between a destructor called by the GC and a destructor called  
explicitly: the one called explicitly can access the memory of the  
object.


Not only that it might be useful for some methods to delete the object  
itself.
A classical example is reference counting, the last release (the one  
that sets the refCount to 0) should be able to delete the object.
It is not natural to make this impossible, or to force al reference  
counted object to have no invariant.


Re: Temporary suspension of disbelief (invariant)

2010-10-27 Thread Fawzi Mohamed


On 27-ott-10, at 20:57, Walter Bright wrote:


Fawzi Mohamed wrote:
well for methods that destroy the object I think that not checking  
the invariant is something reasonable, and not a hole that destroys  
the whole system. Being able to write methods that invalidate the  
object can be useful IMHO.
At the moment in those objects I simply have no invariant, and by  
the way, no I don't think that having the invalidating methods  
always external is really a good solution.
You can do without, but sometime an @invalidate property (or  
similar) has its uses.


I agree that having a method that destroys an object can be useful.  
Such classes can't have an invariant, however.


The problem with an @invalidate property is the user has to go  
through every declaration in the class looking for such a property  
(grep isn't sufficient because it can pick up false positives  
(nested functions) and can miss things inserted by templates and  
mixins). An invariant should be a guarantee, not "this is guaranteed  
unless someone somewhere violates it, which is perfectly legal".


It's the same problem that C++ "mutable" produces, and it's not  
worth it.



the only thing is that the invariant is not checked *after* those  
method that destroy the object, it will be ok to require a special  
name for that method (for example destroy, dealloc or similar) if you  
want to avoid misusage.
The next method called on the object would fail (as it should), I  
often store invalid info on purpose before deleting to increase the  
probability of detecting such usages.

invariant is still valid (and checked) before and after all methods.
It is even impossible to call twice the destroy method (as the check  
at the beginning of it would fail.

So I think that everything would stay perfectly well defined.
I think it is a different situation than mutable


Re: Temporary suspension of disbelief (invariant)

2010-10-27 Thread Fawzi Mohamed


On 27-ott-10, at 20:20, Walter Bright wrote:


Fawzi Mohamed wrote:
An issue I encountered in D1 with invariant is using delete: I have  
a method that deletes the current object, and obviously then any  
invariant would fail.

It would be nice to have a way to disable invariant in that case.
In D2, as delete is not allowed anymore (if I got it correctly)  
this is not a problem.


It's not an invariant if it only works some of the time.

It's like C++ and the mutable keyword, which allows an escape from  
the const rules. Which makes C++ const fairly useless.


A person should be able to see the invariant in a class declaration,  
and know that it offers a guarantee. He should not be required to  
read over everything else in the source code looking for the absence  
of a method that disables it.



well for methods that destroy the object I think that not checking the  
invariant is something reasonable, and not a hole that destroys the  
whole system. Being able to write methods that invalidate the object  
can be useful IMHO.
At the moment in those objects I simply have no invariant, and by the  
way, no I don't think that having the invalidating methods always  
external is really a good solution.
You can do without, but sometime an @invalidate property (or similar)  
has its uses.


Fawzi


Re: Temporary suspension of disbelief (invariant)

2010-10-27 Thread Fawzi Mohamed


On 27-ott-10, at 18:59, Jonathan M Davis wrote:


On Wednesday, October 27, 2010 07:33:58 Fawzi Mohamed wrote:

An issue I encountered in D1 with invariant is using delete: I have a
method that deletes the current object, and obviously then any
invariant would fail.
It would be nice to have a way to disable invariant in that case.
In D2, as delete is not allowed anymore (if I got it correctly) this
is not a problem.


Except that calling a function on a deleted object would not be  
desirable, so
having the invariant fail at that point would be a _good_ thing. If  
anything,
you'd want _all_ function calls on a deleted object to fail  
invariant or no
invariant, because none of them should be happening in the first  
place.


clear() in D2 does (or at least did - I don't know what it's current  
state is)
put an object in its default state prior to any constructor call  
which would
likely violate any invariant, which, on the whole, is a good thing  
as well.
However, there was discussion of making a cleared object have a  
nuked vtbl which
would be even better, since then all function calls on it would fail  
period.


- Jonathan M Davis
..except that clear (just like my dealloc methods) at their end will  
call the invariant...

*that* is what I was talking about,
:)
Fawzi


Re: Temporary suspension of disbelief (invariant)

2010-10-27 Thread Fawzi Mohamed
An issue I encountered in D1 with invariant is using delete: I have a  
method that deletes the current object, and obviously then any  
invariant would fail.

It would be nice to have a way to disable invariant in that case.
In D2, as delete is not allowed anymore (if I got it correctly) this  
is not a problem.


Fawzi
On 27-ott-10, at 15:08, Stanislav Blinov wrote:


27.10.2010 4:00, bearophile wrote:
I have not asked this in the D.learn newsgroup because I think it  
may be a bit interesting for other general people too.


In D contract programming the invariant is not called before/after  
private methods. I think that in some cases you may want to disable  
the invariant even in some public methods. If your class/struct has  
a public method that invalidates the state of the class instance,  
and one public method that fixes the class instance state (I am  
thinking about certain data structures where in a first phase you  
may add many items, and then you ask the data structure to clean up  
itself and become coherent. This may avoid some computations), I  
have found this way to implement it:
If I get this right, then it is by design that your class may have  
several general logical states: e.g. "initializing" and "coherent".  
Given this, I don't see why you'd want to disable invariant checks  
rather than modify those checks instead to validate current logical  
state. In fact, that "ghost" field simply serves as a flag for  
invariant showing which logical state it should enforce. The fact  
that states are 'logical', e.g. are different while represented by  
the same physical fields doesn't always rule them out as, uh, class  
states: you could as well have two separate inner classes that  
perform initialization and polishing, each with its own invariant.  
Then you could use those inner classes' private methods (without  
causing their invariant checks), but in main class' invariant  
perform an assert on them to ensure their state is valid.


IMHO, proper invariants should be strict and act by the numbers at  
all times, otherwise there's little to no gain in using them at all.




Re: Possible bug in atomicOp

2010-10-26 Thread Fawzi Mohamed


On 26-ott-10, at 05:13, Sean Kelly wrote:


Don Wrote:


Wait a minute. x86 has no read-modify-write instructions for x87,  
or for
SSE. So I don't think it's possible to implement atomic floating- 
point

ops, apart from assignment.
(Of course, you can fake it with a mass of casts and a CAS, but that
doesn't seem helpful).
It should just be prevented, except possibly for assignment. (Is an
80-bit float assignment atomic? Maybe not, since it would need  
special

logic).


atomicOp does most of its RMW operations using a CAS loop, so I  
think it should work.  The redo occurs when the memory location  
being written to changed since it was read, and that shouldn't be  
any different for floating point values vs. integers.



I use atomic op casting pointers as integer pointers ( http://github.com/fawzi/blip/blob/master/blip/sync/Atomic.d 
 which is also tango one), and I haven't encountered any problem yet,  
but I haven't checked in detail if some x87 or SSE load/store might  
potentially give problems, but as sean says they should not (as the  
value should be transferred from register to register, not going  
through the memory, and memory accesses are controlled by CAS, and  
memory barriers (if used to separate subsequent "normal" ops) should  
be valid also for x87/SEE.


Fawzi


Re: Q: What are the rules for emitting template code?

2010-10-25 Thread Fawzi Mohamed


On 25-ott-10, at 18:04, Steven Schveighoffer wrote:

On Sun, 24 Oct 2010 00:06:23 -0400, Austin Hastings > wrote:



Howdy,

This is a bit involved, so bear with me.

Suppose I have a template, Decider(Args...) and some other  
templates, Option1(...), Option2(...), etc.


The job of Decider() is to decide, based on the given parameters,  
which of the possible OptionN templates to use.


Decider uses "static if" and some builtins and maybe some CTFE to  
determine its result.


Now:

How much can Decider ask of one of the Option templates without  
that template being expensively realized?


Alternatively:

What parts of an Option template have to be realized for Decider to  
do its job?


In particular:

If Decider uses Option1.sizeof, does any Option1 code get emitted?

If Decider uses some external function that makes use of type  
aliases in Option1, (example:  Option1() { alias byte value_t; } )  
does any Option1 code get emitted?


If Decider uses some function defined inside the same module with  
Option1, but NOT inside of Option1, does any/all of the Option1  
code get emitted?


If Decider uses a static method of Option1, does any more of the  
Option1 code get emitted?




Obviously, I am trying to ... decide ... how to do compile time  
selection. But I'm also just a tad curious at the internals of the  
template engine.


One of the major problems with the template system IMO is compile- 
time templates (that is, templates that are only used at compile  
time) are emitted into the executable, even though they are not used.


Take for example, isForwardRange(R).  A function like this:

void foo(R) if (isForwardRange!R)

is going to instantiate isForwardRange!R, which may instantiate  
other templates to check to see if isForwardRange is true.  But all  
these things end up in the executable, even though they aren't used.


Now, if you are concerned about executable footprint, this problem I  
think will eventually be solved (not sure if there is a bug report,  
but I think I've brought it up before, and the consensus is that it  
should not end up in the exe).  If you are concerned that the  
runtime of the *compiler* might be too long, then I'm afraid you are  
just going to have to deal with it.  Everything in a compiled  
language is focused first on the resulting executable.  It's  
perfectly normal for a compiler to take extra time compiling to make  
the executable more efficient.


-Steve
I find that having no clear rule to where a template is emitted when  
compiling several files at once is a larger problem.

I would much prefer to have something that follows some rules like:
sort imports:
if a includes b and b does not include a: a>b
if a includes b and b does include a: order a,b using lexicographic  
ordering of their module name

if b includes a (and a does not include b): b>a
else neutral

when emitting a don't emit any template that was instantiated by  
modules included that come before a, and *only* those: emit all  
remaining instantiations that are required by a.


Yes this is more complex, and emits a bit more than now, but it would  
make incremental compilation using several files at once *so much*  
easier.


Fawzi


Re: Linux Agora D thread

2010-10-22 Thread Fawzi Mohamed


On 22-ott-10, at 10:56, retard wrote:


[...]
What annoys me the most in pro D articles is the author usually  
tries to
prove (in a naive way) that despite all the deficiencies the  
language and
tool chain is better even *now* than all of the competition or that  
the
*potential* is so high that the only logical conclusion is to move  
to D

*now*. Clearly this isn't the case. These kind of articles give people
the wrong impression. I'm just trying to bring up the *pragmatic*  
point

of view.

For instance, I'm starting the implementation of a 64-bit systems/
application programming project *now*. The implementation phase will  
last
N months (assume optimistic waterfall process model here). How many  
weeks/

months must the N at least be to make D a feasible option?


D1/tango is feasible now (using ldc)

A typical lead developer / project manager has to make decisions  
based on

some assumptions. E.g.

Platform  Implementation  Developer  Performance  Platform
TimeMarket IndexRisk factor
--
C/x64 Linux   12 months   good   100  medium
C++/x64 Linux 10 months   ok 110  high
Java/x64 JVM  8 monthsexcellent  80   low
C#/Windows 64 7 monthsvery good  85   low
Python/Linux  4-5 months  very good  30   low
D 12+ months? very bad   80-115 ? very high

The metrics are imaginary. The point was to show that language  
goodness

isn't a single scalar value.

Why I think the D platform's risk is so high is because the author
constantly refuses to give ANY estimates on feature schedules.  
There's no

up-to-date roadmap anywhere. The bugzilla voting system doesn't work.
Lots of production ready core functionality is missing (for example  
how

long has d2 distribution had a commercial quality xml framework?)


D1/tango also has a good xml parser


For example gcc has had 64-bit C/C++ support quite long. But it took
several years to stabilize. The implementation of a 64-bit X-ray  
machine

firmware in D cannot begin one week after 64-bit DMD is announced.




Re: approxEqual() has fooled me for a long time...

2010-10-20 Thread Fawzi Mohamed


On 20-ott-10, at 23:28, Don wrote:


Fawzi Mohamed wrote:

On 20-ott-10, at 20:53, Don wrote:

Andrei Alexandrescu wrote:

On 10/20/10 10:52 CDT, Don wrote:
I don't think it's possible to have a sensible default for  
absolute
tolerance, because you never know what scale is important. You  
can do a
default for relative tolerance, because floating point numbers  
work that
way (eg, you can say they're equal if they differ in only the  
last 4

bits, or if half of the mantissa bits are equal).

I would even think that the acceptable relative error is almost  
always

known at compile time, but the absolute error may not be.
I wonder if it could work to set either number, if zero, to the  
smallest normalized value. Then proceed with the feqrel  
algorithm. Would that work?

Andrei


feqrel actually treats zero fairly. There are exactly as many  
possible values almost equal to zero, as there are near any other  
number.
So in terms of the floating point number representation, the  
behaviour is perfect.


Thinking out loud here...

I think that you use absolute error to deal with the difference  
between the computer's representation, and the real world. You're  
almost pretending that they are fixed point numbers.
Pretty much any real-world data set has a characteristic  
magnitude, and anything which is more than (say) 10^^50 times  
smaller than the average is probably equivalent to zero.
The thing is two fold, from one thing, yes numbers 10^^50 smaller  
are not important, but the real problem is another, you will  
probably add and subtract numbers of magnitude x, on this operation  
the *absolute* error is x*epsilon.
Note that the error is relative to the magnitude of the operands,  
not of the result, it is really an absolute error.


You have just lost precision.
BTW -- I haven't yet worked out if we are disagreeing with each  
other, or not.


eheh I think we both know the problems, and it is just the matter of  
the kind of tests we do more often.
feqrel is a very important primitive, and it should be available, that  
is what should have been used by Lars.
I happen to test often things where the result is a postion, an energy  
difference,... and in those cases I have a magnitude, and so  
implicitly also an absolute error.
In any case all this discussion was useful, as it made me improve my  
code by making the magnitude an explicit argument.


Fawzi



Re: approxEqual() has fooled me for a long time...

2010-10-20 Thread Fawzi Mohamed


On 20-ott-10, at 20:53, Don wrote:


Andrei Alexandrescu wrote:

On 10/20/10 10:52 CDT, Don wrote:

I don't think it's possible to have a sensible default for absolute
tolerance, because you never know what scale is important. You can  
do a
default for relative tolerance, because floating point numbers  
work that

way (eg, you can say they're equal if they differ in only the last 4
bits, or if half of the mantissa bits are equal).

I would even think that the acceptable relative error is almost  
always

known at compile time, but the absolute error may not be.
I wonder if it could work to set either number, if zero, to the  
smallest normalized value. Then proceed with the feqrel algorithm.  
Would that work?

Andrei


feqrel actually treats zero fairly. There are exactly as many  
possible values almost equal to zero, as there are near any other  
number.
So in terms of the floating point number representation, the  
behaviour is perfect.


Thinking out loud here...

I think that you use absolute error to deal with the difference  
between the computer's representation, and the real world. You're  
almost pretending that they are fixed point numbers.
Pretty much any real-world data set has a characteristic magnitude,  
and anything which is more than (say) 10^^50 times smaller than the  
average is probably equivalent to zero.


The thing is two fold, from one thing, yes numbers 10^^50 smaller are  
not important, but the real problem is another, you will probably add  
and subtract numbers of magnitude x, on this operation the *absolute*  
error is x*epsilon.


Note that the error is relative to the magnitude of the operands, not  
of the result, it is really an absolute error.
Now the end result might have a relative error, but also an absolute  
error whose size depends on the magnitude of the operands.
If the result is close to 0 the absolute error is likely to dominate,  
and checking the relative error will fail.

This is the case for example for matrix multiplication.
In NArray I wanted to check the linar algebra routines with matrixes  
of random numbers, feqrel did fail too much for number close to 0.


Obviously the right thing as Walter said is to let the user choose the  
magnitude of its results.
In the code I posted I did choose simply 0.5**(mantissa_bits/4) which  
is smaller than 1 but not horribly so.
One can easily make that an input parameter (it is the shift parameter  
in my code)


Fawzi


Re: approxEqual() has fooled me for a long time...

2010-10-20 Thread Fawzi Mohamed


On 20-ott-10, at 17:52, Don wrote:


Andrei Alexandrescu wrote:

On 10/20/10 5:32 CDT, Lars T. Kyllingstad wrote:
(This message was originally meant for the Phobos mailing list,  
but for
some reason I am currently unable to send messages to it*.   
Anyway, it's

probably worth making others aware of this as well.)

In my code, and in unittests in particular, I use  
std.math.approxEqual()

a lot to check the results of various computations.  If I expect my
result to be correct to within ten significant digits, say, I'd  
write


  assert (approxEqual(result, expected, 1e-10));

Since results often span several orders of magnitude, I usually  
don't
care about the absolute error, so I just leave it unspecified.  So  
far,

so good, right?

NO!

I just discovered today that the default value for approxEqual's  
default

absolute tolerance is 1e-5, and not zero as one would expect.  This
means that the following, quite unexpectedly, succeeds:

  assert (approxEqual(1e-10, 1e-20, 0.1));

This seems completely illogical to me, and I think it should be  
fixed

ASAP.  Any objections?
I wonder what would be a sensible default. If the default for  
absolute error is zero, then you'd have an equally odd behavior for  
very small numbers (and most notably zero). Essentially nothing  
would be approximately zero.

Andrei


I don't think it's possible to have a sensible default for absolute  
tolerance, because you never know what scale is important. You can  
do a default for relative tolerance, because floating point numbers  
work that way (eg, you can say they're equal if they differ in only  
the last 4 bits, or if half of the mantissa bits are equal).


I would even think that the acceptable relative error is almost  
always known at compile time, but the absolute error may not be.


I had success in using (the very empiric)

/// feqrel version more forgiving close to 0
/// if you sum values you cannot expect better than T.epsilon absolute  
error.
/// feqrel compares relative error, and close to 0 (where the density  
of floats is high) it is

/// much more stringent.
/// To guarantee T.epsilon absolute error one should use shift=1.0,  
here we are more stingent

/// and we use T.mant_dig/4 digits more when close to 0.
int feqrel2(T)(T x,T y){
static if(isComplexType!(T)){
return min(feqrel2(x.re,y.re),feqrel2(x.im,y.im));
} else {
const T shift=ctfe_powI(0.5,T.mant_dig/4);
if (x<0){
return feqrel(x-shift,y-shift);
} else {
return feqrel(x+shift,y+shift);
}
}
}

(from blip.narrray.NArrayBasicOps)


Re: approxEqual() has fooled me for a long time...

2010-10-20 Thread Fawzi Mohamed


On 20-ott-10, at 13:59, Lars T. Kyllingstad wrote:


On Wed, 20 Oct 2010 13:33:49 +0200, Fawzi Mohamed wrote:


On 20-ott-10, at 13:18, Lars T. Kyllingstad wrote:


[...]
However, I, like most people, am a lot more used to thinking in  
terms

of
digits than bits.  If I need my results to be correct to within 10
significant digits, say, how (if possible) would I use feqrel() to
ensure
that?

feqrel(a,b)>33 // 10*log(10)/log(2)


...which would be the solution of

2^bits = 10^digits,

I guess.  Man, I've got to sit down and learn some more about FP  
numbers

one day.


yes floating point use base 2 numbers: matissa + exponent both base 2.
feqrel gives you how many bits (i.e. base 2 digits) are equal in the  
two numbers.

2^33=8589934592 which has 10 digits.
2^34 has already 11 digits, so having more than 33 binary digits in  
common

means having more than 10 base 10 digits in common.


Thanks!


you are welcome

Fawzi


-Lars




Re: approxEqual() has fooled me for a long time...

2010-10-20 Thread Fawzi Mohamed


On 20-ott-10, at 13:59, Lars T. Kyllingstad wrote:


On Wed, 20 Oct 2010 13:33:49 +0200, Fawzi Mohamed wrote:


On 20-ott-10, at 13:18, Lars T. Kyllingstad wrote:


[...]
However, I, like most people, am a lot more used to thinking in  
terms

of
digits than bits.  If I need my results to be correct to within 10
significant digits, say, how (if possible) would I use feqrel() to
ensure
that?

feqrel(a,b)>33 // 10*log(10)/log(2)


...which would be the solution of

 2^bits = 10^digits,

I guess.  Man, I've got to sit down and learn some more about FP  
numbers

one day.


yes floating point use base 2 numbers: matissa + exponent both base 2.
feqrel gives you how many bits (i.e. base 2 digits) are equal in the  
two numbers.

2^33=8589934592 which has 10 digits.
2^34 has already 11 digits, so having more than 33 binary digits in  
common

means having more than 10 base 10 digits in common.


Thanks!


you are welcome

Fawzi


-Lars




  1   2   3   >