On Thu, 2015-10-15 at 06:48 +0000, data pulverizer via Digitalmars-d-
learn wrote:
> A journey of a thousand miles ...


> I tried to start creating a data table type object by 
> investigating variantArray: 
> http://forum.dlang.org/thread/hhzavwrkbrkjzfohc...@forum.dlang.org
>  but hit the snag that D is a static programming language and may not
> allow the kind of behaviour you need for creating the same kind of
> behaviour you need in data table - like objects.
> I envisage such an object as being composed of arrays of vectors 
> where each vector represents a column in a table as in R - easier 
> for model matrix creation. Some people believe that you should 
> work with arrays of tuple rows - which may be more big data 
> friendly. I am not overly wedded to either approach.
> Anyway it seems I have hit an inherent limitation in the 
> language. Correct me if I am wrong. The data frame needs to have 
> dynamic behaviour bind rows and columns and return parts of 
> itself as a data table etc and since D is a static language we 
> cannot do this.

Just because D doesn't have this now doesn't mean it cannot. C doesn't
have such capability but R and Python do even though R and CPython are
just C codes.

Pandas data structures rely on the NumPy n-dimensional array
implementation, it is not beyond the bounds of possibility that that
data structure could be realized as a D module.

Is R's data.table written in R or in C? In either case, it is not
beyond the bounds of possibility that that data structure could be
realized as a D module.

The core issue is to have a seriously efficient n-dimensional array
that is amenable to data parallelism and is extensible. As far as I am
aware currently (I will investigate more) the NumPy array is a good
native code array, but has some issues with data parallelism and Pandas
has to do quite a lot of work to get the extensibility. I wonder how
the R data.table works.

I have this nagging feeling that like NumPy, data.table seems a lot
better than it could be. From small experiments D is (and also Chapel
is even more) hugely faster than Python/NumPy at things Python people
think NumPy is brilliant for. Expectations of Python programmers are
set by the scale of Python performance, so NumPy seems brilliant.
Compared to the scale set by D and Chapel, NumPy is very disappointing.
I bet the same is true of R (I have never really used R).

This is therefore an opportunity for D to step in. However it is a
journey of a thousand miles to get something production worthy.
Python/NumPy/Pandas have had a very large number of programmer hours
expended on them.  Doing this poorly as a D modules is likely worse
than not doing it at all.

Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.win...@ekiga.net
41 Buckmaster Road    m: +44 7770 465 077   xmpp: rus...@winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to