On Thu, 2015-10-15 at 06:48 +0000, data pulverizer via Digitalmars-d- learn wrote: > […] > A journey of a thousand miles ...
Exactly. > I tried to start creating a data table type object by > investigating variantArray: > http://forum.dlang.org/thread/hhzavwrkbrkjzfohc...@forum.dlang.org > but hit the snag that D is a static programming language and may not > allow the kind of behaviour you need for creating the same kind of > behaviour you need in data table - like objects. > > I envisage such an object as being composed of arrays of vectors > where each vector represents a column in a table as in R - easier > for model matrix creation. Some people believe that you should > work with arrays of tuple rows - which may be more big data > friendly. I am not overly wedded to either approach. > > Anyway it seems I have hit an inherent limitation in the > language. Correct me if I am wrong. The data frame needs to have > dynamic behaviour bind rows and columns and return parts of > itself as a data table etc and since D is a static language we > cannot do this. Just because D doesn't have this now doesn't mean it cannot. C doesn't have such capability but R and Python do even though R and CPython are just C codes. Pandas data structures rely on the NumPy n-dimensional array implementation, it is not beyond the bounds of possibility that that data structure could be realized as a D module. Is R's data.table written in R or in C? In either case, it is not beyond the bounds of possibility that that data structure could be realized as a D module. The core issue is to have a seriously efficient n-dimensional array that is amenable to data parallelism and is extensible. As far as I am aware currently (I will investigate more) the NumPy array is a good native code array, but has some issues with data parallelism and Pandas has to do quite a lot of work to get the extensibility. I wonder how the R data.table works. I have this nagging feeling that like NumPy, data.table seems a lot better than it could be. From small experiments D is (and also Chapel is even more) hugely faster than Python/NumPy at things Python people think NumPy is brilliant for. Expectations of Python programmers are set by the scale of Python performance, so NumPy seems brilliant. Compared to the scale set by D and Chapel, NumPy is very disappointing. I bet the same is true of R (I have never really used R). This is therefore an opportunity for D to step in. However it is a journey of a thousand miles to get something production worthy. Python/NumPy/Pandas have had a very large number of programmer hours expended on them. Doing this poorly as a D modules is likely worse than not doing it at all. -- Russel. ============================================================================= Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.win...@ekiga.net 41 Buckmaster Road m: +44 7770 465 077 xmpp: rus...@winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
signature.asc
Description: This is a digitally signed message part