Re: OT: why do people use python when it is slow?

Laeeth Isharc via Digitalmars-d-learn Thu, 15 Oct 2015 14:21:14 -0700

On Wednesday, 14 October 2015 at 22:11:56 UTC, data pulverizerwrote:

On Tuesday, 13 October 2015 at 23:26:14 UTC, Laeeth Isharcwrote:
https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow
Andrei suggested posting more widely.
I am coming at D by way of R, C++, Python etc. so I speak as astatistician who is interested in data science applications.


Welcome...  Looks like we have similar interests.

To sit on the deployment side, D needs to grow it's bigdata/noSQL infrastructure for a start, then hook into a wholeecosystem of analytic tools in an easy and straightforwardmanner. This will take a lot of work!

Indeed. The dlangscience project managed by John Colvin is veryinteresting. It is not a pure stats project, but there will bemany shared areas of need. He has some v interesting ideas, andbeing able to mix Python and D in a Jupyter notebook is rathernice (you can do this already).

I believe it is easier and more effective to start on theresearch side. D will need:
1. A data table structure like R's data.frame or data.table.This is a dynamic data structure that represents a table thatcan have lots of operations applied to it. It is the datastructure that separates R from most programming languages. Itis what pandas tries to emulate. This includes text file anddatabase i/o from mySQL and ODBC for a start.

I fully agree, and have made a very simple start on this. Seegithub. It's usable for my needs as they stand, although far fromproduction ready or elegant. You can read and write to/from CSVand HDF5. I guess mysql and ODBC wouldn't be hard to add, but Idon't myself need for now and won't have time to do myself. If Ihave space I may channel some reesources in that direction sometime next year.

2. Formula class : the ability to talk about statistical modelsusing formulas e.g. y ~ x1 + x2 + x3 etc and then use theseformulas to generate model matrices for input into statisticalalgorithms.

Sounds interesting. Take a look at Colvin's dlang science draftwhite paper, and see what you would add. It's a chance to shapethings whilst they are still fluid.

3. Solid interface to a big data database, that allows a D datatable <-> database easily

Which ones do you have in mind for stats? The different choicesseem to serve quite different needs. And when you say big data,how big do you typically mean ?

4. Functional programming: especially around data table andarray structures. R's apply(), lapply(), tapply(), plyr and nowdata.table(,, by = list()) provides powerful tools for datamanipulation.


Any thoughts on what the design should look like?

To an extent there is a balance between wanting to explore dataiteratively (when you don't know where you will end up), andwanting to build a robust process for production. I have beenwondering myself about using LuaJIT to strap together D buildingblocks for the exploration (and calling it based on a customconsole built around Adam Ruppe's terminal).

5. A factor data type:for categorical variables. This is easyto implement! This ties into the creation of model matrices.
6. Nullable types makes talking about missing data morestraightforward and gives you the opportunity to code them intoa set value in your analysis. D is streaks ahead of Pythonhere, but this is built into R at a basic level.

So matrices with nullable types within? Is nan enough for you ?If not then could be quite expensive if back end is C.

If D can get points 1, 2, 3 many people would be all over Dbecause it is a fantastic programming language and is wickedfast.

What do you like best about it ? And in your own domain, whathave the biggest payoffs been in practice?

Re: OT: why do people use python when it is slow?

Reply via email to