Re: [go-nuts] Go for Data Science

Leo R Tue, 16 Jul 2019 14:47:10 -0700

My point is that contemporary Data Science stack is using too many 
different languages all way from scripting (R, Python) to statically 
compiled C/C++ and sometimes Fortran (R, some scipy algos are in Fortran) 
and even JVM based Scala. This creates artificial barriers -- data 
scientists play the Python/R game but struggle with Scala, software 
engineers write pipelines in Spark/Scala but have no interest in R. Often 
deploying to production requires recoding from one language to another. I 
hope as the field matures there would be more consolidation and unification 
across the language zoo. Language barriers in scientifically heavy fields 
are not healthy. In Statistics, Python's stats.models is a pale shadow of 
R's CRAN. Science community is split along the language lines that spreads 
already thin resources even further.
--Leo


On Tuesday, July 16, 2019 at 4:45:39 PM UTC-4, Jesper Louis Andersen wrote:
>
> On Tue, Jul 16, 2019 at 7:18 PM Slonik Az <slon...@gmail.com <javascript:>> 
> wrote:
>
>> REPL in a static AOT compiled language is hard, yet Swift somehow managed 
>> to implement it.
>>
>>
> I must disagree. The technique is somewhat well known and has a long 
> history. See e.g., various common lisp, and standard ml implementations. If 
> you are willing to accept a hybrid of a byte-code interpreter with a native 
> code compiler at your disposal, then ocaml and haskell will suffice in 
> addition. When a function is defined in the REPL you just call the 
> compiler, and it emits assembly language. You then mark this region as 
> executable in the memory, and you just jump to this when the function is 
> invoked. In some cases, a dispatch table is used so a function can be 
> replaced post-facto. It has fallen somewhat out of favor for the hybrid 
> approaches. Probably because modern computers are fast enough when you are 
> exploring.
>
> In my experience, most data science is about processing of data, so it is 
> suitable for doing science. Exploratory tools are good for understanding 
> the model you are working in. However, real world data processing can 
> require you to work on several terabytes of data (or more!). There is a 
> threshold where it starts becoming beneficial to optimize the processing 
> pipeline, especially the pre-processing parts. And lower level languages, 
> such as Go, tend to fare really well here. These lower level tools can then 
> be hooked into e.g., R and Python, empowering the exploratory part of the 
> system.
>
> Another important point is that modern computational kernels, for instance 
> TensorFlow, are really compilers from a data-flow graph representation to 
> highly optimized numerical routines. Some of which executes on specialized 
> numerical hardware (8-32bit floating point SIMD hardware). You can define 
> such a graph in Python, but then export it and use it in other systems and 
> pipelines. As such Python, your exploratory vehicle, provides a plug-in for 
> a more low-level processing pipeline. This also allows part of the graph to 
> run inside a mobile client. The plug in model is also followed by parallel 
> array processing language, see e.g., Futhark (https://futhark-lang.org/). 
> You embed your kernel in another system. If you read Michael Jones post, 
> there are important similarities.
>
> -- 
> J.
>

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/274c9e47-5034-489b-9ce6-f8c8bde33815%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [go-nuts] Go for Data Science

Reply via email to