Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Sofia’s World
Hey My 2 cents on CI/Cd for pyspark. You can leverage pytests + holden karau's spark testing libs for CI thus giving you `almost` same functionality as Scala - I say almost as in Scala you have nice and descriptive funcspecs - For me choice is based on expertise.having worked with teams which

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Mich Talebzadeh
Hi Wim, I think we are splitting the atom here but my inference to functionality was based on: 1. Spark is written in Scala, so knowing Scala programming language helps coders navigate into the source code, if something does not function as expected. 2. Given the framework using

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread William R
It's really a very big discussion around Pyspark Vs Scala. I have little bit experience about how we can automate the CI/CD when it's a JVM based language. I would like to take this as an opportunity to understand the end-to-end CI/CD flow for Pyspark based ETL pipelines. Could someone please

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Wim Van Leuven
I think Sean is right, but in your argumentation you mention that 'functionality is sacrificed in favour of the availability of resources'. That's where I disagree with you but agree with Sean. That is mostly not true. In your previous posts you also mentioned this . The only reason we sometimes

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Mich Talebzadeh
Thanks for the feedback Sean. Kind regards, Mich LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * *Disclaimer:* Use it at your own risk. Any and all

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Sean Owen
I don't find this trolling; I agree with the observation that 'the skills you have' are a valid and important determiner of what tools you pick. I disagree that you just have to pick the optimal tool for everything. Sounds good until that comes in contact with the real world. For Spark, Python vs

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Gourav Sengupta
Hi Mich, this is turning into a troll now, can you please stop this? No one uses Scala where Python should be used, and no one uses Python where Scala should be used - it all depends on requirements. Everyone understands polyglot programming and how to use relevant technologies best to their

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Mich Talebzadeh
Today I had a discussion with a lead developer on a client site regarding Scala or PySpark. with Spark. They were not doing data science and reluctantly agreed that PySpark was used for ETL. In mitigation he mentioned that in his team he is the only one that is an expert on Scala (his words) and

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Magnus Nilsson
Holy war is a bit dramatic don't you think?  The difference between Scala and Python will always be very relevant when choosing between Spark and Pyspark. I wouldn't call it irrelevant to the original question. br, molotch On Sat, 17 Oct 2020 at 16:57, "Yuri Oleynikov (‫יורי אולייניקוב‬‎)" <

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Magnus Nilsson
I'm sorry you were offended. I'm not an expert in Python and I wasn't trying to attack you personally. It's just an opinion about what makes a language better or worse, it's not the single source of truth. You don't have to take offense. In the end its about context and what you're trying to

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Holden Karau
Scala and Python have their advantages and disadvantages with Spark. In my experience with performance is super important you’ll end up needing to do some of your work in the JVM, but in many situations what matters work is what your team and company are familiar with and the ecosystem of tooling

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Yuri Oleynikov (‫יורי אולייניקוב‬‎)
It seems that thread converted to holy war that has nothing to do with original question. If it is, it’s super disappointing Отправлено с iPhone > 17 окт. 2020 г., в 15:53, Molotch написал(а): > > I would say the pros and cons of Python vs Scala is both down to Spark, the > languages in

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Sasha Kacanski
And you are an expert on python! Idiomatic... Please do everyone a favor and stop commenting on things you have no idea... I build ETL systems python that wiped java commercial stacks left and right. Pyspark was and is and will be a second class citizen in spark world. That has nothing to do with

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Molotch
I would say the pros and cons of Python vs Scala is both down to Spark, the languages in themselves and what kind of data engineer you will get when you try to hire for the different solutions. With Pyspark you get less functionality and increased complexity with the py4j java interop compared

Re: Scala vs Python for ETL with Spark

2020-10-15 Thread Mich Talebzadeh
Hi, I spent a few days converting one of my Spark/Scala scripts to Python. It was interesting but at times looked like trench war. There is a lot of handy stuff in Scala like case classes for defining column headers etc that don't seem to be available in Python (possibly my lack of in-depth

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
Hi, With regard to your statement below ".technology choices are agnostic to use cases according to you" If I may say, I do not think that was the message implied. What was said was that in addition to "best technology fit" there are other factors "equally important" that need to be

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Gourav Sengupta
So Mich and rest, technology choices are agnostic to use cases according to you? This is interesting, really interesting. Perhaps I stand corrected. Regards, Gourav On Sun, Oct 11, 2020 at 5:00 PM Mich Talebzadeh wrote: > if we take Spark and its massive parallel processing and in-memory >

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
if we take Spark and its massive parallel processing and in-memory cache away, then one can argue anything can do the "ETL" job. just write some Java/Scala/SQL/Perl/python to read data and write to from one DB to another often using JDBC connections. However, we all concur that may not be good

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread ayan guha
But when you have fairly large volume of data that is where spark comes in the party. And I assume the requirement of using spark is already established in the original qs and the discussion is to use python vs scala/java. On Sun, 11 Oct 2020 at 10:51 pm, Sasha Kacanski wrote: > If org has

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
Thanks Ayan. I am not qualified to answer your first point. However, my experience with Spark with Scala or Spark with Python agrees with your assertion that use cases do not come into it. Most DEV/OPS work dealing with ETL are provided by service companies that have workforce very familiar with

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread ayan guha
I have one observation: is "python udf is slow due to deserialization penulty" still relevant? Even after arrow is used as in memory data mgmt and so heavy investment from spark dev community on making pandas first class citizen including Udfs. As I work with multiple clients, my exp is org

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Gourav Sengupta
Not quite sure how meaningful this discussion is, but in case someone is really faced with this query the question still is 'what is the use case'? I am just a bit confused with the one size fits all deterministic approach here thought that those days were over almost 10 years ago. Regards Gourav

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Stephen Boesch
I agree with Wim's assessment of data engineering / ETL vs Data Science. I wrote pipelines/frameworks for large companies and scala was a much better choice. But for ad-hoc work interfacing directly with data science experiments pyspark presents less friction. On Sat, 10 Oct 2020 at 13:03, Mich

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Mich Talebzadeh
Many thanks everyone for their valuable contribution. We all started with Spark a few years ago where Scala was the talk of the town. I agree with the note that as long as Spark stayed nish and elite, then someone with Scala knowledge was attracting premiums. In fairness in 2014-2015, there was

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Jacek Pliszka
I would not leave it to data scientists unless they will maintain it. The key decision in cases I've seen was usually people cost/availability with ETL operations cost taken into account. Often the situation is that ETL cloud cost is small and you will not save much. Then it is just skills

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Jörn Franke
It really depends on what your data scientists talk. I don’t think it makes sense for ad hoc data science things to impose a language on them, but let them choose. For more complex AI engineering things you can though apply different standards and criteria. And then it really depends on

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Wim Van Leuven
Hey Mich, This is a very fair question .. I've seen many data engineering teams start out with Scala because technically it is the best choice for many given reasons and basically it is what Spark is. On the other hand, almost all use cases we see these days are data science use cases where

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Gourav Sengupta
What is the use case? Unless you have unlimited funding and time to waste you would usually start with that. Regards, Gourav On Fri, Oct 9, 2020 at 10:29 PM Russell Spitzer wrote: > Spark in Scala (or java) Is much more performant if you are using RDD's, > those operations basically force you

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Russell Spitzer
Spark in Scala (or java) Is much more performant if you are using RDD's, those operations basically force you to pass lambdas, hit serialization between java and python types and yes hit the Global Interpreter Lock. But, none of those things apply to Data Frames which will generate Java code

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Mich Talebzadeh
Thanks So ignoring Python lambdas is it a matter of individuals familiarity with the language that is the most important factor? Also I have noticed that Spark document preferences have been switched from Scala to Python as the first example. However, some codes for example JDBC calls are the

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Russell Spitzer
As long as you don't use python lambdas in your Spark job there should be almost no difference between the Scala and Python dataframe code. Once you introduce python lambdas you will hit some significant serialization penalties as well as have to run actual work code in python. As long as no

Re: Scala Vs Python

2016-09-06 Thread 刘虓
time:* Tuesday, Sep 6, 2016 8:07 AM > *To:* "darren"<dar...@ontrenet.com>; > *Cc:* "Mich Talebzadeh"<mich.talebza...@gmail.com>; "Jakob Odersky"< > ja...@odersky.com>; "ayan guha"<guha.a...@gmail.com>; "kant kodali"&

Re: Scala Vs Python

2016-09-06 Thread Leonard Cohen
akob Odersky"<ja...@odersky.com>; "ayan guha"<guha.a...@gmail.com>; "kant kodali"<kanth...@gmail.com>; "AssafMendelson"<assaf.mendel...@rsa.com>; "user"<user@spark.apache.org>; Subject: Re: Scala Vs Python On Thu, Sep 1, 20

Re: Scala Vs Python

2016-09-05 Thread Luciano Resende
On Thu, Sep 1, 2016 at 3:15 PM, darren wrote: > This topic is a concern for us as well. In the data science world no one > uses native scala or java by choice. It's R and Python. And python is > growing. Yet in spark, python is 3rd in line for feature support, if at all. > >

Re: Scala Vs Python

2016-09-05 Thread Gourav Sengupta
tion. I ran it on two different cluster configurations and ran it > several times to get some idea on the noise. > > Of course, the more complicated the UDF, the less the overhead affects you. > > Hope this helps. > > Assaf > > > > > &g

RE: Scala Vs Python

2016-09-04 Thread AssafMendelson
Assaf From: ayan guha [mailto:guha.a...@gmail.com] Sent: Sunday, September 04, 2016 11:00 AM To: Mendelson, Assaf Cc: user Subject: Re: Scala Vs Python Hi This one is quite interesting. Is it possible to share few toy examples? On Sun, Sep 4, 2016 at 5:23 PM, AssafMendelson

Re: Scala Vs Python

2016-09-04 Thread Simon Edelhaus
gt;> wrap it to be accessible from python. >> >> >> >> >> >> *From:* ayan guha [mailto:[hidden email] >> <http:///user/SendEmail.jtp?type=node=27650=0>] >> *Sent:* Friday, September 02, 2016 12:21 AM >> *To:* kant kodali >> *Cc

Re: Scala Vs Python

2016-09-04 Thread ayan guha
d we write them a scala one and then > wrap it to be accessible from python. > > > > > > *From:* ayan guha [mailto:[hidden email] > <http:///user/SendEmail.jtp?type=node=27650=0>] > *Sent:* Friday, September 02, 2016 12:21 AM > *To:* kant kodali > *Cc:* Mendel

RE: Scala Vs Python

2016-09-04 Thread AssafMendelson
ask my team (which does the engineering) and we write them a scala one and then wrap it to be accessible from python. From: ayan guha [mailto:guha.a...@gmail.com] Sent: Friday, September 02, 2016 12:21 AM To: kant kodali Cc: Mendelson, Assaf; user Subject: Re: Scala Vs Python Thanks All

Re: Scala Vs Python

2016-09-02 Thread darren
ail.com>, AssafMendelson <assaf.mendel...@rsa.com>, user <user@spark.apache.org> Subject: Re: Scala Vs Python Whatever benefits you may accrue from the rapid prototyping and coding in Python, it will be offset against the time taken to convert it to run inside the JVM. This o

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
No offence taken. Glad that it was rectified. Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
I apologize for my harsh tone. You are right, it was unnecessary and discourteous. On Fri, Sep 2, 2016 at 11:01 AM Mich Talebzadeh wrote: > Hi, > > You made such statement: > > "That's complete nonsense." > > That is a strong language and void of any courtesy. Only

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
You made a specific claim -- that Spark will move away from Python -- which I responded to with clear references and data. How on earth is that a "religious argument"? I'm not saying that Python is better than Scala or anything like that. I'm just addressing your specific claim about its future

Re: Scala Vs Python

2016-09-02 Thread andy petrella
looking at the examples, indeed they make nonsense :D On Fri, 2 Sep 2016 16:48 Mich Talebzadeh, wrote: > Right so. We are back into religious arguments. Best of luck > > > > Dr Mich Talebzadeh > > > > LinkedIn * >

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
Right so. We are back into religious arguments. Best of luck Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
On Fri, Sep 2, 2016 at 3:58 AM Mich Talebzadeh wrote: > I believe as we progress in time Spark is going to move away from Python. If > you look at 2014 Databricks code examples, they were mostly in Python. Now > they are mostly in Scala for a reason. > That's complete

Re: Scala Vs Python

2016-09-02 Thread Sivakumaran S
Whatever benefits you may accrue from the rapid prototyping and coding in Python, it will be offset against the time taken to convert it to run inside the JVM. This of course depends on the complexity of the DAG. I guess it is a matter of language preference. Regards, Sivakumaran S > On

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
>From an outsider point of view nobody likes change :) However, it appears to me that Scala is a rising star and if one learns it, it is another iron in the fire so to speak. I believe as we progress in time Spark is going to move away from Python. If you look at 2014 Databricks code examples,

Re: Scala Vs Python

2016-09-02 Thread Jakob Odersky
Forgot to answer your question about feature parity of Python w.r.t. Spark's different components I mostly work with scala so I can't say for sure but I think that all pre-2.0 features (that's basically everything except Structured Streaming) are on par. Structured Streaming is a pretty new

Re: Scala Vs Python

2016-09-02 Thread Jakob Odersky
As you point out, often the reason that Python support lags behind is that functionality is implemented in Scala, so the API in that language is "free" whereas Python support needs to be added explicitly. Nevertheless, Python bindings are an important part of Spark and is used by many people (this

RE: Scala Vs Python

2016-09-02 Thread Santoshakhilesh
[mailto:guha.a...@gmail.com] Sent: 02 September 2016 15:25 To: Tal Grynbaum Cc: darren; Mich Talebzadeh; Jakob Odersky; kant kodali; AssafMendelson; user Subject: Re: Scala Vs Python Tal: I think by nature of the project itself, Python APIs are developed after Scala and Java, and it is a fair

Re: Scala Vs Python

2016-09-02 Thread ayan guha
Tal: I think by nature of the project itself, Python APIs are developed after Scala and Java, and it is a fair trade off between speed of getting stuff to market. And more and more this discussion is progressing, I see not much issue in terms of feature parity. Coming back to performance, Darren

Re: Scala Vs Python

2016-09-02 Thread Tal Grynbaum
On Fri, Sep 2, 2016 at 1:15 AM, darren wrote: > This topic is a concern for us as well. In the data science world no one > uses native scala or java by choice. It's R and Python. And python is > growing. Yet in spark, python is 3rd in line for feature support, if at all. > >

Re: Scala Vs Python

2016-09-01 Thread ayan guha
;>>>> assaf.mendel...@rsa.com wrote: >>>>>>> >>>>>>>> I believe this would greatly depend on your use case and your >>>>>>>> familiarity with the languages. >>>>>>>> >>>>>>>&

Re: Scala Vs Python

2016-09-01 Thread Jakob Odersky
then the performance hit is practically nonexistent. >>>>>>> >>>>>>> Even if you need UDF, it is possible to write those in scala and >>>>>>> wrap them for python and still get away without the performance hit. >>>>>>> >>>>>>

Re: Scala Vs Python

2016-09-01 Thread Mich Talebzadeh
ave interfaces for UDAFs. >>>>>> >>>>>> >>>>>> >>>>>> I believe that if you have large structured data and do not generally >>>>>> need UDF/UDAF you can certainly work in python without losing too much. >>>>>>

Re: Scala Vs Python

2016-09-01 Thread darren
: Mich Talebzadeh <mich.talebza...@gmail.com> Date: 9/1/16 6:01 PM (GMT-05:00) To: Jakob Odersky <ja...@odersky.com> Cc: ayan guha <guha.a...@gmail.com>, kant kodali <kanth...@gmail.com>, AssafMendelson <assaf.mendel...@rsa.com>, user <user@spark.apache.org&

Re: Scala Vs Python

2016-09-01 Thread Peyman Mohajerian
>>>> *From:* ayan guha [mailto:[hidden email] >>>>> <http:///user/SendEmail.jtp?type=node=27637=0>] >>>>> *Sent:* Thursday, September 01, 2016 5:03 AM >>>>> *To:* user >>>>> *Subject:* Scala Vs Python >>>>> >>>&g

Re: Scala Vs Python

2016-09-01 Thread Mich Talebzadeh
;> >>>> Thought to ask (again and again) the question: While I am building any >>>> production application, should I use Scala or Python? >>>> >>>> >>>> >>>> I have read many if not most articles but all seems pre-Spark 2. >>>> Anything changed with Spark 2? Either pro-scala way or pro-python way? >>>> >>>> >>>> >>>> I am thinking performance, feature parity and future direction, not so >>>> much in terms of skillset or ease of use. >>>> >>>> >>>> >>>> Or, if you think it is a moot point, please say so as well. >>>> >>>> >>>> >>>> Any real life example, production experience, anecdotes, personal >>>> taste, profanity all are welcome :) >>>> >>>> >>>> >>>> -- >>>> >>>> Best Regards, >>>> Ayan Guha >>>> >>>> -- >>>> View this message in context: RE: Scala Vs Python >>>> <http://apache-spark-user-list.1001560.n3.nabble.com/RE-Scala-Vs-Python-tp27637.html> >>>> Sent from the Apache Spark User List mailing list archive >>>> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >>>> >>> >> >> >> -- >> Best Regards, >> Ayan Guha >> > >

Re: Scala Vs Python

2016-09-01 Thread Jakob Odersky
e read many if not most articles but all seems pre-Spark 2. >>> Anything changed with Spark 2? Either pro-scala way or pro-python way? >>> >>> >>> >>> I am thinking performance, feature parity and future direction, not so >>> much in terms

Re: Scala Vs Python

2016-09-01 Thread ayan guha
rmance, feature parity and future direction, not so >> much in terms of skillset or ease of use. >> >> >> >> Or, if you think it is a moot point, please say so as well. >> >> >> >> Any real life example, production experience, anecdotes, personal taste, >

Re: Scala Vs Python

2016-09-01 Thread kant kodali
View this message in context: RE: Scala Vs Python Sent from the Apache Spark User List mailing list archive at Nabble.com.

RE: Scala Vs Python

2016-09-01 Thread AssafMendelson
point, please say so as well. Any real life example, production experience, anecdotes, personal taste, profanity all are welcome :) -- Best Regards, Ayan Guha -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RE-Scala-Vs-Python-tp27637.html Sent from

RE: Scala Vs Python

2016-08-31 Thread Santoshakhilesh
Hi , I would prefer Scala if you are starting afresh , this is considering both ease of usage , features , performance and support. You will find numerous examples & support with Scala which might not be true for any other language. I had personally developed the first version of my App using

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread Jörn Franke
Python can access the JVM - this how it interfaces with Spark. Some of the components do not have a wrapper fro the corresponding Java Api yet and thus are not accessible in Python. Same for elastic search. You need to write a more or less simple wrapper. > On 20 Apr 2016, at 09:53,

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread kramer2...@126.com
I am using python and spark. I think one problem might be to communicate spark with third product. For example, combine spark with elasticsearch. You have to use java or scala. Python is not supported -- View this message in context:

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread Zhang, Jingyu
Graphx did not support Python yet. http://spark.apache.org/docs/latest/graphx-programming-guide.html The workaround solution is use graphframes (3rd party API), https://issues.apache.org/jira/browse/SPARK-3789 but some features in Python are not as same as Scala,

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread sujeet jog
It depends on the trade off's you wish to have, Python being a interpreted language, speed of execution will be lesser, but it being a very common language used across, people can jump in hands on quickly Scala programs run in java environment, so it's obvious you will get good execution speed,

Re: Scala vs Python performance differences

2015-01-16 Thread philpearl
I was interested in this as I had some Spark code in Python that was too slow and wanted to know whether Scala would fix it for me. So I re-wrote my code in Scala. In my particular case the Scala version was 10 times faster. But I think that is because I did an awful lot of computation in my

Re: Scala vs Python performance differences

2015-01-16 Thread Davies Liu
Hey Phil, Thank you sharing this. The result didn't surprise me a lot, it's normal to do the prototype in Python, once it get stable and you really need the performance, then rewrite part of it in C or whole of it in another language does make sense, it will not cause you much time. Davies On

Re: Scala vs Python performance differences

2014-11-12 Thread Andrew Ash
Jeremy, Did you complete this benchmark in a way that's shareable with those interested here? Andrew On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I'd also be interested in seeing such a benchmark. On Tue, Apr 15, 2014 at 9:25 AM, Ian Ferreira

Re: Scala vs Python performance differences

2014-11-12 Thread Samarth Mailinglist
I was about to ask this question. On Wed, Nov 12, 2014 at 3:42 PM, Andrew Ash and...@andrewash.com wrote: Jeremy, Did you complete this benchmark in a way that's shareable with those interested here? Andrew On Tue, Apr 15, 2014 at 2:50 PM, Nicholas Chammas nicholas.cham...@gmail.com

Re: Scala vs Python performance differences

2014-04-15 Thread Ian Ferreira
This would be super useful. Thanks. On 4/15/14, 1:30 AM, Jeremy Freeman freeman.jer...@gmail.com wrote: Hi Andrew, I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on ML algorithms, as I'm particularly curious about the relative performance of MLlib in Scala vs the Python

Re: Scala vs Python performance differences

2014-04-14 Thread Jeremy Freeman
Hi Andrew, I'm putting together some benchmarks for PySpark vs Scala. I'm focusing on ML algorithms, as I'm particularly curious about the relative performance of MLlib in Scala vs the Python MLlib API vs pure Python implementations. Will share real results as soon as I have them, but roughly,