Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Sofia’s World
Hey My 2 cents on CI/Cd for pyspark. You can leverage pytests + holden karau's spark testing libs for CI thus giving you `almost` same functionality as Scala - I say almost as in Scala you have nice and descriptive funcspecs - For me choice is based on expertise.having worked with teams which

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Mich Talebzadeh
. Some functionalities are not available in Python. I have seen this few times in Spark doc. There is an interesting write-up on this, although it does on touch on CI/CD aspects. Developing Apache Spark Applications: Scala vs. Python <https://www.pluralsight.com/blog/software-developm

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread William R
It's really a very big discussion around Pyspark Vs Scala. I have little bit experience about how we can automate the CI/CD when it's a JVM based language. I would like to take this as an opportunity to understand the end-to-end CI/CD flow for Pyspark based ETL pipelines. Could someone please

Re: Scala vs Python for ETL with Spark

2020-10-23 Thread Wim Van Leuven
I think Sean is right, but in your argumentation you mention that 'functionality is sacrificed in favour of the availability of resources'. That's where I disagree with you but agree with Sean. That is mostly not true. In your previous posts you also mentioned this . The only reason we sometimes

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Mich Talebzadeh
Thanks for the feedback Sean. Kind regards, Mich LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * *Disclaimer:* Use it at your own risk. Any and all

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Sean Owen
I don't find this trolling; I agree with the observation that 'the skills you have' are a valid and important determiner of what tools you pick. I disagree that you just have to pick the optimal tool for everything. Sounds good until that comes in contact with the real world. For Spark, Python vs

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Gourav Sengupta
Hi Mich, this is turning into a troll now, can you please stop this? No one uses Scala where Python should be used, and no one uses Python where Scala should be used - it all depends on requirements. Everyone understands polyglot programming and how to use relevant technologies best to their

Re: Scala vs Python for ETL with Spark

2020-10-22 Thread Mich Talebzadeh
Today I had a discussion with a lead developer on a client site regarding Scala or PySpark. with Spark. They were not doing data science and reluctantly agreed that PySpark was used for ETL. In mitigation he mentioned that in his team he is the only one that is an expert on Scala (his words) and

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Magnus Nilsson
Holy war is a bit dramatic don't you think?  The difference between Scala and Python will always be very relevant when choosing between Spark and Pyspark. I wouldn't call it irrelevant to the original question. br, molotch On Sat, 17 Oct 2020 at 16:57, "Yuri Oleynikov (‫יורי אולייניקוב‬‎)" <

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Magnus Nilsson
I'm sorry you were offended. I'm not an expert in Python and I wasn't trying to attack you personally. It's just an opinion about what makes a language better or worse, it's not the single source of truth. You don't have to take offense. In the end its about context and what you're trying to

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Holden Karau
Scala and Python have their advantages and disadvantages with Spark. In my experience with performance is super important you’ll end up needing to do some of your work in the JVM, but in many situations what matters work is what your team and company are familiar with and the ecosystem of tooling

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Yuri Oleynikov (‫יורי אולייניקוב‬‎)
It seems that thread converted to holy war that has nothing to do with original question. If it is, it’s super disappointing Отправлено с iPhone > 17 окт. 2020 г., в 15:53, Molotch написал(а): > > I would say the pros and cons of Python vs Scala is both down to Spark, the > languages in

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Sasha Kacanski
And you are an expert on python! Idiomatic... Please do everyone a favor and stop commenting on things you have no idea... I build ETL systems python that wiped java commercial stacks left and right. Pyspark was and is and will be a second class citizen in spark world. That has nothing to do with

Re: Scala vs Python for ETL with Spark

2020-10-17 Thread Molotch
I would say the pros and cons of Python vs Scala is both down to Spark, the languages in themselves and what kind of data engineer you will get when you try to hire for the different solutions. With Pyspark you get less functionality and increased complexity with the py4j java interop compared

Re: Scala vs Python for ETL with Spark

2020-10-15 Thread Mich Talebzadeh
Hi, I spent a few days converting one of my Spark/Scala scripts to Python. It was interesting but at times looked like trench war. There is a lot of handy stuff in Scala like case classes for defining column headers etc that don't seem to be available in Python (possibly my lack of in-depth

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
Hi, With regard to your statement below ".technology choices are agnostic to use cases according to you" If I may say, I do not think that was the message implied. What was said was that in addition to "best technology fit" there are other factors "equally important" that need to be

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Gourav Sengupta
So Mich and rest, technology choices are agnostic to use cases according to you? This is interesting, really interesting. Perhaps I stand corrected. Regards, Gourav On Sun, Oct 11, 2020 at 5:00 PM Mich Talebzadeh wrote: > if we take Spark and its massive parallel processing and in-memory >

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
if we take Spark and its massive parallel processing and in-memory cache away, then one can argue anything can do the "ETL" job. just write some Java/Scala/SQL/Perl/python to read data and write to from one DB to another often using JDBC connections. However, we all concur that may not be good

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread ayan guha
But when you have fairly large volume of data that is where spark comes in the party. And I assume the requirement of using spark is already established in the original qs and the discussion is to use python vs scala/java. On Sun, 11 Oct 2020 at 10:51 pm, Sasha Kacanski wrote: > If org has

Re: Scala vs Python for ETL with Spark

2020-10-11 Thread Mich Talebzadeh
Thanks Ayan. I am not qualified to answer your first point. However, my experience with Spark with Scala or Spark with Python agrees with your assertion that use cases do not come into it. Most DEV/OPS work dealing with ETL are provided by service companies that have workforce very familiar with

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread ayan guha
I have one observation: is "python udf is slow due to deserialization penulty" still relevant? Even after arrow is used as in memory data mgmt and so heavy investment from spark dev community on making pandas first class citizen including Udfs. As I work with multiple clients, my exp is org

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Gourav Sengupta
Not quite sure how meaningful this discussion is, but in case someone is really faced with this query the question still is 'what is the use case'? I am just a bit confused with the one size fits all deterministic approach here thought that those days were over almost 10 years ago. Regards Gourav

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Stephen Boesch
I agree with Wim's assessment of data engineering / ETL vs Data Science. I wrote pipelines/frameworks for large companies and scala was a much better choice. But for ad-hoc work interfacing directly with data science experiments pyspark presents less friction. On Sat, 10 Oct 2020 at 13:03, Mich

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Mich Talebzadeh
Many thanks everyone for their valuable contribution. We all started with Spark a few years ago where Scala was the talk of the town. I agree with the note that as long as Spark stayed nish and elite, then someone with Scala knowledge was attracting premiums. In fairness in 2014-2015, there was

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Jacek Pliszka
I would not leave it to data scientists unless they will maintain it. The key decision in cases I've seen was usually people cost/availability with ETL operations cost taken into account. Often the situation is that ETL cloud cost is small and you will not save much. Then it is just skills

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Jörn Franke
It really depends on what your data scientists talk. I don’t think it makes sense for ad hoc data science things to impose a language on them, but let them choose. For more complex AI engineering things you can though apply different standards and criteria. And then it really depends on

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Wim Van Leuven
people mostly do python. So, if you need those two worlds collaborate and even handover code, you don't want the ideological battle of Scala vs Python. We chose python for the sake of everybody speaking the same language. But it is true, if you do Spark DataFrames, because then PySpark is a thin layer

Re: Scala vs Python for ETL with Spark

2020-10-10 Thread Gourav Sengupta
What is the use case? Unless you have unlimited funding and time to waste you would usually start with that. Regards, Gourav On Fri, Oct 9, 2020 at 10:29 PM Russell Spitzer wrote: > Spark in Scala (or java) Is much more performant if you are using RDD's, > those operations basically force you

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Russell Spitzer
Spark in Scala (or java) Is much more performant if you are using RDD's, those operations basically force you to pass lambdas, hit serialization between java and python types and yes hit the Global Interpreter Lock. But, none of those things apply to Data Frames which will generate Java code

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Mich Talebzadeh
Thanks So ignoring Python lambdas is it a matter of individuals familiarity with the language that is the most important factor? Also I have noticed that Spark document preferences have been switched from Scala to Python as the first example. However, some codes for example JDBC calls are the

Re: Scala vs Python for ETL with Spark

2020-10-09 Thread Russell Spitzer
As long as you don't use python lambdas in your Spark job there should be almost no difference between the Scala and Python dataframe code. Once you introduce python lambdas you will hit some significant serialization penalties as well as have to run actual work code in python. As long as no

Scala vs Python for ETL with Spark

2020-10-09 Thread Mich Talebzadeh
I have come across occasions when the teams use Python with Spark for ETL, for example processing data from S3 buckets into Snowflake with Spark. The only reason I think they are choosing Python as opposed to Scala is because they are more familiar with Python. Since Spark is written in Scala,

Re: Scala Vs Python

2016-09-06 Thread 刘虓
time:* Tuesday, Sep 6, 2016 8:07 AM > *To:* "darren"<dar...@ontrenet.com>; > *Cc:* "Mich Talebzadeh"<mich.talebza...@gmail.com>; "Jakob Odersky"< > ja...@odersky.com>; "ayan guha"<guha.a...@gmail.com>; "kant kodali"&

Re: Scala Vs Python

2016-09-06 Thread Leonard Cohen
akob Odersky"<ja...@odersky.com>; "ayan guha"<guha.a...@gmail.com>; "kant kodali"<kanth...@gmail.com>; "AssafMendelson"<assaf.mendel...@rsa.com>; "user"<user@spark.apache.org>; Subject: Re: Scala Vs Python On Thu, Sep 1, 20

Re: Scala Vs Python

2016-09-05 Thread Luciano Resende
On Thu, Sep 1, 2016 at 3:15 PM, darren wrote: > This topic is a concern for us as well. In the data science world no one > uses native scala or java by choice. It's R and Python. And python is > growing. Yet in spark, python is 3rd in line for feature support, if at all. > >

Re: Scala Vs Python

2016-09-05 Thread Gourav Sengupta
tion. I ran it on two different cluster configurations and ran it > several times to get some idea on the noise. > > Of course, the more complicated the UDF, the less the overhead affects you. > > Hope this helps. > > Assaf > > > > > &g

RE: Scala Vs Python

2016-09-04 Thread AssafMendelson
Assaf From: ayan guha [mailto:guha.a...@gmail.com] Sent: Sunday, September 04, 2016 11:00 AM To: Mendelson, Assaf Cc: user Subject: Re: Scala Vs Python Hi This one is quite interesting. Is it possible to share few toy examples? On Sun, Sep 4, 2016 at 5:23 PM, AssafMendelson

Re: Scala Vs Python

2016-09-04 Thread Simon Edelhaus
gt;> wrap it to be accessible from python. >> >> >> >> >> >> *From:* ayan guha [mailto:[hidden email] >> <http:///user/SendEmail.jtp?type=node=27650=0>] >> *Sent:* Friday, September 02, 2016 12:21 AM >> *To:* kant kodali >> *Cc

Re: Scala Vs Python

2016-09-04 Thread ayan guha
d we write them a scala one and then > wrap it to be accessible from python. > > > > > > *From:* ayan guha [mailto:[hidden email] > <http:///user/SendEmail.jtp?type=node=27650=0>] > *Sent:* Friday, September 02, 2016 12:21 AM > *To:* kant kodali > *Cc:* Mendel

RE: Scala Vs Python

2016-09-04 Thread AssafMendelson
ask my team (which does the engineering) and we write them a scala one and then wrap it to be accessible from python. From: ayan guha [mailto:guha.a...@gmail.com] Sent: Friday, September 02, 2016 12:21 AM To: kant kodali Cc: Mendelson, Assaf; user Subject: Re: Scala Vs Python Thanks All

Re: Scala Vs Python

2016-09-02 Thread darren
ail.com>, AssafMendelson <assaf.mendel...@rsa.com>, user <user@spark.apache.org> Subject: Re: Scala Vs Python Whatever benefits you may accrue from the rapid prototyping and coding in Python, it will be offset against the time taken to convert it to run inside the JVM. This o

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
No offence taken. Glad that it was rectified. Cheers Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
I apologize for my harsh tone. You are right, it was unnecessary and discourteous. On Fri, Sep 2, 2016 at 11:01 AM Mich Talebzadeh wrote: > Hi, > > You made such statement: > > "That's complete nonsense." > > That is a strong language and void of any courtesy. Only

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
You made a specific claim -- that Spark will move away from Python -- which I responded to with clear references and data. How on earth is that a "religious argument"? I'm not saying that Python is better than Scala or anything like that. I'm just addressing your specific claim about its future

Re: Scala Vs Python

2016-09-02 Thread andy petrella
looking at the examples, indeed they make nonsense :D On Fri, 2 Sep 2016 16:48 Mich Talebzadeh, wrote: > Right so. We are back into religious arguments. Best of luck > > > > Dr Mich Talebzadeh > > > > LinkedIn * >

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
Right so. We are back into religious arguments. Best of luck Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: Scala Vs Python

2016-09-02 Thread Nicholas Chammas
On Fri, Sep 2, 2016 at 3:58 AM Mich Talebzadeh wrote: > I believe as we progress in time Spark is going to move away from Python. If > you look at 2014 Databricks code examples, they were mostly in Python. Now > they are mostly in Scala for a reason. > That's complete

Re: Scala Vs Python

2016-09-02 Thread Sivakumaran S
Whatever benefits you may accrue from the rapid prototyping and coding in Python, it will be offset against the time taken to convert it to run inside the JVM. This of course depends on the complexity of the DAG. I guess it is a matter of language preference. Regards, Sivakumaran S > On

Re: Scala Vs Python

2016-09-02 Thread Mich Talebzadeh
>From an outsider point of view nobody likes change :) However, it appears to me that Scala is a rising star and if one learns it, it is another iron in the fire so to speak. I believe as we progress in time Spark is going to move away from Python. If you look at 2014 Databricks code examples,

Re: Scala Vs Python

2016-09-02 Thread Jakob Odersky
Forgot to answer your question about feature parity of Python w.r.t. Spark's different components I mostly work with scala so I can't say for sure but I think that all pre-2.0 features (that's basically everything except Structured Streaming) are on par. Structured Streaming is a pretty new

Re: Scala Vs Python

2016-09-02 Thread Jakob Odersky
As you point out, often the reason that Python support lags behind is that functionality is implemented in Scala, so the API in that language is "free" whereas Python support needs to be added explicitly. Nevertheless, Python bindings are an important part of Spark and is used by many people (this

RE: Scala Vs Python

2016-09-02 Thread Santoshakhilesh
[mailto:guha.a...@gmail.com] Sent: 02 September 2016 15:25 To: Tal Grynbaum Cc: darren; Mich Talebzadeh; Jakob Odersky; kant kodali; AssafMendelson; user Subject: Re: Scala Vs Python Tal: I think by nature of the project itself, Python APIs are developed after Scala and Java, and it is a fair

Re: Scala Vs Python

2016-09-02 Thread ayan guha
Tal: I think by nature of the project itself, Python APIs are developed after Scala and Java, and it is a fair trade off between speed of getting stuff to market. And more and more this discussion is progressing, I see not much issue in terms of feature parity. Coming back to performance, Darren

Re: Scala Vs Python

2016-09-02 Thread Tal Grynbaum
On Fri, Sep 2, 2016 at 1:15 AM, darren wrote: > This topic is a concern for us as well. In the data science world no one > uses native scala or java by choice. It's R and Python. And python is > growing. Yet in spark, python is 3rd in line for feature support, if at all. > >

Re: Scala Vs Python

2016-09-01 Thread ayan guha
>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Fri, Sep 2, 2016 at 12:57 AM, kant kodali <kanth...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> c'mon man this is no Brainer..Dynamic Typed Languages for Large Code >

Re: Scala Vs Python

2016-09-01 Thread Jakob Odersky
its ease of >>>>>> use for ML (that would be my best guess). >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Aug 31, 2016 11:45 PM, AssafMendelson assaf.mendel...@rsa.com >>>>>> wrote: >>>>>&

Re: Scala Vs Python

2016-09-01 Thread Mich Talebzadeh
t; wrote: >>>>> >>>>>> I believe this would greatly depend on your use case and your >>>>>> familiarity with the languages. >>>>>> >>>>>> >>>>>> >>>>>> In general, scala would h

Re: Scala Vs Python

2016-09-01 Thread darren
: Mich Talebzadeh <mich.talebza...@gmail.com> Date: 9/1/16 6:01 PM (GMT-05:00) To: Jakob Odersky <ja...@odersky.com> Cc: ayan guha <guha.a...@gmail.com>, kant kodali <kanth...@gmail.com>, AssafMendelson <assaf.mendel...@rsa.com>, user <user@spark.apache.org&

Re: Scala Vs Python

2016-09-01 Thread Peyman Mohajerian
>>>>> not all interfaces are available in python. >>>>> >>>>> That said, if you are planning to use dataframes without any UDF then >>>>> the performance hit is practically nonexistent. >>>>> >>>>> E

Re: Scala Vs Python

2016-09-01 Thread Mich Talebzadeh
t is possible to write those in scala and wrap >>>> them for python and still get away without the performance hit. >>>> >>>> Python does not have interfaces for UDAFs. >>>> >>>> >>>> >>>> I believe that if you have larg

Re: Scala Vs Python

2016-09-01 Thread Jakob Odersky
erformance hit. >>> >>> Python does not have interfaces for UDAFs. >>> >>> >>> >>> I believe that if you have large structured data and do not generally >>> need UDF/UDAF you can certainly work in python without losing too much. >>> &g

Re: Scala Vs Python

2016-09-01 Thread ayan guha
>> I believe that if you have large structured data and do not generally >> need UDF/UDAF you can certainly work in python without losing too much. >> >> >> >> >> >> *From:* ayan guha [mailto:[hidden email] >> <http:///user/SendEmail.jt

Re: Scala Vs Python

2016-09-01 Thread kant kodali
not have interfaces for UDAFs. I believe that if you have large structured data and do not generally need UDF/UDAF you can certainly work in python without losing too much. From: ayan guha [mailto:[hidden email]] Sent: Thursday, September 01, 2016 5:03 AM To: user Subject: Scala Vs Python

RE: Scala Vs Python

2016-09-01 Thread AssafMendelson
can certainly work in python without losing too much. From: ayan guha [mailto:guha.a...@gmail.com] Sent: Thursday, September 01, 2016 5:03 AM To: user Subject: Scala Vs Python Hi Users Thought to ask (again and again) the question: While I am building any production application, should I use

RE: Scala Vs Python

2016-08-31 Thread Santoshakhilesh
ould prefer to use Scala any day for very simple reason that I would get all the future features and optimizations out of box and I need to type less ☺. Regards, Santosh Akhilesh From: ayan guha [mailto:guha.a...@gmail.com] Sent: 01 September 2016 11:03 To: user Subject: Scala Vs Python Hi U

Scala Vs Python

2016-08-31 Thread ayan guha
Hi Users Thought to ask (again and again) the question: While I am building any production application, should I use Scala or Python? I have read many if not most articles but all seems pre-Spark 2. Anything changed with Spark 2? Either pro-scala way or pro-python way? I am thinking

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread Jörn Franke
gt; > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-for-Spark-ecosystem-tp26805p26806.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > -

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread kramer2...@126.com
-vs-Python-for-Spark-ecosystem-tp26805p26806.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread Zhang, Jingyu
What will be the future of these 2 languages for spark ecosystem? Will >> python cover everything scala can in short time periods? what do you >> advice? >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3

Re: Scala vs Python for Spark ecosystem

2016-04-20 Thread sujeet jog
context: > http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-for-Spark-ecosystem-tp26805.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail:

Scala vs Python for Spark ecosystem

2016-04-20 Thread berkerkozan
this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-for-Spark-ecosystem-tp26805.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user

Re: Scala vs Python performance differences

2015-01-16 Thread philpearl
Python. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p21190.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Scala vs Python performance differences

2015-01-16 Thread Davies Liu
data point comparing computations in Scala to computations in pure Python. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-performance-differences-tp4247p21190.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Scala vs Python performance differences

2014-11-12 Thread Andrew Ash
- Jeremy Freeman, PhD Neuroscientist @thefreemanlab -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor mance-differences-tp4247p4261.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Scala vs Python performance differences

2014-11-12 Thread Samarth Mailinglist
correct, at least for some basic operations (e.g textFile, count, reduce). -- Jeremy - Jeremy Freeman, PhD Neuroscientist @thefreemanlab -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor mance

Re: Scala vs Python performance differences

2014-04-15 Thread Ian Ferreira
@thefreemanlab -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs-Python-perfor mance-differences-tp4247p4261.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Scala vs Python performance differences

2014-04-14 Thread Andrew Ash
Hi Spark users, I've always done all my Spark work in Scala, but occasionally people ask about Python and its performance impact vs the same algorithm implementation in Scala. Has anyone done tests to measure the difference? Anecdotally I've heard Python is a 40% slowdown but that's entirely

Re: Scala vs Python performance differences

2014-04-14 Thread Jeremy Freeman
, in our hands, that 40% number is ballpark correct, at least for some basic operations (e.g textFile, count, reduce). -- Jeremy - Jeremy Freeman, PhD Neuroscientist @thefreemanlab -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-vs