Re: Java vs. Scala for Spark

2015-09-09 Thread Cody Koeninger
Java 8 lambdas are broken to the point of near-uselessness (because of
checked exceptions and inability to close over non-final references).  I
wouldn't use them as a deciding factor in language choice.

Any competent developer should be able to write reasonable java-in-scala
after a week and reading a copy of "Scala for the Impatient"

On Tue, Sep 8, 2015 at 11:15 AM, Jerry Lam  wrote:

> Hi Bryan,
>
> I would choose a language based on the requirements. It does not make
> sense if you have a lot of dependencies that are java-based components and
> interoperability between java and scala is not always obvious.
>
> I agree with the above comments that Java is much more verbose than Scala
> in many cases if not all. However, I personally don't find the verbosity is
> a key factor in choosing a language. For the sake of argument, will you be
> discouraged if you need to write 3 lines of Java for 1 line of scala? I
> really don't care the number of lines as long as I can finish the task
> within a period of time.
>
> I believe, correct me if I'm wrong please, all spark functionalities you
> can find in Scala are also available in Java that includes the mllib,
> sparksql, streaming, etc. So you won't miss any features of spark by using
> Java.
>
> It seems the questions should be
> - what language do the developers are comfortable with?
> - what are the components in the system that will constraint the choice of
> the language?
>
> Best Regards,
>
> Jerry
>
> On Tue, Sep 8, 2015 at 11:59 AM, Dean Wampler 
> wrote:
>
>> It's true that Java 8 lambdas help. If you've read Learning Spark, where
>> they use Java 7, Python, and Scala for the examples, it really shows how
>> awful Java without lambdas is for Spark development.
>>
>> Still, there are several "power tools" in Scala I would sorely miss using
>> Java 8:
>>
>> 1. The REPL (interpreter): I do most of my work in the REPL, then move
>> the code to compiled code when I'm ready to turn it into a batch job. Even
>> better, use Spark Notebook ! (and on GitHub
>> ).
>> 2. Tuples: It's just too convenient to use tuples for schemas, return
>> values from functions, etc., etc., etc.,
>> 3. Pattern matching: This has no analog in Java, so it's hard to
>> appreciate it until you understand it, but see this example
>> 
>> for a taste of how concise it makes code!
>> 4. Type inference: Spark really shows its utility. It means a lot less
>> code to write, but you get the hints of what you just wrote!
>>
>> My $0.02.
>>
>> dean
>>
>>
>> Dean Wampler, Ph.D.
>> Author: Programming Scala, 2nd Edition
>>  (O'Reilly)
>> Typesafe 
>> @deanwampler 
>> http://polyglotprogramming.com
>>
>> On Tue, Sep 8, 2015 at 10:28 AM, Igor Berman 
>> wrote:
>>
>>> we are using java7..its much more verbose that java8 or scala examples
>>> in addition there sometimes libraries that has no java  api, so you need
>>> to write them by yourself(e.g. graphx)
>>> on the other hand, scala is not trivial language like java, so it
>>> depends on your team
>>>
>>> On 8 September 2015 at 17:44, Bryan Jeffrey 
>>> wrote:
>>>
 Thank you for the quick responses.  It's useful to have some insight
 from folks already extensively using Spark.

 Regards,

 Bryan Jeffrey

 On Tue, Sep 8, 2015 at 10:28 AM, Sean Owen  wrote:

> Why would Scala vs Java performance be different Ted? Relatively
> speaking there is almost no runtime difference; it's the same APIs or
> calls via a thin wrapper. Scala/Java vs Python is a different story.
>
> Java libraries can be used in Scala. Vice-versa too, though calling
> Scala-generated classes can be clunky in Java. What's your concern
> about interoperability Jeffrey?
>
> I disagree that Java 7 vs Scala usability is sooo different, but it's
> certainly much more natural to use Spark in Scala. Java 8 closes a lot
> of the usability gap with Scala, but not all of it. Enough that it's
> not crazy for a Java shop to stick to Java 8 + Spark and not be at a
> big disadvantage.
>
> The downsides of Scala IMHO are that it provides too much: lots of
> nice features (closures! superb collections!), lots of rope to hang
> yourself too (implicits sometimes!) and some WTF features (XML
> literals!) Learning the good useful bits of Scala isn't hard. You can
> always write Scala code as much like Java as you like, I find.
>
> Scala tooling is different from Java tooling; that's an
> underappreciated barrier. For example I think SBT is good for
> development, bad for general 

Re: Java vs. Scala for Spark

2015-09-08 Thread Jonathan Coveney
It worked for Twitter!

Seriously though: scala is much much more pleasant. And scala has a great
story for using Java libs. And since spark is kind of framework-y (use its
scripts to submit, start up repl, etc) the projects tend to be lead
projects, so even in a big company that uses Java the cost of scala is low
and fairly isolated. If you need to write large amounts of supporting
libraries, you are free to use Java or scala as you see fit.

El martes, 8 de septiembre de 2015, Bryan Jeffrey 
escribió:

> All,
>
> We're looking at language choice in developing a simple streaming
> processing application in spark.  We've got a small set of example code
> built in Scala.  Articles like the following:
> http://www.bigdatatidbits.cc/2015/02/navigating-from-scala-to-spark-for.html
> would seem to indicate that Scala is great for use in distributed
> programming (including Spark).  However, there is a large group of folks
> that seem to feel that interoperability with other Java libraries is much
> to be desired, and that the cost of learning (yet another) language is
> quite high.
>
> Has anyone looked at Scala for Spark dev in an enterprise environment?
> What was the outcome?
>
> Regards,
>
> Bryan Jeffrey
>


Java vs. Scala for Spark

2015-09-08 Thread Bryan Jeffrey
All,

We're looking at language choice in developing a simple streaming
processing application in spark.  We've got a small set of example code
built in Scala.  Articles like the following:
http://www.bigdatatidbits.cc/2015/02/navigating-from-scala-to-spark-for.html
would seem to indicate that Scala is great for use in distributed
programming (including Spark).  However, there is a large group of folks
that seem to feel that interoperability with other Java libraries is much
to be desired, and that the cost of learning (yet another) language is
quite high.

Has anyone looked at Scala for Spark dev in an enterprise environment?
What was the outcome?

Regards,

Bryan Jeffrey


Re: Java vs. Scala for Spark

2015-09-08 Thread Ted Yu
Performance wise, Scala is by far the best choice when you use Spark.

The cost of learning Scala is not negligible but not insurmountable either.

My personal opinion.

On Tue, Sep 8, 2015 at 6:50 AM, Bryan Jeffrey 
wrote:

> All,
>
> We're looking at language choice in developing a simple streaming
> processing application in spark.  We've got a small set of example code
> built in Scala.  Articles like the following:
> http://www.bigdatatidbits.cc/2015/02/navigating-from-scala-to-spark-for.html
> would seem to indicate that Scala is great for use in distributed
> programming (including Spark).  However, there is a large group of folks
> that seem to feel that interoperability with other Java libraries is much
> to be desired, and that the cost of learning (yet another) language is
> quite high.
>
> Has anyone looked at Scala for Spark dev in an enterprise environment?
> What was the outcome?
>
> Regards,
>
> Bryan Jeffrey
>


Re: Java vs. Scala for Spark

2015-09-08 Thread Bryan Jeffrey
Thank you for the quick responses.  It's useful to have some insight from
folks already extensively using Spark.

Regards,

Bryan Jeffrey

On Tue, Sep 8, 2015 at 10:28 AM, Sean Owen  wrote:

> Why would Scala vs Java performance be different Ted? Relatively
> speaking there is almost no runtime difference; it's the same APIs or
> calls via a thin wrapper. Scala/Java vs Python is a different story.
>
> Java libraries can be used in Scala. Vice-versa too, though calling
> Scala-generated classes can be clunky in Java. What's your concern
> about interoperability Jeffrey?
>
> I disagree that Java 7 vs Scala usability is sooo different, but it's
> certainly much more natural to use Spark in Scala. Java 8 closes a lot
> of the usability gap with Scala, but not all of it. Enough that it's
> not crazy for a Java shop to stick to Java 8 + Spark and not be at a
> big disadvantage.
>
> The downsides of Scala IMHO are that it provides too much: lots of
> nice features (closures! superb collections!), lots of rope to hang
> yourself too (implicits sometimes!) and some WTF features (XML
> literals!) Learning the good useful bits of Scala isn't hard. You can
> always write Scala code as much like Java as you like, I find.
>
> Scala tooling is different from Java tooling; that's an
> underappreciated barrier. For example I think SBT is good for
> development, bad for general project lifecycle management compared to
> Maven, but in any event still less developed. SBT/scalac are huge
> resource hogs, since so much of Scala is really implemented in the
> compiler; prepare to update your laptop to develop in Scala on your
> IDE of choice, and start to think about running long-running compile
> servers like we did in the year 2000.
>
> Still net-net I would choose Scala, FWIW.
>
> On Tue, Sep 8, 2015 at 3:07 PM, Ted Yu  wrote:
> > Performance wise, Scala is by far the best choice when you use Spark.
> >
> > The cost of learning Scala is not negligible but not insurmountable
> either.
> >
> > My personal opinion.
> >
> > On Tue, Sep 8, 2015 at 6:50 AM, Bryan Jeffrey 
> > wrote:
> >>
> >> All,
> >>
> >> We're looking at language choice in developing a simple streaming
> >> processing application in spark.  We've got a small set of example code
> >> built in Scala.  Articles like the following:
> >>
> http://www.bigdatatidbits.cc/2015/02/navigating-from-scala-to-spark-for.html
> >> would seem to indicate that Scala is great for use in distributed
> >> programming (including Spark).  However, there is a large group of folks
> >> that seem to feel that interoperability with other Java libraries is
> much to
> >> be desired, and that the cost of learning (yet another) language is
> quite
> >> high.
> >>
> >> Has anyone looked at Scala for Spark dev in an enterprise environment?
> >> What was the outcome?
> >>
> >> Regards,
> >>
> >> Bryan Jeffrey
> >
> >
>


Re: Java vs. Scala for Spark

2015-09-08 Thread Igor Berman
we are using java7..its much more verbose that java8 or scala examples
in addition there sometimes libraries that has no java  api, so you need to
write them by yourself(e.g. graphx)
on the other hand, scala is not trivial language like java, so it depends
on your team

On 8 September 2015 at 17:44, Bryan Jeffrey  wrote:

> Thank you for the quick responses.  It's useful to have some insight from
> folks already extensively using Spark.
>
> Regards,
>
> Bryan Jeffrey
>
> On Tue, Sep 8, 2015 at 10:28 AM, Sean Owen  wrote:
>
>> Why would Scala vs Java performance be different Ted? Relatively
>> speaking there is almost no runtime difference; it's the same APIs or
>> calls via a thin wrapper. Scala/Java vs Python is a different story.
>>
>> Java libraries can be used in Scala. Vice-versa too, though calling
>> Scala-generated classes can be clunky in Java. What's your concern
>> about interoperability Jeffrey?
>>
>> I disagree that Java 7 vs Scala usability is sooo different, but it's
>> certainly much more natural to use Spark in Scala. Java 8 closes a lot
>> of the usability gap with Scala, but not all of it. Enough that it's
>> not crazy for a Java shop to stick to Java 8 + Spark and not be at a
>> big disadvantage.
>>
>> The downsides of Scala IMHO are that it provides too much: lots of
>> nice features (closures! superb collections!), lots of rope to hang
>> yourself too (implicits sometimes!) and some WTF features (XML
>> literals!) Learning the good useful bits of Scala isn't hard. You can
>> always write Scala code as much like Java as you like, I find.
>>
>> Scala tooling is different from Java tooling; that's an
>> underappreciated barrier. For example I think SBT is good for
>> development, bad for general project lifecycle management compared to
>> Maven, but in any event still less developed. SBT/scalac are huge
>> resource hogs, since so much of Scala is really implemented in the
>> compiler; prepare to update your laptop to develop in Scala on your
>> IDE of choice, and start to think about running long-running compile
>> servers like we did in the year 2000.
>>
>> Still net-net I would choose Scala, FWIW.
>>
>> On Tue, Sep 8, 2015 at 3:07 PM, Ted Yu  wrote:
>> > Performance wise, Scala is by far the best choice when you use Spark.
>> >
>> > The cost of learning Scala is not negligible but not insurmountable
>> either.
>> >
>> > My personal opinion.
>> >
>> > On Tue, Sep 8, 2015 at 6:50 AM, Bryan Jeffrey 
>> > wrote:
>> >>
>> >> All,
>> >>
>> >> We're looking at language choice in developing a simple streaming
>> >> processing application in spark.  We've got a small set of example code
>> >> built in Scala.  Articles like the following:
>> >>
>> http://www.bigdatatidbits.cc/2015/02/navigating-from-scala-to-spark-for.html
>> >> would seem to indicate that Scala is great for use in distributed
>> >> programming (including Spark).  However, there is a large group of
>> folks
>> >> that seem to feel that interoperability with other Java libraries is
>> much to
>> >> be desired, and that the cost of learning (yet another) language is
>> quite
>> >> high.
>> >>
>> >> Has anyone looked at Scala for Spark dev in an enterprise environment?
>> >> What was the outcome?
>> >>
>> >> Regards,
>> >>
>> >> Bryan Jeffrey
>> >
>> >
>>
>
>


Re: Java vs. Scala for Spark

2015-09-08 Thread Dean Wampler
It's true that Java 8 lambdas help. If you've read Learning Spark, where
they use Java 7, Python, and Scala for the examples, it really shows how
awful Java without lambdas is for Spark development.

Still, there are several "power tools" in Scala I would sorely miss using
Java 8:

1. The REPL (interpreter): I do most of my work in the REPL, then move the
code to compiled code when I'm ready to turn it into a batch job. Even
better, use Spark Notebook ! (and on GitHub
).
2. Tuples: It's just too convenient to use tuples for schemas, return
values from functions, etc., etc., etc.,
3. Pattern matching: This has no analog in Java, so it's hard to appreciate
it until you understand it, but see this example

for a taste of how concise it makes code!
4. Type inference: Spark really shows its utility. It means a lot less code
to write, but you get the hints of what you just wrote!

My $0.02.

dean


Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
 (O'Reilly)
Typesafe 
@deanwampler 
http://polyglotprogramming.com

On Tue, Sep 8, 2015 at 10:28 AM, Igor Berman  wrote:

> we are using java7..its much more verbose that java8 or scala examples
> in addition there sometimes libraries that has no java  api, so you need
> to write them by yourself(e.g. graphx)
> on the other hand, scala is not trivial language like java, so it depends
> on your team
>
> On 8 September 2015 at 17:44, Bryan Jeffrey 
> wrote:
>
>> Thank you for the quick responses.  It's useful to have some insight from
>> folks already extensively using Spark.
>>
>> Regards,
>>
>> Bryan Jeffrey
>>
>> On Tue, Sep 8, 2015 at 10:28 AM, Sean Owen  wrote:
>>
>>> Why would Scala vs Java performance be different Ted? Relatively
>>> speaking there is almost no runtime difference; it's the same APIs or
>>> calls via a thin wrapper. Scala/Java vs Python is a different story.
>>>
>>> Java libraries can be used in Scala. Vice-versa too, though calling
>>> Scala-generated classes can be clunky in Java. What's your concern
>>> about interoperability Jeffrey?
>>>
>>> I disagree that Java 7 vs Scala usability is sooo different, but it's
>>> certainly much more natural to use Spark in Scala. Java 8 closes a lot
>>> of the usability gap with Scala, but not all of it. Enough that it's
>>> not crazy for a Java shop to stick to Java 8 + Spark and not be at a
>>> big disadvantage.
>>>
>>> The downsides of Scala IMHO are that it provides too much: lots of
>>> nice features (closures! superb collections!), lots of rope to hang
>>> yourself too (implicits sometimes!) and some WTF features (XML
>>> literals!) Learning the good useful bits of Scala isn't hard. You can
>>> always write Scala code as much like Java as you like, I find.
>>>
>>> Scala tooling is different from Java tooling; that's an
>>> underappreciated barrier. For example I think SBT is good for
>>> development, bad for general project lifecycle management compared to
>>> Maven, but in any event still less developed. SBT/scalac are huge
>>> resource hogs, since so much of Scala is really implemented in the
>>> compiler; prepare to update your laptop to develop in Scala on your
>>> IDE of choice, and start to think about running long-running compile
>>> servers like we did in the year 2000.
>>>
>>> Still net-net I would choose Scala, FWIW.
>>>
>>> On Tue, Sep 8, 2015 at 3:07 PM, Ted Yu  wrote:
>>> > Performance wise, Scala is by far the best choice when you use Spark.
>>> >
>>> > The cost of learning Scala is not negligible but not insurmountable
>>> either.
>>> >
>>> > My personal opinion.
>>> >
>>> > On Tue, Sep 8, 2015 at 6:50 AM, Bryan Jeffrey >> >
>>> > wrote:
>>> >>
>>> >> All,
>>> >>
>>> >> We're looking at language choice in developing a simple streaming
>>> >> processing application in spark.  We've got a small set of example
>>> code
>>> >> built in Scala.  Articles like the following:
>>> >>
>>> http://www.bigdatatidbits.cc/2015/02/navigating-from-scala-to-spark-for.html
>>> >> would seem to indicate that Scala is great for use in distributed
>>> >> programming (including Spark).  However, there is a large group of
>>> folks
>>> >> that seem to feel that interoperability with other Java libraries is
>>> much to
>>> >> be desired, and that the cost of learning (yet another) language is
>>> quite
>>> >> high.
>>> >>
>>> >> Has anyone looked at Scala for Spark dev in an enterprise environment?
>>> >> What was the outcome?
>>> >>
>>> >> Regards,
>>> >>
>>> >> Bryan Jeffrey
>>> >
>>> >
>>>
>>
>>
>


Re: Java vs. Scala for Spark

2015-09-08 Thread Ted Yu
Sean:
w.r.t. performance, I meant Scala/Java vs Python.

Cheers

On Tue, Sep 8, 2015 at 7:28 AM, Sean Owen  wrote:

> Why would Scala vs Java performance be different Ted? Relatively
> speaking there is almost no runtime difference; it's the same APIs or
> calls via a thin wrapper. Scala/Java vs Python is a different story.
>
> Java libraries can be used in Scala. Vice-versa too, though calling
> Scala-generated classes can be clunky in Java. What's your concern
> about interoperability Jeffrey?
>
> I disagree that Java 7 vs Scala usability is sooo different, but it's
> certainly much more natural to use Spark in Scala. Java 8 closes a lot
> of the usability gap with Scala, but not all of it. Enough that it's
> not crazy for a Java shop to stick to Java 8 + Spark and not be at a
> big disadvantage.
>
> The downsides of Scala IMHO are that it provides too much: lots of
> nice features (closures! superb collections!), lots of rope to hang
> yourself too (implicits sometimes!) and some WTF features (XML
> literals!) Learning the good useful bits of Scala isn't hard. You can
> always write Scala code as much like Java as you like, I find.
>
> Scala tooling is different from Java tooling; that's an
> underappreciated barrier. For example I think SBT is good for
> development, bad for general project lifecycle management compared to
> Maven, but in any event still less developed. SBT/scalac are huge
> resource hogs, since so much of Scala is really implemented in the
> compiler; prepare to update your laptop to develop in Scala on your
> IDE of choice, and start to think about running long-running compile
> servers like we did in the year 2000.
>
> Still net-net I would choose Scala, FWIW.
>
> On Tue, Sep 8, 2015 at 3:07 PM, Ted Yu  wrote:
> > Performance wise, Scala is by far the best choice when you use Spark.
> >
> > The cost of learning Scala is not negligible but not insurmountable
> either.
> >
> > My personal opinion.
> >
> > On Tue, Sep 8, 2015 at 6:50 AM, Bryan Jeffrey 
> > wrote:
> >>
> >> All,
> >>
> >> We're looking at language choice in developing a simple streaming
> >> processing application in spark.  We've got a small set of example code
> >> built in Scala.  Articles like the following:
> >>
> http://www.bigdatatidbits.cc/2015/02/navigating-from-scala-to-spark-for.html
> >> would seem to indicate that Scala is great for use in distributed
> >> programming (including Spark).  However, there is a large group of folks
> >> that seem to feel that interoperability with other Java libraries is
> much to
> >> be desired, and that the cost of learning (yet another) language is
> quite
> >> high.
> >>
> >> Has anyone looked at Scala for Spark dev in an enterprise environment?
> >> What was the outcome?
> >>
> >> Regards,
> >>
> >> Bryan Jeffrey
> >
> >
>


Re: Java vs. Scala for Spark

2015-09-08 Thread Jerry Lam
Hi Bryan,

I would choose a language based on the requirements. It does not make sense
if you have a lot of dependencies that are java-based components and
interoperability between java and scala is not always obvious.

I agree with the above comments that Java is much more verbose than Scala
in many cases if not all. However, I personally don't find the verbosity is
a key factor in choosing a language. For the sake of argument, will you be
discouraged if you need to write 3 lines of Java for 1 line of scala? I
really don't care the number of lines as long as I can finish the task
within a period of time.

I believe, correct me if I'm wrong please, all spark functionalities you
can find in Scala are also available in Java that includes the mllib,
sparksql, streaming, etc. So you won't miss any features of spark by using
Java.

It seems the questions should be
- what language do the developers are comfortable with?
- what are the components in the system that will constraint the choice of
the language?

Best Regards,

Jerry

On Tue, Sep 8, 2015 at 11:59 AM, Dean Wampler  wrote:

> It's true that Java 8 lambdas help. If you've read Learning Spark, where
> they use Java 7, Python, and Scala for the examples, it really shows how
> awful Java without lambdas is for Spark development.
>
> Still, there are several "power tools" in Scala I would sorely miss using
> Java 8:
>
> 1. The REPL (interpreter): I do most of my work in the REPL, then move the
> code to compiled code when I'm ready to turn it into a batch job. Even
> better, use Spark Notebook ! (and on GitHub
> ).
> 2. Tuples: It's just too convenient to use tuples for schemas, return
> values from functions, etc., etc., etc.,
> 3. Pattern matching: This has no analog in Java, so it's hard to
> appreciate it until you understand it, but see this example
> 
> for a taste of how concise it makes code!
> 4. Type inference: Spark really shows its utility. It means a lot less
> code to write, but you get the hints of what you just wrote!
>
> My $0.02.
>
> dean
>
>
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd Edition
>  (O'Reilly)
> Typesafe 
> @deanwampler 
> http://polyglotprogramming.com
>
> On Tue, Sep 8, 2015 at 10:28 AM, Igor Berman 
> wrote:
>
>> we are using java7..its much more verbose that java8 or scala examples
>> in addition there sometimes libraries that has no java  api, so you need
>> to write them by yourself(e.g. graphx)
>> on the other hand, scala is not trivial language like java, so it depends
>> on your team
>>
>> On 8 September 2015 at 17:44, Bryan Jeffrey 
>> wrote:
>>
>>> Thank you for the quick responses.  It's useful to have some insight
>>> from folks already extensively using Spark.
>>>
>>> Regards,
>>>
>>> Bryan Jeffrey
>>>
>>> On Tue, Sep 8, 2015 at 10:28 AM, Sean Owen  wrote:
>>>
 Why would Scala vs Java performance be different Ted? Relatively
 speaking there is almost no runtime difference; it's the same APIs or
 calls via a thin wrapper. Scala/Java vs Python is a different story.

 Java libraries can be used in Scala. Vice-versa too, though calling
 Scala-generated classes can be clunky in Java. What's your concern
 about interoperability Jeffrey?

 I disagree that Java 7 vs Scala usability is sooo different, but it's
 certainly much more natural to use Spark in Scala. Java 8 closes a lot
 of the usability gap with Scala, but not all of it. Enough that it's
 not crazy for a Java shop to stick to Java 8 + Spark and not be at a
 big disadvantage.

 The downsides of Scala IMHO are that it provides too much: lots of
 nice features (closures! superb collections!), lots of rope to hang
 yourself too (implicits sometimes!) and some WTF features (XML
 literals!) Learning the good useful bits of Scala isn't hard. You can
 always write Scala code as much like Java as you like, I find.

 Scala tooling is different from Java tooling; that's an
 underappreciated barrier. For example I think SBT is good for
 development, bad for general project lifecycle management compared to
 Maven, but in any event still less developed. SBT/scalac are huge
 resource hogs, since so much of Scala is really implemented in the
 compiler; prepare to update your laptop to develop in Scala on your
 IDE of choice, and start to think about running long-running compile
 servers like we did in the year 2000.

 Still net-net I would choose Scala, FWIW.

 On Tue, Sep 8, 2015 at 3:07 PM, Ted Yu  wrote:
 > Performance wise, Scala is by 

Re: Java vs. Scala for Spark

2015-09-08 Thread Sean Owen
Why would Scala vs Java performance be different Ted? Relatively
speaking there is almost no runtime difference; it's the same APIs or
calls via a thin wrapper. Scala/Java vs Python is a different story.

Java libraries can be used in Scala. Vice-versa too, though calling
Scala-generated classes can be clunky in Java. What's your concern
about interoperability Jeffrey?

I disagree that Java 7 vs Scala usability is sooo different, but it's
certainly much more natural to use Spark in Scala. Java 8 closes a lot
of the usability gap with Scala, but not all of it. Enough that it's
not crazy for a Java shop to stick to Java 8 + Spark and not be at a
big disadvantage.

The downsides of Scala IMHO are that it provides too much: lots of
nice features (closures! superb collections!), lots of rope to hang
yourself too (implicits sometimes!) and some WTF features (XML
literals!) Learning the good useful bits of Scala isn't hard. You can
always write Scala code as much like Java as you like, I find.

Scala tooling is different from Java tooling; that's an
underappreciated barrier. For example I think SBT is good for
development, bad for general project lifecycle management compared to
Maven, but in any event still less developed. SBT/scalac are huge
resource hogs, since so much of Scala is really implemented in the
compiler; prepare to update your laptop to develop in Scala on your
IDE of choice, and start to think about running long-running compile
servers like we did in the year 2000.

Still net-net I would choose Scala, FWIW.

On Tue, Sep 8, 2015 at 3:07 PM, Ted Yu  wrote:
> Performance wise, Scala is by far the best choice when you use Spark.
>
> The cost of learning Scala is not negligible but not insurmountable either.
>
> My personal opinion.
>
> On Tue, Sep 8, 2015 at 6:50 AM, Bryan Jeffrey 
> wrote:
>>
>> All,
>>
>> We're looking at language choice in developing a simple streaming
>> processing application in spark.  We've got a small set of example code
>> built in Scala.  Articles like the following:
>> http://www.bigdatatidbits.cc/2015/02/navigating-from-scala-to-spark-for.html
>> would seem to indicate that Scala is great for use in distributed
>> programming (including Spark).  However, there is a large group of folks
>> that seem to feel that interoperability with other Java libraries is much to
>> be desired, and that the cost of learning (yet another) language is quite
>> high.
>>
>> Has anyone looked at Scala for Spark dev in an enterprise environment?
>> What was the outcome?
>>
>> Regards,
>>
>> Bryan Jeffrey
>
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org