Writing Spark Streaming Programs

2015-03-19 Thread James King
Hello All,

I'm using Spark for streaming but I'm unclear one which implementation
language to use Java, Scala or Python.

I don't know anything about Python, familiar with Scala and have been doing
Java for a long time.

I think the above shouldn't influence my decision on which language to use
because I believe the tool should, fit the problem.

In terms of performance Java and Scala are comparable. However Java is OO
and Scala is FP, no idea what Python is.

If using Scala and not applying a consistent style of programming Scala
code can become unreadable, but I do like the fact it seems to be possible
to do so much work with so much less code, that's a strong selling point
for me. Also it could be that the type of programming done in Spark is best
implemented in Scala as FP language, not sure though.

The question I would like your good help with is are there any other
considerations I need to think about when deciding this? are there any
recommendations you can make in regards to this?

Regards
jk


Re: Writing Spark Streaming Programs

2015-03-19 Thread Gerard Maas
Try writing this Spark Streaming idiom in Java and you'll choose Scala soon
enough:

dstream.foreachRDD{rdd =
 rdd.foreachPartition( partition = )
}

When deciding between Java and Scala for Spark, IMHO Scala has the
upperhand. If you're concerned with readability, have a look at the Scala
coding style recently open sourced by DataBricks:
https://github.com/databricks/scala-style-guide  (btw, I don't agree a good
part of it, but recognize that it can keep the most complex Scala
constructions out of your code)



On Thu, Mar 19, 2015 at 3:50 PM, James King jakwebin...@gmail.com wrote:

 Hello All,

 I'm using Spark for streaming but I'm unclear one which implementation
 language to use Java, Scala or Python.

 I don't know anything about Python, familiar with Scala and have been
 doing Java for a long time.

 I think the above shouldn't influence my decision on which language to use
 because I believe the tool should, fit the problem.

 In terms of performance Java and Scala are comparable. However Java is OO
 and Scala is FP, no idea what Python is.

 If using Scala and not applying a consistent style of programming Scala
 code can become unreadable, but I do like the fact it seems to be possible
 to do so much work with so much less code, that's a strong selling point
 for me. Also it could be that the type of programming done in Spark is best
 implemented in Scala as FP language, not sure though.

 The question I would like your good help with is are there any other
 considerations I need to think about when deciding this? are there any
 recommendations you can make in regards to this?

 Regards
 jk









Re: Writing Spark Streaming Programs

2015-03-19 Thread James King
Many thanks Gerard, this is very helpful. Cheers!

On Thu, Mar 19, 2015 at 4:02 PM, Gerard Maas gerard.m...@gmail.com wrote:

 Try writing this Spark Streaming idiom in Java and you'll choose Scala
 soon enough:

 dstream.foreachRDD{rdd =
  rdd.foreachPartition( partition = )
 }

 When deciding between Java and Scala for Spark, IMHO Scala has the
 upperhand. If you're concerned with readability, have a look at the Scala
 coding style recently open sourced by DataBricks:
 https://github.com/databricks/scala-style-guide  (btw, I don't agree a
 good part of it, but recognize that it can keep the most complex Scala
 constructions out of your code)



 On Thu, Mar 19, 2015 at 3:50 PM, James King jakwebin...@gmail.com wrote:

 Hello All,

 I'm using Spark for streaming but I'm unclear one which implementation
 language to use Java, Scala or Python.

 I don't know anything about Python, familiar with Scala and have been
 doing Java for a long time.

 I think the above shouldn't influence my decision on which language to
 use because I believe the tool should, fit the problem.

 In terms of performance Java and Scala are comparable. However Java is OO
 and Scala is FP, no idea what Python is.

 If using Scala and not applying a consistent style of programming Scala
 code can become unreadable, but I do like the fact it seems to be possible
 to do so much work with so much less code, that's a strong selling point
 for me. Also it could be that the type of programming done in Spark is best
 implemented in Scala as FP language, not sure though.

 The question I would like your good help with is are there any other
 considerations I need to think about when deciding this? are there any
 recommendations you can make in regards to this?

 Regards
 jk










Re: Writing Spark Streaming Programs

2015-03-19 Thread Emre Sevinc
Hello James,

I've been working with Spark Streaming for the last 6 months, and I'm
coding in Java 7. Even though I haven't encountered any blocking issues
with that combination, I'd definitely pick Scala if the decision was up to
me.

I agree with Gerard and Charles on this one. If you can, go with Scala for
Spark Streaming applications.

Cheers,

Emre Sevinç
http://www.bigindustries.be/



On Thu, Mar 19, 2015 at 4:09 PM, James King jakwebin...@gmail.com wrote:

 Many thanks Gerard, this is very helpful. Cheers!

 On Thu, Mar 19, 2015 at 4:02 PM, Gerard Maas gerard.m...@gmail.com
 wrote:

 Try writing this Spark Streaming idiom in Java and you'll choose Scala
 soon enough:

 dstream.foreachRDD{rdd =
  rdd.foreachPartition( partition = )
 }

 When deciding between Java and Scala for Spark, IMHO Scala has the
 upperhand. If you're concerned with readability, have a look at the Scala
 coding style recently open sourced by DataBricks:
 https://github.com/databricks/scala-style-guide  (btw, I don't agree a
 good part of it, but recognize that it can keep the most complex Scala
 constructions out of your code)



 On Thu, Mar 19, 2015 at 3:50 PM, James King jakwebin...@gmail.com
 wrote:

 Hello All,

 I'm using Spark for streaming but I'm unclear one which implementation
 language to use Java, Scala or Python.

 I don't know anything about Python, familiar with Scala and have been
 doing Java for a long time.

 I think the above shouldn't influence my decision on which language to
 use because I believe the tool should, fit the problem.

 In terms of performance Java and Scala are comparable. However Java is
 OO and Scala is FP, no idea what Python is.

 If using Scala and not applying a consistent style of programming Scala
 code can become unreadable, but I do like the fact it seems to be possible
 to do so much work with so much less code, that's a strong selling point
 for me. Also it could be that the type of programming done in Spark is best
 implemented in Scala as FP language, not sure though.

 The question I would like your good help with is are there any other
 considerations I need to think about when deciding this? are there any
 recommendations you can make in regards to this?

 Regards
 jk











-- 
Emre Sevinc


Re: Writing Spark Streaming Programs

2015-03-19 Thread Charles Feduke
Scala is the language used to write Spark so there's never a situation in
which features introduced in a newer version of Spark cannot be taken
advantage of if you write your code in Scala. (This is mostly true of Java,
but it may be a little more legwork if a Java-friendly adapter isn't
available alongside new features.)

Scala is also OO; its a functional hybrid OO language.

Although much of my organization's codebase is written in Java and we've
recently transitioned to Java 8 I still write all of my Spark code using
Scala. (I also squeeze in Scala where I can in other parts of the
organization.) Additionally I use both Python and R for local data
analysis, though I haven't used Python with Spark in production.

On Thu, Mar 19, 2015 at 10:51 AM James King jakwebin...@gmail.com wrote:

 Hello All,

 I'm using Spark for streaming but I'm unclear one which implementation
 language to use Java, Scala or Python.

 I don't know anything about Python, familiar with Scala and have been
 doing Java for a long time.

 I think the above shouldn't influence my decision on which language to use
 because I believe the tool should, fit the problem.

 In terms of performance Java and Scala are comparable. However Java is OO
 and Scala is FP, no idea what Python is.

 If using Scala and not applying a consistent style of programming Scala
 code can become unreadable, but I do like the fact it seems to be possible
 to do so much work with so much less code, that's a strong selling point
 for me. Also it could be that the type of programming done in Spark is best
 implemented in Scala as FP language, not sure though.

 The question I would like your good help with is are there any other
 considerations I need to think about when deciding this? are there any
 recommendations you can make in regards to this?

 Regards
 jk









Re: Writing Spark Streaming Programs

2015-03-19 Thread Jeffrey Jedele
I second what has been said already.

We just built a streaming app in Java and I would definitely choose Scala
this time.

Regards,
Jeff

2015-03-19 16:34 GMT+01:00 Emre Sevinc emre.sev...@gmail.com:

 Hello James,

 I've been working with Spark Streaming for the last 6 months, and I'm
 coding in Java 7. Even though I haven't encountered any blocking issues
 with that combination, I'd definitely pick Scala if the decision was up to
 me.

 I agree with Gerard and Charles on this one. If you can, go with Scala for
 Spark Streaming applications.

 Cheers,

 Emre Sevinç
 http://www.bigindustries.be/



 On Thu, Mar 19, 2015 at 4:09 PM, James King jakwebin...@gmail.com wrote:

 Many thanks Gerard, this is very helpful. Cheers!

 On Thu, Mar 19, 2015 at 4:02 PM, Gerard Maas gerard.m...@gmail.com
 wrote:

 Try writing this Spark Streaming idiom in Java and you'll choose Scala
 soon enough:

 dstream.foreachRDD{rdd =
  rdd.foreachPartition( partition = )
 }

 When deciding between Java and Scala for Spark, IMHO Scala has the
 upperhand. If you're concerned with readability, have a look at the Scala
 coding style recently open sourced by DataBricks:
 https://github.com/databricks/scala-style-guide  (btw, I don't agree a
 good part of it, but recognize that it can keep the most complex Scala
 constructions out of your code)



 On Thu, Mar 19, 2015 at 3:50 PM, James King jakwebin...@gmail.com
 wrote:

 Hello All,

 I'm using Spark for streaming but I'm unclear one which implementation
 language to use Java, Scala or Python.

 I don't know anything about Python, familiar with Scala and have been
 doing Java for a long time.

 I think the above shouldn't influence my decision on which language to
 use because I believe the tool should, fit the problem.

 In terms of performance Java and Scala are comparable. However Java is
 OO and Scala is FP, no idea what Python is.

 If using Scala and not applying a consistent style of programming Scala
 code can become unreadable, but I do like the fact it seems to be possible
 to do so much work with so much less code, that's a strong selling point
 for me. Also it could be that the type of programming done in Spark is best
 implemented in Scala as FP language, not sure though.

 The question I would like your good help with is are there any other
 considerations I need to think about when deciding this? are there any
 recommendations you can make in regards to this?

 Regards
 jk











 --
 Emre Sevinc