Re : SparkR developer release

2014-01-16 Thread andy.petre...@gmail.com
Cool that's awesome and something I'll surely investigate in the coming weeks.
Great job!

Envoyé depuis mon HTC

- Reply message -
De : Shivaram Venkataraman shiva...@eecs.berkeley.edu
Pour : dev@spark.incubator.apache.org, u...@spark.incubator.apache.org
Cc : Zongheng Yang zonghen...@gmail.com, Matthew L Massie 
mas...@berkeley.edu
Objet : SparkR developer release
Date : jeu., janv. 16, 2014 23:14


I'm happy to announce the developer preview of SparkR, an R frontend
for Spark. SparkR presents Spark's API in R and allows you to write
code in R and run the computation on a Spark cluster. You can try out
SparkR today by installing it from our github repo at
https://github.com/amplab-extras/SparkR-pkg .

Right now SparkR is available as a standalone package that can be
installed to run on an existing Spark installation. Note that SparkR
requires Spark = 0.9 and the default build uses the recent 0.9
release candidate. In the future we will consider merging this with
Apache Spark.

More details about SparkR and examples of SparkR code can be found at
http://amplab-extras.github.io/SparkR-pkg. I would like to thank
Zongheng Yang, Matei Zaharia and Matt Massie for their contributions
and help in developing SparkR.

Comments and pull requests are welcome on github.

Thanks
Shivaram


Re : Option folding idiom

2013-12-27 Thread andy.petre...@gmail.com
What about cata()? 
* kidding*

Envoyé depuis mon HTC

- Reply message -
De : Imran Rashid im...@quantifind.com
Pour : dev@spark.incubator.apache.org
Objet : Option folding idiom
Date : ven., déc. 27, 2013 16:02


I'm also against option.fold (though I wouldn't say I really, really hate
this style of coding), for the readability reasons already mentioned.

I actually find myself pulling back from some scala-isms after having spent
some time with them, for readability / maintainability.




On Fri, Dec 27, 2013 at 2:57 AM, Christopher Nguyen c...@adatao.com wrote:

 I've learned and unlearned enough things to be careful when claiming
 something is more intuitive than another, since it's subject to prior
 knowledge. When I first encountered map().getOrElse() it wasn't any more
 intuitive than this fold()() syntax. Maybe the OrElse helps a bit, but
 the get in front of it confuses matters again (it sets one up to expect
 two things following, not one). Meanwhile, people coming from
 data-structure-folding background would argue that fold()() is more
 intuitive.

 If the choice is among three alternatives (match, map().getOrElse(), and
 fold()()), and the goal is intuitively obvious syntax to the broadest
 audience, then match wins by a reasonably good distance, with the latter
 two about equal. This tie could be broken by the fact that more people by
 now know about getOrElse than fold, crossed with the fact that it probably
 isn't on the top of the Spark community's agenda to be avant garde on new
 Scala syntax.



 --
 Christopher T. Nguyen
 Co-founder  CEO, Adatao http://adatao.com
 linkedin.com/in/ctnguyen



 On Thu, Dec 26, 2013 at 11:40 PM, Nick Pentreath
 nick.pentre...@gmail.comwrote:

  +1 for getOrElse
 
 
  When I was new to Scala I tended to use match almost like if/else
  statements with Option. These days I try to use map/flatMap instead and
 use
  getOrElse extensively and I for one find it very intuitive.
 
 
 
 
  I also agree that the fold syntax seems way less intuitive and I
 certainly
  prefer readable Scala code to that which might be more idiomatic but
  which I honestly tend to find very inscrutable and hard to grok quickly.
  —
  Sent from Mailbox for iPhone
 
  On Fri, Dec 27, 2013 at 9:06 AM, Matei Zaharia matei.zaha...@gmail.com
  wrote:
 
   I agree about using getOrElse instead. In choosing which code style and
  idioms to use, my goal has always been to maximize the ease of *other
  developers* understanding the code, and most developers today still don’t
  know Scala. It’s fine to use a maps or matches, because their meaning is
  obvious, but fold on Option is not obvious (even foreach is kind of weird
  for new people). In this case the benefit is so small that it doesn’t
 seem
  worth it.
   Note that if you use getOrElse, you can even throw exceptions in the
  “else” part if you’d like. (This is because Nothing is a subtype of every
  type in Scala.) So for example you can do val stuff =
  option.getOrElse(throw new Exception(“It wasn’t set”)). It looks a little
  weird, but note how the meaning is obvious even if you don’t know
 anything
  about the type system.
   Matei
   On Dec 27, 2013, at 12:12 AM, Kay Ousterhout k...@eecs.berkeley.edu
  wrote:
   I agree with what Reynold said -- there's not a big benefit in terms
 of
   lines of code (esp. compared to using getOrElse) and I think it hurts
  code
   readability.  One of the great things about the current Spark codebase
  is
   that it's very accessible for newcomers -- something that would be
 less
   true with this use of fold.
  
  
   On Thu, Dec 26, 2013 at 8:11 PM, Holden Karau hol...@pigscanfly.ca
  wrote:
  
   I personally with Evan in that I prefer map with getOrElse over fold
  with
   options (but that just my personal preference) :)
  
  
   On Thu, Dec 26, 2013 at 7:58 PM, Reynold Xin r...@databricks.com
  wrote:
  
   I'm not strongly against Option.fold, but I find the readability
  getting
   worse for the use case you brought up.  For the use case of
 if/else, I
   find
   Option.fold pretty confusing because it reverses the order of Some
 vs
   None.
   Also, when code gets long, the lack of an obvious boundary (the only
   boundary is } {) with two closures is pretty confusing.
  
  
   On Thu, Dec 26, 2013 at 4:23 PM, Mark Hamstra 
  m...@clearstorydata.com
   wrote:
  
   On the contrary, it is the completely natural place for the initial
   value
   of the accumulator, and provides the expected result of folding
 over
  an
   empty collection.
  
   scala val l: List[Int] = List()
  
   l: List[Int] = List()
  
  
   scala l.fold(42)(_ + _)
  
   res0: Int = 42
  
  
   scala val o: Option[Int] = None
  
   o: Option[Int] = None
  
  
   scala o.fold(42)(_ + 1)
  
   res1: Int = 42
  
  
   On Thu, Dec 26, 2013 at 5:51 PM, Evan Chan e...@ooyala.com wrote:
  
   +1 for using more functional idioms in general.
  
   That's a pretty clever use of `fold`, but putting the default
   

Re : Scala 2.10 Merge

2013-12-14 Thread andy.petre...@gmail.com
That's a very good news!
Congrats

Envoyé depuis mon HTC

- Reply message -
De : Sam Bessalah samkil...@gmail.com
Pour : dev@spark.incubator.apache.org dev@spark.incubator.apache.org
Objet : Scala 2.10 Merge
Date : sam., déc. 14, 2013 11:03


Yes. Awesome.
Great job guys.

Sam Bessalah

 On Dec 14, 2013, at 9:59 AM, Patrick Wendell pwend...@gmail.com wrote:
 
 Alright I just merged this in - so Spark is officially Scala 2.10
 from here forward.
 
 For reference I cut a new branch called scala-2.9 with the commit
 immediately prior to the merge:
 https://git-wip-us.apache.org/repos/asf/incubator-spark/repo?p=incubator-spark.git;a=shortlog;h=refs/heads/scala-2.9
 
 - Patrick
 
 On Thu, Dec 12, 2013 at 8:26 PM, Patrick Wendell pwend...@gmail.com wrote:
 Hey Reymond,
 
 Let's move this discussion out of this thread and into the associated JIRA.
 I'll write up our current approach over there.
 
 https://spark-project.atlassian.net/browse/SPARK-995
 
 - Patrick
 
 
 On Thu, Dec 12, 2013 at 5:56 PM, Liu, Raymond raymond@intel.com wrote:
 
 Hi Patrick
 
So what's the plan for support Yarn 2.2 in 0.9? As far as I can
 see, if you want to support both 2.2 and 2.0 , due to protobuf version
 incompatible issue. You need two version of akka anyway.
 
Akka 2.3-M1 looks like have a little bit change in API, we
 probably could isolate the code like what we did on yarn part API. I
 remember that it is mentioned that to use reflection for different API is
 preferred. So the purpose to use reflection is to use one release bin jar to
 support both version of Hadoop/Yarn on runtime, instead of build different
 bin jar on compile time?
 
 Then all code related to hadoop will also be built in separate
 modules for loading on demand? This sounds to me involve a lot of works. And
 you still need to have shim layer and separate code for different version
 API and depends on different version Akka etc. Sounds like and even strict
 demands versus our current approaching on master, and with dynamic class
 loader in addition, And the problem we are facing now are still there?
 
 Best Regards,
 Raymond Liu
 
 -Original Message-
 From: Patrick Wendell [mailto:pwend...@gmail.com]
 Sent: Thursday, December 12, 2013 5:13 PM
 To: dev@spark.incubator.apache.org
 Subject: Re: Scala 2.10 Merge
 
 Also - the code is still there because of a recent merge that took in some
 newer changes... we'll be removing it for the final merge.
 
 
 On Thu, Dec 12, 2013 at 1:12 AM, Patrick Wendell pwend...@gmail.com
 wrote:
 
 Hey Raymond,
 
 This won't work because AFAIK akka 2.3-M1 is not binary compatible
 with akka 2.2.3 (right?). For all of the non-yarn 2.2 versions we need
 to still use the older protobuf library, so we'd need to support both.
 
 I'd also be concerned about having a reference to a non-released
 version of akka. Akka is the source of our hardest-to-find bugs and
 simultaneously trying to support 2.2.3 and 2.3-M1 is a bit daunting.
 Of course, if you are building off of master you can maintain a fork
 that uses this.
 
 - Patrick
 
 
 On Thu, Dec 12, 2013 at 12:42 AM, Liu, Raymond
 raymond@intel.comwrote:
 
 Hi Patrick
 
What does that means for drop YARN 2.2? seems codes are still
 there. You mean if build upon 2.2 it will break, and won't and work
 right?
 Since the home made akka build on scala 2.10 are not there. While, if
 for this case, can we just use akka 2.3-M1 which run on protobuf 2.5
 for replacement?
 
 Best Regards,
 Raymond Liu
 
 
 -Original Message-
 From: Patrick Wendell [mailto:pwend...@gmail.com]
 Sent: Thursday, December 12, 2013 4:21 PM
 To: dev@spark.incubator.apache.org
 Subject: Scala 2.10 Merge
 
 Hi Developers,
 
 In the next few days we are planning to merge Scala 2.10 support into
 Spark. For those that haven't been following this, Prashant Sharma
 has been maintaining the scala-2.10 branch of Spark for several
 months. This branch is current with master and has been reviewed for
 merging:
 
 https://github.com/apache/incubator-spark/tree/scala-2.10
 
 Scala 2.10 support is one of the most requested features for Spark -
 it will be great to get this into Spark 0.9! Please note that *Scala
 2.10 is not binary compatible with Scala 2.9*. With that in mind, I
 wanted to give a few heads-up/requests to developers:
 
 If you are developing applications on top of Spark's master branch,
 those will need to migrate to Scala 2.10. You may want to download
 and test the current scala-2.10 branch in order to make sure you will
 be okay as Spark developments move forward. Of course, you can always
 stick with the current master commit and be fine (I'll cut a tag when
 we do the merge in order to delineate where the version changes).
 Please open new threads on the dev list to report and discuss any
 issues.
 
 This merge will temporarily drop support for YARN 2.2 on the master
 branch.
 This is because the workaround we used was only compiled for Scala 2.9.