Re: Scala API: simplifying common patterns

2016-02-08 Thread Reynold Xin
Can you create a pull request? It is difficult to know what's going on. On Mon, Feb 8, 2016 at 4:51 PM, sim wrote: > 24 test failures for sql/test: > https://gist.github.com/ssimeonov/89862967f87c5c497322 > > > > -- > View this message in context: >

Re: spark on yarn wastes one box (or 1 GB on each box) for am container

2016-02-08 Thread Jonathan Kelly
Alex, That's a very good question that I've been trying to answer myself recently too. Since you've mentioned before that you're using EMR, I assume you're asking this because you've noticed this behavior on emr-4.3.0. In this release, we made some changes to the maximizeResourceAllocation

spark on yarn wastes one box (or 1 GB on each box) for am container

2016-02-08 Thread Alexander Pivovarov
Lets say that yarn has 53GB memory available on each slave spark.am container needs 896MB. (512 + 384) I see two options to configure spark: 1. configure spark executors to use 52GB and leave 1 GB on each box. So, some box will also run am container. So, 1GB memory will not be used on all

Re: spark on yarn wastes one box (or 1 GB on each box) for am container

2016-02-08 Thread Sean Owen
Typically YARN is there because you're mediating resource requests from things besides Spark, so yeah using every bit of the cluster is a little bit of a corner case. There's not a good answer if all your nodes are the same size. I think you can let YARN over-commit RAM though, and allocate more

Re: Long running Spark job on YARN throws "No AMRMToken"

2016-02-08 Thread Prabhu Joseph
+ Spark-Dev On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph wrote: > Hi All, > > A long running Spark job on YARN throws below exception after running > for few days. > > yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row. >

Re: Preserving partitioning with dataframe select

2016-02-08 Thread Matt Cheah
Interesting ­ I might be misinterpreting my Spark UI then, in terms of the number of stages I¹m seeing in the job before and after I¹m doing the pre-partitioning. That said, I was mostly thinking about this when reading through the code. In particular, under basicOperators.scala in

Welcoming two new committers

2016-02-08 Thread Matei Zaharia
Hi all, The PMC has recently added two new Spark committers -- Herman van Hovell and Wenchen Fan. Both have been heavily involved in Spark SQL and Tungsten, adding new features, optimizations and APIs. Please join me in welcoming Herman and Wenchen. Matei

Re: Welcoming two new committers

2016-02-08 Thread Ted Yu
Congratulations, Herman and Wenchen. On Mon, Feb 8, 2016 at 9:15 AM, Matei Zaharia wrote: > Hi all, > > The PMC has recently added two new Spark committers -- Herman van Hovell > and Wenchen Fan. Both have been heavily involved in Spark SQL and Tungsten, > adding new

[build system] brief downtime, 8am PST thursday feb 10th

2016-02-08 Thread shane knapp
happy monday! i will be bringing down jenkins and the workers thursday morning to upgrade docker on all of the workers from 1.5.0-1 to 1.7.1-2. as of december last year, docker 1.5 and older lost the ability to pull from the docker hub. since we're running centos 6.X on our workers, and can't

Re: Welcoming two new committers

2016-02-08 Thread Bhupendra Mishra
Congratulations to both. and welcome to group. On Mon, Feb 8, 2016 at 10:45 PM, Matei Zaharia wrote: > Hi all, > > The PMC has recently added two new Spark committers -- Herman van Hovell > and Wenchen Fan. Both have been heavily involved in Spark SQL and Tungsten, >

Re: Welcoming two new committers

2016-02-08 Thread Corey Nolet
Congrats guys! On Mon, Feb 8, 2016 at 12:23 PM, Ted Yu wrote: > Congratulations, Herman and Wenchen. > > On Mon, Feb 8, 2016 at 9:15 AM, Matei Zaharia > wrote: > >> Hi all, >> >> The PMC has recently added two new Spark committers -- Herman van

Re: Welcoming two new committers

2016-02-08 Thread Luciano Resende
On Mon, Feb 8, 2016 at 9:15 AM, Matei Zaharia wrote: > Hi all, > > The PMC has recently added two new Spark committers -- Herman van Hovell > and Wenchen Fan. Both have been heavily involved in Spark SQL and Tungsten, > adding new features, optimizations and APIs. Please

Re: Welcoming two new committers

2016-02-08 Thread Dilip Biswal
Congratulations Wenchen and Herman !! Regards, Dilip Biswal Tel: 408-463-4980 dbis...@us.ibm.com From: Xiao Li To: Corey Nolet Cc: Ted Yu , Matei Zaharia , dev Date:

Re: Welcoming two new committers

2016-02-08 Thread Denny Lee
Awesome - congratulations Herman and Wenchan! On Mon, Feb 8, 2016 at 10:26 AM Dilip Biswal wrote: > Congratulations Wenchen and Herman !! > > Regards, > Dilip Biswal > Tel: 408-463-4980 > dbis...@us.ibm.com > > > > From:Xiao Li > To:

Re: Welcoming two new committers

2016-02-08 Thread Shixiong(Ryan) Zhu
Congrats!!! Herman and Wenchen!!! On Mon, Feb 8, 2016 at 10:44 AM, Luciano Resende wrote: > > > On Mon, Feb 8, 2016 at 9:15 AM, Matei Zaharia > wrote: > >> Hi all, >> >> The PMC has recently added two new Spark committers -- Herman van Hovell >>

Re: Welcoming two new committers

2016-02-08 Thread Xiao Li
Congratulations! Herman and Wenchen! I am just so happy for you! You absolutely deserve it! 2016-02-08 9:35 GMT-08:00 Corey Nolet : > Congrats guys! > > On Mon, Feb 8, 2016 at 12:23 PM, Ted Yu wrote: > >> Congratulations, Herman and Wenchen. >> >> On

Re: Welcoming two new committers

2016-02-08 Thread Ram Sriharsha
great job guys! congrats and welcome! On Mon, Feb 8, 2016 at 12:05 PM, Amit Chavan wrote: > Welcome. > > On Mon, Feb 8, 2016 at 2:50 PM, Suresh Thalamati < > suresh.thalam...@gmail.com> wrote: > >> Congratulations Herman and Wenchen! >> >> On Mon, Feb 8, 2016 at 10:59 AM,

Re: Welcoming two new committers

2016-02-08 Thread Andrew Or
Welcome! 2016-02-08 10:55 GMT-08:00 Bhupendra Mishra : > Congratulations to both. and welcome to group. > > On Mon, Feb 8, 2016 at 10:45 PM, Matei Zaharia > wrote: > >> Hi all, >> >> The PMC has recently added two new Spark committers --

Re: Welcoming two new committers

2016-02-08 Thread Suresh Thalamati
Congratulations Herman and Wenchen! On Mon, Feb 8, 2016 at 10:59 AM, Andrew Or wrote: > Welcome! > > 2016-02-08 10:55 GMT-08:00 Bhupendra Mishra : > >> Congratulations to both. and welcome to group. >> >> On Mon, Feb 8, 2016 at 10:45 PM, Matei

Re: Welcoming two new committers

2016-02-08 Thread Amit Chavan
Welcome. On Mon, Feb 8, 2016 at 2:50 PM, Suresh Thalamati wrote: > Congratulations Herman and Wenchen! > > On Mon, Feb 8, 2016 at 10:59 AM, Andrew Or wrote: > >> Welcome! >> >> 2016-02-08 10:55 GMT-08:00 Bhupendra Mishra

Spark in Production - Use Cases

2016-02-08 Thread Scott walent
Spark Summit East is just 10 days away and we are almost sold out! One of the highlights this year will focus on how Spark is being used across businesses to solve both big and small data needs. Check out the full agenda here: https://spark-summit.org/east-2016/schedule/ Use "ApacheList" for 30%

Re: Welcoming two new committers

2016-02-08 Thread Joseph Bradley
Congrats & welcome! On Mon, Feb 8, 2016 at 12:19 PM, Ram Sriharsha wrote: > great job guys! congrats and welcome! > > On Mon, Feb 8, 2016 at 12:05 PM, Amit Chavan wrote: > >> Welcome. >> >> On Mon, Feb 8, 2016 at 2:50 PM, Suresh Thalamati < >>

Re: pyspark worker concurrency

2016-02-08 Thread Renyi Xiong
never mind, I think pyspark is already doing async socket read / write, but on scala side in PythonRDD.scala On Sat, Feb 6, 2016 at 6:27 PM, Renyi Xiong wrote: > Hi, > > is it a good idea to have 2 threads in pyspark worker? - main thread > responsible for receive and