Re: SBT doesn't pick resource file after clean

2016-05-20 Thread dhruve ashar
The issue is fixed. Here's an explanation which interested people can read through: For the earlier mail, the default resourceDirectory => core/src/main/resources didn't yield the expected result. By default all the static resources placed under this directory are picked up and included in the

Re: Quick question on spark performance

2016-05-20 Thread Yash Sharma
Am going with the default java opts for emr- -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p' The data is not partitioned. Its 6Tb data of

Re: Quick question on spark performance

2016-05-20 Thread Yash Sharma
The median GC time is 1.3 mins for a median duration of 41 mins. What parameters can I tune for controlling GC. Other details, median Peak execution memory of 13 G and input records of 2.3 gigs. 180-200 executors launched. - Thanks, via mobile, excuse brevity. On May 21, 2016 10:59 AM, "Reynold

Re: Quick question on spark performance

2016-05-20 Thread Ted Yu
Yash: Can you share the JVM parameters you used ? How many partitions are there in your data set ? Thanks On Fri, May 20, 2016 at 5:59 PM, Reynold Xin wrote: > It's probably due to GC. > > On Fri, May 20, 2016 at 5:54 PM, Yash Sharma wrote: > >> Hi

Re: Quick question on spark performance

2016-05-20 Thread Reynold Xin
It's probably due to GC. On Fri, May 20, 2016 at 5:54 PM, Yash Sharma wrote: > Hi All, > I am here to get some expert advice on a use case I am working on. > > Cluster & job details below - > > Data - 6 Tb > Cluster - EMR - 15 Nodes C3-8xLarge (shared by other MR apps) > >

Quick question on spark performance

2016-05-20 Thread Yash Sharma
Hi All, I am here to get some expert advice on a use case I am working on. Cluster & job details below - Data - 6 Tb Cluster - EMR - 15 Nodes C3-8xLarge (shared by other MR apps) Parameters- --executor-memory 10G \ --executor-cores 6 \ --conf spark.dynamicAllocation.enabled=true \ --conf

Re: [vote] Apache Spark 2.0.0-preview release (rc1)

2016-05-20 Thread Ricardo Almeida
+1 Ricardo Almeida On 20 May 2016 at 18:33, Mark Hamstra wrote: > This is isn't yet a release candidate since, as Reynold mention in his > opening post, preview releases are "not meant to be functional, i.e. they > can and highly likely will contain critical bugs

Re: SBT doesn't pick resource file after clean

2016-05-20 Thread Jakob Odersky
Ah, I think I see the issue. resourceManaged and core/src/resources aren't included in the classpath; to achieve that, you need to scope the setting to either "compile" or "test" (probably compile in your case). So, the simplest way to add the extra settings would be something like:

Re: [vote] Apache Spark 2.0.0-preview release (rc1)

2016-05-20 Thread Mark Hamstra
This is isn't yet a release candidate since, as Reynold mention in his opening post, preview releases are "not meant to be functional, i.e. they can and highly likely will contain critical bugs or documentation errors." Once we're at the point where we expect there not to be such bugs and errors,

Re: Possible Hive problem with Spark 2.0.0 preview.

2016-05-20 Thread Doug Balog
Some more info I’m still digging. I’m just trying to do `spark.table(“db.table”).count`from a spark-shell “db.table” is just a hive table. At commit b67668b this worked just fine and it returned the number of rows in db.table. Starting at ca99171 "[SPARK-15073][SQL] Hide SparkSession

Re: Spark driver and yarn behavior

2016-05-20 Thread Steve Loughran
On 20 May 2016, at 00:34, Shankar Venkataraman > wrote: Thanks Luciano. The case we are seeing is different - the yarn resource manager is shutting down the container in which the executor is running since there does

Re: [vote] Apache Spark 2.0.0-preview release (rc1)

2016-05-20 Thread Ross Lawley
+1 Having an rc1 would help me get stable feedback on using my library with Spark, compared to relying on 2.0.0-SNAPSHOT. On Fri, 20 May 2016 at 05:57 Xiao Li wrote: > Changed my vote to +1. Thanks! > > 2016-05-19 13:28 GMT-07:00 Xiao Li : > >> Will

Re: Dataset reduceByKey

2016-05-20 Thread Reynold Xin
Andres - this is great feedback. Let me think about it a little bit more and reply later. On Thu, May 19, 2016 at 11:12 AM, Andres Perez wrote: > Hi all, > > We were in the process of porting an RDD program to one which uses > Datasets. Most things were easy to transition,

Re: right outer joins on Datasets

2016-05-20 Thread Reynold Xin
I filed https://issues.apache.org/jira/browse/SPARK-15441 On Thu, May 19, 2016 at 8:48 AM, Andres Perez wrote: > Hi all, I'm getting some odd behavior when using the joinWith > functionality for Datasets. Here is a small test case: > > val left = List(("a", 1), ("a", 2),

Re: SparkR dataframe error

2016-05-20 Thread Kai Jiang
Cool. I will open a JIRA issue to track this. Thanks, Kai. On Thu, May 19, 2016 at 9:55 PM, Sun Rui wrote: > Kai, > You can simply ignore this test failure before it is fixed > > On May 20, 2016, at 12:54, Sun Rui wrote: > > Yes. I also met this