Re: Set up Scala 2.12 test build in Jenkins

2018-08-05 Thread Mridul Muralidharan
I agree, we should not work around the testcase but rather understand and fix the root cause. Closure cleaner should have null'ed out the references and allowed it to be serialized. Regards, Mridul On Sun, Aug 5, 2018 at 8:38 PM Wenchen Fan wrote: > > It seems to me that the closure cleaner

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-05 Thread Jungtaek Lim
Answering one of missed question: > I am not sure how were you planning to expose the state key groups at api level and if it would be transparent. I was thinking about introducing new configuration: it may look like adding unnecessary configuration, but I thought it would help elasticity

Re: [DISCUSS][SQL] Control the number of output files

2018-08-05 Thread John Zhuge
Great help from the community! On Sun, Aug 5, 2018 at 6:17 PM Xiao Li wrote: > FYI, the new hints have been merged. They will be available in the > upcoming release (Spark 2.4). > > *John Zhuge*, thanks for your work! Really appreciate it! Please submit > more PRs and help the community improve

Re: [DISCUSS][SQL] Control the number of output files

2018-08-05 Thread John Zhuge
https://issues.apache.org/jira/browse/SPARK-24940 The PR has been merged to 2.4.0. On Sun, Aug 5, 2018 at 6:06 PM Koert Kuipers wrote: > lukas, > what is the jira ticket for this? i would like to follow it's activity. > thanks! > koert > > On Wed, Jul 25, 2018 at 5:32 PM, lukas nalezenec

Re: Set up Scala 2.12 test build in Jenkins

2018-08-05 Thread Wenchen Fan
It seems to me that the closure cleaner fails to clean up something. The failed test case defines a serializable class inside the test case, and the class doesn't refer to anything in the outer class. Ideally it can be serialized after cleaning up the closure. This is somehow a very weird way to

Re: [DISCUSS][SQL] Control the number of output files

2018-08-05 Thread Xiao Li
FYI, the new hints have been merged. They will be available in the upcoming release (Spark 2.4). *John Zhuge*, thanks for your work! Really appreciate it! Please submit more PRs and help the community improve Spark. : ) Xiao 2018-08-05 21:06 GMT-04:00 Koert Kuipers : > lukas, > what is the

Re: [DISCUSS][SQL] Control the number of output files

2018-08-05 Thread Koert Kuipers
lukas, what is the jira ticket for this? i would like to follow it's activity. thanks! koert On Wed, Jul 25, 2018 at 5:32 PM, lukas nalezenec wrote: > Hi, > Yes, This feature is planned - Spark should be soon able to repartition > output by size. > Lukas > > > Dne st 25. 7. 2018 23:26 uživatel

Re: Why is SQLImplicits an abstract class rather than a trait?

2018-08-05 Thread Jacek Laskowski
Hi Assaf, No idea (and don't remember I've ever wondered about it before), but why not doing this (untested): trait MySparkTestTrait { lazy val spark: SparkSession = SparkSession.builder().getOrCreate() // <-- you sure you don't need master? import spark.implicits._ } Wouldn't that import

Re: Set up Scala 2.12 test build in Jenkins

2018-08-05 Thread Stavros Kontopoulos
Makes sense, not sure if closure cleaning is related to the last one for example or others. The last one is a bit weird, unless I am missing something about the LegacyAccumulatorWrapper logic. Stavros On Sun, Aug 5, 2018 at 10:23 PM, Sean Owen wrote: > Yep that's what I did. There are more

Re: Set up Scala 2.12 test build in Jenkins

2018-08-05 Thread Sean Owen
Yep that's what I did. There are more failures with different resolutions. I'll open a JIRA and PR and ping you, to make sure that the changes are all reasonable, and not an artifact of missing something about closure cleaning in 2.12. In the meantime having a 2.12 build up and running for master

Re: Set up Scala 2.12 test build in Jenkins

2018-08-05 Thread Stavros Kontopoulos
Hi Sean, I run a quick build so the failing tests seem to be: - SPARK-17644: After one stage is aborted for too many failed attempts, subsequent stagesstill behave correctly on fetch failures *** FAILED *** A job with one fetch failure should eventually succeed (DAGSchedulerSuite.scala:2422)

Why is SQLImplicits an abstract class rather than a trait?

2018-08-05 Thread assaf.mendelson
Hi all, I have been playing a bit with SQLImplicits and noticed that it is an abstract class. I was wondering why is that? It has no constructor. Because of it being an abstract class it means that adding a test trait cannot extend it and still be a trait. Consider the following: trait

Set up Scala 2.12 test build in Jenkins

2018-08-05 Thread Sean Owen
Shane et al - could we get a test job in Jenkins to test the Scala 2.12 build? I don't think I have the access or expertise for it, though I could probably copy and paste a job. I think we just need to clone the, say, master Maven Hadoop 2.7 job, and add two steps: run

Re: Am I crazy, or does the binary distro not have Kafka integration?

2018-08-05 Thread Sean Owen
Yes it's a resaonable argument, that putting N more external integration modules on the default spark-submit classpath might bring in more third-party dependencies that clash or something. I think the convenience factor isn't a big deal; users can also just write a dependence on said module in

Re: [Proposal] New feature: reconfigurable number of partitions on stateful operators in Structured Streaming

2018-08-05 Thread Jungtaek Lim
"coalesce" looks like working: I misunderstood it as an efficient version of "repartition" which does shuffle, so expected it would trigger shuffle. My proposal would be covered as using "coalesce": thanks Joseph for correction. Let me abandon the proposal. We may still miss for now is