Re: sql compile failing with Zinc?

2018-08-14 Thread Steve Loughran
thanks. I'm launching zinc by hand, but then mvn is handing it off. Might be best to make the memory property configurable so that people can pla with it themselves. On 14 Aug 2018, at 13:02, Sean Owen mailto:sro...@gmail.com>> wrote: If you're running zinc directly, you can give it more

Re: [VOTE] SPARK 2.3.2 (RC5)

2018-08-14 Thread Saisai Shao
There's still another one SPARK-25114. I will wait for several days in case some other blocks jumped. Thanks Saisai Wenchen Fan 于2018年8月15日周三 上午10:19写道: > SPARK-25051 is resolved, can we start a new RC? > > SPARK-16406 is an improvement, generally we should not backport. > > On Wed, Aug 15,

[SPARK-24771] Upgrade AVRO version from 1.7.7 to 1.8

2018-08-14 Thread Wenchen Fan
Hi all, We've upgraded Avro from 1.7 to 1.8, to support date/timestamp/decimal types in the newly added Avro data source in the coming Spark 2.4, and also to make Avro work with Parquet. Since Avro 1.8 is not binary compatible with Avro 1.7 (see https://issues.apache.org/jira/browse/AVRO-1502),

Re: [VOTE] SPARK 2.3.2 (RC5)

2018-08-14 Thread Wenchen Fan
SPARK-25051 is resolved, can we start a new RC? SPARK-16406 is an improvement, generally we should not backport. On Wed, Aug 15, 2018 at 5:16 AM Sean Owen wrote: > (We wouldn't consider lack of an improvement to block a maintenance > release. It's reasonable to raise this elsewhere as a big

Re: [Performance] Spark DataFrame is slow with wide data. Polynomial complexity on the number of columns is observed. Why?

2018-08-14 Thread antonkulaga
Is it not going to be backported to 2.3.2? I am totally blocked by this issue in one of my projects. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail:

Re: [VOTE] SPARK 2.3.2 (RC5)

2018-08-14 Thread Sean Owen
(We wouldn't consider lack of an improvement to block a maintenance release. It's reasonable to raise this elsewhere as a big nice to have on 2.3.x in general) On Tue, Aug 14, 2018, 4:13 PM antonkulaga wrote: > -1 as https://issues.apache.org/jira/browse/SPARK-16406 does not seem to > be >

Re: [VOTE] SPARK 2.3.2 (RC5)

2018-08-14 Thread antonkulaga
-1 as https://issues.apache.org/jira/browse/SPARK-16406 does not seem to be back-ported to 2.3.1 and it causes a lot of pain -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail:

Re: [DISCUSS] Handling correctness/data loss jiras

2018-08-14 Thread Imran Rashid
+1 on what we should do. On Mon, Aug 13, 2018 at 3:06 PM, Tom Graves wrote: > > > I mean, what are concrete steps beyond saying this is a problem? That's > the important thing to discuss. > > Sorry I'm a bit confused by your statement but also think I agree. I > started this thread for this

Re: sql compile failing with Zinc?

2018-08-14 Thread Marco Gaido
I am not sure, I managed to build successfully using the mvn in the distribution today. Il giorno mar 14 ago 2018 alle ore 22:02 Sean Owen ha scritto: > If you're running zinc directly, you can give it more memory with -J-Xmx2g > or whatever. If you're running ./build/mvn and letting it run

Re: sql compile failing with Zinc?

2018-08-14 Thread Sean Owen
If you're running zinc directly, you can give it more memory with -J-Xmx2g or whatever. If you're running ./build/mvn and letting it run zinc we might need to increase the memory that it requests in the script. On Tue, Aug 14, 2018 at 2:56 PM Steve Loughran wrote: > Is anyone else getting the

sql compile failing with Zinc?

2018-08-14 Thread Steve Loughran
Is anyone else getting the sql module maven build on master branch failing when you use zinc for incremental builds? [warn] ^ java.lang.OutOfMemoryError: GC overhead limit exceeded at scala.tools.nsc.backend.icode.GenICode$Scope.(GenICode.scala:2225) at

Same code in DataFrameWriter.runCommand and Dataset.withAction?

2018-08-14 Thread Jacek Laskowski
Hi, I'm curious why Spark SQL uses two different methods for the seemingly very same code? * DataFrameWriter.runCommand --> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L663 * Dataset.withAction -->

[DISCUSS][SPARK-22674][PYTHON] Disabled _hack_namedtuple for picklable namedtuples

2018-08-14 Thread Sergei Lebedev
Hi all, Some time ago we've discovered that PySpark patches collections.namedtuple to allow unpickling of namedtuples defined in the REPL on the executors. Side-effects of the patch include * hard to debug failures -- we originally came across this while investigating a TensorFlowOnSpark

Re: [VOTE] SPARK 2.3.2 (RC5)

2018-08-14 Thread Marco Gaido
-1, due to SPARK-25051. It is a regression and it is a correctness bug. In 2.3.0/2.3.1 an Analysis exception was thrown, 2.2.* works fine. I cannot reproduce the issue on current master, but I was able using the prepared 2.3.2 release. Il giorno mar 14 ago 2018 alle ore 10:04 Saisai Shao ha

[VOTE] SPARK 2.3.2 (RC5)

2018-08-14 Thread Saisai Shao
Please vote on releasing the following candidate as Apache Spark version 2.3.2. The vote is open until August 20 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.3.2 [ ] -1 Do not release this package because ... To