Re: [pyspark] dataframe map_partition

2019-03-10 Thread Hyukjin Kwon
Because both dapply in R and Scalar Pandas UDF in Python are similar, and cover each other. FWIW, it somewhat sounds like SPARK-26413 and SPARK-26412 2019년 3월 9일 (토) 오후 12:32, peng yu 님이 작성: > Cool, thanks for letting me know, but why not support dapply >

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-10 Thread Wenchen Fan
Which version of Parquet has this bug? Maybe we can downgrade it. On Mon, Mar 11, 2019 at 10:34 AM Mark Hamstra wrote: > It worked in 2.3. We broke it with 2.4.0 and were informed of that > regression late in the 2.4.0 release process. Since we didn't fix it before > the 2.4.0 release, it

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-10 Thread Mark Hamstra
It worked in 2.3. We broke it with 2.4.0 and were informed of that regression late in the 2.4.0 release process. Since we didn't fix it before the 2.4.0 release, it should have been noted as a known issue. To now claim that there is no regression from 2.4.0 is a circular argument denying the

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-10 Thread Michael Heuer
I'm not saying that this issue should be a blocker for 2.4.1, rather I'm looking for help moving things along. I'm not a committer in any of the Spark, Parquet, or Avro projects. > On Mar 10, 2019, at 8:53 PM, Sean Owen wrote: > > From https://issues.apache.org/jira/browse/SPARK-25588, I'm

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-10 Thread Sean Owen
>From https://issues.apache.org/jira/browse/SPARK-25588, I'm reading that: - this is a Parquet-Avro version conflict thing - a downstream app wants different versions of Parquet and Avro than Spark uses, which triggers it - it doesn't work in 2.4.0 It's not a regression from 2.4.0, which is the

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-10 Thread Mark Hamstra
Now wait... we created a regression in 2.4.0. Arguably, we should have blocked that release until we had a fix; but the issue came up late in the release process and it looks to me like there wasn't an adequate fix immediately available, so we did something bad and released 2.4.0 with a known

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-10 Thread DB Tsai
As we have many important fixes in 2.4 branch which we want to release asap, and this is is not a regression from Spark 2.4; as a result, 2.4.1 will be not blocked by this. Sincerely, DB Tsai -- Web: https://www.dbtsai.com PGP Key ID:

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-10 Thread Michael Heuer
Any chance we could get some movement on this for 2.4.1? https://issues.apache.org/jira/browse/SPARK-25588 https://github.com/apache/parquet-mr/pull/560 It would require a new Parquet release,