Re: [pyspark] dataframe map_partition

2019-03-08 Thread peng yu
Cool, thanks for letting me know, but why not support dapply http://spark.apache.org/docs/2.0.0/api/R/dapply.html as supported in R, so we can just pass in a pandas dataframe On Fri, Mar 8, 2019 at 6:09 PM Li Jin wrote: > Hi, > > Pandas UDF supports input as struct type. However, note that it

Re: [pyspark] dataframe map_partition

2019-03-08 Thread Li Jin
Hi, Pandas UDF supports input as struct type. However, note that it will be turned into python dict because pandas itself does not have native struct type. On Fri, Mar 8, 2019 at 2:55 PM peng yu wrote: > Yeah, that seems most likely i have wanted, does the scalar Pandas UDF > support input is a

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread DB Tsai
Since I can not find the commit of `Preparing development version 2.4.2-SNAPSHOT` after rc6 cut, it's very risky to fix the branch and do a force-push. I'll follow Marcelo's suggestion to have another rc7 cut. Thus, this vote fails. DB Tsai | Siri Open Source Technologies [not a

Re: [pyspark] dataframe map_partition

2019-03-08 Thread peng yu
Yeah, that seems most likely i have wanted, does the scalar Pandas UDF support input is a StructType too ? On Fri, Mar 8, 2019 at 2:25 PM Bryan Cutler wrote: > Hi Peng, > > I just added support for scalar Pandas UDF to return a StructType as a > Pandas DataFrame in

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread DB Tsai
Okay, I see the problem. rc6 tag is not in the 2.4 branch. It's very weird. It must be overwritten by a force push. DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc > On Mar 8, 2019, at 11:39 AM, DB Tsai wrote: > > I was using `./do-release-docker.sh` to create

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread Marcelo Vanzin
I'd be more comfortable with an rc7. Either that or manually fix the branch with a force push, but that's a bit risky, it's easy to mess up force pushes (if we can even do that?). It's very possible that there is a bug in the script; IIRC it should create the commits in the right branch when you

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread DB Tsai
I was using `./do-release-docker.sh` to create a release. But since the gpg validation fails couple times when the script tried to publish the jars into Nexus, I re-ran the scripts multiple times without creating a new rc. I was wondering if the script will overwrite the v.2.4.1-rc6 tag instead

Re: [pyspark] dataframe map_partition

2019-03-08 Thread Bryan Cutler
Hi Peng, I just added support for scalar Pandas UDF to return a StructType as a Pandas DataFrame in https://issues.apache.org/jira/browse/SPARK-23836. Is that the functionality you are looking for? Bryan On Thu, Mar 7, 2019 at 1:13 PM peng yu wrote: > right now, i'm using the colums-at-a-time

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread Marcelo Vanzin
I personally find it a little weird to not have the commit in branch-2.4. Not that this would happen, but if the v2.4.1-rc6 tag is overwritten (e.g. accidentally) then you lose the reference to that commit, and then the exact commit from which the rc was generated is lost. On Fri, Mar 8, 2019 at

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread Sean Owen
That's weird. I see the commit but can't find it in the branch. Was it pushed, or lost in a force push of 2.4 along the way? The change is there, just under a different commit in the 2.4 branch. It doesn't necessarily invalidate the RC as it is a valid public tagged commit and all that. I just

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-08 Thread Jakub Wozniak
Hi, To me it is backwards compatible with older Hbase versions. The code actually only falls back to the newer api on exception. It would be great if this gets in. Otherwise a setup with Hbase 2 + Spark 2.4 gets a bit complicated as we are forced to use an older version of the Hbase client

Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-08 Thread Jakub Wozniak
I guess it is that one: https://github.com/apache/spark/commit/dfed439e33b7bf224dd412b0960402068d961c7b#diff-9ebb59b7b008c694a8f583b94bd24e1d Cheers, Jakub On 7 Mar 2019, at 17:25, Sean Owen mailto:sro...@gmail.com>> wrote: Do you know what change fixed it? If it's not a regression from 2.4.0

Re: [VOTE] Release Apache Spark 2.4.1 (RC6)

2019-03-08 Thread Mihály Tóth
Hi, I am not sure how problematic it is but v2.4.1-rc6 is not on branch-2.4. Release related commits I have seen so far were also part of the branch. I guess the "Preparing Spark release v2.4.1-rc6" and "Preparing development version 2.4.2-SNAPSHOT" commits were simply not pushed to spark-2.4