Revisiting Python / pandas UDF (continues)

2019-12-04 Thread Hyukjin Kwon
Hi all, I would like to finish redesigning Pandas UDF ones in Spark 3.0. If you guys don't have a minor concern in general about (see https://issues.apache.org/jira/browse/SPARK-28264), I would like to start soon after addressing existing comments. Please take a look and comment on the design

Re: Enabling fully disaggregated shuffle on Spark

2019-12-04 Thread bo yang
Thanks guys for the discussion in the email and also this afternoon! >From our experience, we do not need to change Spark DAG scheduler to implement a remote shuffle service. Current Spark shuffle manager interfaces are pretty good and easy to implement. But we do feel the need to modify

RE: Enabling fully disaggregated shuffle on Spark

2019-12-04 Thread Jia, Ke A
Hi Ben and Felix, This is Jia Ke from Intel Big Data Team. And I'm also interested in this. Would you please add me to the invite, thanks a lot. Best regards, Jia Ke From: Qi,He Sent: Thursday, December 05, 2019 11:12 AM To: Saisai Shao Cc: Liu,Linhong ; Aniket Mokashi ; Felix Cheung ; Ben

Re: Enabling fully disaggregated shuffle on Spark

2019-12-04 Thread Qi,He
Hi Ben and Felix This is Qi He from Baidu,same team with Linhong,I’m also interested in this. Would you please add me to the invite, thanks a lot. Thanks Qi, He 发件人: Saisai Shao mailto:sai.sai.s...@gmail.com>> 日期: 2019年12月4日 星期三 下午5:57 至: Greg Lee mailto:lihao...@gmail.com>> 抄送: "Liu,Linhong"

Re: [DISCUSS] Consistent relation resolution behavior in SparkSQL

2019-12-04 Thread Wenchen Fan
+1, I think it's good for both end-users and Spark developers: * for end-users, when they lookup a table, they don't need to care which command triggers it, as the behavior is consistent in all the places. * for Spark developers, we may simplify the code quite a bit. For now we have two code paths

Re: [DISCUSS] PostgreSQL dialect

2019-12-04 Thread Yuanjian Li
Thanks all of you for joining the discussion. The PR is given in https://github.com/apache/spark/pull/26763, all the PostgreSQL dialect related PRs are linked in the description. Hoping the authors could help in reviewing. Best, Yuanjian Driesprong, Fokko 于2019年12月1日周日 下午7:24写道: > +1

Re: SQL test failures in PR builder?

2019-12-04 Thread Shane Knapp
++yin huai for more insight in to the NewSparkPullRequestBuilder job... tbh, i never (or still) really understand the exact use for that job, except that it's triggered by https://spark-prs.appspot.com/ shane On Wed, Dec 4, 2019 at 3:34 PM Sean Owen wrote: > > BTW does anyone know why there

Re: SQL test failures in PR builder?

2019-12-04 Thread Sean Owen
BTW does anyone know why there are two PR builder jobs? I'm confused about why different ones would execute. Yes I see NewSparkPullRequestBuilder failing on a variety of PRs. I don't think it has anything to do with Hive; these PRs touch different parts of code but all not related to this

Re: Enabling fully disaggregated shuffle on Spark

2019-12-04 Thread Ben Sidhom
Hey Imran (and everybody who made it to the sync today): Thanks for the comments. Responses below: Scheduling and re-executing tasks >> Allow coordination between the service and the Spark DAG scheduler as to >> whether a given block/partition needs to be recomputed when a task fails or >> when

Re: Enabling fully disaggregated shuffle on Spark

2019-12-04 Thread Imran Rashid
Hi Ben, in general everything you're proposing sounds reasonable. For me, at least, I'd need more details on most of the points before I fully understand them, but I'm definitely in favor of the general goal for making spark support fully disaggregated shuffle. Of course, I also want to make

Re: SQL test failures in PR builder?

2019-12-04 Thread Dongjoon Hyun
Hi, Sean. It seems that there is no failure on your other SQL PR. https://github.com/apache/spark/pull/26748 Does the sequential failure happen only at `NewSparkPullRequestBuilder`? Since `NewSparkPullRequestBuilder` is not the same with `SparkPullRequestBuilder`, there might be a root

SQL test failures in PR builder?

2019-12-04 Thread Sean Owen
I'm seeing consistent failures in the PR builder when touching SQL code: https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4960/testReport/ org.apache.spark.sql.hive.thriftserver.SparkMetadataOperationSuite.Spark's own GetSchemasOperation(SparkGetSchemasOperation)14 ms2

Re: Enabling fully disaggregated shuffle on Spark

2019-12-04 Thread Saisai Shao
Hi Ben and Felix, I'm also interested in this. Would you please add me to the invite, thanks a lot. Best regards, Saisai Greg Lee 于2019年12月2日周一 下午11:34写道: > Hi Felix & Ben, > > This is Li Hao from Baidu, same team with Linhong. > > As mentioned in Linhong’s email, independent disaggregated