Re: Spark 3.0 preview release on-going features discussion

Dongjoon Hyun Fri, 20 Sep 2019 09:56:15 -0700

Thank you for the summarization, Xingbo.

I also agree with Sean because I don't think those block 3.0.0 preview
release.
Especially, correctness issues should not be there.


Instead, could you summarize what we have as of now for 3.0.0 preview?

I believe JDK11 (SPARK-28684) and Hive 2.3.5 (SPARK-23710) will be in the
what-we-have list for 3.0.0 preview.

Bests,
Dongjoon.

On Fri, Sep 20, 2019 at 6:22 AM Sean Owen <[email protected]> wrote:

> Is this a list of items that might be focused on for the final 3.0
> release? At least, Scala 2.13 support shouldn't be on that list. The
> others look plausible, or are already done, but there are probably
> more.
>
> As for the 3.0 preview, I wouldn't necessarily block on any particular
> feature, though, yes, the more work that can go into important items
> between now and then, the better.
> I wouldn't necessarily present any list of things that will or might
> be in 3.0 with that preview; just list the things that are done, like
> JDK 11 support.
>
> On Fri, Sep 20, 2019 at 2:46 AM Xingbo Jiang <[email protected]>
> wrote:
> >
> > Hi all,
> >
> > Let's start a new thread to discuss the on-going features for Spark 3.0
> preview release.
> >
> > Below is the feature list for the Spark 3.0 preview release. The list is
> collected from the previous discussions in the dev list.
> >
> > Followup of the shuffle+repartition correctness issue: support roll back
> shuffle stages (https://issues.apache.org/jira/browse/SPARK-25341)
> > Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 (
> https://issues.apache.org/jira/browse/SPARK-23710)
> > JDK 11 support (https://issues.apache.org/jira/browse/SPARK-28684)
> > Scala 2.13 support (https://issues.apache.org/jira/browse/SPARK-25075)
> > DataSourceV2 features
> >
> > Enable file source v2 writers (
> https://issues.apache.org/jira/browse/SPARK-27589)
> > CREATE TABLE USING with DataSourceV2
> > New pushdown API for DataSourceV2
> > Support DELETE/UPDATE/MERGE Operations in DataSourceV2 (
> https://issues.apache.org/jira/browse/SPARK-28303)
> >
> > Correctness issue: Stream-stream joins - left outer join gives
> inconsistent output (https://issues.apache.org/jira/browse/SPARK-26154)
> > Revisiting Python / pandas UDF (
> https://issues.apache.org/jira/browse/SPARK-28264)
> > Spark Graph (https://issues.apache.org/jira/browse/SPARK-25994)
> >
> > Features that are nice to have:
> >
> > Use remote storage for persisting shuffle data (
> https://issues.apache.org/jira/browse/SPARK-25299)
> > Spark + Hadoop + Parquet + Avro compatibility problems (
> https://issues.apache.org/jira/browse/SPARK-25588)
> > Introduce new option to Kafka source - specify timestamp to start and
> end offset (https://issues.apache.org/jira/browse/SPARK-26848)
> > Delete files after processing in structured streaming (
> https://issues.apache.org/jira/browse/SPARK-20568)
> >
> > Here, I am proposing to cut the branch on October 15th. If the features
> are targeting to 3.0 preview release, please prioritize the work and finish
> it before the date. Note, Oct. 15th is not the code freeze of Spark 3.0.
> That means, the community will still work on the features for the upcoming
> Spark 3.0 release, even if they are not included in the preview release.
> The goal of preview release is to collect more feedback from the community
> regarding the new 3.0 features/behavior changes.
> >
> > Thanks!
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>

Re: Spark 3.0 preview release on-going features discussion

Reply via email to