Thanks everyone, let me first work on the feature list and major changes that have already been finished in the master branch.
Cheers! Xingbo Ryan Blue <rb...@netflix.com> 于2019年9月20日周五 上午10:56写道: > I’m not sure that DSv2 list is accurate. We discussed this in the DSv2 > sync this week (just sent out the notes) and came up with these items: > > - Finish TableProvider update to avoid another API change: pass all > table config from metastore > - Catalog behavior fix: > https://issues.apache.org/jira/browse/SPARK-29014 > - Stats push-down fix: move push-down to the optimizer > - Make DataFrameWriter compatible when updating a source from v1 to > v2, by adding extractCatalogName and extractIdentifier to TableProvider > > Some of the ideas that came up, like changing the pushdown API, were > passed on because it is too close to the release to reasonably get the > changes done without a serious delay (like the API changes just before the > 2.4 release). > > On Fri, Sep 20, 2019 at 9:55 AM Dongjoon Hyun <dongjoon.h...@gmail.com> > wrote: > >> Thank you for the summarization, Xingbo. >> >> I also agree with Sean because I don't think those block 3.0.0 preview >> release. >> Especially, correctness issues should not be there. >> >> Instead, could you summarize what we have as of now for 3.0.0 preview? >> >> I believe JDK11 (SPARK-28684) and Hive 2.3.5 (SPARK-23710) will be in the >> what-we-have list for 3.0.0 preview. >> >> Bests, >> Dongjoon. >> >> On Fri, Sep 20, 2019 at 6:22 AM Sean Owen <sro...@gmail.com> wrote: >> >>> Is this a list of items that might be focused on for the final 3.0 >>> release? At least, Scala 2.13 support shouldn't be on that list. The >>> others look plausible, or are already done, but there are probably >>> more. >>> >>> As for the 3.0 preview, I wouldn't necessarily block on any particular >>> feature, though, yes, the more work that can go into important items >>> between now and then, the better. >>> I wouldn't necessarily present any list of things that will or might >>> be in 3.0 with that preview; just list the things that are done, like >>> JDK 11 support. >>> >>> On Fri, Sep 20, 2019 at 2:46 AM Xingbo Jiang <jiangxb1...@gmail.com> >>> wrote: >>> > >>> > Hi all, >>> > >>> > Let's start a new thread to discuss the on-going features for Spark >>> 3.0 preview release. >>> > >>> > Below is the feature list for the Spark 3.0 preview release. The list >>> is collected from the previous discussions in the dev list. >>> > >>> > Followup of the shuffle+repartition correctness issue: support roll >>> back shuffle stages (https://issues.apache.org/jira/browse/SPARK-25341) >>> > Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 ( >>> https://issues.apache.org/jira/browse/SPARK-23710) >>> > JDK 11 support (https://issues.apache.org/jira/browse/SPARK-28684) >>> > Scala 2.13 support (https://issues.apache.org/jira/browse/SPARK-25075) >>> > DataSourceV2 features >>> > >>> > Enable file source v2 writers ( >>> https://issues.apache.org/jira/browse/SPARK-27589) >>> > CREATE TABLE USING with DataSourceV2 >>> > New pushdown API for DataSourceV2 >>> > Support DELETE/UPDATE/MERGE Operations in DataSourceV2 ( >>> https://issues.apache.org/jira/browse/SPARK-28303) >>> > >>> > Correctness issue: Stream-stream joins - left outer join gives >>> inconsistent output (https://issues.apache.org/jira/browse/SPARK-26154) >>> > Revisiting Python / pandas UDF ( >>> https://issues.apache.org/jira/browse/SPARK-28264) >>> > Spark Graph (https://issues.apache.org/jira/browse/SPARK-25994) >>> > >>> > Features that are nice to have: >>> > >>> > Use remote storage for persisting shuffle data ( >>> https://issues.apache.org/jira/browse/SPARK-25299) >>> > Spark + Hadoop + Parquet + Avro compatibility problems ( >>> https://issues.apache.org/jira/browse/SPARK-25588) >>> > Introduce new option to Kafka source - specify timestamp to start and >>> end offset (https://issues.apache.org/jira/browse/SPARK-26848) >>> > Delete files after processing in structured streaming ( >>> https://issues.apache.org/jira/browse/SPARK-20568) >>> > >>> > Here, I am proposing to cut the branch on October 15th. If the >>> features are targeting to 3.0 preview release, please prioritize the work >>> and finish it before the date. Note, Oct. 15th is not the code freeze of >>> Spark 3.0. That means, the community will still work on the features for >>> the upcoming Spark 3.0 release, even if they are not included in the >>> preview release. The goal of preview release is to collect more feedback >>> from the community regarding the new 3.0 features/behavior changes. >>> > >>> > Thanks! >>> >>> --------------------------------------------------------------------- >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> > > -- > Ryan Blue > Software Engineer > Netflix >