Re: Spark 3.0 preview release on-going features discussion

Ryan Blue Fri, 20 Sep 2019 10:56:35 -0700

I’m not sure that DSv2 list is accurate. We discussed this in the DSv2 sync
this week (just sent out the notes) and came up with these items:


   - Finish TableProvider update to avoid another API change: pass all
   table config from metastore
   - Catalog behavior fix: https://issues.apache.org/jira/browse/SPARK-29014
   - Stats push-down fix: move push-down to the optimizer
   - Make DataFrameWriter compatible when updating a source from v1 to v2,
   by adding extractCatalogName and extractIdentifier to TableProvider

Some of the ideas that came up, like changing the pushdown API, were passed
on because it is too close to the release to reasonably get the changes
done without a serious delay (like the API changes just before the 2.4
release).

On Fri, Sep 20, 2019 at 9:55 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Thank you for the summarization, Xingbo.
>
> I also agree with Sean because I don't think those block 3.0.0 preview
> release.
> Especially, correctness issues should not be there.
>
> Instead, could you summarize what we have as of now for 3.0.0 preview?
>
> I believe JDK11 (SPARK-28684) and Hive 2.3.5 (SPARK-23710) will be in the
> what-we-have list for 3.0.0 preview.
>
> Bests,
> Dongjoon.
>
> On Fri, Sep 20, 2019 at 6:22 AM Sean Owen <sro...@gmail.com> wrote:
>
>> Is this a list of items that might be focused on for the final 3.0
>> release? At least, Scala 2.13 support shouldn't be on that list. The
>> others look plausible, or are already done, but there are probably
>> more.
>>
>> As for the 3.0 preview, I wouldn't necessarily block on any particular
>> feature, though, yes, the more work that can go into important items
>> between now and then, the better.
>> I wouldn't necessarily present any list of things that will or might
>> be in 3.0 with that preview; just list the things that are done, like
>> JDK 11 support.
>>
>> On Fri, Sep 20, 2019 at 2:46 AM Xingbo Jiang <jiangxb1...@gmail.com>
>> wrote:
>> >
>> > Hi all,
>> >
>> > Let's start a new thread to discuss the on-going features for Spark 3.0
>> preview release.
>> >
>> > Below is the feature list for the Spark 3.0 preview release. The list
>> is collected from the previous discussions in the dev list.
>> >
>> > Followup of the shuffle+repartition correctness issue: support roll
>> back shuffle stages (https://issues.apache.org/jira/browse/SPARK-25341)
>> > Upgrade the built-in Hive to 2.3.5 for hadoop-3.2 (
>> https://issues.apache.org/jira/browse/SPARK-23710)
>> > JDK 11 support (https://issues.apache.org/jira/browse/SPARK-28684)
>> > Scala 2.13 support (https://issues.apache.org/jira/browse/SPARK-25075)
>> > DataSourceV2 features
>> >
>> > Enable file source v2 writers (
>> https://issues.apache.org/jira/browse/SPARK-27589)
>> > CREATE TABLE USING with DataSourceV2
>> > New pushdown API for DataSourceV2
>> > Support DELETE/UPDATE/MERGE Operations in DataSourceV2 (
>> https://issues.apache.org/jira/browse/SPARK-28303)
>> >
>> > Correctness issue: Stream-stream joins - left outer join gives
>> inconsistent output (https://issues.apache.org/jira/browse/SPARK-26154)
>> > Revisiting Python / pandas UDF (
>> https://issues.apache.org/jira/browse/SPARK-28264)
>> > Spark Graph (https://issues.apache.org/jira/browse/SPARK-25994)
>> >
>> > Features that are nice to have:
>> >
>> > Use remote storage for persisting shuffle data (
>> https://issues.apache.org/jira/browse/SPARK-25299)
>> > Spark + Hadoop + Parquet + Avro compatibility problems (
>> https://issues.apache.org/jira/browse/SPARK-25588)
>> > Introduce new option to Kafka source - specify timestamp to start and
>> end offset (https://issues.apache.org/jira/browse/SPARK-26848)
>> > Delete files after processing in structured streaming (
>> https://issues.apache.org/jira/browse/SPARK-20568)
>> >
>> > Here, I am proposing to cut the branch on October 15th. If the features
>> are targeting to 3.0 preview release, please prioritize the work and finish
>> it before the date. Note, Oct. 15th is not the code freeze of Spark 3.0.
>> That means, the community will still work on the features for the upcoming
>> Spark 3.0 release, even if they are not included in the preview release.
>> The goal of preview release is to collect more feedback from the community
>> regarding the new 3.0 features/behavior changes.
>> >
>> > Thanks!
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Spark 3.0 preview release on-going features discussion

Reply via email to