Re: code freeze and branch cut for Apache Spark 2.4

Wenchen Fan Mon, 30 Jul 2018 19:01:53 -0700

I went through the open JIRA tickets and here is a list that we should
consider for Spark 2.4:

*High Priority*:
SPARK-24374 <https://issues.apache.org/jira/browse/SPARK-24374>: Support
Barrier Execution Mode in Apache Spark
This one is critical to the Spark ecosystem for deep learning. It only has
a few remaining works and I think we should have it in Spark 2.4.

*Middle Priority*:
SPARK-23899 <https://issues.apache.org/jira/browse/SPARK-23899>: Built-in
SQL Function Improvement
We've already added a lot of built-in functions in this release, but there
are a few useful higher-order functions in progress, like `array_except`,
`transform`, etc. It would be great if we can get them in Spark 2.4.

SPARK-14220 <https://issues.apache.org/jira/browse/SPARK-14220>: Build and
test Spark against Scala 2.12
Very close to finishing, great to have it in Spark 2.4.

SPARK-4502 <https://issues.apache.org/jira/browse/SPARK-4502>: Spark SQL
reads unnecessary nested fields from Parquet
This one is there for years (thanks for your patience Michael!), and is
also close to finishing. Great to have it in 2.4.

SPARK-24882 <https://issues.apache.org/jira/browse/SPARK-24882>: data
source v2 API improvement
This is to improve the data source v2 API based on what we learned during
this release. From the migration of existing sources and design of new
features, we found some problems in the API and want to address them. I
believe this should be the last significant API change to data source
v2, so great to have in Spark 2.4. I'll send a discuss email about it later.

SPARK-24252 <https://issues.apache.org/jira/browse/SPARK-24252>: Add
catalog support in Data Source V2
This is a very important feature for data source v2, and is currently being
discussed in the dev list.

SPARK-24768 <https://issues.apache.org/jira/browse/SPARK-24768>: Have a
built-in AVRO data source implementation
Most of it is done, but date/timestamp support is still missing. Great to
have in 2.4.

SPARK-23243 <https://issues.apache.org/jira/browse/SPARK-23243>:
Shuffle+Repartition on an RDD could lead to incorrect answers
This is a long-standing correctness bug, great to have in 2.4.

There are some other important features like the adaptive execution,
streaming SQL, etc., not in the list, since I think we are not able to
finish them before 2.4.

Feel free to add more things if you think they are important to Spark 2.4
by replying to this email.

Thanks,
Wenchen

On Mon, Jul 30, 2018 at 11:00 PM Sean Owen <sro...@apache.org> wrote:

> In theory releases happen on a time-based cadence, so it's pretty much
> wrap up what's ready by the code freeze and ship it. In practice, the
> cadence slips frequently, and it's very much a negotiation about what
> features should push the code freeze out a few weeks every time. So, kind
> of a hybrid approach here that works OK.
>
> Certainly speak up if you think there's something that really needs to get
> into 2.4. This is that discuss thread.
>
> (BTW I updated the page you mention just yesterday, to reflect the plan
> suggested in this thread.)
>
> On Mon, Jul 30, 2018 at 9:51 AM Tom Graves <tgraves...@yahoo.com.invalid>
> wrote:
>
>> Shouldn't this be a discuss thread?
>>
>> I'm also happy to see more release managers and agree the time is getting
>> close, but we should see what features are in progress and see how close
>> things are and propose a date based on that.  Cutting a branch to soon just
>> creates more work for committers to push to more branches.
>>
>>  http://spark.apache.org/versioning-policy.html mentioned the code
>> freeze and release branch cut mid-august.
>>
>>
>> Tom
>>
>>

Re: code freeze and branch cut for Apache Spark 2.4

Reply via email to