Re: code freeze and branch cut for Apache Spark 2.4

Tomasz Gawęda Tue, 31 Jul 2018 06:07:36 -0700

Hi,

what is the status of Continuous Processing + Aggregations? As far as I 
remember, Jose Torres said it should  be easy to perform aggregations if 
coalesce(1) work. IIRC it's already merged to master.


Is this work in progress? If yes, it would be great to have full 
aggregation/join support in Spark 2.4 in CP.

Pozdrawiam / Best regards,

Tomek


On 2018-07-31 10:43, Petar Zečević wrote:
> This one is important to us: 
> https://issues.apache.org/jira/browse/SPARK-24020 (Sort-merge join inner 
> range optimization) but I think it could be useful to others too.
>
> It is finished and is ready to be merged (was ready a month ago at least).
>
> Do you think you could consider including it in 2.4?
>
> Petar
>
>
> Wenchen Fan @ 1970-01-01 01:00 CET:
>
>> I went through the open JIRA tickets and here is a list that we should 
>> consider for Spark 2.4:
>>
>> High Priority:
>> SPARK-24374: Support Barrier Execution Mode in Apache Spark
>> This one is critical to the Spark ecosystem for deep learning. It only has a 
>> few remaining works and I think we should have it in Spark 2.4.
>>
>> Middle Priority:
>> SPARK-23899: Built-in SQL Function Improvement
>> We've already added a lot of built-in functions in this release, but there 
>> are a few useful higher-order functions in progress, like `array_except`, 
>> `transform`, etc. It would be great if we can get them in Spark 2.4.
>>
>> SPARK-14220: Build and test Spark against Scala 2.12
>> Very close to finishing, great to have it in Spark 2.4.
>>
>> SPARK-4502: Spark SQL reads unnecessary nested fields from Parquet
>> This one is there for years (thanks for your patience Michael!), and is also 
>> close to finishing. Great to have it in 2.4.
>>
>> SPARK-24882: data source v2 API improvement
>> This is to improve the data source v2 API based on what we learned during 
>> this release. From the migration of existing sources and design of new 
>> features, we found some problems in the API and want to address them. I 
>> believe this should be
>> the last significant API change to data source v2, so great to have in Spark 
>> 2.4. I'll send a discuss email about it later.
>>
>> SPARK-24252: Add catalog support in Data Source V2
>> This is a very important feature for data source v2, and is currently being 
>> discussed in the dev list.
>>
>> SPARK-24768: Have a built-in AVRO data source implementation
>> Most of it is done, but date/timestamp support is still missing. Great to 
>> have in 2.4.
>>
>> SPARK-23243: Shuffle+Repartition on an RDD could lead to incorrect answers
>> This is a long-standing correctness bug, great to have in 2.4.
>>
>> There are some other important features like the adaptive execution, 
>> streaming SQL, etc., not in the list, since I think we are not able to 
>> finish them before 2.4.
>>
>> Feel free to add more things if you think they are important to Spark 2.4 by 
>> replying to this email.
>>
>> Thanks,
>> Wenchen
>>
>> On Mon, Jul 30, 2018 at 11:00 PM Sean Owen <[email protected]> wrote:
>>
>>   In theory releases happen on a time-based cadence, so it's pretty much 
>> wrap up what's ready by the code freeze and ship it. In practice, the 
>> cadence slips frequently, and it's very much a negotiation about what 
>> features should push the
>>   code freeze out a few weeks every time. So, kind of a hybrid approach here 
>> that works OK.
>>
>>   Certainly speak up if you think there's something that really needs to get 
>> into 2.4. This is that discuss thread.
>>
>>   (BTW I updated the page you mention just yesterday, to reflect the plan 
>> suggested in this thread.)
>>
>>   On Mon, Jul 30, 2018 at 9:51 AM Tom Graves <[email protected]> 
>> wrote:
>>
>>   Shouldn't this be a discuss thread?
>>
>>   I'm also happy to see more release managers and agree the time is getting 
>> close, but we should see what features are in progress and see how close 
>> things are and propose a date based on that.  Cutting a branch to soon just 
>> creates
>>   more work for committers to push to more branches.
>>
>>    http://spark.apache.org/versioning-policy.html mentioned the code freeze 
>> and release branch cut mid-august.
>>
>>   Tom
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>

Re: code freeze and branch cut for Apache Spark 2.4

Reply via email to