Re: Spark 3.0 branch cut and code freeze on Jan 31?

Hyukjin Kwon Wed, 05 Feb 2020 00:18:14 -0800

Awesome Shane.

2020년 2월 5일 (수) 오전 7:29, Xiao Li <[email protected]>님이 작성:


> Thank you, Shane!
>
> Xiao
>
> On Tue, Feb 4, 2020 at 2:16 PM Dongjoon Hyun <[email protected]>
> wrote:
>
>> Thank you, Shane! :D
>>
>> Bests,
>> Dongjoon
>>
>> On Tue, Feb 4, 2020 at 13:28 shane knapp ☠ <[email protected]> wrote:
>>
>>> all the 3.0 builds have been created and are currently churning away!
>>>
>>> (the failed builds were to a silly bug in the build scripts sneaking
>>> it's way back in, but that's resolved now)
>>>
>>> shane
>>>
>>> On Sat, Feb 1, 2020 at 6:16 PM Reynold Xin <[email protected]> wrote:
>>>
>>>> Note that branch-3.0 was cut. Please focus on testing, polish, and
>>>> let's get the release out!
>>>>
>>>>
>>>> On Wed, Jan 29, 2020 at 3:41 PM, Reynold Xin <[email protected]>
>>>> wrote:
>>>>
>>>>> Just a reminder - code freeze is coming this Fri!
>>>>>
>>>>> There can always be exceptions, but those should be exceptions and
>>>>> discussed on a case by case basis rather than becoming the norm.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Jan 31 sounds good to me.
>>>>>>
>>>>>> Just curious, do we allow some exception on code freeze? One thing
>>>>>> came into my mind is that some feature could have multiple subtasks and
>>>>>> part of subtasks have been merged and other subtask(s) are in reviewing. 
>>>>>> In
>>>>>> this case do we allow these subtasks to have more days to get reviewed 
>>>>>> and
>>>>>> merged later?
>>>>>>
>>>>>> Happy Holiday!
>>>>>>
>>>>>> Thanks,
>>>>>> Jungtaek Lim (HeartSaVioR)
>>>>>>
>>>>>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Looks nice, happy holiday, all!
>>>>>>>
>>>>>>> Bests,
>>>>>>> Takeshi
>>>>>>>
>>>>>>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> +1 for January 31st.
>>>>>>>>
>>>>>>>> Bests,
>>>>>>>> Dongjoon.
>>>>>>>>
>>>>>>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Jan 31 is pretty reasonable. Happy Holidays!
>>>>>>>>>
>>>>>>>>> Xiao
>>>>>>>>>
>>>>>>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>>>>>>>>>> arbitrary but indeed this has been in progress for a while, and 
>>>>>>>>>> there's a
>>>>>>>>>> downside to not releasing it, to making the gap to 3.0 larger.
>>>>>>>>>> On my end I don't know of anything that's holding up a release;
>>>>>>>>>> is it basically DSv2?
>>>>>>>>>>
>>>>>>>>>> BTW these are the items still targeted to 3.0.0, some of which
>>>>>>>>>> may not have been legitimately tagged. It may be worth reviewing 
>>>>>>>>>> what's
>>>>>>>>>> still open and necessary, and what should be untargeted.
>>>>>>>>>>
>>>>>>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>>>>>>> SPARK-29345 Add an API that allows a user to define and observe
>>>>>>>>>> arbitrary metrics on streaming queries
>>>>>>>>>> SPARK-29348 Add observable metrics
>>>>>>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>>>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2
>>>>>>>>>> test
>>>>>>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>>>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>>>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>>>>>>> SPARK-28588 Build a SQL reference doc
>>>>>>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>>>>>>> SPARK-28684 Hive module support JDK 11
>>>>>>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames
>>>>>>>>>> after some operations
>>>>>>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>>>>>>> SPARK-28301 fix the behavior of table name resolution with
>>>>>>>>>> multi-catalog
>>>>>>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>>>>>>> SPARK-28103 Cannot infer filters from union table with empty
>>>>>>>>>> local relation table properly
>>>>>>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>>>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>>>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>>>>>>> SPARK-27780 Shuffle server & client should be versioned to enable
>>>>>>>>>> smoother upgrade
>>>>>>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when
>>>>>>>>>> the # of joined tables > 12
>>>>>>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>>>>>>> SPARK-27520 Introduce a global config system to replace
>>>>>>>>>> hadoopConfiguration
>>>>>>>>>> SPARK-24625 put all the backward compatible behavior change
>>>>>>>>>> configs under spark.sql.legacy.*
>>>>>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>>>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch
>>>>>>>>>> failures by default
>>>>>>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s
>>>>>>>>>> backend cause driver pods to hang
>>>>>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>>>>>>> SPARK-24942 Improve cluster resource management with jobs
>>>>>>>>>> containing barrier stage
>>>>>>>>>> SPARK-25914 Separate projection from grouping and aggregate in
>>>>>>>>>> logical Aggregate
>>>>>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL
>>>>>>>>>> standard
>>>>>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>>>>>>> SPARK-26425 Add more constraint checks in file streaming source
>>>>>>>>>> to avoid checkpoint corruption
>>>>>>>>>> SPARK-25843 Redesign rangeBetween API
>>>>>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that
>>>>>>>>>> produce named output from CleanupAliases
>>>>>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and
>>>>>>>>>> window aggregate
>>>>>>>>>> SPARK-25531 new write APIs for data source v2
>>>>>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>>>>>>>>> Kubernetes
>>>>>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode +
>>>>>>>>>> Mesos
>>>>>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>>>>>>> MesosFineGrainedSchedulerBackend
>>>>>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for
>>>>>>>>>> barrier execution mode
>>>>>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based
>>>>>>>>>> Partition Spec
>>>>>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>>>>>>> SPARK-19842 Informational Referential Integrity Constraints
>>>>>>>>>> Support in Spark
>>>>>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in
>>>>>>>>>> nested list of structures
>>>>>>>>>> SPARK-22386 Data Source V2 improvements
>>>>>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode +
>>>>>>>>>> YARN
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> We've pushed out 3.0 multiple times. The latest release window
>>>>>>>>>>> documented on the website
>>>>>>>>>>> <http://spark.apache.org/versioning-policy.html> says we'd code
>>>>>>>>>>> freeze and cut branch-3.0 early Dec. It looks like we are suffering 
>>>>>>>>>>> a bit
>>>>>>>>>>> from the tragedy of the commons, that nobody is pushing for getting 
>>>>>>>>>>> the
>>>>>>>>>>> release out. I understand the natural tendency for each individual 
>>>>>>>>>>> is to
>>>>>>>>>>> finish or extend the feature/bug that the person has been working 
>>>>>>>>>>> on. At
>>>>>>>>>>> some point we need to say "this is it" and get the release out. I'm 
>>>>>>>>>>> happy
>>>>>>>>>>> to help drive this process.
>>>>>>>>>>>
>>>>>>>>>>> To be realistic, I don't think we should just code freeze *today*.
>>>>>>>>>>> Although we have updated the website, contributors have all been 
>>>>>>>>>>> operating
>>>>>>>>>>> under the assumption that all active developments are still going 
>>>>>>>>>>> on. I
>>>>>>>>>>> propose we *cut the branch on **Jan 31**, and code freeze and
>>>>>>>>>>> switch over to bug squashing mode, and try to get the 3.0 official 
>>>>>>>>>>> release
>>>>>>>>>>> out in Q1*. That is, by default no new features can go into the
>>>>>>>>>>> branch starting Jan 31.
>>>>>>>>>>>
>>>>>>>>>>> What do you think?
>>>>>>>>>>>
>>>>>>>>>>> And happy holidays everybody.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> [image: Databricks Summit - Watch the talks]
>>>>>>>>> <https://databricks.com/sparkaisummit/north-america>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> ---
>>>>>>> Takeshi Yamamuro
>>>>>>>
>>>>>>
>>>>
>>>
>>> --
>>> Shane Knapp
>>> Computer Guy / Voice of Reason
>>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>>> https://rise.cs.berkeley.edu
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe e-mail: [email protected]
>>
>> --
> <https://databricks.com/sparkaisummit/north-america>
>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Reply via email to