Re: Spark 3.0 branch cut and code freeze on Jan 31?

Reynold Xin Sat, 01 Feb 2020 18:16:13 -0800

Note that branch-3.0 was cut. Please focus on testing, polish, and let's get 
the release out!


On Wed, Jan 29, 2020 at 3:41 PM, Reynold Xin < r...@databricks.com > wrote:

> 
> Just a reminder - code freeze is coming this Fri !
> 
> 
> 
> There can always be exceptions, but those should be exceptions and
> discussed on a case by case basis rather than becoming the norm.
> 
> 
> 
> 
> 
> 
> On Tue, Dec 24, 2019 at 4:55 PM, Jungtaek Lim < kabhwan. opensource@ gmail.
> com ( kabhwan.opensou...@gmail.com ) > wrote:
> 
>> Jan 31 sounds good to me.
>> 
>> 
>> Just curious, do we allow some exception on code freeze? One thing came
>> into my mind is that some feature could have multiple subtasks and part of
>> subtasks have been merged and other subtask(s) are in reviewing. In this
>> case do we allow these subtasks to have more days to get reviewed and
>> merged later?
>> 
>> 
>> Happy Holiday!
>> 
>> 
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>> 
>> On Wed, Dec 25, 2019 at 8:36 AM Takeshi Yamamuro < linguin. m. s@ gmail. com
>> ( linguin....@gmail.com ) > wrote:
>> 
>> 
>>> Looks nice, happy holiday, all!
>>> 
>>> 
>>> Bests,
>>> Takeshi
>>> 
>>> On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun < dongjoon. hyun@ gmail. com
>>> ( dongjoon.h...@gmail.com ) > wrote:
>>> 
>>> 
>>>> +1 for January 31st.
>>>> 
>>>> 
>>>> Bests,
>>>> Dongjoon.
>>>> 
>>>> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li < lixiao@ databricks. com (
>>>> lix...@databricks.com ) > wrote:
>>>> 
>>>> 
>>>>> Jan 31 is pretty reasonable. Happy Holidays! 
>>>>> 
>>>>> 
>>>>> Xiao
>>>>> 
>>>>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen < srowen@ gmail. com (
>>>>> sro...@gmail.com ) > wrote:
>>>>> 
>>>>> 
>>>>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all 
>>>>>> arbitrary
>>>>>> but indeed this has been in progress for a while, and there's a downside
>>>>>> to not releasing it, to making the gap to 3.0 larger. 
>>>>>> On my end I don't know of anything that's holding up a release; is it
>>>>>> basically DSv2?
>>>>>> 
>>>>>> BTW these are the items still targeted to 3.0.0, some of which may not
>>>>>> have been legitimately tagged. It may be worth reviewing what's still 
>>>>>> open
>>>>>> and necessary, and what should be untargeted.
>>>>>> 
>>>>>> 
>>>>>> SPARK-29768 nondeterministic expression fails column pruning
>>>>>> SPARK-29345 Add an API that allows a user to define and observe arbitrary
>>>>>> metrics on streaming queries
>>>>>> SPARK-29348 Add observable metrics
>>>>>> SPARK-29429 Support Prometheus monitoring natively
>>>>>> SPARK-29577 Implement p-value simulation and unit tests for chi2 test
>>>>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>>>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>>>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>>>>> SPARK-28588 Build a SQL reference doc
>>>>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>>>>> SPARK-28684 Hive module support JDK 11
>>>>>> SPARK-28548 explain() shows wrong result for persisted DataFrames after
>>>>>> some operations
>>>>>> SPARK-28264 Revisiting Python / pandas UDF
>>>>>> SPARK-28301 fix the behavior of table name resolution with multi-catalog
>>>>>> SPARK-28155 do not leak SaveMode to file source v2
>>>>>> SPARK-28103 Cannot infer filters from union table with empty local
>>>>>> relation table properly
>>>>>> SPARK-27986 Support Aggregate Expressions with filter
>>>>>> SPARK-28024 Incorrect numeric values when out of range
>>>>>> SPARK-27936 Support local dependency uploading from --py-files
>>>>>> SPARK-27780 Shuffle server & client should be versioned to enable 
>>>>>> smoother
>>>>>> upgrade
>>>>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the # of
>>>>>> joined tables > 12
>>>>>> SPARK-27471 Reorganize public v2 catalog API
>>>>>> SPARK-27520 Introduce a global config system to replace
>>>>>> hadoopConfiguration
>>>>>> SPARK-24625 put all the backward compatible behavior change configs under
>>>>>> spark.sql.legacy.*
>>>>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>>>>> SPARK-25017 Add test suite for ContextBarrierState
>>>>>> SPARK-25083 remove the type erasure hack in data source scan
>>>>>> SPARK-25383 Image data source supports sample pushdown
>>>>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures by
>>>>>> default
>>>>>> SPARK-27296 Efficient User Defined Aggregators
>>>>>> SPARK-25128 multiple simultaneous job submissions against k8s backend
>>>>>> cause driver pods to hang
>>>>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>>>>> SPARK-21559 Remove Mesos fine-grained mode
>>>>>> SPARK-24942 Improve cluster resource management with jobs containing
>>>>>> barrier stage
>>>>>> SPARK-25914 Separate projection from grouping and aggregate in logical
>>>>>> Aggregate
>>>>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard
>>>>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>>>>> SPARK-26425 Add more constraint checks in file streaming source to avoid
>>>>>> checkpoint corruption
>>>>>> SPARK-25843 Redesign rangeBetween API
>>>>>> SPARK-25841 Redesign window function rangeBetween API
>>>>>> SPARK-25752 Add trait to easily whitelist logical operators that produce
>>>>>> named output from CleanupAliases
>>>>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window
>>>>>> aggregate
>>>>>> SPARK-25531 new write APIs for data source v2
>>>>>> SPARK-25547 Pluggable jdbc connection factory
>>>>>> SPARK-20845 Support specification of column names in INSERT INTO
>>>>>> SPARK-24724 Discuss necessary info and access in barrier mode + 
>>>>>> Kubernetes
>>>>>> 
>>>>>> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos
>>>>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>>>>> MesosFineGrainedSchedulerBackend
>>>>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>>>>> SPARK-25186 Stabilize Data Source V2 API
>>>>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier
>>>>>> execution mode
>>>>>> SPARK-7768 Make user-defined type (UDT) API public
>>>>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition
>>>>>> Spec
>>>>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>>>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>>>>> SPARK-19842 Informational Referential Integrity Constraints Support in
>>>>>> Spark
>>>>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested list
>>>>>> of structures
>>>>>> SPARK-22386 Data Source V2 improvements
>>>>>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin < rxin@ databricks. com (
>>>>>> r...@databricks.com ) > wrote:
>>>>>> 
>>>>>> 
>>>>>>> We've pushed out 3.0 multiple times. The latest release window 
>>>>>>> documented
>>>>>>> on the website ( http://spark.apache.org/versioning-policy.html ) says
>>>>>>> we'd code freeze and cut branch-3.0 early Dec. It looks like we are
>>>>>>> suffering a bit from the tragedy of the commons, that nobody is pushing
>>>>>>> for getting the release out. I understand the natural tendency for each
>>>>>>> individual is to finish or extend the feature/bug that the person has 
>>>>>>> been
>>>>>>> working on. At some point we need to say "this is it" and get the 
>>>>>>> release
>>>>>>> out. I'm happy to help drive this process.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> To be realistic, I don't think we should just code freeze * today *.
>>>>>>> Although we have updated the website, contributors have all been 
>>>>>>> operating
>>>>>>> under the assumption that all active developments are still going on. I
>>>>>>> propose we *cut the branch on* *Jan 31* *, and code freeze and switch 
>>>>>>> over
>>>>>>> to bug squashing mode, and try to get the 3.0 official release out in 
>>>>>>> Q1*.
>>>>>>> That is, by default no new features can go into the branch starting Jan 
>>>>>>> 31
>>>>>>> .
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> And happy holidays everybody.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Databricks Summit - Watch the talks (
>>>>> https://databricks.com/sparkaisummit/north-america ) 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> ---
>>> Takeshi Yamamuro
>>> 
>> 
>> 
> 
>

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Reply via email to