Re: Spark 3.0 branch cut and code freeze on Jan 31?

Takeshi Yamamuro Tue, 24 Dec 2019 15:37:22 -0800

Looks nice, happy holiday, all!

Bests,
Takeshi


On Wed, Dec 25, 2019 at 3:56 AM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> +1 for January 31st.
>
> Bests,
> Dongjoon.
>
> On Tue, Dec 24, 2019 at 7:11 AM Xiao Li <lix...@databricks.com> wrote:
>
>> Jan 31 is pretty reasonable. Happy Holidays!
>>
>> Xiao
>>
>> On Tue, Dec 24, 2019 at 5:52 AM Sean Owen <sro...@gmail.com> wrote:
>>
>>> Yep, always happens. Is earlier realistic, like Jan 15? it's all
>>> arbitrary but indeed this has been in progress for a while, and there's a
>>> downside to not releasing it, to making the gap to 3.0 larger.
>>> On my end I don't know of anything that's holding up a release; is it
>>> basically DSv2?
>>>
>>> BTW these are the items still targeted to 3.0.0, some of which may not
>>> have been legitimately tagged. It may be worth reviewing what's still open
>>> and necessary, and what should be untargeted.
>>>
>>> SPARK-29768 nondeterministic expression fails column pruning
>>> SPARK-29345 Add an API that allows a user to define and observe
>>> arbitrary metrics on streaming queries
>>> SPARK-29348 Add observable metrics
>>> SPARK-29429 Support Prometheus monitoring natively
>>> SPARK-29577 Implement p-value simulation and unit tests for chi2 test
>>> SPARK-28900 Test Pyspark, SparkR on JDK 11 with run-tests
>>> SPARK-28883 Fix a flaky test: ThriftServerQueryTestSuite
>>> SPARK-28717 Update SQL ALTER TABLE RENAME  to use TableCatalog API
>>> SPARK-28588 Build a SQL reference doc
>>> SPARK-28629 Capture the missing rules in HiveSessionStateBuilder
>>> SPARK-28684 Hive module support JDK 11
>>> SPARK-28548 explain() shows wrong result for persisted DataFrames after
>>> some operations
>>> SPARK-28264 Revisiting Python / pandas UDF
>>> SPARK-28301 fix the behavior of table name resolution with multi-catalog
>>> SPARK-28155 do not leak SaveMode to file source v2
>>> SPARK-28103 Cannot infer filters from union table with empty local
>>> relation table properly
>>> SPARK-27986 Support Aggregate Expressions with filter
>>> SPARK-28024 Incorrect numeric values when out of range
>>> SPARK-27936 Support local dependency uploading from --py-files
>>> SPARK-27780 Shuffle server & client should be versioned to enable
>>> smoother upgrade
>>> SPARK-27714 Support Join Reorder based on Genetic Algorithm when the #
>>> of joined tables > 12
>>> SPARK-27471 Reorganize public v2 catalog API
>>> SPARK-27520 Introduce a global config system to replace
>>> hadoopConfiguration
>>> SPARK-24625 put all the backward compatible behavior change configs
>>> under spark.sql.legacy.*
>>> SPARK-24941 Add RDDBarrier.coalesce() function
>>> SPARK-25017 Add test suite for ContextBarrierState
>>> SPARK-25083 remove the type erasure hack in data source scan
>>> SPARK-25383 Image data source supports sample pushdown
>>> SPARK-27272 Enable blacklisting of node/executor on fetch failures by
>>> default
>>> SPARK-27296 Efficient User Defined Aggregators
>>> SPARK-25128 multiple simultaneous job submissions against k8s backend
>>> cause driver pods to hang
>>> SPARK-26664 Make DecimalType's minimum adjusted scale configurable
>>> SPARK-21559 Remove Mesos fine-grained mode
>>> SPARK-24942 Improve cluster resource management with jobs containing
>>> barrier stage
>>> SPARK-25914 Separate projection from grouping and aggregate in logical
>>> Aggregate
>>> SPARK-20964 Make some keywords reserved along with the ANSI/SQL standard
>>> SPARK-26221 Improve Spark SQL instrumentation and metrics
>>> SPARK-26425 Add more constraint checks in file streaming source to avoid
>>> checkpoint corruption
>>> SPARK-25843 Redesign rangeBetween API
>>> SPARK-25841 Redesign window function rangeBetween API
>>> SPARK-25752 Add trait to easily whitelist logical operators that produce
>>> named output from CleanupAliases
>>> SPARK-25640 Clarify/Improve EvalType for grouped aggregate and window
>>> aggregate
>>> SPARK-25531 new write APIs for data source v2
>>> SPARK-25547 Pluggable jdbc connection factory
>>> SPARK-20845 Support specification of column names in INSERT INTO
>>> SPARK-24724 Discuss necessary info and access in barrier mode +
>>> Kubernetes
>>> SPARK-24725 Discuss necessary info and access in barrier mode + Mesos
>>> SPARK-25074 Implement maxNumConcurrentTasks() in
>>> MesosFineGrainedSchedulerBackend
>>> SPARK-23710 Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
>>> SPARK-25186 Stabilize Data Source V2 API
>>> SPARK-25376 Scenarios we should handle but missed in 2.4 for barrier
>>> execution mode
>>> SPARK-7768 Make user-defined type (UDT) API public
>>> SPARK-14922 Alter Table Drop Partition Using Predicate-based Partition
>>> Spec
>>> SPARK-15694 Implement ScriptTransformation in sql/core
>>> SPARK-18134 SQL: MapType in Group BY and Joins not working
>>> SPARK-19842 Informational Referential Integrity Constraints Support in
>>> Spark
>>> SPARK-22231 Support of map, filter, withColumn, dropColumn in nested
>>> list of structures
>>> SPARK-22386 Data Source V2 improvements
>>> SPARK-24723 Discuss necessary info and access in barrier mode + YARN
>>>
>>>
>>> On Mon, Dec 23, 2019 at 5:48 PM Reynold Xin <r...@databricks.com> wrote:
>>>
>>>> We've pushed out 3.0 multiple times. The latest release window
>>>> documented on the website
>>>> <http://spark.apache.org/versioning-policy.html> says we'd code freeze
>>>> and cut branch-3.0 early Dec. It looks like we are suffering a bit from the
>>>> tragedy of the commons, that nobody is pushing for getting the release out.
>>>> I understand the natural tendency for each individual is to finish or
>>>> extend the feature/bug that the person has been working on. At some point
>>>> we need to say "this is it" and get the release out. I'm happy to help
>>>> drive this process.
>>>>
>>>> To be realistic, I don't think we should just code freeze *today*.
>>>> Although we have updated the website, contributors have all been operating
>>>> under the assumption that all active developments are still going on. I
>>>> propose we *cut the branch on **Jan 31**, and code freeze and switch
>>>> over to bug squashing mode, and try to get the 3.0 official release out in
>>>> Q1*. That is, by default no new features can go into the branch
>>>> starting Jan 31.
>>>>
>>>> What do you think?
>>>>
>>>> And happy holidays everybody.
>>>>
>>>>
>>>>
>>>>
>>
>> --
>> [image: Databricks Summit - Watch the talks]
>> <https://databricks.com/sparkaisummit/north-america>
>>
>

-- 
---
Takeshi Yamamuro

Re: Spark 3.0 branch cut and code freeze on Jan 31?

Reply via email to