xy2953396112 commented on code in PR #110:
URL: https://github.com/apache/celeborn-website/pull/110#discussion_r2186691747


##########
docs/community/release_notes/release_note_0.6.0.md:
##########
@@ -0,0 +1,483 @@
+---
+hide:
+  - navigation
+
+license: |
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      https://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+---
+
+# Apache Celeborn™ 0.6.0 Release Notes
+
+## Highlight
+
+- Support Flink hybrid shuffle integration with Apache Celeborn
+- Support Spark 4.0.0
+- Support HARD_SPLIT in PushMergedData
+- Optimize skew partition logic for Reduce Mode to avoid sorting shuffle files
+- Refine the celeborn RESTful APIs and introduce Celeborn CLI for automation
+- Supporting worker Tags
+- CongestionController supports control traffic by user/worker traffic speed
+- Support read shuffle from S3
+- Support CppClient in Celeborn
+- HELM Chart Optimization
+
+### Improvement
+
+- [CELEBORN-2045] Add logger sinks to allow persist metrics data and avoid 
possible worker OOM
+- [CELEBORN-2046] Specify extractionDir of AsyncProfilerLoader with 
celeborn.worker.jvmProfiler.localDir
+- [CELEBORN-2027] Allow CelebornShuffleReader to decompress data on demand
+- [CELEBORN-2003] Add retry mechanism when completing S3 multipart upload
+- [CELEBORN-2018] Support min number of workers selected for shuffle
+- [CELEBORN-2006] LifecycleManager should avoid parsing shufflePartitionType 
every time
+- [CELEBORN-2007] Reduce PartitionLocation memory usage
+- [CELEBORN-2008] SlotsAllocator should select disks randomly in RoundRobin 
mode
+- [CELEBORN-1896] delete data from failed to fetch shuffles
+- [CELEBORN-2004] Filter empty partition before createIntputStream
+- [CELEBORN-2002][MASTER] Audit shuffle lifecycle in separate log file
+- [CELEBORN-1993] CelebornConf introduces celeborn.<module>.io.threads to 
specify number of threads used in the client thread pool
+- [CELEBORN-1995] Optimize memory usage for push failed batches
+- [CELEBORN-1994] Introduce disruptor dependency to support asynchronous 
logging of log4j2
+- [CELEBORN-1965] Rely on all default hadoop providers for S3 auth
+- [CELEBORN-1982] Slot Selection Perf Improvements
+- [CELEBORN-1961] Convert Resource.proto from Protocol Buffers version 2 to 
version 3
+- [CELEBORN-1931] use gather API for local flusher to optimize write io pattern
+- [CELEBORN-1916] Support Aliyun OSS Based on MPU Extension Interface
+- [CELEBORN-1925] Support Flink 2.0
+- [CELEBORN-1844][CIP-8] introduce tier writer proxy and simplify partition 
data writer
+- [CELEBORN-1921] Broadcast large GetReducerFileGroupResponse to prevent Spark 
driver network exhausted
+- [CELEBORN-1928][CIP-12] Support HARD_SPLIT in PushMergedData should support 
handle older worker success response
+- [CELEBORN-1930][CIP-12] Support HARD_SPLIT in PushMergedData should handle 
congestion control NPE issue
+- [CELEBORN-1577][PHASE2] QuotaManager should support interrupt shuffle
+- [CELEBORN-1856] Support stage-rerun when read partition by chunkOffsets when 
enable optimize skew partition read
+- [CELEBORN-1894] Allow skipping already read chunks during unreplicated 
shuffle read retried
+- [CELEBORN-1909] Support pre-run static code blocks of TransportMessages to 
improve performance of protobuf serialization
+- [CELEBORN-1911] Move multipart-uploader to 
multipart-uploader/multipart-uploader-s3 for extensibility
+- [CELEBORN-1910] Remove redundant synchronized of isTerminated in 
ThreadUtils#sameThreadExecutorService
+- [CELEBORN-1898] SparkOutOfMemoryError compatible with Spark 4.0 and 4.1
+- [CELEBORN-1897] Avoid calling toString for too long messages
+- [CELEBORN-1882] Support configuring the SSL handshake timeout for SSLHandler
+- [CELEBORN-1883] Replace HashSet with ConcurrentHashMap.newKeySet for 
ShuffleFileGroups
+- [CELEBORN-1858] Support DfsPartitionReader read partition by chunkOffsets 
when enable optimize skew partition read
+- [CELEBORN-1879] Ignore invalid chunk range generated by 
splitSkewedPartitionLocations
+- [CELEBORN-1857] Support LocalPartitionReader read partition by chunkOffsets 
when enable optimize skew partition read
+- [CELEBORN-1876] Log remote address on RPC exception for 
TransportRequestHandler
+- [CELEBORN-1865] Update master endpointRef when master leader is abnormal
+- [CELEBORN-1319] Optimize skew partition logic for Reduce Mode to avoid 
sorting shuffle files
+- [CELEBORN-1861] Support celeborn.worker.storage.baseDir.diskType option to 
specify disk type of base directory for worker
+- [CELEBORN-1757] Add retry when sending RPC to LifecycleManager
+- [CELEBORN-1859] DfsPartitionReader and LocalPartitionReader should reuse 
pbStreamHandlers get from BatchOpenStream request
+- [CELEBORN-1843] Optimize roundrobin for more balanced disk slot allocation
+- [CELEBORN-1854] Change receive revive request log level to debug
+- [CELEBORN-1841] Support custom implementation of EventExecutorChooser to 
avoid deadlock when calling await in EventLoop thread
+- [CELEBORN-1847][CIP-8] Introduce local and DFS tier writer
+- [CELEBORN-1850] Setup worker endpoint after initalizing controller
+- [CELEBORN-1838] Interrupt spark task should not report fetch failure
+- [CELEBORN-1835][CIP-8] Add tier writer base and memory tier writer
+- [CELEBORN-1829] Replace waitThreadPoll's thread pool with 
ScheduledExecutorService in Controller
+- [CELEBORN-1720] Prevent stage re-run if another task attempt is running or 
successful
+- [CELEBORN-1482][CIP-8] Add partition meta handler
+- [CELEBORN-1812] Distinguish sorting-file from sort-tasks waiting to be 
submitted
+- [CELEBORN-1737] Support build tez client package
+- [CELEBORN-1802] Fail the celeborn master/worker start if CELEBORN_CONF_DIR 
is not directory
+- [CELEBORN-1701][FOLLOWUP] Support stage rerun for shuffle data lost
+- [CELEBORN-1413] Support Spark 4.0
+- [CELEBORN-1753] Optimize the code for `exists` and `find` method
+- [CELEBORN-1777] Add `java.security.jgss/sun.security.krb5` to 
DEFAULT_MODULE_OPTIONS
+- [CELEBORN-1731] Support merged kv input for Tez
+- [CELEBORN-1732] Support unordered kv input for Tez
+- [CELEBORN-1733] Support ordered grouped kv input for Tez
+- [CELEBORN-1721][CIP-12] Support HARD_SPLIT in PushMergedData
+- [CELEBORN-1758] Remove the empty user resource consumption from worker 
heartbeat
+- [CELEBORN-1729] Support ordered KV output for Tez
+- [CELEBORN-1730] Support unordered KV output for Tez
+- [CELEBORN-1612] Add a basic reader writer class to Tez
+- [CELEBORN-1725] Optimize performance of handling `MapperEnd` RPC in 
`LifecycleManager`
+- [CELEBORN-1700] Flink supports fallback to vanilla Flink built-in shuffle 
implementation
+- [CELEBORN-1621][CIP-11] Predefined worker tags expr via dynamic configs
+- [CELEBORN-1545] Add Tez plugin skeleton and dag app master
+- [CELEBORN-1530] support MPU for S3
+- [CELEBORN-1618][CIP-11] Supporting tags via DB Config Service
+- [CELEBORN-1726] Update WorkerInfo when transition worker state
+- [CELEBORN-1715] RemoteShuffleMaster should check 
celeborn.client.push.replicate.enabled in constructor
+- [CELEBORN-1714] Optimize handleApplicationLost
+- [CELEBORN-1660] Cache available workers and only count the available workers 
device free capacity
+- [CELEBORN-1619][CIP-11] Integrate tags manager with config service
+- [CELEBORN-1071] Support stage rerun for shuffle data lost
+- [CELEBORN-1697] Improve ThreadStackTrace for thread dump
+- [CELEBORN-1671] CelebornShuffleReader will try replica if create client 
failed
+- [CELEBORN-1682] Add java tools.jar into classpath for JVM quake
+- [CELEBORN-1660] Using map for workers to find worker fast
+- [CELEBORN-1673] Support retry create client
+- [CELEBORN-1577][PHASE1] Storage quota should support interrupt shuffle
+- [CELEBORN-1642][CIP-11] Support multiple worker tags
+- [CELEBORN-1636] Client supports dynamic update of Worker resources on the 
server
+- [CELEBORN-1651] Support ratio threshold of unhealthy disks for excluding 
worker
+- [CELEBORN-1601] Support revise lost shuffles
+- [CELEBORN-1487][PHASE2] CongestionController support dynamic config
+- [CELEBORN-1648] Refine AppUniqueId with UUID suffix
+- [CELEBORN-1490][CIP-6] Introduce tier consumer for hybrid shuffle
+- [CELEBORN-1645] Introduce ShuffleFallbackPolicy to support custom 
implementation of shuffle fallback policy for 
CelebornShuffleFallbackPolicyRunner
+- [CELEBORN-1620][CIP-11] Support passing worker tags via RequestSlots message
+- [CELEBORN-1487][PHASE1] CongestionController support control traffic by 
user/worker traffic speed
+- [CELEBORN-1637] Enhance config to bypass memory check for partition file 
sorter
+- [CELEBORN-1638] Improve the slots allocator performance
+- [CELEBORN-1617][CIP-11] Support workers tags in FS Config Service
+- [CELEBORN-1615] Start the http server after all handlers added
+- [CELEBORN-1574] Speed up unregister shuffle by batch processing
+- [CELEBORN-1625] Add parameter `skipCompress` for pushOrMergeData
+- [CELEBORN-1597][CIP-11] Implement TagsManager
+- [CELEBORN-1490][CIP-6] Impl worker write process for Flink Hybrid Shuffle
+- [CELEBORN-1602] do hard split for push merged data RPC with disk full
+- [CELEBORN-1594] Refine dynamicConfig template and prevent NPE
+- [CELEBORN-1513] Support wildcard bind in dual stack environments
+- [CELEBORN-1587] Change to debug logging on client side for SortBasedPusher 
trigger push
+- [CELEBORN-1496] Differentiate map results with only different stageAttemptId
+- [CELEBORN-1583] MasterClient#sendMessageInner should throw Throwable for 
celeborn.masterClient.maxRetries is 0
+- [CELEBORN-1578] Make Worker#timer have thread name and daemon
+- [CELEBORN-1580] ReadBufferDispacther should notify exception to listener
+- [CELEBORN-1575] TimeSlidingHub should remove expire node when reading
+- [CELEBORN-1560] Remove usages of deprecated Files.createTempDir of Guava
+- [CELEBORN-1567] Support throw FetchFailedException when Data corruption 
detected
+- [CELEBORN-1568] Support worker retries in MiniCluster
+- [CELEBORN-1563] Log networkLocation in WorkerInfo
+- [CELEBORN-1531] Refactor self checks in master
+- [CELEBORN-1550] Add support of providing custom dynamic store backend 
implementation
+- [CELEBORN-1529] Read shuffle data from S3
+- [CELEBORN-1518] Add support for Apache Spark barrier stages
+- [CELEBORN-1542] Master supports to check the worker host pattern on worker 
registration
+- [CELEBORN-1535] Support to disable master workerUnavailableInfo expiration
+- [CELEBORN-1511] Add support for custom master endpoint resolver
+- [CELEBORN-1541] Enhance the readable address for internal port
+- [CELEBORN-1533] Log location when CelebornInputStream#fillBuffer fails
+- [CELEBORN-1524] Support IPv6 hostnames for Apache Ratis
+- [CELEBORN-1519] Do not update estimated partition size if it is unchanged
+- [CELEBORN-1521] Introduce celeborn-spi module for authentication extensions
+- [CELEBORN-1469] Support writing shuffle data to OSS(S3 only)
+- [CELEBORN-1446] Enable chunk prefetch when initialize CelebornInputStream
+- [CELEBORN-1509] Reply response without holding a lock
+- [CELEBORN-1543] Support Flink 1.20
+- [CELEBORN-1504] Support for Apache Flink 1.16
+- [CELEBORN-1500] Filter out empty InputStreams
+- [CELEBORN-1495] CelebornColumnDictionary supports dictionary of float and 
double column type
+- [CELEBORN-1494] Support IPv6 addresses in 
PbSerDeUtils.fromPackedPartitionLocations
+- [CELEBORN-1483] Add storage policy
+- [CELEBORN-1479] Report register shuffle failed reason in exception
+- [CELEBORN-1489] Update Flink support with authentication support
+- [CELEBORN-1485] Refactor addCounter, addGauge and addTimer of AbstractSource 
to reduce CPU utilization
+- [CELEBORN-1472] Reduce CongestionController#userBufferStatuses call times
+- [CELEBORN-1471] CelebornScalaObjectMapper supports configuring 
FAIL_ON_UNKNOWN_PROPERTIES to false
+
+### Flink hybrid shuffle
+- [CELEBORN-1867][FLINK] Fix flink client memory leak of 
TransportResponseHandler#outstandingRpcs for handling addCredit and 
notifyRequiredSegment response
+- [CELEBORN-1866][FLINK] Fix CelebornChannelBufferReader request more buffers 
than needed
+- [CELEBORN-1490][CIP-6] Support process large buffer in flink hybrid shuffle
+- [CELEBORN-1490][CIP-6] Add Flink hybrid shuffle doc
+- [CELEBORN-1490][CIP-6] Impl worker read process in Flink Hybrid Shuffle
+- [CELEBORN-1490][CIP-6] Introduce tier producer in celeborn flink client
+- [CELEBORN-1490][CIP-6] Introduce tier factory and master agent in flink 
hybrid shuffle
+- [CELEBORN-1490][CIP-6] Enrich register shuffle method
+- [CELEBORN-1490][CIP-6] Extends FileMeta to support hybrid shuffle
+- [CELEBORN-1490][CIP-6] Extends message to support hybrid shuffle
+
+### RESTful API and CLI
+- [CELEBORN-1056][FOLLOWUP] Support upsert and delete of dynamic configuration 
management
+- [CELEBORN-2020] Support http authentication for Celeborn CLI
+- [CELEBORN-1875] Support to get workers topology information with RESTful api
+- [CELEBORN-1797] Support to adjust the logger level with RESTful API during 
runtime
+- [CELEBORN-1750] Return struct worker resource consumption information with 
RESTful api
+- [CELEBORN-1707] Audit all RESTful api calls and use separate restAuditFile
+- [CELEBORN-1632] Support to apply ratis local raft_meta_conf command with 
RESTful api
+- [CELEBORN-1599] Container Info REST API
+- [CELEBORN-1630] Support to apply ratis peer operation with RESTful api
+- [CELEBORN-1631] Support to apply ratis snapshot operation with RESTful api
+- [CELEBORN-1633] Return more raft group information
+- [CELEBORN-1629] Support to apply ratis election operation with RESTful api
+- [CELEBORN-1609] Support SSL for celeborn RESTful service
+- [CELEBORN-1608] Reduce redundant response for RESTful api
+- [CELEBORN-1607] Enable `useEnumCaseInsensitive` for openapi-generator
+- [CELEBORN-1589] Ensure master is leader for some POST request APIs
+- [CELEBORN-1572] Celeborn CLI initial REST API support
+- [CELEBORN-1553] Using the request base url as swagger server
+- [CELEBORN-1537] Support to remove workers unavailable info with RESTful api
+- [CELEBORN-1546] Support authorization on swagger UI
+- [CELEBORN-1477] Using openapi-generator `apache-httpclient` library instead 
of `jersey2`
+- [CELEBORN-1493] Check admin privileges for http mutative requests
+- [CELEBORN-1477][CIP-9] Refine the celeborn RESTful APIs
+- [CELEBORN-1476] Enhance the RESTful response error msg
+- [CELEBORN-1475] Fix unknownExcludedWorkers filter for /exclude request
+- [CELEBORN-1318] Support celeborn http authentication
+
+### Monitoring
+- [CELEBORN-2024] Publish commit files fail count metrics
+- [CELEBORN-1892] Adding register with master fail count metric for worker
+- [CELEBORN-2005] Introduce numBytesIn, numBytesOut, numBytesInPerSecond, 
numBytesOutPerSecond metrics for RemoteShuffleServiceFactory
+- [CELEBORN-1800] Introduce ApplicationTotalCount and ApplicationFallbackCount 
metric to record the total and fallback count of application
+- [CELEBORN-1974] ApplicationId as metrics label should be behind a config flag
+- [CELEBORN-1968] Publish metric for unreleased partition location count when 
worker was gracefully shutdown
+- [CELEBORN-1977] Add help/type on prometheus exposed metrics
+- [CELEBORN-1831] Add ratis commitIndex metrics
+- [CELEBORN-1817] add committed file size metrics
+- [CELEBORN-1791] All NettyMemoryMetrics should register to source
+- [CELEBORN-1804] Shuffle environment metrics of RemoteShuffleEnvironment 
should use Shuffle.Remote metric group
+- [CELEBORN-1634][FOLLOWUP] Add rpc metrics into grafana dashboard
+- [CELEBORN-1766] Add detail metrics about fetch chunk
+- [CELEBORN-1756] Only gauge hdfs metrics if HDFS storage enabled to reduce 
metrics
+- [CELEBORN-1634] Support queue time/processing time metrics for rpc framework
+- [CELEBORN-1706] Use bytes(IEC) unit instead of bytes(SI) for size related 
metrics in prometheus dashboard
+- [CELEBORN-1685] ShuffleFallbackPolicy supports ShuffleFallbackCount metric
+- [CELEBORN-1680] Introduce ShuffleFallbackCount metrics
+- [CELEBORN-1444][FOLLOWUP] Add IsDecommissioningWorker to celeborn dashboard
+- [CELEBORN-1640] NettyMemoryMetrics supports numHeapArenas, numDirectArenas, 
tinyCacheSize, smallCacheSize, normalCacheSize, numThreadLocalCaches and 
chunkSize
+- [CELEBORN-1656] Remove duplicate UserProduceSpeed metrics
+- [CELEBORN-1627] Introduce `instance` variable for celeborn dashboard to 
filter metrics
+- [CELEBORN-1582] Publish metric for unreleased shuffle count when worker was 
decommissioned
+- [CELEBORN-1501] Introduce application dimension resource consumption metrics 
of Worker
+- [CELEBORN-1586] Add available workers Metrics
+- [CELEBORN-1581] Fix incorrect metrics of DeviceCelebornFreeBytes and 
DeviceCelebornTotalBytes
+- [CELEBORN-1491] introduce flusher working queue size metric
+
+### CPP Client
+- [CELEBORN-1978][CIP-14] Add code style checking for cppClient
+- [CELEBORN-1958][CIP-14] Add testsuite to test writing with javaClient and 
reading with cppClient
+- [CELEBORN-1874][CIP-14] Add CICD procedure in github action for cppClient
+- [CELEBORN-1932][CIP-14] Adapt java's serialization to support cpp 
serialization for GetReducerFileGroup/Response
+- [CELEBORN-1915][CIP-14] Add reader's ShuffleClient to cppClient
+- [CELEBORN-1906][CIP-14] Add CelebornInputStream to cppClient
+- [CELEBORN-1881][CIP-14] Add WorkerPartitionReader to cppClient
+- [CELEBORN-1871][CIP-14] Add NettyRpcEndpointRef to cppClient
+- [CELEBORN-1863][CIP-14] Add TransportClient to cppClient
+- [CELEBORN-1845][CIP-14] Add MessageDispatcher to cppClient
+- [CELEBORN-1836][CIP-14] Add Message to cppClient
+- [CELEBORN-1827][CIP-14] Add messageDecoder to cppClient
+- [CELEBORN-1821][CIP-14] Add controlMessages to cppClient
+- [CELEBORN-1819][CIP-14] Refactor cppClient with nested namespace
+- [CELEBORN-1814][CIP-14] Add transportMessage to cppClient
+- [CELEBORN-1809][CIP-14] Add partitionLocation to cppClient
+- [CELEBORN-1799][CIP-14] Add celebornConf to cppClient
+- [CELEBORN-1785][CIP-14] Add baseConf to cppClient
+- [CELEBORN-1772][CIP-14] Add memory module to cppClient
+- [CELEBORN-1761][CIP-14] Add cppProto to cppClient
+- [CELEBORN-1754][CIP-14] Add exceptions and checking utils to cppClient
+- [CELEBORN-1751][CIP-14] Add celebornException utils to cppClient
+- [CELEBORN-1740][CIP-14] Add stackTrace utils to cppClient
+- [CELEBORN-1741][CIP-14] Add processBase utils to cppClient
+- [CELEBORN-1724][CIP-14] Add environment setup tools for CppClient development
+
+### HELM Chart Optimization
+- [CELEBORN-2017][HELM] Add namespace to the metadata
+- [CELEBORN-1528][HELM] Use volume claim template to support various storage 
backend
+- [CELEBORN-1996][HELM] Rename volumes.{master,worker} to 
{master,worker}.volumes and {master.worker}.volumeMounts
+- [CELEBORN-1989][HELM] Split securityContext into master.podSecurityContext 
and worker.podSecurityContext
+- [CELEBORN-1988][HELM] Split hostNetwork into master.hostNetwork and 
worker.hostNetwork
+- [CELEBORN-1987][HELM] Split dnsPolicy into master.dnsPolicy and 
worker.dnsPolicy
+- [CELEBORN-1981][HELM] Rename masterReplicas and workerReplicas to 
master.replicas and worker.replicas
+- [CELEBORN-1985][HELM] Add new values master.envFrom and worker.envFrom
+- [CELEBORN-1986][HELM] Rename priorityClass.{master,worker} to 
{master,worker}.priorityClass
+- [CELEBORN-1980][HELM] Split environments into master.env and worker.env
+- [CELEBORN-1972][HELM] Rename affinity.{master,worker} to 
{master,worker}.affinity
+- [CELEBORN-1951][HELM] Rename resources.{master,worker} to 
{master,worker}.resoruces
+- [CELEBORN-1962][HELM] Split tolerations into master.tolerations and 
worker.tolerations
+- [CELEBORN-1955][HELM] Split nodeSelector into master.nodeSelector and 
worker.nodeSelector
+- [CELEBORN-1953][HELM] Split podAnnotations into master.annotations and 
worker.annotations
+- [CELEBORN-1954][HELM] Add a new value image.registry
+- [CELEBORN-1952][HELM] Define template helpers for master/worker respectively
+- [CELEBORN-1532][HELM] Read log4j2 and metrics configurations from file
+- [CELEBORN-1830] Chart statefulset resources key duplicate
+- [CELEBORN-1788] Add role and roleBinding helm charts
+- [CELEBORN-1786] add serviceAccount helm chart
+- [CELEBORN-1780] Add support for NodePort Service per Master replica
+- [CELEBORN-1552] automatically support prometheus to scrape metrics for helm 
chart
+
+### Stability and Bug Fix
+- [CELEBORN-1721][FOLLOWUP] Return softsplit if there is no hardsplit for 
pushMergeData
+- [CELEBORN-2043] Fix IndexOutOfBoundsException exception in 
getEvictedFileWriter
+- [CELEBORN-2040] Avoid throw FetchFailedException when 
GetReducerFileGroupResponse failed via broadcast
+- [CELEBORN-2042] Fix FetchFailure handling when TaskSetManager is not found
+- [CELEBORN-2033] updateProduceBytes should be called even if 
updateProduceBytes throws exception
+- [CELEBORN-2025] RpcFailure Scala 2.13 serialization is incompatible
+- [CELEBORN-2022] Spark4 Client should package commons-io
+- [CELEBORN-2023] Spark4 Client incompatible with isLocalMaster method
+- [CELEBORN-2009] Commit files request failure should exclude worker in 
LifecycleManager
+- [CELEBORN-2015] Retry IOException failures for RPC requests
+- [CELEBORN-1902] Read client throws PartitionConnectionException
+- [CELEBORN-2000] Ignore the getReducerFileGroup timeout before shuffle stage 
end
+- [CELEBORN-1960] Fix PauseSpentTime only append the interval check time
+- [CELEBORN-1998] RemoteShuffleEnvironment should not register 
InputChannelMetrics repeatedly
+- [CELEBORN-1999] OpenStreamTime should use requestId to record cost time
+- [CELEBORN-1855] LifecycleManager return appshuffleId for non barrier stage 
when fetch fail has been reported
+- [CELEBORN-1948] Fix the issue where replica may lose data when HARD_SPLIT 
occurs during handlePushMergeData
+- [CELEBORN-1912] Client should send heartbeat to worker for processing 
heartbeat to avoid reading idleness of worker which enables heartbeat
+- [CELEBORN-1992] Ensure hadoop FS are not closed by hadoop ShutdownHookManager
+- [CELEBORN-1919] Hardsplit batch tracking should be disabled when pushing 
only a single replica
+- [CELEBORN-1979] Change partition manager should respect the 
`celeborn.storage.availableTypes`
+- [CELEBORN-1983] Fix fetch fail not throw due to reach spark maxTaskFailures
+- [CELEBORN-1976] CommitHandler should use 
celeborn.client.rpc.commitFiles.askTimeout for timeout of doParallelCommitFiles
+- [CELEBORN-1966] Added fix to get userinfo in celeborn
+- [CELEBORN-1969] Remove celeborn.client.shuffle.mapPartition.split.enabled to 
enable shuffle partition split at default for MapPartition
+- [CELEBORN-1970] Use StatusCode.fromValue instead of Utils.toStatusCode
+- [CELEBORN-1646][FOLLOWUP] DeviceMonitor should notifyObserversOnError with 
CRITICAL_ERROR disk status for input/ouput error
+- [CELEBORN-1929] Avoid unnecessary buffer loss to get better buffer 
reusability
+- [CELEBORN-1918] Add batchOpenStream time to fetch wait time
+- [CELEBORN-1947] Reduce log for CelebornShuffleReader sleeping before 
inputStream ready
+- [CELEBORN-1923] Correct Celeborn available slots calculation logic
+- [CELEBORN-1914] incWriteTime when ShuffleWriter invoke pushGiantRecord
+- [CELEBORN-1822] Respond to RegisterShuffle with max epoch PartitionLocation 
to avoid revive
+- [CELEBORN-1899] Fix configuration bug in shuffle s3
+- [CELEBORN-1885] Fix nullptr exceptions in FetchChunk after worker restart
+- [CELEBORN-1889] Fix scala 2.13 complie error
+- [CELEBORN-1846] Fix the StreamHandler usage in fetching chunk when task 
attempt is odd
+- [CELEBORN-1792] MemoryManager resume should use pinnedDirectMemory instead 
of usedDirectMemory
+- [CELEBORN-1832] MapPartitionData should create fixed thread pool with 
registration of ThreadPoolSource
+- [CELEBORN-1820] Failing to write and flush StreamChunk data should be 
counted as FETCH_CHUNK_FAIL
+- [CELEBORN-1818] Fix incorrect timeout exception when waiting on no pending 
writes
+- [CELEBORN-1763] Fix DataPusher be blocked for a long time
+- [CELEBORN-1783] Fix Pending task in commitThreadPool wont be canceled
+- [CELEBORN-1782] Worker in congestion control should be in blacklist to avoid 
impact new shuffle
+- [CELEBORN-1510] Partial task unable to switch to the replica
+- [CELEBORN-1778] Fix commitInfo NPE and add assert in 
LifecycleManagerCommitFilesSuite
+- [CELEBORN-1670] Avoid swallowing InterruptedException in ShuffleClientImpl
+  Revert "- [CELEBORN-1376] Push data failed should always release request 
body"
+- [CELEBORN-1770] FlushNotifier should setException for all Throwables in 
Flusher
+- [CELEBORN-1769] Fix packed partition location cause 
GetReducerFileGroupResponse lose location
+- [CELEBORN-1767] Fix occasional errors in UT when creating workers
+- [CELEBORN-1765] Fix NPE when removeFileInfo in StorageManager
+- [CELEBORN-1760] OOM causes disk buffer unable to be released
+- [CELEBORN-1743] Resolve the metrics data interruption and the job failure 
caused by locked resources
+- [CELEBORN-1759] Fix reserve slots might lost partition location between 0.4 
client and 0.5 server
+- [CELEBORN-1749] Fix incorrect application diskBytesWritten metrics
+- [CELEBORN-1713] RpcTimeoutException should include RPC address in message
+- [CELEBORN-1728] Fix NPE when failing to connect to celeborn worker
+- [CELEBORN-1727] Correct the calculation of worker diskInfo actualUsableSpace
+- [CELEBORN-1718] Fix memory storage file won't hard split when memory file is 
full and worker has no disks
+- [CELEBORN-1705] Fix disk buffer size is negative issue
+- [CELEBORN-1686] Avoid return the same pushTaskQueue
+- [CELEBORN-1696] StorageManager#cleanFile should remove file info
+- [CELEBORN-1691] Fix the issue that upstream tasks don't rerun and the 
current task still retry when failed to decompress in flink
+- [CELEBORN-1693] Fix storageFetcherPool concurrent problem
+- [CELEBORN-1674] Fix reader thread name of MapPartitionData
+- [CELEBORN-1668] Fix NPE when handle closed file writers
+- [CELEBORN-1669] Fix NullPointerException for 
PartitionFilesSorter#updateSortedShuffleFiles after cleaning up expired shuffle 
key
+- [CELEBORN-1667] Fix NPE & LEAK occurring prior to worker registration
+- [CELEBORN-1664] Fix secret fetch failures after LEADER master failover
+- [CELEBORN-1665] CommitHandler should process CommitFilesResponse with 
COMMIT_FILE_EXCEPTION status
+- [CELEBORN-1663] Only register appShuffleDeterminate if stage using celeborn 
for shuffle
+- [CELEBORN-1661] Make sure that the sortedFilesDb is initialized successfully 
when worker enable graceful shutdown
+- [CELEBORN-1662] Handle PUSH_DATA_FAIL_PARTITION_NOT_FOUND in 
getPushDataFailCause
+- [CELEBORN-1655] Fix read buffer dispatcher thread terminate unexpectedly
+- [CELEBORN-1646] Catch exception of Files#getFileStore for DeviceMonitor and 
StorageManager for input/ouput error
+- [CELEBORN-1652] Throw TransportableError for failure of sending 
PbReadAddCredit to avoid flink task get stuck
+- [CELEBORN-1643] DataPusher handle InterruptedException
+- [CELEBORN-1579] Fix the memory leak of result partition
+- [CELEBORN-1564] Fix actualUsableSpace of offerSlotsLoadAware condition on 
diskInfo
+- [CELEBORN-1573] Change to debug logging on client side for reserve slots
+- [CELEBORN-1557] Fix totalSpace of DiskInfo for Master in HA mode
+- [CELEBORN-1549] Fix networkLocation persistence into Ratis
+- [CELEBORN-1558] Fix the incorrect decrement of pendingWrites in 
handlePushMergeData
+- [CELEBORN-1547] Worker#listTopDiskUseApps should return 
celeborn.metrics.app.topDiskUsage.count applications
+- [CELEBORN-1544] ShuffleWriter needs to call close finally to avoid memory 
leaks
+- [CELEBORN-1473] TransportClientFactory should register netty memory metric 
with source for shared pooled ByteBuf allocator
+- [CELEBORN-1522] Fix applicationId extraction from shuffle key
+- [CELEBORN-1516] DynamicConfigServiceFactory should support singleton
+- [CELEBORN-1515] SparkShuffleManager should set lifecycleManager to null 
after stopping lifecycleManager in Spark 2
+- [CELEBORN-1439] Fix revive logic bug which will casue data correctness issue 
and job failiure
+- [CELEBORN-1507] Prevent invalid Filegroups from being used
+- [CELEBORN-1478] Fix wrong use partitionId as shuffleId when readPartition
+
+### Build
+- [CELEBORN-2012] Add license for http5
+- [CELEBORN-1801] Remove out-of-dated flink 1.14 and 1.15
+- [CELEBORN-1746] Reduce the size of aws dependencies
+- [CELEBORN-1677][BUILD] Update SCM information for SBT build configuration
+- [CELEBORN-1649] Bumping up maven to 3.9.9
+- [CELEBORN-1658] Add Git Commit Info and Build JDK Spec to sbt Manifest
+- [CELEBORN-1659] Fix sbt make-distribution for cli
+- [CELEBORN-1616] Shade com.google.thirdparty to prevent dependency conflicts
+- [CELEBORN-1606] Generate dependencies-client-flink-1.16
+- [CELEBORN-1600] Enable check server dependencies
+- [CELEBORN-1556] Update Github actions to v4
+- [CELEBORN-1559] Fix make distribution script failed to recognize specific 
profile
+- [CELEBORN-1526] Fix MR plugin can not run on Hadoop 3.1.0
+- [CELEBORN-1527] Error prone plugin should exclude 
target/generated-sources/java of module
+- [CELEBORN-1505] Algin the celeborn server jackson dependency versions
+
+### Documentation
+- [CELEBORN-1870] Fix typos in in 'Developer' documents
+- [CELEBORN-1860] Remove unused celeborn.<module>.io.enableVerboseMetrics 
option
+- [CELEBORN-1823] Remove unused remote-shuffle.job.min.memory-per-partition 
and remote-shuffle.job.min.memory-per-gate
+- [CELEBORN-1811] Update default value for 
`celeborn.master.slot.assign.extraSlots`
+- [CELEBORN-1789][DOC] Document on Java Columnar Shuffle
+- [CELEBORN-1774] Update default value of celeborn.<module>.io.mode to whether 
epoll mode is available
+- [CELEBORN-1622][CIP-11] Adding documentation for Worker Tags feature
+- [CELEBORN-1755] Update doc to include S3 as one of storage layers
+- [CELEBORN-1752] Migration guide for unexpected shuffle RESTful api change 
since 0.5.0
+- [CELEBORN-1745] Remove application top disk usage code
+- [CELEBORN-1719] Introduce celeborn.client.spark.stageRerun.enabled with 
alternative celeborn.client.spark.fetch.throwsFetchFailure to enable spark 
stage rerun
+- [CELEBORN-1687] Highlight flink session cluster issue in doc
+- [CELEBORN-1684] Fix ambiguous client jar expression of document
+- [CELEBORN-1678] Add Celeborn CLI User guide in README
+- [CELEBORN-1635] Introduce Blaze support document
+  [CLEBORN-1555] Replace deprecated config celeborn.storage.activeTypes in 
docs and tests
+- [CELEBORN-1566] Update docs about using HDFS
+- [CELEBORN-1551] Fix wrong link in quota_management.md
+- [CELEBORN-1437][DOC] Merge METRICS.md into monitoring.md
+- [CELEBORN-1436][DOC] Move Rest API out from monitoring.md to webapi.md
+- [CELEBORN-1486] Introduce ClickHouse Backend in Gluten Support document
+- [CELEBORN-1466] Add local command in celeborn_ratis_shell.md
+
+### Dependencies
+- [CELEBORN-2030] Bump Spark from 3.5.5 to 3.5.6
+- [CELEBORN-2013] Upgrade scala binary version of spark-3.3, spark-3.4, 
spark-3.5 profile to 2.13.8
+- [CELEBORN-1895] Bump log4j2 version to 2.24.3
+- [CELEBORN-1890] Bump Spark from 3.5.4 to 3.5.5
+- [CELEBORN-1884] Bump rocksdbjni version from 9.5.2 to 9.10.0
+- [CELEBORN-1877] Bump zstd-jni version from 1.5.2-1 to 1.5.7-1
+- [CELEBORN-1872] Bump Flink from 1.19.1, 1.20.0 to 1.19.2, 1.20.1
+- [CELEBORN-1864] Bump Netty version from 4.1.115.Final to 4.1.118.Final
+- [CELEBORN-1862] Bump Ratis version from 3.1.2 to 3.1.3
+- [CELEBORN-1842] Bump ap-loader version from 3.0-8 to 3.0-9
+- [CELEBORN-1806] Bump Spark from 3.5.3 to 3.5.4
+- [CELEBORN-1712] Bump Netty version from 4.1.109.Final to 4.1.115.Final
+- [CELEBORN-1748] Deprecate identity provider configs tied with quota
+- [CELEBORN-1702] Bump Ratis version from 3.1.1 to 3.1.2
+- [CELEBORN-1708] Bump protobuf version from 3.21.7 to 3.25.5
+- [CELEBORN-1709] Bump jetty version from 9.4.52.v20230823 to 9.4.56.v20240826
+- [CELEBORN-1710] Bump commons-io version from 2.13.0 to 2.17.0
+- [CELEBORN-1672] Bump Spark from 3.4.3 to 3.4.4
+- [CELEBORN-1666] Bump scala-protoc from 1.0.6 to 1.0.7
+- [CELEBORN-1525] Bump Ratis version from 3.1.0 to 3.1.1
+- [CELEBORN-1613] Bump Spark from 3.5.2 to 3.5.3
+- [CELEBORN-1605] Bump commons-lang3 version from 3.13.0 to 3.17.0
+- [CELEBORN-1604] Bump rocksdbjni version from 8.11.3 to 9.5.2
+- [CELEBORN-1562] Bump Spark from 3.5.1 to 3.5.2
+- [CELEBORN-1512] Bump Flink from 1.19.0 to 1.19.1
+- [CELEBORN-1499] Bump Ratis version from 3.0.1 to 3.1.0
+
+## Credits
+
+Thanks to the following contributors who helped to review and commit to Apache 
Celeborn 0.6.0 version:
+
+| Contributors    |                 |                 |                 |      
           |                      |
+|-----------------|-----------------|-----------------|-----------------|-----------------|----------------------|
+| Aidar Bariev    | Amandeep Singh  | Aravind Patnam  | Arsen Gumin     | 
avishnus        | Biao Geng            |
+| Binjie Yang     | Björn Boschman  | Bowen Liang     | Cheng Pan       | 
Chongchen Chen  | Erik Fang            |
+| Fei Wang        | Fu Chen         | Guangwei Hong   | Haotian Cao     | He 
Zhao         | Jiaming Xie          |
+| Jianfu Li       | Jiashu Xiong    | Jinqian Fan     | Kerwin Zhang    | 
Keyong Zhou     | Kun Wan              |
+| Leo Li          | Lianne Li       | Madhukar525722  | Minchu Yang     | 
Mingxiao Feng   | Mridul Muralidharan  |
+| Nan Zhu         | Nicholas Jiang  | Nicolas Fraison | Pengqi Li       | 
Sanskar Modi    | Saurabh Dubey        |
+| Shaoyun Chen    | Shengjie Wang   | Shlomi Uubul    | Veli Yang       | 
Weijie Guo      | Xianming Lei         |
+| Xinyu Wang      | Xu Huang        | xy2953396112    | Yajun Gao       | 
Yanze Jiang     | Yi Chen              |
+| Yi Zhu          | Yuting Wang     | Yuxin Tan       | Zhao Zhao       | 
Zhengqi Zhang   | Zhentao Shuai        |
+| Ziyi Wu         |                 |                 |                 |      
           |                      |

Review Comment:
   Zhaohui Xu, Thank you.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@celeborn.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to