pan3793 commented on code in PR #36: URL: https://github.com/apache/incubator-celeborn-website/pull/36#discussion_r1479138082
########## docs/community/release_notes/release_note_0.4.0.md: ########## @@ -0,0 +1,268 @@ +--- +hide: + - navigation + +license: | + Licensed to the Apache Software Foundation (ASF) under one or more + contributor license agreements. See the NOTICE file distributed with + this work for additional information regarding copyright ownership. + The ASF licenses this file to You under the Apache License, Version 2.0 + (the "License"); you may not use this file except in compliance with + the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +--- + +# Apache Celeborn(Incubating) 0.4.0 Release Notes + +## Highlight + +- Rerun Spark Stage for Celeborn Shuffle Fetch Failure +- Added support for Hadoop MapReduce +- Added support for Flink 1.18 +- Implemented JVM monitoring in Celeborn Worker using JVMQuake +- Added support for SBT build system + +### IMPOROVEMENT + +- [CELEBORN-1052] Introduce dynamic ConfigService at SystemLevel and TenantLevel +- [CELEBORN-977] Support RocksDB as recover DB backend +- [CELEBORN-851] Mention Celeborn 0.4 server requires 0.3 or above clients +- [CELEBORN-808] Remove unnecessary RssShuffleManager in 0.4.0 +- [CELEBORN-980] Asynchronously delete original files to fix `ReusedExchange` bug +- [CELEBORN-1149] Improve replica selection when rack aware +- [CELEBORN-448] Support exclude worker manually +- [CELEBORN-1236][METRICS] Celeborn add metrics about thread pool +- [CELEBORN-1242] Unify celeborn thread name format +- [CELEBORN-1226][FOLLOWUP] Unify creation of thread using ThreadUtils +- [CELEBORN-1226][BRANCH-0.4.0] Unify creation of thread using ThreadUtils (#2245) +- [CELEBORN-1238] deviceCheckThreadPool is only initialized when diskCheck is enabled +- [CELEBORN-1225][FOLLOWUP] Worker should build replicate factory to get client for sending replicate data +- [CELEBORN-1233] Treat unfound PartitionLocation as failed in Controller#commitFiles +- [CELEBORN-1218] Optimize dataPusher to get partitionLocationMap only once +- [CELEBORN-1225] Worker should build replicate factory to get client for sending replicate data +- [CELEBORN-1228] Format the timestamp when recording worker failure +- [CELEBORN-1224] Make TransportMessage#type transient for backward compatibility +- [CELEBORN-1219] takeBuffer() avoid checking source.metricsCollectCriticalEnabled twice +- [CELEBORN-1220][IMPROVEMENT] Make trim logic more robust +- [CELEBORN-1177] OpenStream should register stream via ChunkStreamManager to close stream for ReusedExchange +- [CELEBORN-1217] Improve exception message of loadFileGroup for ShuffleClientImpl +- [CELEBORN-1215] Introduce PausePushDataAndReplicateTime metric to record time for a worker to stop receiving pushData from clients and other workers +- [CELEBORN-1216] Resolve error occurring during distribution creation with profile -Pspark-2.4 +- [CELEBORN-1214] Introduce WriteDataHardSplitCount metric to record HARD_SPLIT partitions of PushData and PushMergedData +- [CELEBORN-891] Remove pipeline feature for sort based writer +- [CELEBORN-1210] Fix potential memory leak in PartitionFilesCleaner +- [CELEBORN-1100] Introduce ChunkStreamCount, OpenStreamFailCount metrics about opening stream of FetchHandler +- [CELEBORN-1211] Add extension for celeborn shuffle handler +- [CELEBORN-1201] Optimize memory usage of cache in partition sorter +- [CELEBORN-1190][FOLLOWUP] Apply error prone patch and suppress some problems +- [CELEBORN-1252] Fix resource consumption of worker does not update when update interval is greater than heartbeat interval +- [CELEBORN-1253] Improve exception message of fetching chunk failure for WorkerPartitionReader +- [CELEBORN-1246][FOLLOWUP] Introduce OpenStreamSuccessCount, FetchChunkSuccessCount and WriteDataSuccessCount metric in Grafana dashboard +- [CELEBORN-1189] Introduce RunningApplicationCount metric and /applications API to record running applications of worker +- [CELEBORN-1187][FOLLOWUP] Unify the size and file count of active shuffle metrics for master and worker +- [CELEBORN-1187] Unify the size and file count of active shuffle metrics for master and worker +- [CELEBORN-1196] Slots allocator will increment disk index repeatedly +- [CELEBORN-1193] ResettableSlidingWindowReservoir should reset `full` to false +- [CELEBORN-1188][TEST] Using JUnit function instead of java assert +- [CELEBORN-1190] Apply error prone patch and suppress some problems +- [CELEBORN-1036][FOLLOWUP] totalInflightReqs should decrement when batchIdSet contains the batchId to avoid duplicate caller of removeBatch +- [CELEBORN-1150] support io encryption for spark +- [CELEBORN-1176] Server side support for Sasl Auth +- [CELEBORN-1180] Changed the version of Sasl Auth related config to 0.5 +- [CELEBORN-1157] Add client-side support for Sasl Authentication in the transport layer +- [CELEBORN-1164] Introduce FetchChunkFailCount metric to expose the count of fetching chunk failed in current worker +- [CELEBORN-1162][BUG] Fix refCnt 0 Exception in FetchHandler#handleChunkFetchRequest +- [CELEBORN-1151] Request slots when register shuffle should filter the workers excluded by application +- [CELEBORN-1152] fix GetShuffleId RPC NPE for empty shuffle +- [CELEBORN-1147] Added a dedicated API for RPC messages which also accepts an RpcResponseCallback instance +- [MINOR] Update log level of ChunkFetchSuccess failed for `FetchHandler#handleChunkFetchRequest` from error to warn +- [CELEBORN-1127] Add JVM classloader metrics +- [CELEBORN-1122] Metrics supports json format +- [CELEBORN-1052][FOLLOWUP] Introduce dynamic ConfigService at SystemLevel and TenantLevel +- [CELEBORN-1145] Separate clientPushBufferMaxSize from CelebornInputStreamImpl +- [CELEBORN-1131] Add Client/Server bootstrap framework to transport layer +- [CELEBORN-1142] clear shuffleIdCache in shutdown method of ShuffleClientImpl +- [CELEBORN-1081][FOLLOWUP] Remove UNKNOWN_DISK and allocate all slots to disk +- [CELEBORN-1125][FOLLOWUP] Add failureaccess shade +- [CELEBORN-1140] Use try-with-resources to avoid FSDataInputStream not being closed +- [CELEBORN-1135] Added tests for the RpcEnv and related classes +- [CELEBORN-856] Add mapreduce integration test +- [CELEBORN-1123] Support fallback to non-columnar shuffle for schema that cannot be obtained from shuffle dependency +- [CELEBORN-1134] Celeborn Flink client should validate whether execution.batch-shuffle-mode is ALL_EXCHANGES_BLOCKING +- [CELEBORN-1106] Ensure data is written into flush buffer before sending message to client +- [CELEBORN-1110][FOLLOWUP] Support celeborn.worker.storage.disk.reserve.ratio to configure worker reserved ratio for each disk +- [CELEBORN-1108][FOLLOWUP] Use rat plugin check Flink 1.18 +- [CELEBORN-1048][FOLLOWUP] MR module compile +- [CELEBORN-1108] Rat plugin check for more modules +- [CELEBORN-247][FOLLOWUP] Add metrics for each user's quota usage of Celeborn Worker +- [CELEBORN-1095] Support configuration of fastest available XXHashFactory instance for checksum of Lz4Decompressor +- [CELEBORN-1087] Remove SimpleStateMachineStorageUtil in master module +- Revert "[CELEBORN-255] Add counter of outstandingFetches, outstanding… Review Comment: such revert should not be present in release notes ... let's remove it and the reverted one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@celeborn.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org