[GitHub] [hudi] bhasudha commented on a change in pull request #2016: [WIP] Add release page doc for 0.6.0

GitBox Mon, 24 Aug 2020 07:19:04 -0700


bhasudha commented on a change in pull request #2016:
URL: https://github.com/apache/hudi/pull/2016#discussion_r475646635




##########
File path: docs/_pages/releases.md
##########
@@ -5,6 +5,72 @@ layout: releases
 toc: true
 last_modified_at: 2020-05-28T08:40:00-07:00
 ---
+## [Release 0.6.0](https://github.com/apache/hudi/releases/tag/release-0.6.0) 
([docs](/docs/0.6.0-quick-start-guide.html))
+
+### Download Information
+ * Source Release : [Apache Hudi 0.6.0 Source 
Release](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz) 
([asc](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.asc), 
[sha512](https://downloads.apache.org/hudi/0.6.0/hudi-0.6.0.src.tgz.sha512))
+ * Apache Hudi jars corresponding to this release is available 
[here](https://repository.apache.org/#nexus-search;quick~hudi)
+
+### Migration Guide for this release
+ - With 0.6.0 Hudi is moving from list based rollback to marker based 
rollbacks. To smoothly aid this transition a 
+ new property called `hoodie.table.version` is added to hoodie.properties 
file. Whenever hoodie is launched with 
+ newer table version i.e 1 (or moving from pre 0.6.0 to 0.6.0), an upgrade 
step will be executed automatically 
+ to adhere to marker based rollback. This automatic upgrade step will happen 
just once per dataset as the 
+ `hoodie.table.version` will be updated in property file after upgrade is 
completed.
+ - Similarly, a command line tool for Downgrading is added if in case some 
users want to downgrade hoodie from 
+ table version 1 to 0 or move from hoodie 0.6.0 to pre 0.6.0
+ 
+### Release Highlights
+
+#### Ingestion side improvements:
+  - Hudi now supports `Azure Data Lake Storage V2` , `Alluxio` and `Tencent 
Cloud Object Storage` storages.
+  - Add support for "bulk_insert" without converting to RDD. This has better 
performance compared to existing "bulk_insert".
+    This implementation uses Datasource for writing to storage with support 
for key generators to operate on Row 
+    (rather than HoodieRecords as per previous "bulk_insert") is added.
+  - # TODO Add more about bulk insert modes. 
+  - # TODO Add more on bootstrap.             
+  - In previous versions, auto clean runs synchronously after ingestion. 
Starting 0.6.0, Hudi does cleaning and ingestion in parallel.
+  - Support async compaction for spark streaming writes to hudi table. 
Previous versions supported only inline compactions.
+  - Implemented rollbacks using marker files instead of relying on commit 
metadata. Please check the migration guide for more details on this.
+  - A new InlineFileSystem has been added to support embedding any file format 
as an inline format within a regular file.
+
+#### Query side improvements:
+  - Starting 0.6.0, snapshot queries are feasible via spark datasource. 
+  - In prior versions we only supported HoodieCombineHiveInputFormat for 
CopyOnWrite tables to ensure that there is a limit on the number of mappers 
spawned for
+    any query. Hudi now supports Merge on Read tables also using 
HoodieCombineInputFormat.
+  - Speedup spark read queries by caching metaclient in HoodieROPathFilter. 
This helps reduce listing related overheads in S3 when filtering files for 
read-optimized queries. 
+
+#### DeltaStreamer improvements:
+  - HoodieMultiDeltaStreamer: adds support for ingesting multiple kafka 
streams in a single DeltaStreamer deployment
+  - Added a new tool - InitialCheckPointProvider, to set checkpoints when 
migrating to DeltaStreamer after an initial load of the table is complete..
+  - Add CSV source support.
+  - Added chained transformer that can add chain multiple transformers.
+
+#### Indexing improvements:
+  - Added a new index `HoodieSimpleIndex` which joins incoming records with 
base files to index records.
+  - Added ability to configure user defined indexes.
+
+#### Key generation improvements:
+  - Introduced `CustomTimestampBasedKeyGenerator` to support complex keys as 
record key and custom partition paths.
+  - Support more time units and dat/time formats in 
`TimestampBasedKeyGenerator`  
+
+#### Developer productivity and monitoring improvements:
+  - Spark DAGs are named to aid better debuggability
+  - Console, JMX, Prometheus and DataDog metric reporters have been added.
+  - Support pluggable metrics reporting by introducing proper abstraction for 
user defined metrics.
+
+#### CLI related features:
+  - Added support for deleting savepoints via CLI
+  - Added a new command - `export instants`, to export metadata of instants

Review comment:
       Let me add one line here and point to the migration section for more 
details. Like I did in [line 
34.](https://github.com/apache/hudi/pull/2016/files#diff-21c3ed259536d942a5f57ecff7d2a17aR34)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [hudi] bhasudha commented on a change in pull request #2016: [WIP] Add release page doc for 0.6.0

Reply via email to