[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new d0c3b9f Travis CI build asf-site d0c3b9f is described below commit d0c3b9fb1095eaae1cda48bbef0de32defdb04b8 Author: CI AuthorDate: Thu May 21 04:46:35 2020 + Travis CI build asf-site --- content/cn/docs/0.5.2-querying_data.html | 4 ++-- content/cn/docs/querying_data.html | 4 ++-- content/cn/docs/quick-start-guide.html | 2 ++ content/docs/0.5.2-querying_data.html| 4 ++-- content/docs/querying_data.html | 4 ++-- content/docs/quick-start-guide.html | 6 -- 6 files changed, 14 insertions(+), 10 deletions(-) diff --git a/content/cn/docs/0.5.2-querying_data.html b/content/cn/docs/0.5.2-querying_data.html index eeaf7a1..4ed98db 100644 --- a/content/cn/docs/0.5.2-querying_data.html +++ b/content/cn/docs/0.5.2-querying_data.html @@ -360,7 +360,7 @@ Presto - Impala(此功能还未正式发布) + Impala (3.4 or later) 读优化表 @@ -677,7 +677,7 @@ Upsert实用程序(HoodieDeltaStreamer Presto是一种常用的查询引擎,可提供交互式查询性能。 Hudi RO表可以在Presto中无缝查询。 这需要在整个安装过程中将hudi-presto-bundle jar放入presto_install/plugin/hive-hadoop2/中。 -Impala(此功能还未正式发布) +Impala (3.4 or later) 读优化表 diff --git a/content/cn/docs/querying_data.html b/content/cn/docs/querying_data.html index e33d18b..002c34c 100644 --- a/content/cn/docs/querying_data.html +++ b/content/cn/docs/querying_data.html @@ -360,7 +360,7 @@ Presto - Impala(此功能还未正式发布) + Impala (3.4 or later) 读优化表 @@ -677,7 +677,7 @@ Upsert实用程序(HoodieDeltaStreamer Presto是一种常用的查询引擎,可提供交互式查询性能。 Hudi RO表可以在Presto中无缝查询。 这需要在整个安装过程中将hudi-presto-bundle jar放入presto_install/plugin/hive-hadoop2/中。 -Impala(此功能还未正式发布) +Impala (3.4 or later) 读优化表 diff --git a/content/cn/docs/quick-start-guide.html b/content/cn/docs/quick-start-guide.html index 984639a..adc2bfc 100644 --- a/content/cn/docs/quick-start-guide.html +++ b/content/cn/docs/quick-start-guide.html @@ -410,6 +410,8 @@ read. format("org.apache.hudi"). load(basePath + "/*/*/*/*") +//load(basePath) 如果使用 "/partitionKey=partitionValue" 文件夹命名格式,Spark将自动识别分区信息 + roViewDF.registerTempTable("hudi_ro_table") spark.sql("select fare, begin_lon, begin_lat, ts from hudi_ro_table where fare 20.0").show() spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, rider, driver, fare from hudi_ro_table").show() diff --git a/content/docs/0.5.2-querying_data.html b/content/docs/0.5.2-querying_data.html index 859bd0c..4315d03 100644 --- a/content/docs/0.5.2-querying_data.html +++ b/content/docs/0.5.2-querying_data.html @@ -357,7 +357,7 @@ Presto - Impala (Not Officially Released) + Impala (3.4 or later) Snapshot Query @@ -672,7 +672,7 @@ Please refer to confi Presto is a popular query engine, providing interactive query performance. Presto currently supports snapshot queries on COPY_ON_WRITE and read optimized queries on MERGE_ON_READ Hudi tables. This requires the hudi-presto-bundle jar to be placed into presto_install/plugin/hive-hadoop2/, across the installation. -Impala (Not Officially Released) +Impala (3.4 or later) Snapshot Query diff --git a/content/docs/querying_data.html b/content/docs/querying_data.html index e8dbe1a..2cfa722 100644 --- a/content/docs/querying_data.html +++ b/content/docs/querying_data.html @@ -357,7 +357,7 @@ Presto - Impala (Not Officially Released) + Impala (3.4 or later) Snapshot Query @@ -672,7 +672,7 @@ Please refer to configurati Presto is a popular query engine, providing interactive query performance. Presto currently supports snapshot queries on COPY_ON_WRITE and read optimized queries on MERGE_ON_READ Hudi tables. This requires the hudi-presto-bundle jar to be placed into presto_install/plugin/hive-hadoop2/, across the installation. -Impala (Not Officially Released) +Impala (3.4 or later) Snapshot Query diff --git a/content/docs/quick-start-guide.html b/content/docs/quick-start-guide.html index fa00061..76f0967 100644 --- a/content/docs/quick-start-guide.html +++ b/content/docs/quick-start-guide.html @@ -446,7 +446,8 @@ Here we are using the default write operation : read. format("hudi"). load(basePath + "/*/*/*/*") -tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot") +//load(basePath) use "/partitionKey=partitionValue" folder structure for Spark auto partition discovery +tripsSnapshotDF.createOrReplaceTempView("hudi_trips_snapshot") spark.sql("select fare, begin_lon, begin_lat, ts from hudi_trips_snapshot where fare 20.0").show() spark.sql("select _hoodie_commit_time, _hoodie_record_key, _hoodie_partition_path, rider, driver, fare from hudi_trips_snapshot").show() @@ -637,7 +638,8 @@ Here we are
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 4b419ad Travis CI build asf-site 4b419ad is described below commit 4b419adf1ec2faaafdf1319475363d9239ba612d Author: CI AuthorDate: Sat May 16 04:36:26 2020 + Travis CI build asf-site --- content/assets/css/main.css | 2 +- content/docs/powered_by.html | 34 ++ content/index.html | 82 ++-- 3 files changed, 76 insertions(+), 42 deletions(-) diff --git a/content/assets/css/main.css b/content/assets/css/main.css index 9922c4f..23f7302 100644 --- a/content/assets/css/main.css +++ b/content/assets/css/main.css @@ -1 +1 @@ -table{border-color:#1ab7ea !important}.page a{color:#3b9cba !important}.page__content{font-size:17px}.page__content.releases{font-size:17px}.page__footer{font-size:15px !important}.page__footer a{color:#3b9cba !important}.page__content .notice,.page__content .notice--primary,.page__content .notice--info,.page__content .notice--warning,.page__content .notice--success,.page__content .notice--danger{font-size:0.8em !important}.page__content table{font-size:0.8em !important}.page__content ta [...] +table{border-color:#1ab7ea !important}.page a{color:#3b9cba !important}.page__content{font-size:17px}.page__content.releases{font-size:17px}.page__footer{font-size:15px !important}.page__footer a{color:#3b9cba !important}.page__content .notice,.page__content .notice--primary,.page__content .notice--info,.page__content .notice--warning,.page__content .notice--success,.page__content .notice--danger{font-size:0.8em !important}.page__content table{font-size:0.8em !important}.page__content ta [...] diff --git a/content/docs/powered_by.html b/content/docs/powered_by.html index 8a7e363..f4ab420 100644 --- a/content/docs/powered_by.html +++ b/content/docs/powered_by.html @@ -425,6 +425,40 @@ December 2019, AWS re:Invent 2019, Las Vegas, NV, USA https://eng.uber.com/hoodie/;>“Hoodie: Uber Engineering’s Incremental Processing Framework on Hadoop” - Engineering Blog By Prasanna Rajaperumal +Powered by + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Back to top diff --git a/content/index.html b/content/index.html index f680ed7..d8296e3 100644 --- a/content/index.html +++ b/content/index.html @@ -163,47 +163,47 @@ - - - - Hudi Users - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Get Started - - - - - + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new a4eecda Travis CI build asf-site a4eecda is described below commit a4eecdafab8b516e572499afa4e9b3136b99a2d3 Author: CI AuthorDate: Wed May 13 01:51:11 2020 + Travis CI build asf-site --- content/cn/releases.html | 4 ++-- content/releases.html| 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/content/cn/releases.html b/content/cn/releases.html index a48f73c..2321f00 100644 --- a/content/cn/releases.html +++ b/content/cn/releases.html @@ -216,7 +216,7 @@ Download Information - Source Release : https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz;>Apache Hudi(incubating) 0.5.2-incubating Source Release (https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.asc;>asc, https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.sha512;>sha512) + Source Release : https://downloads.apache.org/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz;>Apache Hudi(incubating) 0.5.2-incubating Source Release (https://downloads.apache.org/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.asc;>asc, https://downloads.apache.org/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.sha512;>sha512) Apache Hudi (incubating) jars corresponding to this release is available https://repository.apache.org/#nexus-search;quick~hudi;>here @@ -251,7 +251,7 @@ Download Information - Source Release : https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz;>Apache Hudi(incubating) 0.5.1-incubating Source Release (https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc;>asc, https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512;>sha512) + Source Release : https://downloads.apache.org/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz;>Apache Hudi(incubating) 0.5.1-incubating Source Release (https://downloads.apache.org/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc;>asc, https://downloads.apache.org/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512;>sha512) Apache Hudi (incubating) jars corresponding to this release is available https://repository.apache.org/#nexus-search;quick~hudi;>here diff --git a/content/releases.html b/content/releases.html index 303ba95..d225795 100644 --- a/content/releases.html +++ b/content/releases.html @@ -223,7 +223,7 @@ Download Information - Source Release : https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz;>Apache Hudi(incubating) 0.5.2-incubating Source Release (https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.asc;>asc, https://www.apache.org/dist/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.sha512;>sha512) + Source Release : https://downloads.apache.org/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz;>Apache Hudi(incubating) 0.5.2-incubating Source Release (https://downloads.apache.org/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.asc;>asc, https://downloads.apache.org/incubator/hudi/0.5.2-incubating/hudi-0.5.2-incubating.src.tgz.sha512;>sha512) Apache Hudi (incubating) jars corresponding to this release is available https://repository.apache.org/#nexus-search;quick~hudi;>here @@ -258,7 +258,7 @@ Download Information - Source Release : https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz;>Apache Hudi(incubating) 0.5.1-incubating Source Release (https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc;>asc, https://www.apache.org/dist/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512;>sha512) + Source Release : https://downloads.apache.org/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz;>Apache Hudi(incubating) 0.5.1-incubating Source Release (https://downloads.apache.org/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.asc;>asc, https://downloads.apache.org/incubator/hudi/0.5.1-incubating/hudi-0.5.1-incubating.src.tgz.sha512;>sha512) Apache Hudi (incubating) jars corresponding to this release is available https://repository.apache.org/#nexus-search;quick~hudi;>here @@ -311,7 +311,7 @@ If you are using this feature, you need to relocate the avro dependencies in you Download Information - Source Release : https://www.apache.org/dist/incubator/hudi/0.5.0-incubating/hudi-0.5.0-incubating.src.tgz;>Apache Hudi(incubating) 0.5.0-incubating Source Release
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new f1d9a80 Travis CI build asf-site f1d9a80 is described below commit f1d9a8088b8bc46242325dc1eb849ad96c240a60 Author: CI AuthorDate: Wed May 6 11:23:13 2020 + Travis CI build asf-site --- content/docs/use_cases.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/use_cases.html b/content/docs/use_cases.html index 640c785..01ea086 100644 --- a/content/docs/use_cases.html +++ b/content/docs/use_cases.html @@ -382,7 +382,7 @@ Unfortunately, in today’s post-mobile pre-IoT world, late data f In such cases, the only remedy to guarantee correctness is to https://falcon.apache.org/FalconDocumentation.html#Handling_late_input_data;>reprocess the last few hours worth of data, over and over again each hour, which can significantly hurt the efficiency across the entire ecosystem. For e.g; imagine reprocessing TBs worth of data every hour across hundreds of workflows. -Hudi comes to the rescue again, by providing a way to consume new data (including late data) from an upsteam Hudi table HU at a record granularity (not folders/partitions), +Hudi comes to the rescue again, by providing a way to consume new data (including late data) from an upstream Hudi table HU at a record granularity (not folders/partitions), apply the processing logic, and efficiently update/reconcile late data with a downstream Hudi table HD. Here, HU and HD can be continuously scheduled at a much more frequent schedule like 15 mins, and providing an end-end latency of 30 mins at HD.
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 206e549 Travis CI build asf-site 206e549 is described below commit 206e549637101c6fe49aaed915c41f21ad884e81 Author: CI AuthorDate: Wed May 6 08:42:27 2020 + Travis CI build asf-site --- content/docs/comparison.html | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/content/docs/comparison.html b/content/docs/comparison.html index 3919cbd..223e9f7 100644 --- a/content/docs/comparison.html +++ b/content/docs/comparison.html @@ -349,7 +349,7 @@ Consequently, Kudu does not support incremental pulling (as of early 2017), some Kudu diverges from a distributed file system abstraction and HDFS altogether, with its own set of storage servers talking to each other via RAFT. Hudi, on the other hand, is designed to work with an underlying Hadoop compatible filesystem (HDFS,S3 or Ceph) and does not have its own fleet of storage servers, -instead relying on Apache Spark to do the heavy-lifting. Thu, Hudi can be scaled easily, just like other Spark jobs, while Kudu would require hardware +instead relying on Apache Spark to do the heavy-lifting. Thus, Hudi can be scaled easily, just like other Spark jobs, while Kudu would require hardware operational support, typical to datastores like HBase or Vertica. We have not at this point, done any head to head benchmarks against Kudu (given RTTable is WIP). But, if we were to go with results shared by https://db-blog.web.cern.ch/blog/zbigniew-baranowski/2017-01-performance-comparison-different-file-formats-and-storage-engines;>CERN , we expect Hudi to positioned at something that ingests parquet with superior performance.
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 79fb998 Travis CI build asf-site 79fb998 is described below commit 79fb9989614909d23885760b253d446a38ea66b5 Author: CI AuthorDate: Mon May 4 00:20:21 2020 + Travis CI build asf-site --- content/docs/quick-start-guide.html | 269 +--- 1 file changed, 249 insertions(+), 20 deletions(-) diff --git a/content/docs/quick-start-guide.html b/content/docs/quick-start-guide.html index 8e40382..e8cbbf7 100644 --- a/content/docs/quick-start-guide.html +++ b/content/docs/quick-start-guide.html @@ -4,7 +4,7 @@ Quick-Start Guide - Apache Hudi - + @@ -13,7 +13,7 @@ https://hudi.apache.org/docs/quick-start-guide.html;> - + @@ -335,14 +335,29 @@ IN THIS PAGE - Setup spark-shell - Insert data - Query data - Update data - Incremental query - Point in time query - Delete data - Where to go from here? + Scala example + + Setup + Insert data + Query data + Update data + Incremental query + Point in time query + Delete data + + + Pyspark example + + Setup + Insert data + Query data + Update data + Incremental query + Point in time query + Delete data + Where to go from here? + + @@ -351,13 +366,15 @@ code snippets that allows you to insert and update a Hudi table of default table type: Copy on Write. After each write operation we will also show how to read the data both snapshot and incrementally. +Scala example -Setup spark-shell +Setup Hudi works with Spark-2.x versions. You can follow instructions https://spark.apache.org/downloads.html;>here for setting up spark. From the extracted directory run spark-shell with Hudi as: -spark-2.4.4-bin-hadoop2.7/bin/spark// spark-shell +spark-2.4.4-bin-hadoop2.7/bin/spark-shell \ --packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.1-incubating, [...] --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' @@ -374,7 +391,8 @@ From the extracted directory run spark-shell with Hudi as: Setup table name, base path and a data generator to generate records for this guide. -import org.apache.hudi.QuickstartUtils._ +// spark-shell +import org.apache.hudi.QuickstartUtils._ import scala.collection.JavaConversions._ import org.apache.spark.sql.SaveMode._ import org.apache.hudi.DataSourceReadOptions._ @@ -393,7 +411,8 @@ can generate sample inserts and updates based on the the sample trip schema Generate some new trips, load them into a DataFrame and write the DataFrame into the Hudi table as below. -val inserts = convertToStringList(dataGen.generateInserts(10)) +// spark-shell +val inserts = convertToStringList(dataGen.generateInserts(10)) val df = spark.read.json(spark.sparkContext.parallelize(inserts, 2df.write.format("hudi"). options(getQuickstartWriteConfigs). @@ -418,7 +437,8 @@ Here we are using the default write operation : Load the data files into a DataFrame. -val tripsSnapshotDF = spark. +// spark-shell +val tripsSnapshotDF = spark. read. format("hudi"). load(basePath + "/*/*/*/*") @@ -437,7 +457,8 @@ Refer to Table types and queriesThis is similar to inserting new data. Generate updates to existing trips using the data generator, load into a DataFrame and write DataFrame into the hudi table. -val updates = convertToStringList(dataGen.generateUpdates(10)) +// spark-shell +val updates = convertToStringList(dataGen.generateUpdates(10)) val df = spark.read.json(spark.sparkContext.parallelize(updates, 2df.write.format("hudi"). options(getQuickstartWriteConfigs). @@ -459,7 +480,8 @@ denoted by the timestamp. Look for changes in _h This can be achieved using Hudi’s incremental querying and providing a begin time from which changes need to be streamed. We do not need to specify endTime, if we want all changes after the given commit (as is the common case). -// reload data +// spark-shell +// reload data spark. read. format("hudi"). @@ -487,7 +509,8 @@ feature is that it now lets you author streaming pipelines on batch data. Lets look at how to query data as of a specific time. The specific time can be represented by pointing endTime to a specific commit time and beginTime to “000” (denoting earliest possible commit time). -val beginTime = "000" // Represents all commits this time. +// spark-shell +val beginTime = "000" // Represents all commits this time. val endTime = commits(commits.length - 2) // commit time we are interested in //incrementally query data @@ -497,13 +520,14 @@ specific commit time and beginTime to “000” (denoting earliest possible comm
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 609d5bf Travis CI build asf-site 609d5bf is described below commit 609d5bf8c3d0a1f4461ff2e4aa548daceedd11d2 Author: CI AuthorDate: Sat Apr 25 13:14:10 2020 + Travis CI build asf-site --- content/assets/js/lunr/lunr-store.js | 2 +- content/cn/docs/quick-start-guide.html | 8 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/content/assets/js/lunr/lunr-store.js b/content/assets/js/lunr/lunr-store.js index 1d0335b..f690419 100644 --- a/content/assets/js/lunr/lunr-store.js +++ b/content/assets/js/lunr/lunr-store.js @@ -545,7 +545,7 @@ var store = [{ "url": "https://hudi.apache.org/docs/oss_hoodie.html;, "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{ "title": "Quick-Start Guide", - "excerpt":"本指南通过使用spark-shell简要介绍了Hudi功能。使用Spark数据源,我们将通过代码段展示如何插入和更新的Hudi默认存储类型数据集: 写时复制。每次写操作之后,我们还将展示如何读取快照和增量读取数据。 设置spark-shell Hudi适用于Spark-2.x版本。您可以按照此处的说明设置spark。 在提取的目录中,使用spark-shell运行Hudi: bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' 设置表名、基本路径和数据生成器来为本指南生成记录。 import org.apache.hudi.QuickstartUtils._ import scala.collection.JavaConversions._ import org.apache.spark.sql. [...] + "excerpt":"本指南通过使用spark-shell简要介绍了Hudi功能。使用Spark数据源,我们将通过代码段展示如何插入和更新Hudi的默认存储类型数据集: 写时复制。每次写操作之后,我们还将展示如何读取快照和增量数据。 设置spark-shell Hudi适用于Spark-2.x版本。您可以按照此处的说明设置spark。 在提取的目录中,使用spark-shell运行Hudi: bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' 设置表名、基本路径和数据生成器来为本指南生成记录。 import org.apache.hudi.QuickstartUtils._ import scala.collection.JavaConversions._ import org.apache.spark.sql.Sa [...] "tags": [], "url": "https://hudi.apache.org/cn/docs/quick-start-guide.html;, "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{ diff --git a/content/cn/docs/quick-start-guide.html b/content/cn/docs/quick-start-guide.html index 3dcd47b..f1bd106 100644 --- a/content/cn/docs/quick-start-guide.html +++ b/content/cn/docs/quick-start-guide.html @@ -4,7 +4,7 @@ Quick-Start Guide - Apache Hudi - + @@ -13,7 +13,7 @@ https://hudi.apache.org/cn/docs/quick-start-guide.html;> - + @@ -346,8 +346,8 @@ - 本指南通过使用spark-shell简要介绍了Hudi功能。使用Spark数据源,我们将通过代码段展示如何插入和更新的Hudi默认存储类型数据集: -写时复制。每次写操作之后,我们还将展示如何读取快照和增量读取数据。 + 本指南通过使用spark-shell简要介绍了Hudi功能。使用Spark数据源,我们将通过代码段展示如何插入和更新Hudi的默认存储类型数据集: +写时复制。每次写操作之后,我们还将展示如何读取快照和增量数据。 设置spark-shell Hudi适用于Spark-2.x版本。您可以按照https://spark.apache.org/downloads.html;>此处的说明设置spark。
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 4254e60 Travis CI build asf-site 4254e60 is described below commit 4254e606504a6b0351176f8bb59f9a830d4b66e6 Author: CI AuthorDate: Wed Apr 22 15:53:20 2020 + Travis CI build asf-site --- content/assets/js/lunr/lunr-store.js | 2 +- content/cn/docs/writing_data.html| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/content/assets/js/lunr/lunr-store.js b/content/assets/js/lunr/lunr-store.js index 70f1edb..1d0335b 100644 --- a/content/assets/js/lunr/lunr-store.js +++ b/content/assets/js/lunr/lunr-store.js @@ -600,7 +600,7 @@ var store = [{ "url": "https://hudi.apache.org/docs/concepts.html;, "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{ "title": "写入 Hudi 数据集", -"excerpt":"这一节我们将介绍使用DeltaStreamer工具从外部源甚至其他Hudi数据集摄取新更改的方法, 以及通过使用Hudi数据源的upserts加快大型Spark作业的方法。 对于此类数据集,我们可以使用各种查询引擎查询它们。 写操作 在此之前,了解Hudi数据源及delta streamer工具提供的三种不同的写操作以及如何最佳利用它们可能会有所帮助。 这些操作可以在针对数据集发出的每个提交/增量提交中进行选择/更改。 UPSERT(插入更新) :这是默认操作,在该操作中,通过查找索引,首先将输入记录标记为插入或更新。 在运行启发式方法以确定如何最好地将这些记录放到存储上,如优化文件大小之类后,这些记录最终会被写入。 对于诸如数据库更改捕获之类的用例,建议该操作,因为输入几乎肯定包含更新。 INSERT(插入) :就使用启发式方法确定文件大小� �言,此操作与插入更新(UPSERT)非常相似,但此操作完全跳过了索引查找步骤。 因此,对于日志重复数据删除等用例(结合下面提到的过滤重复项的选项),它可以比插入更新快得多。 插入也适用于这种用 [...] +"excerpt":"这一节我们将介绍使用DeltaStreamer工具从外部源甚至其他Hudi数据集摄取新更改的方法, 以及通过使用Hudi数据源的upserts加快大型Spark作业的方法。 对于此类数据集,我们可以使用各种查询引擎查询它们。 写操作 在此之前,了解Hudi数据源及delta streamer工具提供的三种不同的写操作以及如何最佳利用它们可能会有所帮助。 这些操作可以在针对数据集发出的每个提交/增量提交中进行选择/更改。 UPSERT(插入更新) :这是默认操作,在该操作中,通过查找索引,首先将输入记录标记为插入或更新。 在运行启发式方法以确定如何最好地将这些记录放到存储上,如优化文件大小之后,这些记录最终会被写入。 对于诸如数据库更改捕获之类的用例,建议该操作,因为输入几乎肯定包含更新。 INSERT(插入) :就使用启发式方法确定文件大小而� �,此操作与插入更新(UPSERT)非常相似,但此操作完全跳过了索引查找步骤。 因此,对于日志重复数据删除等用例(结合下面提到的过滤重复项的选项),它可以比插入更新快得多。 插入也适用于这种用例 [...] "tags": [], "url": "https://hudi.apache.org/cn/docs/writing_data.html;, "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{ diff --git a/content/cn/docs/writing_data.html b/content/cn/docs/writing_data.html index 24ed3d0..c7f7903 100644 --- a/content/cn/docs/writing_data.html +++ b/content/cn/docs/writing_data.html @@ -356,7 +356,7 @@ UPSERT(插入更新) :这是默认操作,在该操作中,通过查找索引,首先将输入记录标记为插入或更新。 - 在运行启发式方法以确定如何最好地将这些记录放到存储上,如优化文件大小之类后,这些记录最终会被写入。 + 在运行启发式方法以确定如何最好地将这些记录放到存储上,如优化文件大小之后,这些记录最终会被写入。 对于诸如数据库更改捕获之类的用例,建议该操作,因为输入几乎肯定包含更新。 INSERT(插入) :就使用启发式方法确定文件大小而言,此操作与插入更新(UPSERT)非常相似,但此操作完全跳过了索引查找步骤。 因此,对于日志重复数据删除等用例(结合下面提到的过滤重复项的选项),它可以比插入更新快得多。
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 46fb6ee Travis CI build asf-site 46fb6ee is described below commit 46fb6ee01d6a099d593671a83b6ccf27b9240934 Author: CI AuthorDate: Tue Apr 21 12:22:55 2020 + Travis CI build asf-site --- content/cn/docs/quick-start-guide.html | 2 +- content/docs/oss_hoodie.html | 14 -- content/docs/quick-start-guide.html| 2 +- content/sitemap.xml| 4 ++-- 4 files changed, 16 insertions(+), 6 deletions(-) diff --git a/content/cn/docs/quick-start-guide.html b/content/cn/docs/quick-start-guide.html index 02c11d2..3dcd47b 100644 --- a/content/cn/docs/quick-start-guide.html +++ b/content/cn/docs/quick-start-guide.html @@ -391,7 +391,7 @@ mode(Overwrite)覆盖并重新创建数据集(如果已经存在)。 您可以检查在/tmp/hudi_cow_table/region/country/city/下生成的数据。我们提供了一个记录键 -(schema中的uuid),分区字段(region/county/city)和组合逻辑(schema中的ts) +(schema中的uuid),分区字段(region/country/city)和组合逻辑(schema中的ts) 以确保行程记录在每个分区中都是唯一的。更多信息请参阅 https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#FAQ-HowdoImodelthedatastoredinHudi;>对Hudi中的数据进行建模, 有关将数据提取到Hudi中的方法的信息,请参阅写入Hudi数据集。 diff --git a/content/docs/oss_hoodie.html b/content/docs/oss_hoodie.html index b998d3c..baf5a36 100644 --- a/content/docs/oss_hoodie.html +++ b/content/docs/oss_hoodie.html @@ -19,7 +19,7 @@ - + @@ -383,13 +383,23 @@ Aliyun OSS Libs -Aliyun hadoop libraries jars to add to our pom.xml. +Aliyun hadoop libraries jars to add to our pom.xml. Since hadoop-aliyun depends on the version of hadoop 2.9.1+, you need to use the version of hadoop 2.9.1 or later. dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-aliyun/artifactId version3.2.1/version /dependency +dependency +groupIdcom.aliyun.oss/groupId +artifactIdaliyun-sdk-oss/artifactId +version3.8.1/version +/dependency +dependency +groupIdorg.jdom/groupId +artifactIdjdom/artifactId +version1.1/version +/dependency diff --git a/content/docs/quick-start-guide.html b/content/docs/quick-start-guide.html index 0d86d95..8e40382 100644 --- a/content/docs/quick-start-guide.html +++ b/content/docs/quick-start-guide.html @@ -407,7 +407,7 @@ can generate sample inserts and updates based on the the sample trip schema mode(Overwrite) overwrites and recreates the table if it already exists. You can check the data generated under /tmp/hudi_trips_cow/region/country/city/. We provided a record key -(uuid in https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58;>schema), partition field (region/county/city) and combine logic (ts in +(uuid in https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58;>schema), partition field (region/country/city) and combine logic (ts in https://github.com/apache/incubator-hudi/blob/master/hudi-spark/src/main/java/org/apache/hudi/QuickstartUtils.java#L58;>schema) to ensure trip records are unique within each partition. For more info, refer to https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=113709185#FAQ-HowdoImodelthedatastoredinHudi;>Modeling data stored in Hudi and for info on ways to ingest data into Hudi, refer to Writing Hudi Tables. diff --git a/content/sitemap.xml b/content/sitemap.xml index 79ddedc..e42df8c 100644 --- a/content/sitemap.xml +++ b/content/sitemap.xml @@ -430,11 +430,11 @@ https://hudi.apache.org/docs/oss_hoodie.html -2020-04-12T16:50:50-04:00 +2020-04-21T18:50:50-04:00 https://hudi.apache.org/docs/oss_hoodie.html -2020-04-12T17:23:24-04:00 +2020-04-21T17:38:24-04:00 https://hudi.apache.org/cn/docs/quick-start-guide.html
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 9b2b8d4 Travis CI build asf-site 9b2b8d4 is described below commit 9b2b8d4fff6b6181a49a71c3799ed50fc0ef6bf5 Author: CI AuthorDate: Mon Apr 20 23:20:22 2020 + Travis CI build asf-site --- content/cn/docs/docker_demo.html | 2 +- content/docs/docker_demo.html| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/content/cn/docs/docker_demo.html b/content/cn/docs/docker_demo.html index 42abb0d..c74a272 100644 --- a/content/cn/docs/docker_demo.html +++ b/content/cn/docs/docker_demo.html @@ -455,7 +455,7 @@ This should pull the docker images from docker hub and setup docker cluster. HDFS Services (NameNode, DataNode) Spark Master and Worker Hive Services (Metastore, HiveServer2 along with PostgresDB) - Kafka Broker and a Zookeeper Node (Kakfa will be used as upstream source for the demo) + Kafka Broker and a Zookeeper Node (Kafka will be used as upstream source for the demo) Adhoc containers to run Hudi/Hive CLI commands diff --git a/content/docs/docker_demo.html b/content/docs/docker_demo.html index e08bfc4..c167fec 100644 --- a/content/docs/docker_demo.html +++ b/content/docs/docker_demo.html @@ -460,7 +460,7 @@ This should pull the docker images from docker hub and setup docker cluster. HDFS Services (NameNode, DataNode) Spark Master and Worker Hive Services (Metastore, HiveServer2 along with PostgresDB) - Kafka Broker and a Zookeeper Node (Kakfa will be used as upstream source for the demo) + Kafka Broker and a Zookeeper Node (Kafka will be used as upstream source for the demo) Adhoc containers to run Hudi/Hive CLI commands
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new abb90a58 Travis CI build asf-site abb90a58 is described below commit abb90a584bf640a3904c606d344a29e6e967f9f7 Author: CI AuthorDate: Thu Apr 16 18:43:21 2020 + Travis CI build asf-site --- content/cn/contributing.html | 2 +- content/contributing.html| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/content/cn/contributing.html b/content/cn/contributing.html index e053949..5d01a27 100644 --- a/content/cn/contributing.html +++ b/content/cn/contributing.html @@ -340,7 +340,7 @@ open source license https://www.apache.org/legal/resolved.html#criteria For involved changes, it’s best to also run the entire integration test suite using mvn clean install For website changes, please build the site locally test navigation, formatting links thoroughly If your code change changes some aspect of documentation (e.g new config, default value change), -please ensure there is another PR to https://github.com/apache/incubator-hudi/blob/asf-site/docs/README.md;>update the docs as well. +please ensure there is another PR to https://github.com/apache/incubator-hudi/tree/asf-site/README.md;>update the docs as well. Sending a Pull Request diff --git a/content/contributing.html b/content/contributing.html index cb1fc65..f0931a9 100644 --- a/content/contributing.html +++ b/content/contributing.html @@ -340,7 +340,7 @@ open source license https://www.apache.org/legal/resolved.html#criteria For involved changes, it’s best to also run the entire integration test suite using mvn clean install For website changes, please build the site locally test navigation, formatting links thoroughly If your code change changes some aspect of documentation (e.g new config, default value change), -please ensure there is another PR to https://github.com/apache/incubator-hudi/blob/asf-site/docs/README.md;>update the docs as well. +please ensure there is another PR to https://github.com/apache/incubator-hudi/tree/asf-site/README.md;>update the docs as well. Sending a Pull Request
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 93eef62 Travis CI build asf-site 93eef62 is described below commit 93eef6216fcb1fda0c2d238a21f3ca5500cf2b37 Author: CI AuthorDate: Sun Apr 12 12:27:07 2020 + Travis CI build asf-site --- content/assets/js/lunr/lunr-store.js | 10 + content/docs/oss_hoodie.html | 439 +++ content/sitemap.xml | 8 + 3 files changed, 457 insertions(+) diff --git a/content/assets/js/lunr/lunr-store.js b/content/assets/js/lunr/lunr-store.js index 9351ab6..70f1edb 100644 --- a/content/assets/js/lunr/lunr-store.js +++ b/content/assets/js/lunr/lunr-store.js @@ -534,6 +534,16 @@ var store = [{ "tags": [], "url": "https://hudi.apache.org/docs/docker_demo.html;, "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{ +"title": "OSS Filesystem", +"excerpt":"这个页面描述了如何让你的Hudi spark任务使用Aliyun OSS存储。 Aliyun OSS 部署 为了让Hudi使用OSS,需要增加两部分的配置: 为Hidi增加Aliyun OSS的相关配置 增加Jar包的MVN依赖 Aliyun OSS 相关的配置 新增下面的配置到你的Hudi能访问的core-site.xml文件。使用你的OSS bucket name替换掉fs.defaultFS,使用OSS endpoint地址替换fs.oss.endpoint,使用OSS的key和secret分别替换fs.oss.accessKeyId和fs.oss.accessKeySecret。主要Hudi就能读写相应的bucket。 property namefs.defaultFS/name valueoss://bucketname//value /property property namefs.oss.e [...] +"tags": [], +"url": "https://hudi.apache.org/docs/oss_hoodie.html;, +"teaser":"https://hudi.apache.org/assets/images/500x300.png"},{ +"title": "OSS Filesystem", +"excerpt":"In this page, we explain how to get your Hudi spark job to store into Aliyun OSS. Aliyun OSS configs There are two configurations required for Hudi-OSS compatibility: Adding Aliyun OSS Credentials for Hudi Adding required Jars to classpath Aliyun OSS Credentials Add the required configs in your core-site.xml from...","categories": [], +"tags": [], +"url": "https://hudi.apache.org/docs/oss_hoodie.html;, +"teaser":"https://hudi.apache.org/assets/images/500x300.png"},{ "title": "Quick-Start Guide", "excerpt":"本指南通过使用spark-shell简要介绍了Hudi功能。使用Spark数据源,我们将通过代码段展示如何插入和更新的Hudi默认存储类型数据集: 写时复制。每次写操作之后,我们还将展示如何读取快照和增量读取数据。 设置spark-shell Hudi适用于Spark-2.x版本。您可以按照此处的说明设置spark。 在提取的目录中,使用spark-shell运行Hudi: bin/spark-shell --packages org.apache.hudi:hudi-spark-bundle:0.5.0-incubating --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' 设置表名、基本路径和数据生成器来为本指南生成记录。 import org.apache.hudi.QuickstartUtils._ import scala.collection.JavaConversions._ import org.apache.spark.sql. [...] "tags": [], diff --git a/content/docs/oss_hoodie.html b/content/docs/oss_hoodie.html new file mode 100644 index 000..b998d3c --- /dev/null +++ b/content/docs/oss_hoodie.html @@ -0,0 +1,439 @@ + + + + + +OSS Filesystem - Apache Hudi + + + + + + +https://hudi.apache.org/docs/oss_hoodie.html;> + + + + + + + + + + + + + + + + + + + + + + + + + + document.documentElement.className = document.documentElement.className.replace(/\bno-js\b/g, '') + ' js '; + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Documentation + + Community + + Activities + + https://cwiki.apache.org/confluence/display/HUDI/FAQ; target="_blank" >FAQ + + Releases + + + Toggle menu + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Toggle Menu + + + + + Getting Started + + + + + + + + + + + + + Quick Start + + + + + + + + + + + Use Cases + + + + + + + + + + + Talks & Powered By + + + + + + + + + + + Comparison + + + + + + + + + + + Docker Demo + + + + + + + + + + Documentation + + + + + + + + + + + + + Concepts + + + + + + + + + + +
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new d99609b Travis CI build asf-site d99609b is described below commit d99609b344844dae0ecddbfc8ac56244a8f345ca Author: CI AuthorDate: Wed Apr 8 09:04:51 2020 + Travis CI build asf-site --- content/cn/community.html | 28 content/community.html| 8 +++- 2 files changed, 27 insertions(+), 9 deletions(-) diff --git a/content/cn/community.html b/content/cn/community.html index 55c85e2..cc54301 100644 --- a/content/cn/community.html +++ b/content/cn/community.html @@ -292,7 +292,7 @@ Committers are chosen by a majority vote of the Apache Hudi https://www https://avatars.githubusercontent.com/bhasudha; style="max-width: 100px" alt="bhasudha" align="middle" /> https://github.com/bhasudha;>Bhavani Sudha - Committer + PPMC, Committer bhavanisudha @@ -308,12 +308,6 @@ Committers are chosen by a majority vote of the Apache Hudi https://www kishoreg - https://avatars.githubusercontent.com/leesf; style="max-width: 100px" alt="leesf" align="middle" /> - https://github.com/leesf;>Shaofeng Li - Committer - leesf - - https://avatars.githubusercontent.com/lresende; style="max-width: 100px" alt="lresende" align="middle" /> https://github.com/lresende;>Luciano Resende PPMC, Committer @@ -332,6 +326,18 @@ Committers are chosen by a majority vote of the Apache Hudi https://www prasanna + https://avatars.githubusercontent.com/leesf; style="max-width: 100px" alt="leesf" align="middle" /> + https://github.com/leesf;>Shaofeng Li + PPMC, Committer + leesf + + + https://avatars.githubusercontent.com/nsivabalan; style="max-width: 100px" alt="nsivabalan" align="middle" /> + https://github.com/nsivabalan;>Sivabalan Narayanan + Committer + sivabalan + + https://avatars.githubusercontent.com/smarthi; style="max-width: 100px" alt="smarthi" align="middle" /> https://github.com/smarthi;>Suneel Marthi PPMC, Committer @@ -352,7 +358,7 @@ Committers are chosen by a majority vote of the Apache Hudi https://www https://avatars.githubusercontent.com/yanghua; style="max-width: 100px" alt="yanghua" /> https://github.com/yanghua;>vinoyang - Committer + PPMC, Committer vinoyang @@ -361,6 +367,12 @@ Committers are chosen by a majority vote of the Apache Hudi https://www PPMC, Committer zqureshi + + https://avatars.githubusercontent.com/lamber-ken; alt="lamber-ken" style="max-width: 100px;" align="middle" /> + https://github.com/lamber-ken;>lamber-ken + Committer + lamberken + diff --git a/content/community.html b/content/community.html index 83983f8..d83bebd 100644 --- a/content/community.html +++ b/content/community.html @@ -292,7 +292,7 @@ Committers are chosen by a majority vote of the Apache Hudi https://www https://avatars.githubusercontent.com/bhasudha; style="max-width: 100px" alt="bhasudha" align="middle" /> https://github.com/bhasudha;>Bhavani Sudha - Committer + PPMC, Committer bhavanisudha @@ -367,6 +367,12 @@ Committers are chosen by a majority vote of the Apache Hudi https://www PPMC, Committer zqureshi + + https://avatars.githubusercontent.com/lamber-ken; alt="lamber-ken" style="max-width: 100px;" align="middle" /> + https://github.com/lamber-ken;>lamber-ken + Committer + lamberken +
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 96f3d74 Travis CI build asf-site 96f3d74 is described below commit 96f3d746d491c06e67c494985452fd95d0b831ee Author: CI AuthorDate: Sun Mar 29 03:30:47 2020 + Travis CI build asf-site --- test-content/assets/js/lunr/lunr-store.js | 4 +- test-content/cn/docs/0.5.2-querying_data.html | 102 +- test-content/cn/docs/querying_data.html | 102 +- test-content/docs/0.5.2-querying_data.html| 3 +- test-content/docs/querying_data.html | 3 +- 5 files changed, 204 insertions(+), 10 deletions(-) diff --git a/test-content/assets/js/lunr/lunr-store.js b/test-content/assets/js/lunr/lunr-store.js index 1077cf3..9351ab6 100644 --- a/test-content/assets/js/lunr/lunr-store.js +++ b/test-content/assets/js/lunr/lunr-store.js @@ -435,7 +435,7 @@ var store = [{ "url": "https://hudi.apache.org/docs/0.5.2-writing_data.html;, "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{ "title": "查询 Hudi 数据集", -"excerpt":"从概念上讲,Hudi物理存储一次数据到DFS上,同时在其上提供三个逻辑视图,如之前所述。 数据集同步到Hive Metastore后,它将提供由Hudi的自定义输入格式支持的Hive外部表。一旦提供了适当的Hudi捆绑包, 就可以通过Hive、Spark和Presto之类的常用查询引擎来查询数据集。 具体来说,在写入过程中传递了两个由table name命名的Hive表。 例如,如果table name = hudi_tbl,我们得到 hudi_tbl 实现了由 HoodieParquetInputFormat 支持的数据集的读优化视图,从而提供了纯列式数据。 hudi_tbl_rt 实现了由 HoodieParquetRealtimeInputFormat 支持的数据集的实时视图,从而提供了基础数据和日志数据的合并视图。 如概念部分所述,增量处理所需要的 一个关键原语是增量拉取(以从数据集中获取更改流/日志)。您可以增量提取Hudi数据集,这意味着自指定的即时时间起, 您可� �只获得全部更新和新行。 这与插入更新一起使用,对于构建某 [...] +"excerpt":"从概念上讲,Hudi物理存储一次数据到DFS上,同时在其上提供三个逻辑视图,如之前所述。 数据集同步到Hive Metastore后,它将提供由Hudi的自定义输入格式支持的Hive外部表。一旦提供了适当的Hudi捆绑包, 就可以通过Hive、Spark和Presto之类的常用查询引擎来查询数据集。 具体来说,在写入过程中传递了两个由table name命名的Hive表。 例如,如果table name = hudi_tbl,我们得到 hudi_tbl 实现了由 HoodieParquetInputFormat 支持的数据集的读优化视图,从而提供了纯列式数据。 hudi_tbl_rt 实现了由 HoodieParquetRealtimeInputFormat 支持的数据集的实时视图,从而提供了基础数据和日志数据的合并视图。 如概念部分所述,增量处理所需要的 一个关键原语是增量拉取(以从数据集中获取更改流/日志)。您可以增量提取Hudi数据集,这意味着自指定的即时时间起, 您可� �只获得全部更新和新行。 这与插入更新一起使用,对于构建某 [...] "tags": [], "url": "https://hudi.apache.org/cn/docs/0.5.2-querying_data.html;, "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{ @@ -600,7 +600,7 @@ var store = [{ "url": "https://hudi.apache.org/docs/writing_data.html;, "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{ "title": "查询 Hudi 数据集", -"excerpt":"从概念上讲,Hudi物理存储一次数据到DFS上,同时在其上提供三个逻辑视图,如之前所述。 数据集同步到Hive Metastore后,它将提供由Hudi的自定义输入格式支持的Hive外部表。一旦提供了适当的Hudi捆绑包, 就可以通过Hive、Spark和Presto之类的常用查询引擎来查询数据集。 具体来说,在写入过程中传递了两个由table name命名的Hive表。 例如,如果table name = hudi_tbl,我们得到 hudi_tbl 实现了由 HoodieParquetInputFormat 支持的数据集的读优化视图,从而提供了纯列式数据。 hudi_tbl_rt 实现了由 HoodieParquetRealtimeInputFormat 支持的数据集的实时视图,从而提供了基础数据和日志数据的合并视图。 如概念部分所述,增量处理所需要的 一个关键原语是增量拉取(以从数据集中获取更改流/日志)。您可以增量提取Hudi数据集,这意味着自指定的即时时间起, 您可� �只获得全部更新和新行。 这与插入更新一起使用,对于构建某 [...] +"excerpt":"从概念上讲,Hudi物理存储一次数据到DFS上,同时在其上提供三个逻辑视图,如之前所述。 数据集同步到Hive Metastore后,它将提供由Hudi的自定义输入格式支持的Hive外部表。一旦提供了适当的Hudi捆绑包, 就可以通过Hive、Spark和Presto之类的常用查询引擎来查询数据集。 具体来说,在写入过程中传递了两个由table name命名的Hive表。 例如,如果table name = hudi_tbl,我们得到 hudi_tbl 实现了由 HoodieParquetInputFormat 支持的数据集的读优化视图,从而提供了纯列式数据。 hudi_tbl_rt 实现了由 HoodieParquetRealtimeInputFormat 支持的数据集的实时视图,从而提供了基础数据和日志数据的合并视图。 如概念部分所述,增量处理所需要的 一个关键原语是增量拉取(以从数据集中获取更改流/日志)。您可以增量提取Hudi数据集,这意味着自指定的即时时间起, 您可� �只获得全部更新和新行。 这与插入更新一起使用,对于构建某 [...] "tags": [], "url": "https://hudi.apache.org/cn/docs/querying_data.html;, "teaser":"https://hudi.apache.org/assets/images/500x300.png"},{ diff --git a/test-content/cn/docs/0.5.2-querying_data.html b/test-content/cn/docs/0.5.2-querying_data.html index 0f4a441..5d337d6 100644 --- a/test-content/cn/docs/0.5.2-querying_data.html +++ b/test-content/cn/docs/0.5.2-querying_data.html @@ -335,6 +335,12 @@ IN THIS PAGE + 查询引擎支持列表 + + 读优化表 + 实时表 + + Hive 读优化表 @@ -352,7 +358,7 @@ Presto Impala(此功能还未正式发布) - 读优化表 + 读优化表 @@ -377,6 +383,94 @@ 并与其他表(数据集/维度)结合以写出增量到目标Hudi数据集。增量视图是通过查询上表之一实现的,并具有特殊配置, 该特殊配置指示查询计划仅需要从数据集中获取增量数据。 +查询引擎支持列表 + +下面的表格展示了各查询引擎是否支持Hudi格式 + +读优化表 + + + + + 查询引擎 + 实时视图 + 增量拉取 + + + + + Hive + Y + Y + + + Spark SQL + Y + Y + + + Spark Datasource + Y + Y + + + Presto + Y + N + + + Impala + Y + N + + + + +实时表 + + + + + 查询引擎 + 实时视图 + 增量拉取 + 读优化表 + + + + + Hive + Y + Y + Y + + + Spark SQL + Y + Y + Y + +
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 6712fb2 Travis CI build asf-site 6712fb2 is described below commit 6712fb2ee460ccd98ff6869f6de22a3ac6961819 Author: CI AuthorDate: Thu Mar 26 02:07:43 2020 + Travis CI build asf-site --- test-content/cn/releases.html | 2 +- test-content/releases.html| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/test-content/cn/releases.html b/test-content/cn/releases.html index ed77b36..a48f73c 100644 --- a/test-content/cn/releases.html +++ b/test-content/cn/releases.html @@ -245,7 +245,7 @@ Raw Release Notes -The raw release notes are available https://issues.apache.org/jira/projects/HUDI/versions/12346606#release-report-tab-body;>here +The raw release notes are available https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822version=12346606;>here https://github.com/apache/incubator-hudi/releases/tag/release-0.5.1-incubating;>Release 0.5.1-incubating (docs) diff --git a/test-content/releases.html b/test-content/releases.html index 9f97bff..303ba95 100644 --- a/test-content/releases.html +++ b/test-content/releases.html @@ -252,7 +252,7 @@ Raw Release Notes -The raw release notes are available https://issues.apache.org/jira/projects/HUDI/versions/12346606#release-report-tab-body;>here +The raw release notes are available https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822version=12346606;>here https://github.com/apache/incubator-hudi/releases/tag/release-0.5.1-incubating;>Release 0.5.1-incubating (docs)
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 3353180 Travis CI build asf-site 3353180 is described below commit 33531804037eb843d4939b5f8c3779aaf0fc5a25 Author: CI AuthorDate: Wed Mar 25 23:44:20 2020 + Travis CI build asf-site --- test-content/cn/docs/powered_by.html | 2 +- test-content/docs/powered_by.html| 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/test-content/cn/docs/powered_by.html b/test-content/cn/docs/powered_by.html index 5866edd..f852d64 100644 --- a/test-content/cn/docs/powered_by.html +++ b/test-content/cn/docs/powered_by.html @@ -351,7 +351,7 @@ Hudi还支持几个增量的Hive ETL管道,并且目前已集成到Uber的数 Yields.io -Yields.io是第一个使用AI在企业范围内进行自动模型验证和实时监控的金融科技平台。他们的数据湖由Hudi管理,他们还积极使用Hudi为增量式、跨语言/平台机器学习构建基础架构。 +https://www.yields.io/Blog/Apache-Hudi-at-Yields;>Yields.io是第一个使用AI在企业范围内进行自动模型验证和实时监控的金融科技平台。他们的数据湖由Hudi管理,他们还积极使用Hudi为增量式、跨语言/平台机器学习构建基础架构。 Yotpo diff --git a/test-content/docs/powered_by.html b/test-content/docs/powered_by.html index 86e6f32..e5b52e3 100644 --- a/test-content/docs/powered_by.html +++ b/test-content/docs/powered_by.html @@ -355,7 +355,7 @@ offering, providing means for AWS users to perform record-level updates/deletes Yields.io -Yields.io is the first FinTech platform that uses AI for automated model validation and real-time monitoring on an enterprise-wide scale. Their data lake is managed by Hudi. They are also actively building their infrastructure for incremental, cross language/platform machine learning using Hudi. +Yields.io is the first FinTech platform that uses AI for automated model validation and real-time monitoring on an enterprise-wide scale. Their https://www.yields.io/Blog/Apache-Hudi-at-Yields;>data lake is managed by Hudi. They are also actively building their infrastructure for incremental, cross language/platform machine learning using Hudi. Yotpo
[incubator-hudi] branch asf-site updated: Travis CI build asf-site
This is an automated email from the ASF dual-hosted git repository. vinoth pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/incubator-hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new cfb5b00 Travis CI build asf-site cfb5b00 is described below commit cfb5b0005f2907bd95d2c1f04915c1a74c054e15 Author: CI AuthorDate: Wed Mar 25 08:43:29 2020 + Travis CI build asf-site --- test-content/cn/docs/powered_by.html | 13 + test-content/docs/powered_by.html| 6 ++ 2 files changed, 19 insertions(+) diff --git a/test-content/cn/docs/powered_by.html b/test-content/cn/docs/powered_by.html index b695608..5866edd 100644 --- a/test-content/cn/docs/powered_by.html +++ b/test-content/cn/docs/powered_by.html @@ -391,6 +391,19 @@ June 2019, SF Big Analytics Meetup, San Mateo, CA https://docs.google.com/presentation/d/1FHhsvh70ZP6xXlHdVsAI0g__B_6Mpto5KQFlZ0b8-mM;>“Apache Hudi (Incubating) - The Past, Present and Future Of Efficient Data Lake Architectures” - By Vinoth Chandar Balaji Varadarajan September 2019, ApacheCon NA 19, Las Vegas, NV, USA + +https://www.portal.reinvent.awsevents.com/connect/sessionDetail.ww?SESSION_ID=98662csrftkn=YS67-AG7B-QIAV-ZZBK-E6TT-MD4Q-1HEP-747P;>“Insert, upsert, and delete data in Amazon S3 using Amazon EMR” - By Paul Codding Vinoth Chandar +December 2019, AWS re:Invent 2019, Las Vegas, NV, USA + + +https://www.slideshare.net/SyedKather/building-robust-cdc-pipeline-with-apache-hudi-and-debezium;>“Building Robust CDC Pipeline With Apache Hudi And Debezium” - By Pratyaksh, Purushotham, Syed and Shaik December 2019, Hadoop Summit Bangalore, India + + +https://drive.google.com/open?id=1dmH2kWJF69PNdifPp37QBgjivOHaSLDn;>“Using Apache Hudi to build the next-generation data lake and its application in medical big data” - By JingHuang Leesf March 2020, Apache Hudi Apache Kylin Online Meetup, China + + +https://drive.google.com/open?id=1Pk_WdFxfEZxMMfAOn0R8-m3ALkcN6G9e;>“Building a near real-time, high-performance data warehouse based on Apache Hudi and Apache Kylin” - By ShaoFeng Shi March 2020, Apache Hudi Apache Kylin Online Meetup, China + 文章 diff --git a/test-content/docs/powered_by.html b/test-content/docs/powered_by.html index b1dda5a..86e6f32 100644 --- a/test-content/docs/powered_by.html +++ b/test-content/docs/powered_by.html @@ -406,6 +406,12 @@ December 2019, AWS re:Invent 2019, Las Vegas, NV, USA https://www.slideshare.net/SyedKather/building-robust-cdc-pipeline-with-apache-hudi-and-debezium;>“Building Robust CDC Pipeline With Apache Hudi And Debezium” - By Pratyaksh, Purushotham, Syed and Shaik December 2019, Hadoop Summit Bangalore, India + +https://drive.google.com/open?id=1dmH2kWJF69PNdifPp37QBgjivOHaSLDn;>“Using Apache Hudi to build the next-generation data lake and its application in medical big data” - By JingHuang Leesf March 2020, Apache Hudi Apache Kylin Online Meetup, China + + +https://drive.google.com/open?id=1Pk_WdFxfEZxMMfAOn0R8-m3ALkcN6G9e;>“Building a near real-time, high-performance data warehouse based on Apache Hudi and Apache Kylin” - By ShaoFeng Shi March 2020, Apache Hudi Apache Kylin Online Meetup, China + Articles