Re: [PR] DOCS- update blogs for new content [hudi]
bhasudha commented on code in PR #10383: URL: https://github.com/apache/hudi/pull/10383#discussion_r1435780600 ## website/blog/2023-12-01-Getting-started-with-Apache-Hudi.mdx: ## @@ -0,0 +1,20 @@ +--- +title: "Getting started with Apache Hudi" +excerpt: "Getting started with Apache Hudi" +author: DataCouch +category: blog +image: /assets/images/blog/2023-12-01-Getting-started-with-Apache-Hudi.png +tags: +- apache hudi +- spark Review Comment: apache spark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS- update blogs for new content [hudi]
bhasudha commented on code in PR #10383: URL: https://github.com/apache/hudi/pull/10383#discussion_r1435780699 ## website/blog/2023-12-06-Apache-Hudi-From-Zero-To-One-blog-7.mdx: ## @@ -0,0 +1,26 @@ +--- +title: "Apache Hudi: From Zero To One (7/10)" +excerpt: "Concurrently run writers and table services" +author: Shiyan Xu +category: blog +image: /assets/images/blog/2023-12-06-Apache-Hudi-From-Zero-To-One-blog-7.png +tags: +- blog +- apache hudi +- concurrency +- dataumagic Review Comment: dataumagic -> datumagic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS- update blogs for new content [hudi]
bhasudha commented on code in PR #10383: URL: https://github.com/apache/hudi/pull/10383#discussion_r1435780255 ## website/blog/2023-11-28-Apache-Hudi-Part-1-History-Getting-Started.mdx: ## @@ -0,0 +1,21 @@ +--- +title: "Apache Hudi (Part 1): History, Getting Started" +excerpt: "Apache Hudi (Part 1): History, Getting Started" +author: Dipankar Mazumdar +category: blog +image: /assets/images/blog/2023-11-28-Apache-Hudi-Part-1-History-Getting-Started.png Review Comment: Looks like the image file is missed from commit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS- update blogs for new content [hudi]
bhasudha commented on code in PR #10383: URL: https://github.com/apache/hudi/pull/10383#discussion_r1435778261 ## website/blog/2023-11-22-Introducing-Apache-Hudi-support-with-AWS-Glue-crawlers.mdx: ## @@ -0,0 +1,16 @@ +--- +title: "Introducing Apache Hudi support with AWS Glue crawlers" +excerpt: "Introducing Apache Hudi support with AWS Glue crawlers" +author: Noritaka Sekiyama, Kyle Duong, Sandeep Adwankar +category: blog +image: /assets/images/blog/2023-11-22-Introducing-Apache-Hudi-support-with-AWS-Glue-crawlers.png +tags: +- apache hudi +- aws Review Comment: Since there is aws glue we can remove this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS- update blogs for new content [hudi]
bhasudha commented on code in PR #10383: URL: https://github.com/apache/hudi/pull/10383#discussion_r1435778120 ## website/blog/2023-11-19-Hudi-Streamer-DeltaStreamer-Hands-On-Guide-Local-Ingestion-from-Parquet-Source.mdx: ## @@ -0,0 +1,20 @@ +--- +title: "Hudi Streamer (Delta Streamer) Hands-On Guide: Local Ingestion from Parquet Source" +excerpt: "Hudi Streamer (Delta Streamer) Hands-On Guide: Local Ingestion from Parquet Source" +author: Soumil Shah +category: blog +image: /assets/images/blog/2023-11-19-Hudi-Streamer-DeltaStreamer-Hands-On-Guide-Local-Ingestion-from-Parquet-Source.png +tags: +- apache hudi +- hudi streamer +- how-to +- parquet Review Comment: apache parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS- update blogs for new content [hudi]
bhasudha commented on code in PR #10383: URL: https://github.com/apache/hudi/pull/10383#discussion_r1435777975 ## website/blog/2023-11-14-What-is-an-Open-Table-Format-and-Why-to-use-one.mdx: ## @@ -0,0 +1,19 @@ +--- +title: "What is an Open Table Format? & Why to use one?" Review Comment: Lets remove this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS- update blogs for new content [hudi]
bhasudha commented on code in PR #10383: URL: https://github.com/apache/hudi/pull/10383#discussion_r1435777964 ## website/blog/2023-11-14-What-is-an-Open-Table-Format-and-Why-to-use-one.mdx: ## @@ -0,0 +1,19 @@ +--- +title: "What is an Open Table Format? & Why to use one?" Review Comment: Skip this. This is not talking much about Hudi. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS- update blogs for new content [hudi]
bhasudha commented on code in PR #10383: URL: https://github.com/apache/hudi/pull/10383#discussion_r1435777589 ## website/blog/2023-11-13-Apache-Hudi-From-Zero-To-One-blog-6.mdx: ## @@ -0,0 +1,27 @@ +--- +title: "Apache Hudi: From Zero To One (6/10)" +excerpt: "Demystify clustering and space-filling curves" +author: Shiyan Xu +category: blog +image: /assets/images/blog/2023-11-16-Apache-Hudi-From-Zero-To-One-blog-6.png Review Comment: 2023-11-13 for the image name? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS- update blogs for new content [hudi]
bhasudha commented on code in PR #10383: URL: https://github.com/apache/hudi/pull/10383#discussion_r1435776395 ## website/blog/2023-09-22-Exploring-the-Architecture-of-Apache-Iceberg-Delta-Lake-and-Apache-Hudi.mdx: ## @@ -0,0 +1,21 @@ +--- +title: "Exploring the Architecture of Apache Iceberg, Delta Lake, and Apache Hudi" +excerpt: "Exploring the Architecture of Apache Iceberg, Delta Lake, and Apache Hudi" +author: Alex Merced +category: blog +image: /assets/images/blog/2023-09-22-Exploring-the-Architecture-of-Apache-Iceberg-Delta-Lake-and-Apache-Hudi.png +tags: +- apache hudi +- apache iceberg +- blog +- apache hudi Review Comment: repeated? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7140] [DNM] Trial Patch to test CI run [hudi]
hudi-bot commented on PR #10176: URL: https://github.com/apache/hudi/pull/10176#issuecomment-1868450660 ## CI report: * d1a43dc3694b6a51aa830fe2b78340503c6909b5 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21688) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS- update blogs for new content [hudi]
bhasudha commented on code in PR #10383: URL: https://github.com/apache/hudi/pull/10383#discussion_r1435775000 ## website/blog/2023-09-06-Apache-Hudi-From-Zero-To-One-blog-2.mdx: ## @@ -0,0 +1,31 @@ +--- +title: "Apache Hudi: From Zero To One (2/10)" +excerpt: "Dive into read operation flow and query types" +author: Shiyan Xu +category: blog +image: /assets/images/blog/2023-09-06-Apache-Hudi-From-Zero-To-One-blog-2.png +tags: +- blog +- apache hudi +- query types Review Comment: `query types` -> `queries` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS- update blogs for new content [hudi]
bhasudha commented on code in PR #10383: URL: https://github.com/apache/hudi/pull/10383#discussion_r1435774973 ## website/blog/2023-09-06-Apache-Hudi-From-Zero-To-One-blog-2.mdx: ## @@ -0,0 +1,31 @@ +--- +title: "Apache Hudi: From Zero To One (2/10)" +excerpt: "Dive into read operation flow and query types" +author: Shiyan Xu +category: blog +image: /assets/images/blog/2023-09-06-Apache-Hudi-From-Zero-To-One-blog-2.png +tags: +- blog +- apache hudi +- query types +- read operations Review Comment: change to `reads`. Lets not introduce word families as tags for the same word. Lets try to keep it to similar tag we have done that already. Pease ensure you always refer to the all tags to find the closest tag that already exists unless absolutely needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS- update blogs for new content [hudi]
bhasudha commented on code in PR #10383: URL: https://github.com/apache/hudi/pull/10383#discussion_r1435774727 ## website/blog/2023-09-06-Apache-Hudi-From-Zero-To-One-blog-2.mdx: ## @@ -0,0 +1,31 @@ +--- +title: "Apache Hudi: From Zero To One (2/10)" +excerpt: "Dive into read operation flow and query types" +author: Shiyan Xu +category: blog +image: /assets/images/blog/2023-09-06-Apache-Hudi-From-Zero-To-One-blog-2.png +tags: +- blog +- apache hudi +- query types +- read operations +- datumagic +- spark Review Comment: qualify fully -> `apache spark` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS- update blogs for new content [hudi]
bhasudha commented on code in PR #10383: URL: https://github.com/apache/hudi/pull/10383#discussion_r1435770831 ## website/blog/2022-12-09-Apache-Hudi-2022-A-year-in-Review.md: ## Review Comment: I was mistaken. Seems like this blog is already there - https://hudi.apache.org/blog/2022/12/29/Apache-Hudi-2022-A-Year-In-Review under 2022-12-29 date. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [HUDI-7140] [DNM] Trial Patch to test CI run [hudi]
hudi-bot commented on PR #10176: URL: https://github.com/apache/hudi/pull/10176#issuecomment-1868430646 ## CI report: * 73914cebbda35a22a2ede05065732c6bc9e03448 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21635) * d1a43dc3694b6a51aa830fe2b78340503c6909b5 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21688) Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
svn commit: r66288 - in /dev/hudi/hudi-0.14.1-rc1: hudi-0.14.1-rc1.src.tgz hudi-0.14.1-rc1.src.tgz.asc hudi-0.14.1-rc1.src.tgz.sha512
Author: sivabalan Date: Sun Dec 24 04:25:05 2023 New Revision: 66288 Log: Adding rc1 source release Modified: dev/hudi/hudi-0.14.1-rc1/hudi-0.14.1-rc1.src.tgz dev/hudi/hudi-0.14.1-rc1/hudi-0.14.1-rc1.src.tgz.asc dev/hudi/hudi-0.14.1-rc1/hudi-0.14.1-rc1.src.tgz.sha512 Modified: dev/hudi/hudi-0.14.1-rc1/hudi-0.14.1-rc1.src.tgz == Binary files - no diff available. Modified: dev/hudi/hudi-0.14.1-rc1/hudi-0.14.1-rc1.src.tgz.asc == --- dev/hudi/hudi-0.14.1-rc1/hudi-0.14.1-rc1.src.tgz.asc (original) +++ dev/hudi/hudi-0.14.1-rc1/hudi-0.14.1-rc1.src.tgz.asc Sun Dec 24 04:25:05 2023 @@ -1,16 +1,16 @@ -BEGIN PGP SIGNATURE- -iQIzBAABCAAdFiEErNUqBmM9s7LH0OpWQsotPtWJUSIFAmWD5yoACgkQQsotPtWJ -USL7Hg//VhxqeYYSIHoJbPb0VLgVy+MPd9QlKzhT507Mcey6+93Y3gFefIlnoocv -r69krqu/uY4CrEPTSZRNzlBp0ChOm4NUa8Ws6sYVNQ/3ihBrmbgIzxmJV9w8e2Yd -EDeD/a30G3ebeoqPSoOwzBQP6YpyYiJFL9l3sFjptB7IdmzjxZVyVQixDPJoBDT/ -F80Hg0DexK2ZUtTtZcRuwVMnfbDtQDMAhR5FnkIhOtn+7tsifLtlt2KiRCYrLc6L -vL+zZiPU/iq5nPgltErmJKHaaX/3QzCsx++QqQuoMnIqsE7LBmmrw6jANTv4Cwna -eDPMoBHZoC+eAJR7WdQOsYGNe/lGvYiKy68gkRqqhbHE/NDWhZ1Kvx4GwGSpAsAv -GWekBlh+2rx3wFTurmLpzQ9w8mxfy3vJqRxWPtOaEuxhsF3D/BpQoJc8oxoeVl9d -r4M30Xrn3s6wWW8P8DbdEbJr6K6xwQh5WSU2+s3IEDRvbqHrGXw4Rkngj1QQuGze -hbh+EQLIV8J+COEvfvEgaM8iX6obLEsHISqNZCCJOkojujzZHdVBPYYeGjvC5cLi -cxIF+JG47Q1TxVrt//lKVMw4R5BFseJcG8R3BWBeFvo6bdt1ZKscCQuRKJb6eNjf -mUgQYPcTKozCratbifopMZ+8O98mKlWeO7wQCZjCS1sYfw8ABsU= -=tLVq +iQIzBAABCAAdFiEErNUqBmM9s7LH0OpWQsotPtWJUSIFAmWHogYACgkQQsotPtWJ +USLssg/+NNPHJhgP7vf49irD8rL1z5cb22iaegnRLdxg3PYdHHHILSuruF6E8+iL +4T83MNuKFNrWAtWO6SyHAyTebjri+9dxmtqzqyksjLf2qF5s8opTMStLMoVLksMu +tQalrmPmkIxLsHpmD62xgxeKvP4jMM/lKZtmD6mlK1pLFCFF/DGZ2hfk5pnit6KO +2zU7l2dFHjNF+4/WZStzX80fFUAGDuCkZAfwQSxaMKRGTcb+kiM3FgMfCvh4O/hx +siS9EX1x78cLhUymihohkmswrfz6hJc6ykD8Jm5DAvnl2oLbNzyi3NR5JAZRe3bT ++MxF7TsmFCHnRVIBWgYQZ1FjMMavosWaSrN9I1eq6NEnY5xaIdif+w4n81XEJYxb +Vecrm0ZlTSrCS2ydVoNbZVy0EraOxlMLkPubz6XOezQVmREV05xJIX9RVf8WTtAt +tkBsskKTMYNjJpr3rjfn1YsgpiqvFn0d5UhQ/vPE8cJ5TGGzDscLxrpSViLNSG4d +UW3cWfl0QCnqbhXhc4PjdF9+bDzVkT1y3bHrJ1oYbVisIj3Q8YGX1mQSy0t/N+Ky +ESySh31dofmT3CVARzSWbTfyK53oTsZYDb+BWBWUziectgue36tlEw6Gr00BcPwr +k1p6gYL/CFvDPZJK1JMzy7KVF+CVYABRLtiaKL/WcGd2H6jyEsY= +=4T0B -END PGP SIGNATURE- Modified: dev/hudi/hudi-0.14.1-rc1/hudi-0.14.1-rc1.src.tgz.sha512 == --- dev/hudi/hudi-0.14.1-rc1/hudi-0.14.1-rc1.src.tgz.sha512 (original) +++ dev/hudi/hudi-0.14.1-rc1/hudi-0.14.1-rc1.src.tgz.sha512 Sun Dec 24 04:25:05 2023 @@ -1 +1 @@ -ca9facc49f462008a84bd6ceb6ae8170a10f49d8b0af6fa4aa8058676fabf77d8931005c6cb56be86ca941567b1ce1d551fd1d06e72d526ce7e8ab26a3d59b3b hudi-0.14.1-rc1.src.tgz +4940fe3c108f9899a3fa1da543990fe88254b158c104d09d9eec86bf69375a4a29909c2cb6d377dcb070242021f87237cd232c6dfd27c6247135cdb912626e42 hudi-0.14.1-rc1.src.tgz
Re: [PR] [HUDI-7140] [DNM] Trial Patch to test CI run [hudi]
hudi-bot commented on PR #10176: URL: https://github.com/apache/hudi/pull/10176#issuecomment-1868429824 ## CI report: * 73914cebbda35a22a2ede05065732c6bc9e03448 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=21635) * d1a43dc3694b6a51aa830fe2b78340503c6909b5 UNKNOWN Bot commands @hudi-bot supports the following commands: - `@hudi-bot run azure` re-run the last Azure build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) annotated tag release-0.14.1-rc1 updated (52309055f0c -> e3990f4860d)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to annotated tag release-0.14.1-rc1 in repository https://gitbox.apache.org/repos/asf/hudi.git *** WARNING: tag release-0.14.1-rc1 was modified! *** from 52309055f0c (commit) to e3990f4860d (tag) tagging 52309055f0ccac2f860c9f784e0610095f7d5d1d (commit) replaces release-0.14.0 by sivabalan on Sat Dec 23 19:51:29 2023 -0800 - Log - 0.14.1 -BEGIN PGP SIGNATURE- iQIzBAABCAAdFiEErNUqBmM9s7LH0OpWQsotPtWJUSIFAmWHqsEACgkQQsotPtWJ USLdow//Xv5WthkSvB0lXewJCzx9BLhYQF3bSjzx42OXyAs4exThiQF+F8CD7Ny+ HCLc0lP2CcE1w4P2Fd2uz+aZD3fAMasRWyyM+dH3zpbGKtpfHq3WG7fBLxCxw0eH naBZOaT19IW0jleASlcKu4UVGVQGQGFmk8U3gSQxkraoUneMMuVKLl98KNpJ3YS1 PshXuZv/CFLybEQcbf0h0/PcexLs4SiGqxiKG79bqaGH6gROmwv+po+5EgZfU0ej q4NKHL7UVYbndgFciz+JUZPlMT/N+wOK4ygR7WPTZ2pEdrvInfhU3MJDojbRdDum JcXrAaPau5PsDElolTGhH1+rCQ0JBa0G/Sdf2SRAYNNUym4BbJEDOTnMm4ZfVYdZ MQB3+zGwMXztzbiLKi05jLOR4sYxLD4FVcV2oowrqUP9JMZekGBoOuuAc/spzjsj mj3/NA54hEA14g6Duy9ln9v6GOFsP1MQV7eMYV1H9mcbMqg8tGPol5lLheUqdOy7 avp572XnAEJC+YgyOXXN8Wk2cDelAouB7CiVP4qzAHA6qX7bxuTen1ppitE/O1Vb jV+JcbqQH3cBmr2akTWEkTmf1oxPcJEAa6yi0XEPDDeMJZ7MpPqdo2+LTN/jiuA7 00wjVdaA/Uj3h0rejkCyNnp/jz0JBj/0y8YrIe2h7vyweJnO8jg= =cXcF -END PGP SIGNATURE- --- No new revisions were added by this update. Summary of changes:
(hudi) annotated tag release-0.14.1-rc1 deleted (was 4e883eb3881)
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a change to annotated tag release-0.14.1-rc1 in repository https://gitbox.apache.org/repos/asf/hudi.git *** WARNING: tag release-0.14.1-rc1 was deleted! *** tag was 4e883eb3881 The revisions that were on this annotated tag are still contained in other references; therefore, this change does not discard any commits from the repository.
(hudi) branch release-0.14.1 updated: Revert "Add cachedSchema per batch, fix idempotency with getSourceSchema calls"
This is an automated email from the ASF dual-hosted git repository. sivabalan pushed a commit to branch release-0.14.1 in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/release-0.14.1 by this push: new 52309055f0c Revert "Add cachedSchema per batch, fix idempotency with getSourceSchema calls" 52309055f0c is described below commit 52309055f0ccac2f860c9f784e0610095f7d5d1d Author: sivabalan AuthorDate: Sat Dec 23 18:59:55 2023 -0800 Revert "Add cachedSchema per batch, fix idempotency with getSourceSchema calls" This reverts commit dff42eb468cafe43e9208c0ae738c91184ded673. --- .../utilities/schema/FilebasedSchemaProvider.java | 29 + .../hudi/utilities/schema/SchemaProvider.java | 5 --- .../utilities/schema/SchemaRegistryProvider.java | 36 +- .../apache/hudi/utilities/streamer/StreamSync.java | 5 +-- .../schema/TestSchemaRegistryProvider.java | 20 5 files changed, 16 insertions(+), 79 deletions(-) diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/FilebasedSchemaProvider.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/FilebasedSchemaProvider.java index 9dbf66325d7..3ca97b01f95 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/FilebasedSchemaProvider.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/FilebasedSchemaProvider.java @@ -45,11 +45,6 @@ public class FilebasedSchemaProvider extends SchemaProvider { private final FileSystem fs; - private final String sourceFile; - private final String targetFile; - private final boolean shouldSanitize; - private final String invalidCharMask; - protected Schema sourceSchema; protected Schema targetSchema; @@ -57,21 +52,18 @@ public class FilebasedSchemaProvider extends SchemaProvider { public FilebasedSchemaProvider(TypedProperties props, JavaSparkContext jssc) { super(props, jssc); checkRequiredConfigProperties(props, Collections.singletonList(FilebasedSchemaProviderConfig.SOURCE_SCHEMA_FILE)); -this.sourceFile = getStringWithAltKeys(props, FilebasedSchemaProviderConfig.SOURCE_SCHEMA_FILE); -this.targetFile = getStringWithAltKeys(props, FilebasedSchemaProviderConfig.TARGET_SCHEMA_FILE, sourceFile); -this.shouldSanitize = SanitizationUtils.shouldSanitize(props); -this.invalidCharMask = SanitizationUtils.getInvalidCharMask(props); +String sourceFile = getStringWithAltKeys(props, FilebasedSchemaProviderConfig.SOURCE_SCHEMA_FILE); +boolean shouldSanitize = SanitizationUtils.shouldSanitize(props); +String invalidCharMask = SanitizationUtils.getInvalidCharMask(props); this.fs = FSUtils.getFs(sourceFile, jssc.hadoopConfiguration(), true); -this.sourceSchema = parseSchema(this.sourceFile); +this.sourceSchema = readAvroSchemaFromFile(sourceFile, this.fs, shouldSanitize, invalidCharMask); if (containsConfigProperty(props, FilebasedSchemaProviderConfig.TARGET_SCHEMA_FILE)) { - this.targetSchema = parseSchema(this.targetFile); + this.targetSchema = readAvroSchemaFromFile( + getStringWithAltKeys(props, FilebasedSchemaProviderConfig.TARGET_SCHEMA_FILE), + this.fs, shouldSanitize, invalidCharMask); } } - private Schema parseSchema(String schemaFile) { -return readAvroSchemaFromFile(schemaFile, this.fs, shouldSanitize, invalidCharMask); - } - @Override public Schema getSourceSchema() { return sourceSchema; @@ -95,11 +87,4 @@ public class FilebasedSchemaProvider extends SchemaProvider { } return SanitizationUtils.parseAvroSchema(schemaStr, sanitizeSchema, invalidCharMask); } - - // Per write batch, refresh the schemas from the file - @Override - public void refresh() { -this.sourceSchema = parseSchema(this.sourceFile); -this.targetSchema = parseSchema(this.targetFile); - } } diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaProvider.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaProvider.java index 5c8ca8f6c1b..2410798d355 100644 --- a/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaProvider.java +++ b/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaProvider.java @@ -56,9 +56,4 @@ public abstract class SchemaProvider implements Serializable { // by default, use source schema as target for hoodie table as well return getSourceSchema(); } - - //every schema provider has the ability to refresh itself, which will mean something different per provider. - public void refresh() { - - } } diff --git a/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaRegistryProvider.java b/hudi-utilities/src/main/java/org/apache/hudi/utilities/schema/SchemaRegistryProvider.java index f31e867e96e..c3541e6aab0 100644 ---
Re: [PR] [HUDI-3016][RFC-43] Proposal to implement Table Service Manager [hudi]
zyclove commented on PR #4309: URL: https://github.com/apache/hudi/pull/4309#issuecomment-1868420871 @xushiyan @yuzhaojing @danny0405 Hi, Can version 1.0 support this feature? This feature is very necessary. Please push forward the progress. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Bulk insert does not handle hoodie metrics [hudi]
parisni commented on issue #10395: URL: https://github.com/apache/hudi/issues/10395#issuecomment-1868406313 so I did take a look into the code, and I don't see reason why bulk-insert operation would not report metrics. As other operation it goes into that path https://github.com/apache/hudi/blob/c2da8aaa5fadb1b3984f6fde2a034c806b501fc5/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/DefaultSource.scala#L146 which among all reports the metrics https://github.com/apache/hudi/blob/c2da8aaa5fadb1b3984f6fde2a034c806b501fc5/hudi-client/hudi-client-common/src/main/java/org/apache/hudi/metrics/Metrics.java#L118 I also can confirm when I set `hoodie.metrics.reporter.type=CONSOLE` ,bulk_insert logs the metrics. The problem comes when using datadog as a reporter, it works fine with any operation except bulk_insert. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch asf-site updated: DOCS-added-video-content (#10385)
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new 1c88cf3e397 DOCS-added-video-content (#10385) 1c88cf3e397 is described below commit 1c88cf3e39746f03cf6db35c115a5feb716cd53e Author: nadine farah AuthorDate: Sat Dec 23 08:45:22 2023 -0800 DOCS-added-video-content (#10385) * initial commit for content video add added soumil's videos updated tags and added videos updated author tags * More fixes - change mdx to md format - fix tag inconsistencies - delete irrelevant guides - change thumbnails - Co-authored-by: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com> --- README.md | 4 +++- ...-Serverless-Architecture-in-Hudi-Data-Lakes.png | Bin 0 -> 247259 bytes ...Guide-Local-Ingestion-from-Parquet-Source-1.png | Bin 0 -> 122866 bytes ...-On-Guide-Local-Ingestion-from-CSV-Source-2.png | Bin 0 -> 122377 bytes ...bles-using-Hudi-MultiTable-Delta-Streamer-3.png | Bin 0 -> 119607 bytes ...l-from-Postgres-to-Hudi-using-deltastreamer.png | Bin 0 -> 842209 bytes ...mer-in-continous-Mode-and-SQL-transformer-5.png | Bin 0 -> 127113 bytes ...ngest-data-from-Kafka-Topic-Hands-on-Labs-6.png | Bin 0 -> 122382 bytes .../video_blogs/2023-11-24-hudi-table-types.png| Bin 0 -> 323601 bytes ...zium-kafka-schema-registry-deltastreamer-7a.png | Bin 0 -> 230759 bytes ...zium-kafka-schema-registry-deltastreamer-7b.png | Bin 0 -> 55192 bytes ...tadata-table-Record-Level-Index-HBase-Index.png | Bin 0 -> 114468 bytes ...Streamer-in-Continuous-Mode-Hands-on-Labs-8.png | Bin 0 -> 126485 bytes ...ache-Hudi-DeltaStreamer-with-Hands-on-Lab-9.png | Bin 0 -> 380565 bytes ...-in-Incremental-Fashion-Bronze-to-Silver-10.png | Bin 0 -> 407136 bytes ...r-on-Local-Machine-for-Begineers-Easy-Setup.png | Bin 0 -> 124505 bytes ...ift-Server-and-Hudi-with-Beeline-in-Minutes.png | Bin 0 -> 125195 bytes ...ng-and-AvroKafkaSource-Consumption-11-Guide.png | Bin 0 -> 390151 bytes ...Data-using-Hue-and-Presto-CLI-Hands-on-Labs.png | Bin 0 -> 124038 bytes ...0-14-and-RLI-on-AWS-Glue-Step-by-Step-Guide.png | Bin 0 -> 123948 bytes ...d_Real_Time_Apache_Hudi_Transaction_Datalake.md | 2 +- ...eltastreamer_and_AWS_DMS_Hands_on_Lab_Part_1.md | 2 +- ...eltastreamer_and_AWS_DMS_Hands_on_Lab_Part_2.md | 2 +- ...eltastreamer_and_AWS_DMS_Hands_on_Lab_Part_3.md | 2 +- ...eltastreamer_and_AWS_DMS_Hands_on_Lab_Part_4.md | 2 +- ...eltastreamer_and_AWS_DMS_Hands_on_Lab_Part_5.md | 2 +- ...R_Serverless_Hands_on_Lab_step_by_step_guide.md | 2 +- ...t_Driven_Approach_using_Lambdas_Event_Bridge.md | 2 +- ...t_Apache_Hudi_Transformers_with_Hands_on_Lab.md | 2 +- ...c-Data-Platforms-Like-a-Pro-Final-Part-Demo.md} | 1 - ...m-Data-Processing-with-Python-Hands-on-Labs.md} | 0 ...d-Apache-Flink-Hands-on-Guide-for-Beginners.md} | 0 ...n-S3-with-Apache-Flink-CDC-Connector-Python.md} | 0 ...ional-Datalakes-on-S3-using-PyFLink-Locally.md} | 0 ...nerating-Primary-Keys-for-Modern-Data-Lakes.md} | 0 ...h-DynamoDB-for-Faster-Commit-Time-Retrieval.md} | 0 ...16-Hudi-0-14-0-Deep-Dive-Record-Level-Index.md} | 0 ...-Course-for-beginner-Operations-Type-Part-5.md} | 0 ...r-Data-Lake-using-Elastic-Search-and-Kibana.md} | 0 ...our-Medallion-Architecture-with-Apache-Hudi.md} | 0 ...g-Serverless-Architecture-in-Hudi-Data-Lakes.md | 15 +++ ...-Guide-Local-Ingestion-from-Parquet-Source-1.md | 18 ++ ...s-On-Guide-Local-Ingestion-from-CSV-Source-2.md | 20 ...ables-using-Hudi-MultiTable-Delta-Streamer-3.md | 16 ...ll-from-Postgres-to-Hudi-using-deltastreamer.md | 16 ...amer-in-continous-Mode-and-SQL-transformer-5.md | 17 + ...ingest-data-from-Kafka-Topic-Hands-on-Labs-6.md | 19 +++ website/videoBlog/2023-11-24-hudi-table-types.md | 16 ...ezium-kafka-schema-registry-deltastreamer-7a.md | 21 + ...ezium-kafka-schema-registry-deltastreamer-7b.md | 20 ...etadata-table-Record-Level-Index-HBase-Index.md | 17 + ...aStreamer-in-Continuous-Mode-Hands-on-Labs-8.md | 19 +++ ...pache-Hudi-DeltaStreamer-with-Hands-on-Lab-9.md | 17 + ...e-in-Incremental-Fashion-Bronze-to-Silver-10.md | 20 ...er-on-Local-Machine-for-Begineers-Easy-Setup.md | 19 +++ ...rift-Server-and-Hudi-with-Beeline-in-Minutes.md | 18 ++ ...ing-and-AvroKafkaSource-Consumption-11-Guide.md | 20 ...-Data-using-Hue-and-Presto-CLI-Hands-on-Labs.md | 19 +++
Re: [PR] DOCS-added-video-content [hudi]
bhasudha merged PR #10385: URL: https://github.com/apache/hudi/pull/10385 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS-added-video-content [hudi]
bhasudha commented on code in PR #10385: URL: https://github.com/apache/hudi/pull/10385#discussion_r1435645589 ## website/videoBlog/2023-12-19-How-to-Use-Apache-Hudi-0-14-and-RLI-on-AWS-Glue-Step-by-Step-Guide.mdx: ## @@ -0,0 +1,18 @@ +--- +title: "How to Use Apache Hudi 0.14 and RLI (record level index) on AWS Glue Step by Step Guide" +last_modified_at: 2023-12-20T16:54:38.964863-07:00 +authors: +- name: Soumil Shah +category: blog +image: /assets/images/video_blogs/2023-12-19-How-to-Use-Apache-Hudi-0-14-and-RLI-on-AWS-Glue-Step-by-Step-Guide.png +navigate: "https://www.youtube.com/watch?v=HJ6QQN408AE; +tags: +- guide +- beginner +- record level index Review Comment: add `indexing` tag -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS-added-video-content [hudi]
bhasudha commented on code in PR #10385: URL: https://github.com/apache/hudi/pull/10385#discussion_r1435645468 ## website/videoBlog/2023-12-12-Apache-Hudi-DeltaStreamer-in-Action-Python-Publishing-and-AvroKafkaSource-Consumption-11-Guide.mdx: ## @@ -0,0 +1,18 @@ +--- +title: "Apache Hudi Delta Streamer in Action: Python Publishing and AvroKafkaSource Consumption (#11 Guide)" +last_modified_at: 2023-12-20T16:54:38.964863-07:00 +authors: +- name: Soumil Shah +category: blog +image: /assets/images/video_blogs/2023-12-12-Apache-Hudi-DeltaStreamer-in-Action-Python-Publishing-and-AvroKafkaSource-Consumption-11-Guide.png +navigate: "https://www.youtube.com/watch?v=FSpt4jSH_O0; +tags: +- guide +- beginner +- deltastreamer +- AvroKafkaSource Review Comment: avoid class names. These are implementation details ## website/videoBlog/2023-12-16-Learn-How-to-Setup-Hudi-on-EMR-with-Hive-and-Query-Data-using-Hue-and-Presto-CLI-Hands-on-Labs.mdx: ## @@ -0,0 +1,21 @@ +--- +title: "Learn How to Setup Hudi on EMR with Hive and Query Data using Hue and Presto CLI Hands on Labs" +last_modified_at: 2023-12-20T16:54:38.964863-07:00 +authors: +- name: Soumil Shah +category: blog +image: /assets/images/video_blogs/2023-12-16-Learn-How-to-Setup-Hudi-on-EMR-with-Hive-and-Query-Data-using-Hue-and-Presto-CLI-Hands-on-Labs.png +navigate: "https://www.youtube.com/watch?v=oav6aEldk1o; +tags: +- guide +- aws +- beginner +- hive Review Comment: apache hive -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS-added-video-content [hudi]
bhasudha commented on code in PR #10385: URL: https://github.com/apache/hudi/pull/10385#discussion_r1435645350 ## website/videoBlog/2023-12-11-Simplifying-Big-Data-Setting-Up-SparkSQL-Hive-Thrift-Server-and-Hudi-with-Beeline-in-Minutes.mdx: ## @@ -0,0 +1,18 @@ +--- +title: "Simplifying Big Data: Setting Up Spark SQL, Hive Thrift Server, and Hudi with Beeline in Minutes" +last_modified_at: 2023-12-20T16:54:38.964863-07:00 +authors: +- name: Soumil Shah +category: blog +image: /assets/images/video_blogs/2023-12-11-Simplifying-Big-Data-Setting-Up-SparkSQL-Hive-Thrift-Server-and-Hudi-with-Beeline-in-Minutes.png +navigate: "https://www.youtube.com/watch?v=lCorHcx2mvc; +tags: +- guide +- aws Review Comment: avoid plain `aws` tag and go with fully qualified tags like `aws emr` for example -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS-added-video-content [hudi]
bhasudha commented on code in PR #10385: URL: https://github.com/apache/hudi/pull/10385#discussion_r1435645073 ## website/videoBlog/2023-12-09-Learn-How-to-use-DBT-with-Spark-and-Thrift-Server-on-Local-Machine-for-Begineers-Easy-Setup.mdx: ## @@ -0,0 +1,18 @@ +--- +title: "Learn How to use DBT with Spark and Thrift Server on Local Machine for Begineers Easy Setup" +last_modified_at: 2023-12-20T16:54:38.964863-07:00 +authors: +- name: Soumil Shah +category: blog +image: /assets/images/video_blogs/2023-12-09-Learn-How-to-use-DBT-with-Spark-and-Thrift-Server-on-Local-Machine-for-Begineers-Easy-Setup.png +navigate: "https://www.youtube.com/watch?v=k1HSFPlunlM; +tags: +- guide +- beginner +- spark Review Comment: Tag with fully qualified names - `apache spark` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS-added-video-content [hudi]
bhasudha commented on code in PR #10385: URL: https://github.com/apache/hudi/pull/10385#discussion_r1435644838 ## website/videoBlog/2023-11-21-RFC-14-Step-by-Step-Guide-for-Incremental-Data-Pull-from-Postgres-to-Hudi-using-deltastreamer.mdx: ## @@ -0,0 +1,16 @@ +--- +title: "RFC-14: Step-by-Step Guide for Incremental Data Pull from Postgres to Hudi using DeltaStreamer (#4)" +last_modified_at: 2023-12-20T16:54:38.964863-07:00 +authors: +- name: Soumil Shah +category: blog +image: /assets/images/video_blogs/2023-11-21-RFC-14-Step-by-Step-Guide-for-Incremental-Data-Pull-from-Postgres-to-Hudi-using-deltastreamer.png +navigate: "https://www.youtube.com/watch?v=kqQ0SVwfBig; +tags: +- guide +- beginner +- deltastreamer Review Comment: Lets remember to add hudi streamer tag wherever we are adding deltastreamer. For reference deltastreamer was renamed in recent releases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS-added-video-content [hudi]
bhasudha commented on code in PR #10385: URL: https://github.com/apache/hudi/pull/10385#discussion_r1435644596 ## website/static/assets/images/video_blogs/2023-12-11-Simplifying-Big-Data-Setting-Up-SparkSQL-Hive-Thrift-Server-and-Hudi-with-Beeline-in-Minutes.png: ## Review Comment: Lets avoid these type of thumnails. Doesnt bring out that the video guide is about very well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOCS-added-video-content [hudi]
bhasudha commented on code in PR #10385: URL: https://github.com/apache/hudi/pull/10385#discussion_r1435644467 ## website/static/assets/images/video_blogs/2023-11-19-Hudi-Streamer-Hands-On-Guide-Local-Ingestion-from-Parquet-Source-1.png: ## Review Comment: Lets avoid these type of thumnails. Doesnt bring out that the video guide is about very well. ## website/static/assets/images/video_blogs/2023-11-20-Hudi-Streamer-Hands-On-Guide-Local-Ingestion-from-CSV-Source-2.png: ## Review Comment: Lets avoid these type of thumnails. Doesnt bring out that the video guide is about very well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Seeking Assistance with Hudi Integration Issue in Spark Thrift Server and DBT [hudi]
soumilshah1995 closed issue #10287: [SUPPORT] Seeking Assistance with Hudi Integration Issue in Spark Thrift Server and DBT URL: https://github.com/apache/hudi/issues/10287 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Seeking Assistance with Hudi Integration Issue in Spark Thrift Server and DBT [hudi]
soumilshah1995 commented on issue #10287: URL: https://github.com/apache/hudi/issues/10287#issuecomment-1868324146 ![Screenshot 2023-12-23 at 11 13 29 AM](https://github.com/apache/hudi/assets/39345855/1029e731-be52-4ff8-81b8-1753c342de44) will be creating YouTube videos for this which will help everyone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
(hudi) branch asf-site updated: DOC-added talks/presentations (#10399)
This is an automated email from the ASF dual-hosted git repository. bhavanisudha pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/hudi.git The following commit(s) were added to refs/heads/asf-site by this push: new ca9d62964af DOC-added talks/presentations (#10399) ca9d62964af is described below commit ca9d62964af76d4b1f529e48c0feaebc27ebd0ee Author: nadine farah AuthorDate: Sat Dec 23 06:37:16 2023 -0800 DOC-added talks/presentations (#10399) * added talks/presentations added talks from open source data summit that were hudi focused * More fixes Fixed order of talks, added conference names and fixed some links - Co-authored-by: Bhavani Sudha Saktheeswaran <2179254+bhasu...@users.noreply.github.com> --- README.md | 15 + website/src/pages/talks.md | 84 ++ 2 files changed, 70 insertions(+), 29 deletions(-) diff --git a/README.md b/README.md index 9415e9fff12..a4ae4b3d96c 100644 --- a/README.md +++ b/README.md @@ -157,6 +157,21 @@ Example: When you change any file in `versioned_docs/version-0.7.0/`, it will on ## Configs Configs can be automatically updated by following these steps documented at ../hudi-utils/README.md +## Talks + +When adding a talk, please follow these guidelines. + +1. Ensure the entry is of the format + "[Title](Hyperlink to video/resources)" - By , , . , . +2. Please ensure the talks are in chronological order. +3. Try to add links to videos and slide decks when possible. If they are not available in same page, feel free to add + [Slides](Slides link) towards the end like for example: + +:::note + ["Hoodie: An Open Source Incremental Processing Framework From Uber"](http://www.dataengconf.com/hoodie-an-open-source-incremental-processing-framework-from-uber) - By Vinoth Chandar. + Apr 2017, DataEngConf, San Francisco, CA [Slides](https://www.slideshare.net/vinothchandar/hoodie-dataengconf-2017) [Video](https://www.youtube.com/watch?v=7Wudjc-v7CA) +::: + ## Blogs When adding a new blog, please follow these guidelines. diff --git a/website/src/pages/talks.md b/website/src/pages/talks.md index bea37571700..ddcfacb5fad 100644 --- a/website/src/pages/talks.md +++ b/website/src/pages/talks.md @@ -49,59 +49,85 @@ last_modified_at: 2019-12-31T15:59:57-04:00 18. ["Next Generation Data lakes using Apache Hudi"](https://docs.google.com/presentation/d/1y-ryRwCdTbqQHGr_bn3lxM_B8L1L5nsZOIXlJsDl_wU/edit?usp=sharing) - By Balaji Varadarajan and Sivabalan Narayanan, Sep 2020, ["ApacheCon"](https://www.apachecon.com/) -19. ["Building Large-Scale, Transactional Data Lakes using Apache Hudi"](https://www.dbta.com/DataSummit/Fall2020/Agenda.aspx) - By Nishith Agarwal, Data Summit 2020 +19. ["Apache Hudi on Amazon EMR"](https://pages.awscloud.com/rs/112-TZM-766/images/EV_analytics-sprint-week-apache-hundi-amazon-emr_Sep-2020.pdf) - By the AWS team. September 2020 -20. ["Landing practice of Apache Hudi in T3go"](https://drive.google.com/file/d/1ULVPkjynaw-07wsutLcZm-4rVXf8E8N8/view?usp=sharing) - By VinoYang and XianghuWang, November 2020, Qcon. +20. ["Building Large-Scale, Transactional Data Lakes using Apache Hudi"](https://www.dbta.com/DataSummit/Fall2020/Agenda.aspx) - By Nishith Agarwal, Data Summit 2020 -21. ["Meetup talk by Nishith Agarwal"](https://www.meetup.com/UberEvents/events/274924537/) - Uber Data Platforms Meetup, Dec 2020 +21. ["Landing practice of Apache Hudi in T3go"](https://drive.google.com/file/d/1ULVPkjynaw-07wsutLcZm-4rVXf8E8N8/view?usp=sharing) - By VinoYang and XianghuWang, November 2020, Qcon. -22. ["Apache Hudi learning series: Understanding Hudi internals"](https://www.slideshare.net/NishithAgarwal3/hudi-architecture-fundamentals-and-capabilities) - By Abhishek Modi, Balajee Nagasubramaniam, Prashant Wason, Satish Kotha, Nishith Agarwal, Feb 2021, Uber Meetup +22. ["Meetup talk by Nishith Agarwal"](https://www.meetup.com/UberEvents/events/274924537/) - Uber Data Platforms Meetup, Dec 2020 -23. ["Apache Hudi Meetup at Uber with talks from AWS, CityStorageSystems & Uber"](https://youtu.be/iXBInMLbjo0) - By Udit Mehrotra, Wenning Ding (AWS), Alexander Filipchik (CityStorageSystems), Prashant Wason, Satish Kotha (Uber), Feb 2021 +23. ["Apache Hudi learning series: Understanding Hudi internals"](https://www.slideshare.net/NishithAgarwal3/hudi-architecture-fundamentals-and-capabilities) - By Abhishek Modi, Balajee Nagasubramaniam, Prashant Wason, Satish Kotha, Nishith Agarwal, Feb 2021, Uber Meetup -24. ["Apache Hudi: The Streaming Data Lake Platform"](https://docs.google.com/presentation/d/1lVpbYV7qytAZPdwx4X9DD9ii0qFh7n9WGKJ0XQ4VpIs/edit?usp=sharing) - By Nishith Agarwal, Sivabalan Narayanan, +24. ["Apache Hudi Meetup at Uber with talks from AWS, CityStorageSystems & Uber"](https://youtu.be/iXBInMLbjo0) - By Udit Mehrotra, Wenning
Re: [PR] DOC-added talks/presentations [hudi]
bhasudha merged PR #10399: URL: https://github.com/apache/hudi/pull/10399 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] DOC-added talks/presentations [hudi]
bhasudha commented on PR #10399: URL: https://github.com/apache/hudi/pull/10399#issuecomment-1868304376 @nfarah86 Thanks for the PR. I reviewed and fixed a few things. For future reference, please ensure - The talks are in chronological order - We mention the conference name and stick to the format. I added these in README.md as well for reference. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] [MINOR] Fixed unit tests [hudi]
geserdugarov commented on PR #10362: URL: https://github.com/apache/hudi/pull/10362#issuecomment-1868303781 I don't understand what is happening with CI. I've changed 2 unit tests: - `TestJavaHoodieBackedMetadata`, from `hudi-client/hudi-java-client`, - `TestHoodieDeltaStreamer`, from `hudi-utilities`. Both are Java tests. ### Azure CI `hudi-client/hudi-java-client` is not included in the Azure CI. `hudi-utilities` is included in the Azure CI in `UT FT other modules` job at `UT other modules` stage. So, `TestHoodieDeltaStreamer` test is the only one, which could brake the Azure CI. But the last log from `UT other modules` stage is > [INFO] Running org.apache.hudi.utilities.sources.TestSqlSource before > This job was abandoned. We have detected that logs from the agent may have not finished uploading. We have included our in-memory record of all log lines uploaded before we lost contact with the agent: My change in this test couldn't brake it this way, only test failure is possible. Maybe with my MR test ordering is changed and the unit tests running is hung at `@AfterAll/Each` of some test class or at `@BeforeAll/Each` of another one. But I couldn't reproduce the problem locally. This part of CI job is passing without any problem locally. ### GitHub Actions My change in `TestJavaHoodieBackedMetadata` from `hudi-client/hudi-java-client` should affect only `test-hudi-hadoop-mr-and-hudi-java-client` job, but not `test-spark`. And I see that `test-hudi-hadoop-mr-and-hudi-java-client` is ok, but there are hungs in `test-spark` and failure at `TestDataSourceForBootstrap` scala test after > 2023-12-23T04:01:07.0996155Z 4017081 [Executor task launch worker for task 372] ERROR org.apache.spark.executor.Executor [] - Exception in task 0.0 in stage 133.0 (TID 372) 2023-12-23T04:01:07.0997116Z java.lang.OutOfMemoryError: GC overhead limit exceeded @danny0405 , @yihua Could you, please, give me any suggestions what else can I try? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org