[DISCUSS] Regarding nightly builds
Hello all, Should we have nightly builds that way we can point users to those builds for the latest features introduced, instead of being blocked on the next release. Also this kind of gives an early feedback on new features or fixes if any further improvements are needed. Does anyone know if and how other Apache projects handle nightly builds? Thanks, Sudha
IllegalStateException: Hudi File Id (...) has more than 1 pending compactions. Hudi 0.5.3 + S3
Hi Team, We are trying to run incremental updates to our MoR hudi table on S3 and it looks like inevitably after 20-30 commits table gets corrupted. We do initial data import and enable incremental upserts then we verify that tables are readable by running: hive> select * from table_name _ro limit 1; but after letting incremental upserts to run for several hours , the mentioned above select query starts throwing exceptions like: Failed with exception java.io.IOException:java.lang.IllegalStateException: Hudi File Id (HoodieFileGroupId{partitionPath='983', fileId='8e9fde92-7515-4f89-a667-ce5c1087e60c-0'}) has more than 1 pending compactions. Checking compactions mentioned in exception message via hudi-cli, do indeed verifies that fileid is present in both compactions. The upsert settings that we use are: hudiOptions = Map[String,String]( HoodieWriteConfig.TABLE_NAME → inputTableName, "hoodie.consistency.check.enabled"->"true", "hoodie.compact.inline.max.delta.commits"->"30", "hoodie.compact.inline"->"true", "hoodie.clean.automatic"->"true", "hoodie.cleaner.commits.retained"->"1000", "hoodie.keep.min.commits"->"1001", "hoodie.keep.max.commits"->"1050", DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "MERGE_ON_READ", DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> primaryKeys, DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY -> classOf[ComplexKeyGenerator].getName, DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY ->"partition_val_str", DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> sortKeys, DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true", DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → inputTableName, DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY → "partition_val_str", DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY → classOf[MultiPartKeysValueExtractor].getName, DataSourceWriteOptions.HIVE_URL_OPT_KEY ->s"jdbc:hive2://$hiveServer2URI:1" ) Any suggestions on what can cause or how to possibly debug this issue would help a lot. Thank you, Anton Zuyeu
Re: [ANNOUNCE] Apache Hudi 0.5.3 released
Great job. Thanks Siva for driving this to completion. -Sudha On Thu, Jun 18, 2020 at 4:36 AM Vinoth Chandar wrote: > Thanks for all the great work! > Onto 0.6.0 now! > > On Thu, Jun 18, 2020 at 4:06 AM leesf wrote: > > > Great, thanks siva and sudha! > > > > vino yang 于2020年6月18日周四 下午2:16写道: > > > > > Great job! > > > > > > Thanks for your hard work, Siva and Sudha! > > > > > > Best, > > > Vino > > > > > > nishith agarwal 于2020年6月18日周四 上午11:09写道: > > > > > > > Great job Siva and Sudha, thanks for driving this! > > > > > > > > -Nishith > > > > > > > > On Wed, Jun 17, 2020 at 7:16 PM wrote: > > > > > > > > > Super news :) The very first release after graduation. Awesome job > > > Siva > > > > > and Sudha for spearheading the release of 0.5.3. > > > > > Balaji.V > > > > > > > > > > Sent from Yahoo Mail for iPhone > > > > > > > > > > > > > > > On Wednesday, June 17, 2020, 5:50 PM, Sivabalan < > n.siv...@gmail.com> > > > > > wrote: > > > > > > > > > > The Apache Hudi community is pleased to announce the release of > > Apache > > > > Hudi > > > > > 0.5.3. > > > > > > > > > > > > > > > > > > > > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes > and > > > > > Incrementals. Apache Hudi manages storage of large analytical > > datasets > > > on > > > > > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible > storage) > > > and > > > > > provides the ability to update/delete records as well capture > > changes. > > > > > > > > > > > > > > > > > > > > 0.5.3 is a bug fix release and is the first release after > graduating > > as > > > > > TLP. It includes more than 35 resolved issues, comprising general > > > > > improvements and bug-fixes. Hudi 0.5.3 enables Embedded Timeline > > Server > > > > and > > > > > Incremental Cleaning by default for both delta-streamer and spark > > > > > datasource writes. Apart from multiple bug fixes, this release also > > > > > improves write performance like avoiding unnecessary loading of > data > > > > after > > > > > writes and improving parallelism while searching for existing files > > for > > > > > writing new records. > > > > > > > > > > > > > > > > > > > > For details on how to use Hudi, please look at the quick start page > > > > located > > > > > at https://hudi.apache.org/docs/quick-start-guide.html > > > > > > > > > > If you'd like to download the source release, you can find it here: > > > > > > > > > > https://github.com/apache/hudi/releases/tag/release-0.5.3 > > > > > > > > > > You can read more about the release (including release notes) here: > > > > > > > > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348256 > > > > > > > > > > > > > > > > > > > > We would like to thank all contributors, the community, and the > > Apache > > > > > Software Foundation for enabling this release and we look forward > to > > > > > continued collaboration. We welcome your help and feedback. For > more > > > > > information on how to report problems, and to get involved, visit > the > > > > > project website at: http://hudi.apache.org/ > > > > > > > > > > > > > > > Kind regards, > > > > > > > > > > Sivabalan Narayanan (Hudi 0.5.3 Release Manager) > > > > > > > > > > On behalf of the Apache Hudi > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [ANNOUNCE] Apache Hudi 0.5.3 released
Thanks for all the great work! Onto 0.6.0 now! On Thu, Jun 18, 2020 at 4:06 AM leesf wrote: > Great, thanks siva and sudha! > > vino yang 于2020年6月18日周四 下午2:16写道: > > > Great job! > > > > Thanks for your hard work, Siva and Sudha! > > > > Best, > > Vino > > > > nishith agarwal 于2020年6月18日周四 上午11:09写道: > > > > > Great job Siva and Sudha, thanks for driving this! > > > > > > -Nishith > > > > > > On Wed, Jun 17, 2020 at 7:16 PM wrote: > > > > > > > Super news :) The very first release after graduation. Awesome job > > Siva > > > > and Sudha for spearheading the release of 0.5.3. > > > > Balaji.V > > > > > > > > Sent from Yahoo Mail for iPhone > > > > > > > > > > > > On Wednesday, June 17, 2020, 5:50 PM, Sivabalan > > > > wrote: > > > > > > > > The Apache Hudi community is pleased to announce the release of > Apache > > > Hudi > > > > 0.5.3. > > > > > > > > > > > > > > > > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and > > > > Incrementals. Apache Hudi manages storage of large analytical > datasets > > on > > > > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) > > and > > > > provides the ability to update/delete records as well capture > changes. > > > > > > > > > > > > > > > > 0.5.3 is a bug fix release and is the first release after graduating > as > > > > TLP. It includes more than 35 resolved issues, comprising general > > > > improvements and bug-fixes. Hudi 0.5.3 enables Embedded Timeline > Server > > > and > > > > Incremental Cleaning by default for both delta-streamer and spark > > > > datasource writes. Apart from multiple bug fixes, this release also > > > > improves write performance like avoiding unnecessary loading of data > > > after > > > > writes and improving parallelism while searching for existing files > for > > > > writing new records. > > > > > > > > > > > > > > > > For details on how to use Hudi, please look at the quick start page > > > located > > > > at https://hudi.apache.org/docs/quick-start-guide.html > > > > > > > > If you'd like to download the source release, you can find it here: > > > > > > > > https://github.com/apache/hudi/releases/tag/release-0.5.3 > > > > > > > > You can read more about the release (including release notes) here: > > > > > > > > > > > > > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348256 > > > > > > > > > > > > > > > > We would like to thank all contributors, the community, and the > Apache > > > > Software Foundation for enabling this release and we look forward to > > > > continued collaboration. We welcome your help and feedback. For more > > > > information on how to report problems, and to get involved, visit the > > > > project website at: http://hudi.apache.org/ > > > > > > > > > > > > Kind regards, > > > > > > > > Sivabalan Narayanan (Hudi 0.5.3 Release Manager) > > > > > > > > On behalf of the Apache Hudi > > > > > > > > > > > > > > > > > > > > > >
Re: [ANNOUNCE] Apache Hudi 0.5.3 released
Great, thanks siva and sudha! vino yang 于2020年6月18日周四 下午2:16写道: > Great job! > > Thanks for your hard work, Siva and Sudha! > > Best, > Vino > > nishith agarwal 于2020年6月18日周四 上午11:09写道: > > > Great job Siva and Sudha, thanks for driving this! > > > > -Nishith > > > > On Wed, Jun 17, 2020 at 7:16 PM wrote: > > > > > Super news :) The very first release after graduation. Awesome job > Siva > > > and Sudha for spearheading the release of 0.5.3. > > > Balaji.V > > > > > > Sent from Yahoo Mail for iPhone > > > > > > > > > On Wednesday, June 17, 2020, 5:50 PM, Sivabalan > > > wrote: > > > > > > The Apache Hudi community is pleased to announce the release of Apache > > Hudi > > > 0.5.3. > > > > > > > > > > > > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and > > > Incrementals. Apache Hudi manages storage of large analytical datasets > on > > > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage) > and > > > provides the ability to update/delete records as well capture changes. > > > > > > > > > > > > 0.5.3 is a bug fix release and is the first release after graduating as > > > TLP. It includes more than 35 resolved issues, comprising general > > > improvements and bug-fixes. Hudi 0.5.3 enables Embedded Timeline Server > > and > > > Incremental Cleaning by default for both delta-streamer and spark > > > datasource writes. Apart from multiple bug fixes, this release also > > > improves write performance like avoiding unnecessary loading of data > > after > > > writes and improving parallelism while searching for existing files for > > > writing new records. > > > > > > > > > > > > For details on how to use Hudi, please look at the quick start page > > located > > > at https://hudi.apache.org/docs/quick-start-guide.html > > > > > > If you'd like to download the source release, you can find it here: > > > > > > https://github.com/apache/hudi/releases/tag/release-0.5.3 > > > > > > You can read more about the release (including release notes) here: > > > > > > > > > > > > https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348256 > > > > > > > > > > > > We would like to thank all contributors, the community, and the Apache > > > Software Foundation for enabling this release and we look forward to > > > continued collaboration. We welcome your help and feedback. For more > > > information on how to report problems, and to get involved, visit the > > > project website at: http://hudi.apache.org/ > > > > > > > > > Kind regards, > > > > > > Sivabalan Narayanan (Hudi 0.5.3 Release Manager) > > > > > > On behalf of the Apache Hudi > > > > > > > > > > > > > > >