[DISCUSS] Regarding nightly builds

2020-06-18 Thread Bhavani Sudha
Hello all,

Should we have nightly builds that way we can point users to those builds
for the latest features introduced, instead of being blocked on the next
release. Also this kind of gives an early feedback on new features or fixes
 if any further improvements are needed.  Does anyone know if and how other
Apache projects handle nightly builds?

Thanks,
Sudha


IllegalStateException: Hudi File Id (...) has more than 1 pending compactions. Hudi 0.5.3 + S3

2020-06-18 Thread Zuyeu, Anton
Hi Team,

We are trying to run incremental updates to our MoR hudi table on S3 and it 
looks like inevitably after 20-30 commits table gets corrupted. We do initial 
data import and enable incremental upserts then we verify that tables are 
readable by running:
hive> select * from table_name _ro limit 1;

but after letting incremental upserts to run for several hours , the mentioned 
above select query starts throwing exceptions like:
Failed with exception java.io.IOException:java.lang.IllegalStateException: Hudi 
File Id (HoodieFileGroupId{partitionPath='983', 
fileId='8e9fde92-7515-4f89-a667-ce5c1087e60c-0'}) has more than 1 pending 
compactions.

Checking compactions mentioned in exception message via hudi-cli, do indeed 
verifies that fileid is present in both compactions. The upsert settings that 
we use are:
hudiOptions = Map[String,String](
  HoodieWriteConfig.TABLE_NAME → inputTableName,
  "hoodie.consistency.check.enabled"->"true",
  "hoodie.compact.inline.max.delta.commits"->"30",
  "hoodie.compact.inline"->"true",
  "hoodie.clean.automatic"->"true",
  "hoodie.cleaner.commits.retained"->"1000",
  "hoodie.keep.min.commits"->"1001",
  "hoodie.keep.max.commits"->"1050",
  DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "MERGE_ON_READ",
  DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> primaryKeys,
  DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY -> 
classOf[ComplexKeyGenerator].getName,
  DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY 
->"partition_val_str",
  DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> sortKeys,
  DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true",
  DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → inputTableName,
  DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY → 
"partition_val_str",
  DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY → 
classOf[MultiPartKeysValueExtractor].getName,
  DataSourceWriteOptions.HIVE_URL_OPT_KEY 
->s"jdbc:hive2://$hiveServer2URI:1"

)

Any suggestions on what can cause or how to possibly debug this issue would 
help a lot.

Thank you,
Anton Zuyeu


Re: [ANNOUNCE] Apache Hudi 0.5.3 released

2020-06-18 Thread Bhavani Sudha
Great job. Thanks Siva for driving this to completion.

-Sudha

On Thu, Jun 18, 2020 at 4:36 AM Vinoth Chandar  wrote:

> Thanks for all the great work!
> Onto 0.6.0 now!
>
> On Thu, Jun 18, 2020 at 4:06 AM leesf  wrote:
>
> > Great, thanks siva and sudha!
> >
> > vino yang  于2020年6月18日周四 下午2:16写道:
> >
> > > Great job!
> > >
> > > Thanks for your hard work, Siva and Sudha!
> > >
> > > Best,
> > > Vino
> > >
> > > nishith agarwal  于2020年6月18日周四 上午11:09写道:
> > >
> > > > Great job Siva and Sudha, thanks for driving this!
> > > >
> > > > -Nishith
> > > >
> > > > On Wed, Jun 17, 2020 at 7:16 PM  wrote:
> > > >
> > > > > Super news :)  The very first release after graduation. Awesome job
> > > Siva
> > > > > and Sudha for spearheading the release of 0.5.3.
> > > > > Balaji.V
> > > > >
> > > > > Sent from Yahoo Mail for iPhone
> > > > >
> > > > >
> > > > > On Wednesday, June 17, 2020, 5:50 PM, Sivabalan <
> n.siv...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > The Apache Hudi community is pleased to announce the release of
> > Apache
> > > > Hudi
> > > > > 0.5.3.
> > > > >
> > > > >
> > > > >
> > > > > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes
> and
> > > > > Incrementals. Apache Hudi manages storage of large analytical
> > datasets
> > > on
> > > > > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible
> storage)
> > > and
> > > > > provides the ability to update/delete records as well capture
> > changes.
> > > > >
> > > > >
> > > > >
> > > > > 0.5.3 is a bug fix release and is the first release after
> graduating
> > as
> > > > > TLP. It includes more than 35 resolved issues, comprising general
> > > > > improvements and bug-fixes. Hudi 0.5.3 enables Embedded Timeline
> > Server
> > > > and
> > > > > Incremental Cleaning by default for both delta-streamer and spark
> > > > > datasource writes. Apart from multiple bug fixes, this release also
> > > > > improves write performance like avoiding unnecessary loading of
> data
> > > > after
> > > > > writes and improving parallelism while searching for existing files
> > for
> > > > > writing new records.
> > > > >
> > > > >
> > > > >
> > > > > For details on how to use Hudi, please look at the quick start page
> > > > located
> > > > > at https://hudi.apache.org/docs/quick-start-guide.html
> > > > >
> > > > > If you'd like to download the source release, you can find it here:
> > > > >
> > > > > https://github.com/apache/hudi/releases/tag/release-0.5.3
> > > > >
> > > > > You can read more about the release (including release notes) here:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348256
> > > > >
> > > > >
> > > > >
> > > > > We would like to thank all contributors, the community, and the
> > Apache
> > > > > Software Foundation for enabling this release and we look forward
> to
> > > > > continued collaboration. We welcome your help and feedback. For
> more
> > > > > information on how to report problems, and to get involved, visit
> the
> > > > > project website at: http://hudi.apache.org/
> > > > >
> > > > >
> > > > > Kind regards,
> > > > >
> > > > > Sivabalan Narayanan (Hudi 0.5.3 Release Manager)
> > > > >
> > > > > On behalf of the Apache Hudi
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] Apache Hudi 0.5.3 released

2020-06-18 Thread Vinoth Chandar
Thanks for all the great work!
Onto 0.6.0 now!

On Thu, Jun 18, 2020 at 4:06 AM leesf  wrote:

> Great, thanks siva and sudha!
>
> vino yang  于2020年6月18日周四 下午2:16写道:
>
> > Great job!
> >
> > Thanks for your hard work, Siva and Sudha!
> >
> > Best,
> > Vino
> >
> > nishith agarwal  于2020年6月18日周四 上午11:09写道:
> >
> > > Great job Siva and Sudha, thanks for driving this!
> > >
> > > -Nishith
> > >
> > > On Wed, Jun 17, 2020 at 7:16 PM  wrote:
> > >
> > > > Super news :)  The very first release after graduation. Awesome job
> > Siva
> > > > and Sudha for spearheading the release of 0.5.3.
> > > > Balaji.V
> > > >
> > > > Sent from Yahoo Mail for iPhone
> > > >
> > > >
> > > > On Wednesday, June 17, 2020, 5:50 PM, Sivabalan 
> > > > wrote:
> > > >
> > > > The Apache Hudi community is pleased to announce the release of
> Apache
> > > Hudi
> > > > 0.5.3.
> > > >
> > > >
> > > >
> > > > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> > > > Incrementals. Apache Hudi manages storage of large analytical
> datasets
> > on
> > > > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage)
> > and
> > > > provides the ability to update/delete records as well capture
> changes.
> > > >
> > > >
> > > >
> > > > 0.5.3 is a bug fix release and is the first release after graduating
> as
> > > > TLP. It includes more than 35 resolved issues, comprising general
> > > > improvements and bug-fixes. Hudi 0.5.3 enables Embedded Timeline
> Server
> > > and
> > > > Incremental Cleaning by default for both delta-streamer and spark
> > > > datasource writes. Apart from multiple bug fixes, this release also
> > > > improves write performance like avoiding unnecessary loading of data
> > > after
> > > > writes and improving parallelism while searching for existing files
> for
> > > > writing new records.
> > > >
> > > >
> > > >
> > > > For details on how to use Hudi, please look at the quick start page
> > > located
> > > > at https://hudi.apache.org/docs/quick-start-guide.html
> > > >
> > > > If you'd like to download the source release, you can find it here:
> > > >
> > > > https://github.com/apache/hudi/releases/tag/release-0.5.3
> > > >
> > > > You can read more about the release (including release notes) here:
> > > >
> > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348256
> > > >
> > > >
> > > >
> > > > We would like to thank all contributors, the community, and the
> Apache
> > > > Software Foundation for enabling this release and we look forward to
> > > > continued collaboration. We welcome your help and feedback. For more
> > > > information on how to report problems, and to get involved, visit the
> > > > project website at: http://hudi.apache.org/
> > > >
> > > >
> > > > Kind regards,
> > > >
> > > > Sivabalan Narayanan (Hudi 0.5.3 Release Manager)
> > > >
> > > > On behalf of the Apache Hudi
> > > >
> > > >
> > > >
> > > >
> > >
> >
>


Re: [ANNOUNCE] Apache Hudi 0.5.3 released

2020-06-18 Thread leesf
Great, thanks siva and sudha!

vino yang  于2020年6月18日周四 下午2:16写道:

> Great job!
>
> Thanks for your hard work, Siva and Sudha!
>
> Best,
> Vino
>
> nishith agarwal  于2020年6月18日周四 上午11:09写道:
>
> > Great job Siva and Sudha, thanks for driving this!
> >
> > -Nishith
> >
> > On Wed, Jun 17, 2020 at 7:16 PM  wrote:
> >
> > > Super news :)  The very first release after graduation. Awesome job
> Siva
> > > and Sudha for spearheading the release of 0.5.3.
> > > Balaji.V
> > >
> > > Sent from Yahoo Mail for iPhone
> > >
> > >
> > > On Wednesday, June 17, 2020, 5:50 PM, Sivabalan 
> > > wrote:
> > >
> > > The Apache Hudi community is pleased to announce the release of Apache
> > Hudi
> > > 0.5.3.
> > >
> > >
> > >
> > > Apache Hudi (pronounced Hoodie) stands for Hadoop Upserts Deletes and
> > > Incrementals. Apache Hudi manages storage of large analytical datasets
> on
> > > DFS (Cloud stores, HDFS or any Hadoop FileSystem compatible storage)
> and
> > > provides the ability to update/delete records as well capture changes.
> > >
> > >
> > >
> > > 0.5.3 is a bug fix release and is the first release after graduating as
> > > TLP. It includes more than 35 resolved issues, comprising general
> > > improvements and bug-fixes. Hudi 0.5.3 enables Embedded Timeline Server
> > and
> > > Incremental Cleaning by default for both delta-streamer and spark
> > > datasource writes. Apart from multiple bug fixes, this release also
> > > improves write performance like avoiding unnecessary loading of data
> > after
> > > writes and improving parallelism while searching for existing files for
> > > writing new records.
> > >
> > >
> > >
> > > For details on how to use Hudi, please look at the quick start page
> > located
> > > at https://hudi.apache.org/docs/quick-start-guide.html
> > >
> > > If you'd like to download the source release, you can find it here:
> > >
> > > https://github.com/apache/hudi/releases/tag/release-0.5.3
> > >
> > > You can read more about the release (including release notes) here:
> > >
> > >
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12348256
> > >
> > >
> > >
> > > We would like to thank all contributors, the community, and the Apache
> > > Software Foundation for enabling this release and we look forward to
> > > continued collaboration. We welcome your help and feedback. For more
> > > information on how to report problems, and to get involved, visit the
> > > project website at: http://hudi.apache.org/
> > >
> > >
> > > Kind regards,
> > >
> > > Sivabalan Narayanan (Hudi 0.5.3 Release Manager)
> > >
> > > On behalf of the Apache Hudi
> > >
> > >
> > >
> > >
> >
>