[VOTE] Release 0.6.0, release candidate #1
Hi everyone, Please review and vote on the release candidate #1 for the version 0.6.0, as follows: [ ] +1, Approve the release [ ] -1, Do not approve the release (please provide specific comments) The complete staging area is available for your review, which includes: * JIRA release notes [1], * the official Apache source release and binary convenience releases to be deployed to dist.apache.org [2], which are signed with the key with fingerprint 7F66CD4CE990983A284672293224F200E1FC2172 [3], * all artifacts to be deployed to the Maven Central Repository [4], * source code tag "release-0.6.0-rc1" [5], The vote will be open for at least 72 hours. It is adopted by majority approval, with at least 3 PMC affirmative votes. Thanks, Release Manager [1] https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12322822&version=12346663 [2] https://dist.apache.org/repos/dist/dev/hudi/hudi-0.6.0-rc1/ [3] https://dist.apache.org/repos/dist/release/hudi/KEYS [4] https://repository.apache.org/content/repositories/orgapachehudi-1025/ [5] https://github.com/apache/hudi/tree/release-0.6.0-rc1
Re: Request Contributor Access to JIRA
Hi Jack, Done and welcome to Hudi community! Best, Vino Jack Ye 于2020年8月20日周四 上午8:34写道: > Hi, > > I would like to request contributor access to the Hudi JIRA, my username is > jackye. > > Thank you very much, > > Best, > Jack Ye >
Request Contributor Access to JIRA
Hi, I would like to request contributor access to the Hudi JIRA, my username is jackye. Thank you very much, Best, Jack Ye
Re: [DISCUSS] Release 0.6.0 timelines
Hi Allen, Yes, it's a serialization (runtime) issue. I am working on fixing it. On Wed, Aug 19, 2020 at 7:04 PM Sivabalan wrote: > That would be of great help Allen. Much appreciated. > > On Wed, Aug 19, 2020 at 9:30 AM Allen Underwood > wrote: > > > Thanks Sivabalan, > > > > That's definitely the issue I had to resolve when introducing Joda time. > > I'll have a look back at my code to see how I got around it. If I > remember > > correctly, in the original code, the DateTime object was being saved as > an > > instance variable on the object and I had to do things differently in the > > Joda time approach - but it turned out not to be a big deal because the > > instance variable wasn't needed. > > > > I'll see if I can get a PR in to help fix this. > > > > The worst part is no unit tests catch this - it's a runtime issue I > assume > > due to how Spark serializes state. > > > > Thanks, > > > > Allen > > > > On Wed, Aug 19, 2020 at 9:23 AM Sivabalan wrote: > > > >> I am not sure if all findings so far have been documented here. but this > >> is > >> > >> > >> the ticket AFSIK: https://issues.apache.org/jira/browse/HUDI-1177 > >> > >> > >> > >> > >> > >> On Wed, Aug 19, 2020 at 9:15 AM Allen Underwood > >> > >> > >> wrote: > >> > >> > >> > >> > >> > >> > Just out of curiosity - what's the blocker - you have an issue? I had > >> > >> > >> > originally done the code to make that work. > >> > >> > >> > > >> > >> > >> > On Wed, Aug 19, 2020 at 12:51 AM Vinoth Chandar > >> wrote: > >> > >> > >> > > >> > >> > >> >> We still have 1 blocker issue from TimestampKeyGenerator issue with > >> joda > >> > >> > >> >> DateTimeFormatter. Sudha (RM) and Pratyaksh are going to look into > >> this. > >> > >> > >> >> > >> > >> > >> >> In the meantime, here's the progress, plans around testing so far. If > >> > >> > >> >> folks > >> > >> > >> >> in the community can help test the release branch in the next couple > of > >> > >> > >> >> days, it would be of great help! > >> > >> > >> >> > >> > >> > >> >> > >> > >> > >> >>- Testing plan here: Stability tested by balaji running spark > >> > >> > >> >> streaming, > >> > >> > >> >>Correctness (Nishith testing on release branch, Udit to help with > >> EMR > >> > >> > >> >> setup) > >> > >> > >> >>- Testing plan continued: Spark datasource MOR, .. Marker based > >> > >> > >> >>rollback, upgrade-downgrade (vinoth is on this) > >> > >> > >> >>- Testing plan : bootstrap (balaji tested for hive), udit has it > >> > >> > >> >> working > >> > >> > >> >>for Presto in EMR. > >> > >> > >> >>- Testing plan: bulk_insert v2 (vinoth tested for correctness, > file > >> > >> > >> >>sizes. ), performance (tested on a cluster, microbenchmarks) > >> > >> > >> >> > >> > >> > >> >> > >> > >> > >> >> There will also be some blogs/docs to explain new stuff. > >> > >> > >> >> > >> > >> > >> >>- Blogs to explain new stuff: sort modes (vinoth), simple index > >> > >> > >> >>(vinoth), bootstrap (balaji), multi delta-streamer (pratyaksh), > >> Spark > >> > >> > >> >>Streaming/DeltaStreamer continuous mode blog (balaji) > >> > >> > >> >>- Docs: All missing docs will be added by Sudha. > >> > >> > >> >> > >> > >> > >> >> > >> > >> > >> >> > >> > >> > >> >> > >> > >> > >> >> On Tue, Aug 18, 2020 at 1:58 AM Bhavani Sudha < > bhavanisud...@gmail.com > >> > > >> > >> > >> >> wrote: > >> > >> > >> >> > >> > >> > >> >> > Quick update on the RC. > >> > >> > >> >> > > >> > >> > >> >> > Found a build issue when building scala 2.12 and sent a PR for > that - > >> > >> > >> >> > https://github.com/apache/hudi/pull/1976 . Working on resolving > >> this in > >> > >> > >> >> > the > >> > >> > >> >> > release branch and updating RC. Will update soon. > >> > >> > >> >> > > >> > >> > >> >> > Thanks, > >> > >> > >> >> > Sudha > >> > >> > >> >> > > >> > >> > >> >> > On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar > >> > >> > >> >> wrote: > >> > >> > >> >> > > >> > >> > >> >> > > Thanks Sudha! This is means master is now open for regular PRs. > >> Thanks > >> > >> > >> >> > for > >> > >> > >> >> > > your patience, everyone. > >> > >> > >> >> > > > >> > >> > >> >> > > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha < > >> > >> > >> >> bhavanisud...@gmail.com> > >> > >> > >> >> > > wrote: > >> > >> > >> >> > > > >> > >> > >> >> > > > Hello all, > >> > >> > >> >> > > > > >> > >> > >> >> > > > We have cut the release branch - > >> > >> > >> >> > > > https://github.com/apache/hudi/tree/release-0.6.0 . Since it > is > >> > >> > >> >> > already > >> > >> > >> >> > > > Friday, we will be sending the release candidate early next > week > >> > >> > >> >> (after > >> > >> > >> >> > > > some testing). > >> > >> > >> >> > > > > >> > >> > >> >> > > > Happy Friday! > >> > >> > >> >> > > > > >> > >> > >> >> > > > Thanks, > >> > >> > >> >> > > > Sudha > >> > >> > >> >> > > > > >> > >> > >> >> > > > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org < > >> > >> > >> >> vbal...@apache.org > >> > >
Re: HUDI Read | Leverage Partitions
I am so sorry to bother you. It worked , there was some typo. Really apologize On Wed, Aug 19, 2020 at 7:01 PM tanu dua wrote: > Hi Gary, > I am getting an exception while loading HUDI tables using glob path. Does > it work ? Have someone tried it ? If I use without {} it works > Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: > file:/C:/Hudi/data/co/A/2019/{3,4}; > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:552) > at > org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545) > > On Tue, Jun 30, 2020 at 7:39 PM Tanuj wrote: > >> Thanks a lot. I understand now. >> >> On 2020/06/27 02:45:52, Gary Li wrote: >> > Hi, >> > >> > If you use year=xxx/month=xxx folder structure, you can use Dataset >> > df= >> > >> spark.read().format("hudi").schema(schema).load(+). >> > Without a glob postfix, Spark can automatically load the partition >> > information, just like regular parquet files. >> > >> https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#partition-discovery >> > >> > If you use something like 2020/06, you may need to build the glob string >> > and add it to the load() to skip the unnecessary partitions. e.g. >> > .load(++"2020/{05,06}") >> > >> > Or list one parquet file from different partitions and use a map >> function >> > to load 1 row from each path with a limit clause. >> > >> > On Fri, Jun 26, 2020 at 8:33 AM Tanuj wrote: >> > >> > > Hi, >> > > We have created a table with partition depth of 2 as year/month. We >> need >> > > to read data from HUDI in Spark Streaming layer where we get the >> batch data >> > > of say 10 rows which we need to use to read from HUDI. We are reading >> it >> > > like - >> > > >> > > // Read from HUDI >> > > Dataset df= >> > > >> spark.read().format("hudi").schema(schema).load(++"/*/*") >> > > >> > > //Apply filter >> > > >> > > >> df=df.filter(df.col("year").isin().filter(df.col("month").isin()).filter(df.col("id").isin()); >> > > >> > > Is it the best way to read the data ? Will HUDI take care of just >> reading >> > > from the partitions or we need to take care of ? For eg. If I need to >> read >> > > just 1 row we can build the full path and then read which will read >> the >> > > parquet file from that partition quickly but here our requirement is >> to >> > > read data from multiple partitions. >> > > >> > > >> > > >> > >> >
Re: [DISCUSS] Release 0.6.0 timelines
That would be of great help Allen. Much appreciated. On Wed, Aug 19, 2020 at 9:30 AM Allen Underwood wrote: > Thanks Sivabalan, > > That's definitely the issue I had to resolve when introducing Joda time. > I'll have a look back at my code to see how I got around it. If I remember > correctly, in the original code, the DateTime object was being saved as an > instance variable on the object and I had to do things differently in the > Joda time approach - but it turned out not to be a big deal because the > instance variable wasn't needed. > > I'll see if I can get a PR in to help fix this. > > The worst part is no unit tests catch this - it's a runtime issue I assume > due to how Spark serializes state. > > Thanks, > > Allen > > On Wed, Aug 19, 2020 at 9:23 AM Sivabalan wrote: > >> I am not sure if all findings so far have been documented here. but this >> is >> >> >> the ticket AFSIK: https://issues.apache.org/jira/browse/HUDI-1177 >> >> >> >> >> >> On Wed, Aug 19, 2020 at 9:15 AM Allen Underwood >> >> >> wrote: >> >> >> >> >> >> > Just out of curiosity - what's the blocker - you have an issue? I had >> >> >> > originally done the code to make that work. >> >> >> > >> >> >> > On Wed, Aug 19, 2020 at 12:51 AM Vinoth Chandar >> wrote: >> >> >> > >> >> >> >> We still have 1 blocker issue from TimestampKeyGenerator issue with >> joda >> >> >> >> DateTimeFormatter. Sudha (RM) and Pratyaksh are going to look into >> this. >> >> >> >> >> >> >> >> In the meantime, here's the progress, plans around testing so far. If >> >> >> >> folks >> >> >> >> in the community can help test the release branch in the next couple of >> >> >> >> days, it would be of great help! >> >> >> >> >> >> >> >> >> >> >> >>- Testing plan here: Stability tested by balaji running spark >> >> >> >> streaming, >> >> >> >>Correctness (Nishith testing on release branch, Udit to help with >> EMR >> >> >> >> setup) >> >> >> >>- Testing plan continued: Spark datasource MOR, .. Marker based >> >> >> >>rollback, upgrade-downgrade (vinoth is on this) >> >> >> >>- Testing plan : bootstrap (balaji tested for hive), udit has it >> >> >> >> working >> >> >> >>for Presto in EMR. >> >> >> >>- Testing plan: bulk_insert v2 (vinoth tested for correctness, file >> >> >> >>sizes. ), performance (tested on a cluster, microbenchmarks) >> >> >> >> >> >> >> >> >> >> >> >> There will also be some blogs/docs to explain new stuff. >> >> >> >> >> >> >> >>- Blogs to explain new stuff: sort modes (vinoth), simple index >> >> >> >>(vinoth), bootstrap (balaji), multi delta-streamer (pratyaksh), >> Spark >> >> >> >>Streaming/DeltaStreamer continuous mode blog (balaji) >> >> >> >>- Docs: All missing docs will be added by Sudha. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> On Tue, Aug 18, 2020 at 1:58 AM Bhavani Sudha > > >> >> >> >> wrote: >> >> >> >> >> >> >> >> > Quick update on the RC. >> >> >> >> > >> >> >> >> > Found a build issue when building scala 2.12 and sent a PR for that - >> >> >> >> > https://github.com/apache/hudi/pull/1976 . Working on resolving >> this in >> >> >> >> > the >> >> >> >> > release branch and updating RC. Will update soon. >> >> >> >> > >> >> >> >> > Thanks, >> >> >> >> > Sudha >> >> >> >> > >> >> >> >> > On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar >> >> >> >> wrote: >> >> >> >> > >> >> >> >> > > Thanks Sudha! This is means master is now open for regular PRs. >> Thanks >> >> >> >> > for >> >> >> >> > > your patience, everyone. >> >> >> >> > > >> >> >> >> > > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha < >> >> >> >> bhavanisud...@gmail.com> >> >> >> >> > > wrote: >> >> >> >> > > >> >> >> >> > > > Hello all, >> >> >> >> > > > >> >> >> >> > > > We have cut the release branch - >> >> >> >> > > > https://github.com/apache/hudi/tree/release-0.6.0 . Since it is >> >> >> >> > already >> >> >> >> > > > Friday, we will be sending the release candidate early next week >> >> >> >> (after >> >> >> >> > > > some testing). >> >> >> >> > > > >> >> >> >> > > > Happy Friday! >> >> >> >> > > > >> >> >> >> > > > Thanks, >> >> >> >> > > > Sudha >> >> >> >> > > > >> >> >> >> > > > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org < >> >> >> >> vbal...@apache.org >> >> >> >> > > >> >> >> >> > > > wrote: >> >> >> >> > > > >> >> >> >> > > > > >> >> >> >> > > > > Hi Folks, >> >> >> >> > > > > We are continuing to work on CI stabilization and will cut the >> >> >> >> > release >> >> >> >> > > > > once we stabilize the builds hopefully tonight/tomorrow. >> >> >> >> > > > > Thanks,Balaji.V >> >> >> >> > > > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth >> Chandar < >> >> >> >> > > > > vin...@apache.org> wrote: >> >> >> >> > > > > >> >> >> >> > > > > Hello all, >> >> >> >> > > > > >> >> >> >> > > > > Update on this. We have landed most of the blockers for the >> 0.6.0 >> >> >> >> > > release >> >> >> >> > > > > and I am currently working on the last major blocker, >> HUDI-1013. >> >> >> >
Re: HUDI Read | Leverage Partitions
Hi Gary, I am getting an exception while loading HUDI tables using glob path. Does it work ? Have someone tried it ? If I use without {} it works Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:/Hudi/data/co/A/2019/{3,4}; at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:552) at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary$1.apply(DataSource.scala:545) On Tue, Jun 30, 2020 at 7:39 PM Tanuj wrote: > Thanks a lot. I understand now. > > On 2020/06/27 02:45:52, Gary Li wrote: > > Hi, > > > > If you use year=xxx/month=xxx folder structure, you can use Dataset > > df= > > > spark.read().format("hudi").schema(schema).load(+). > > Without a glob postfix, Spark can automatically load the partition > > information, just like regular parquet files. > > > https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#partition-discovery > > > > If you use something like 2020/06, you may need to build the glob string > > and add it to the load() to skip the unnecessary partitions. e.g. > > .load(++"2020/{05,06}") > > > > Or list one parquet file from different partitions and use a map function > > to load 1 row from each path with a limit clause. > > > > On Fri, Jun 26, 2020 at 8:33 AM Tanuj wrote: > > > > > Hi, > > > We have created a table with partition depth of 2 as year/month. We > need > > > to read data from HUDI in Spark Streaming layer where we get the batch > data > > > of say 10 rows which we need to use to read from HUDI. We are reading > it > > > like - > > > > > > // Read from HUDI > > > Dataset df= > > > > spark.read().format("hudi").schema(schema).load(++"/*/*") > > > > > > //Apply filter > > > > > > > df=df.filter(df.col("year").isin().filter(df.col("month").isin()).filter(df.col("id").isin()); > > > > > > Is it the best way to read the data ? Will HUDI take care of just > reading > > > from the partitions or we need to take care of ? For eg. If I need to > read > > > just 1 row we can build the full path and then read which will read the > > > parquet file from that partition quickly but here our requirement is to > > > read data from multiple partitions. > > > > > > > > > > > >
Re: [DISCUSS] Release 0.6.0 timelines
Thanks Sivabalan, That's definitely the issue I had to resolve when introducing Joda time. I'll have a look back at my code to see how I got around it. If I remember correctly, in the original code, the DateTime object was being saved as an instance variable on the object and I had to do things differently in the Joda time approach - but it turned out not to be a big deal because the instance variable wasn't needed. I'll see if I can get a PR in to help fix this. The worst part is no unit tests catch this - it's a runtime issue I assume due to how Spark serializes state. Thanks, Allen On Wed, Aug 19, 2020 at 9:23 AM Sivabalan wrote: > I am not sure if all findings so far have been documented here. but this is > the ticket AFSIK: https://issues.apache.org/jira/browse/HUDI-1177 > > On Wed, Aug 19, 2020 at 9:15 AM Allen Underwood > wrote: > > > Just out of curiosity - what's the blocker - you have an issue? I had > > originally done the code to make that work. > > > > On Wed, Aug 19, 2020 at 12:51 AM Vinoth Chandar > wrote: > > > >> We still have 1 blocker issue from TimestampKeyGenerator issue with joda > >> DateTimeFormatter. Sudha (RM) and Pratyaksh are going to look into this. > >> > >> In the meantime, here's the progress, plans around testing so far. If > >> folks > >> in the community can help test the release branch in the next couple of > >> days, it would be of great help! > >> > >> > >>- Testing plan here: Stability tested by balaji running spark > >> streaming, > >>Correctness (Nishith testing on release branch, Udit to help with EMR > >> setup) > >>- Testing plan continued: Spark datasource MOR, .. Marker based > >>rollback, upgrade-downgrade (vinoth is on this) > >>- Testing plan : bootstrap (balaji tested for hive), udit has it > >> working > >>for Presto in EMR. > >>- Testing plan: bulk_insert v2 (vinoth tested for correctness, file > >>sizes. ), performance (tested on a cluster, microbenchmarks) > >> > >> > >> There will also be some blogs/docs to explain new stuff. > >> > >>- Blogs to explain new stuff: sort modes (vinoth), simple index > >>(vinoth), bootstrap (balaji), multi delta-streamer (pratyaksh), Spark > >>Streaming/DeltaStreamer continuous mode blog (balaji) > >>- Docs: All missing docs will be added by Sudha. > >> > >> > >> > >> > >> On Tue, Aug 18, 2020 at 1:58 AM Bhavani Sudha > >> wrote: > >> > >> > Quick update on the RC. > >> > > >> > Found a build issue when building scala 2.12 and sent a PR for that - > >> > https://github.com/apache/hudi/pull/1976 . Working on resolving this > in > >> > the > >> > release branch and updating RC. Will update soon. > >> > > >> > Thanks, > >> > Sudha > >> > > >> > On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar > >> wrote: > >> > > >> > > Thanks Sudha! This is means master is now open for regular PRs. > Thanks > >> > for > >> > > your patience, everyone. > >> > > > >> > > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha < > >> bhavanisud...@gmail.com> > >> > > wrote: > >> > > > >> > > > Hello all, > >> > > > > >> > > > We have cut the release branch - > >> > > > https://github.com/apache/hudi/tree/release-0.6.0 . Since it is > >> > already > >> > > > Friday, we will be sending the release candidate early next week > >> (after > >> > > > some testing). > >> > > > > >> > > > Happy Friday! > >> > > > > >> > > > Thanks, > >> > > > Sudha > >> > > > > >> > > > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org < > >> vbal...@apache.org > >> > > > >> > > > wrote: > >> > > > > >> > > > > > >> > > > > Hi Folks, > >> > > > > We are continuing to work on CI stabilization and will cut the > >> > release > >> > > > > once we stabilize the builds hopefully tonight/tomorrow. > >> > > > > Thanks,Balaji.V > >> > > > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth > Chandar < > >> > > > > vin...@apache.org> wrote: > >> > > > > > >> > > > > Hello all, > >> > > > > > >> > > > > Update on this. We have landed most of the blockers for the > 0.6.0 > >> > > release > >> > > > > and I am currently working on the last major blocker, HUDI-1013. > >> > > > > We are working through some unexpected CI flakiness. We hope to > >> > > stabilize > >> > > > > master, cut the RC, and then open up master for regular PR > merges. > >> > > > > ETA for this is tomorrow night PST (Aug 12, PST). > >> > > > > > >> > > > > We will keep this thread posted! > >> > > > > > >> > > > > Thanks > >> > > > > Vinoth > >> > > > > > >> > > > > On Tue, Aug 4, 2020 at 9:47 PM Vinoth Chandar < > vin...@apache.org> > >> > > wrote: > >> > > > > > >> > > > > > Small correction: > >> > > > > > > >> > > > > > >> Vinoth working on code review, tests for PR 1876, > >> > > > > > This is landed! > >> > > > > > > >> > > > > > > >> > > > > > On Tue, Aug 4, 2020 at 9:44 PM Bhavani Sudha < > >> > > bhavanisud...@gmail.com> > >> > > > > > wrote: > >> > > > > > > >> > > > > >> Hello all, > >> > > > > >> > >> > > > > >> We are targeting the end of
Re: [DISCUSS] Release 0.6.0 timelines
I am not sure if all findings so far have been documented here. but this is the ticket AFSIK: https://issues.apache.org/jira/browse/HUDI-1177 On Wed, Aug 19, 2020 at 9:15 AM Allen Underwood wrote: > Just out of curiosity - what's the blocker - you have an issue? I had > originally done the code to make that work. > > On Wed, Aug 19, 2020 at 12:51 AM Vinoth Chandar wrote: > >> We still have 1 blocker issue from TimestampKeyGenerator issue with joda >> DateTimeFormatter. Sudha (RM) and Pratyaksh are going to look into this. >> >> In the meantime, here's the progress, plans around testing so far. If >> folks >> in the community can help test the release branch in the next couple of >> days, it would be of great help! >> >> >>- Testing plan here: Stability tested by balaji running spark >> streaming, >>Correctness (Nishith testing on release branch, Udit to help with EMR >> setup) >>- Testing plan continued: Spark datasource MOR, .. Marker based >>rollback, upgrade-downgrade (vinoth is on this) >>- Testing plan : bootstrap (balaji tested for hive), udit has it >> working >>for Presto in EMR. >>- Testing plan: bulk_insert v2 (vinoth tested for correctness, file >>sizes. ), performance (tested on a cluster, microbenchmarks) >> >> >> There will also be some blogs/docs to explain new stuff. >> >>- Blogs to explain new stuff: sort modes (vinoth), simple index >>(vinoth), bootstrap (balaji), multi delta-streamer (pratyaksh), Spark >>Streaming/DeltaStreamer continuous mode blog (balaji) >>- Docs: All missing docs will be added by Sudha. >> >> >> >> >> On Tue, Aug 18, 2020 at 1:58 AM Bhavani Sudha >> wrote: >> >> > Quick update on the RC. >> > >> > Found a build issue when building scala 2.12 and sent a PR for that - >> > https://github.com/apache/hudi/pull/1976 . Working on resolving this in >> > the >> > release branch and updating RC. Will update soon. >> > >> > Thanks, >> > Sudha >> > >> > On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar >> wrote: >> > >> > > Thanks Sudha! This is means master is now open for regular PRs. Thanks >> > for >> > > your patience, everyone. >> > > >> > > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha < >> bhavanisud...@gmail.com> >> > > wrote: >> > > >> > > > Hello all, >> > > > >> > > > We have cut the release branch - >> > > > https://github.com/apache/hudi/tree/release-0.6.0 . Since it is >> > already >> > > > Friday, we will be sending the release candidate early next week >> (after >> > > > some testing). >> > > > >> > > > Happy Friday! >> > > > >> > > > Thanks, >> > > > Sudha >> > > > >> > > > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org < >> vbal...@apache.org >> > > >> > > > wrote: >> > > > >> > > > > >> > > > > Hi Folks, >> > > > > We are continuing to work on CI stabilization and will cut the >> > release >> > > > > once we stabilize the builds hopefully tonight/tomorrow. >> > > > > Thanks,Balaji.V >> > > > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth Chandar < >> > > > > vin...@apache.org> wrote: >> > > > > >> > > > > Hello all, >> > > > > >> > > > > Update on this. We have landed most of the blockers for the 0.6.0 >> > > release >> > > > > and I am currently working on the last major blocker, HUDI-1013. >> > > > > We are working through some unexpected CI flakiness. We hope to >> > > stabilize >> > > > > master, cut the RC, and then open up master for regular PR merges. >> > > > > ETA for this is tomorrow night PST (Aug 12, PST). >> > > > > >> > > > > We will keep this thread posted! >> > > > > >> > > > > Thanks >> > > > > Vinoth >> > > > > >> > > > > On Tue, Aug 4, 2020 at 9:47 PM Vinoth Chandar >> > > wrote: >> > > > > >> > > > > > Small correction: >> > > > > > >> > > > > > >> Vinoth working on code review, tests for PR 1876, >> > > > > > This is landed! >> > > > > > >> > > > > > >> > > > > > On Tue, Aug 4, 2020 at 9:44 PM Bhavani Sudha < >> > > bhavanisud...@gmail.com> >> > > > > > wrote: >> > > > > > >> > > > > >> Hello all, >> > > > > >> >> > > > > >> We are targeting the end of this week to cut RC. Here is an >> update >> > > of >> > > > > >> where >> > > > > >> we are at release blockers. >> > > > > >> >> > > > > >> 0.6.0 Release blocker status (board >> > > > > >> < >> > > > > >> >> > > > > >> > > > >> > > >> > >> https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=397&projectKey=HUDI&view=detail&selectedIssue=HUDI-69 >> > > > > >> >) >> > > > > >> , >> > > > > >> >> > > > > >>- Spark Datasource/MOR >> > https://github.com/apache/hudi/pull/1848 >> > > > > needs >> > > > > >> to >> > > > > >>be tested by gary/balaji (About to land) >> > > > > >>- Hive Sync restructuring (Review done, about to land) >> > > > > >>- Bootstrap >> > > > > >> - Vinoth working on code review, tests for PR 1876, >> > > > > >> - then udit will rework PR 1702 (In Code review) >> > > > > >> - then we will review, land PR 1870, 1869 >> > > > > >>- Bulk insert V2 PR 1834, lowe
Re: [DISCUSS] Release 0.6.0 timelines
Just out of curiosity - what's the blocker - you have an issue? I had originally done the code to make that work. On Wed, Aug 19, 2020 at 12:51 AM Vinoth Chandar wrote: > We still have 1 blocker issue from TimestampKeyGenerator issue with joda > DateTimeFormatter. Sudha (RM) and Pratyaksh are going to look into this. > > In the meantime, here's the progress, plans around testing so far. If folks > in the community can help test the release branch in the next couple of > days, it would be of great help! > > >- Testing plan here: Stability tested by balaji running spark streaming, >Correctness (Nishith testing on release branch, Udit to help with EMR > setup) >- Testing plan continued: Spark datasource MOR, .. Marker based >rollback, upgrade-downgrade (vinoth is on this) >- Testing plan : bootstrap (balaji tested for hive), udit has it working >for Presto in EMR. >- Testing plan: bulk_insert v2 (vinoth tested for correctness, file >sizes. ), performance (tested on a cluster, microbenchmarks) > > > There will also be some blogs/docs to explain new stuff. > >- Blogs to explain new stuff: sort modes (vinoth), simple index >(vinoth), bootstrap (balaji), multi delta-streamer (pratyaksh), Spark >Streaming/DeltaStreamer continuous mode blog (balaji) >- Docs: All missing docs will be added by Sudha. > > > > > On Tue, Aug 18, 2020 at 1:58 AM Bhavani Sudha > wrote: > > > Quick update on the RC. > > > > Found a build issue when building scala 2.12 and sent a PR for that - > > https://github.com/apache/hudi/pull/1976 . Working on resolving this in > > the > > release branch and updating RC. Will update soon. > > > > Thanks, > > Sudha > > > > On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar > wrote: > > > > > Thanks Sudha! This is means master is now open for regular PRs. Thanks > > for > > > your patience, everyone. > > > > > > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha > > > > wrote: > > > > > > > Hello all, > > > > > > > > We have cut the release branch - > > > > https://github.com/apache/hudi/tree/release-0.6.0 . Since it is > > already > > > > Friday, we will be sending the release candidate early next week > (after > > > > some testing). > > > > > > > > Happy Friday! > > > > > > > > Thanks, > > > > Sudha > > > > > > > > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org < > vbal...@apache.org > > > > > > > wrote: > > > > > > > > > > > > > > Hi Folks, > > > > > We are continuing to work on CI stabilization and will cut the > > release > > > > > once we stabilize the builds hopefully tonight/tomorrow. > > > > > Thanks,Balaji.V > > > > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth Chandar < > > > > > vin...@apache.org> wrote: > > > > > > > > > > Hello all, > > > > > > > > > > Update on this. We have landed most of the blockers for the 0.6.0 > > > release > > > > > and I am currently working on the last major blocker, HUDI-1013. > > > > > We are working through some unexpected CI flakiness. We hope to > > > stabilize > > > > > master, cut the RC, and then open up master for regular PR merges. > > > > > ETA for this is tomorrow night PST (Aug 12, PST). > > > > > > > > > > We will keep this thread posted! > > > > > > > > > > Thanks > > > > > Vinoth > > > > > > > > > > On Tue, Aug 4, 2020 at 9:47 PM Vinoth Chandar > > > wrote: > > > > > > > > > > > Small correction: > > > > > > > > > > > > >> Vinoth working on code review, tests for PR 1876, > > > > > > This is landed! > > > > > > > > > > > > > > > > > > On Tue, Aug 4, 2020 at 9:44 PM Bhavani Sudha < > > > bhavanisud...@gmail.com> > > > > > > wrote: > > > > > > > > > > > >> Hello all, > > > > > >> > > > > > >> We are targeting the end of this week to cut RC. Here is an > update > > > of > > > > > >> where > > > > > >> we are at release blockers. > > > > > >> > > > > > >> 0.6.0 Release blocker status (board > > > > > >> < > > > > > >> > > > > > > > > > > > > > > > https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=397&projectKey=HUDI&view=detail&selectedIssue=HUDI-69 > > > > > >> >) > > > > > >> , > > > > > >> > > > > > >>- Spark Datasource/MOR > > https://github.com/apache/hudi/pull/1848 > > > > > needs > > > > > >> to > > > > > >>be tested by gary/balaji (About to land) > > > > > >>- Hive Sync restructuring (Review done, about to land) > > > > > >>- Bootstrap > > > > > >> - Vinoth working on code review, tests for PR 1876, > > > > > >> - then udit will rework PR 1702 (In Code review) > > > > > >> - then we will review, land PR 1870, 1869 > > > > > >>- Bulk insert V2 PR 1834, lower risk, independent PR, well > > tested > > > > > >> already > > > > > >> - Dependent PR 1149 to be landed, > > > > > >> - and modes to be respected in V2 impl as well (At risk) > > > > > >>- Upgrade Downgrade Hooks, PR 1858 : (In Code review) > > > > > >>- HUDI-1054- Marker list perf improvement, Udit has a PR out > > > > > >>- HUDI-115 : Overwrite with...
Re: JIRA contributor permission
Hi Guoguang, Done and welcome to Hudi community! Best, Vino Guoguang.Wang 于2020年8月19日周三 下午6:47写道: > Hi, > I want to contribute to Apache Hudi. > Would you please give me the contributor permission? > My JIRA ID is guoguang.wang
JIRA contributor permission
Hi, I want to contribute to Apache Hudi. Would you please give me the contributor permission? My JIRA ID is guoguang.wang
Re: Re: [DISCUSS] Release 0.6.0 timelines
Hi, vc I also want to do some tests against the release branch. 957029...@qq.com From: Vinoth Chandar Date: 2020-08-19 12:51 To: dev Subject: Re: [DISCUSS] Release 0.6.0 timelines We still have 1 blocker issue from TimestampKeyGenerator issue with joda DateTimeFormatter. Sudha (RM) and Pratyaksh are going to look into this. In the meantime, here's the progress, plans around testing so far. If folks in the community can help test the release branch in the next couple of days, it would be of great help! - Testing plan here: Stability tested by balaji running spark streaming, Correctness (Nishith testing on release branch, Udit to help with EMR setup) - Testing plan continued: Spark datasource MOR, .. Marker based rollback, upgrade-downgrade (vinoth is on this) - Testing plan : bootstrap (balaji tested for hive), udit has it working for Presto in EMR. - Testing plan: bulk_insert v2 (vinoth tested for correctness, file sizes. ), performance (tested on a cluster, microbenchmarks) There will also be some blogs/docs to explain new stuff. - Blogs to explain new stuff: sort modes (vinoth), simple index (vinoth), bootstrap (balaji), multi delta-streamer (pratyaksh), Spark Streaming/DeltaStreamer continuous mode blog (balaji) - Docs: All missing docs will be added by Sudha. On Tue, Aug 18, 2020 at 1:58 AM Bhavani Sudha wrote: > Quick update on the RC. > > Found a build issue when building scala 2.12 and sent a PR for that - > https://github.com/apache/hudi/pull/1976 . Working on resolving this in > the > release branch and updating RC. Will update soon. > > Thanks, > Sudha > > On Fri, Aug 14, 2020 at 5:56 PM Vinoth Chandar wrote: > > > Thanks Sudha! This is means master is now open for regular PRs. Thanks > for > > your patience, everyone. > > > > On Fri, Aug 14, 2020 at 3:51 PM Bhavani Sudha > > wrote: > > > > > Hello all, > > > > > > We have cut the release branch - > > > https://github.com/apache/hudi/tree/release-0.6.0 . Since it is > already > > > Friday, we will be sending the release candidate early next week (after > > > some testing). > > > > > > Happy Friday! > > > > > > Thanks, > > > Sudha > > > > > > On Wed, Aug 12, 2020 at 3:56 PM vbal...@apache.org > > > > wrote: > > > > > > > > > > > Hi Folks, > > > > We are continuing to work on CI stabilization and will cut the > release > > > > once we stabilize the builds hopefully tonight/tomorrow. > > > > Thanks,Balaji.V > > > > On Tuesday, August 11, 2020, 09:15:05 PM PDT, Vinoth Chandar < > > > > vin...@apache.org> wrote: > > > > > > > > Hello all, > > > > > > > > Update on this. We have landed most of the blockers for the 0.6.0 > > release > > > > and I am currently working on the last major blocker, HUDI-1013. > > > > We are working through some unexpected CI flakiness. We hope to > > stabilize > > > > master, cut the RC, and then open up master for regular PR merges. > > > > ETA for this is tomorrow night PST (Aug 12, PST). > > > > > > > > We will keep this thread posted! > > > > > > > > Thanks > > > > Vinoth > > > > > > > > On Tue, Aug 4, 2020 at 9:47 PM Vinoth Chandar > > wrote: > > > > > > > > > Small correction: > > > > > > > > > > >> Vinoth working on code review, tests for PR 1876, > > > > > This is landed! > > > > > > > > > > > > > > > On Tue, Aug 4, 2020 at 9:44 PM Bhavani Sudha < > > bhavanisud...@gmail.com> > > > > > wrote: > > > > > > > > > >> Hello all, > > > > >> > > > > >> We are targeting the end of this week to cut RC. Here is an update > > of > > > > >> where > > > > >> we are at release blockers. > > > > >> > > > > >> 0.6.0 Release blocker status (board > > > > >> < > > > > >> > > > > > > > > > > https://issues.apache.org/jira/secure/RapidBoard.jspa?rapidView=397&projectKey=HUDI&view=detail&selectedIssue=HUDI-69 > > > > >> >) > > > > >> , > > > > >> > > > > >>- Spark Datasource/MOR > https://github.com/apache/hudi/pull/1848 > > > > needs > > > > >> to > > > > >>be tested by gary/balaji (About to land) > > > > >>- Hive Sync restructuring (Review done, about to land) > > > > >>- Bootstrap > > > > >> - Vinoth working on code review, tests for PR 1876, > > > > >> - then udit will rework PR 1702 (In Code review) > > > > >> - then we will review, land PR 1870, 1869 > > > > >>- Bulk insert V2 PR 1834, lower risk, independent PR, well > tested > > > > >> already > > > > >> - Dependent PR 1149 to be landed, > > > > >> - and modes to be respected in V2 impl as well (At risk) > > > > >>- Upgrade Downgrade Hooks, PR 1858 : (In Code review) > > > > >>- HUDI-1054- Marker list perf improvement, Udit has a PR out > > > > >>- HUDI-115 : Overwrite with... ordering issue, Sudha has a PR > > > nearing > > > > >>landing > > > > >>- HUDI-1098 : Marker file issue with non-existent files. (In > Code > > > > >> review) > > > > >>- Spark Streaming + Async Compaction , test complete, code > review > > > > >>comments and l