[jira] [Updated] (HUDI-2619) Make table services work with Dataset

2021-11-03 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2619:
-
Fix Version/s: (was: 0.10.0)

> Make table services work with Dataset
> --
>
> Key: HUDI-2619
> URL: https://issues.apache.org/jira/browse/HUDI-2619
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: Raymond Xu
>Assignee: Raymond Xu
>Priority: Blocker
>
> Clustering, Compaction, Clean should also work with Dataset



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-1885) Support Delete/Update Non-Pk Table

2021-11-03 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-1885:
-
Labels: sev:critical  (was: )

> Support Delete/Update Non-Pk Table
> --
>
> Key: HUDI-1885
> URL: https://issues.apache.org/jira/browse/HUDI-1885
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: sev:critical
> Fix For: 0.10.0
>
>
> Allow to delete/update a non-pk table.
> {code:java}
> create table h0 (
>   id int,
>   name string,
>   price double
> ) using hudi;
> delete from h0 where id = 10;
> update h0 set price = 10 where id = 12;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2234) MERGE INTO works only ON primary key

2021-11-03 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-2234:
-
Labels: sev:critical  (was: )

> MERGE INTO works only ON primary key
> 
>
> Key: HUDI-2234
> URL: https://issues.apache.org/jira/browse/HUDI-2234
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Sagar Sumit
>Assignee: Raymond Xu
>Priority: Blocker
>  Labels: sev:critical
> Fix For: 0.10.0
>
>
> {code:sql}
> drop table if exists hudi_gh_ext_fixed;
> create table hudi_gh_ext_fixed (id int, name string, price double, ts long) 
> using hudi options(primaryKey = 'id', precombineField = 'ts') location 
> 'file:///tmp/hudi-h4-fixed';
> insert into hudi_gh_ext_fixed values(3, 'AMZN', 300, 120);
> insert into hudi_gh_ext_fixed values(2, 'UBER', 300, 120);
> insert into hudi_gh_ext_fixed values(4, 'GOOG', 300, 120);
> update hudi_gh_ext_fixed set price = 150.0 where name = 'UBER';
> drop table if exists hudi_fixed;
> create table hudi_fixed (id int, name string, price double, ts long) using 
> hudi options(primaryKey = 'id', precombineField = 'ts') partitioned by (ts) 
> location 'file:///tmp/hudi-h4-part-fixed';
> insert into hudi_fixed values(2, 'UBER', 200, 120);
> MERGE INTO hudi_fixed 
> USING (select id, name, price, ts from hudi_gh_ext_fixed) updates
> ON hudi_fixed.name = updates.name
> WHEN MATCHED THEN
>   UPDATE SET *
> WHEN NOT MATCHED
>   THEN INSERT *;
> -- java.lang.IllegalArgumentException: Merge Key[name] is not Equal to the 
> defined primary key[id] in table hudi_fixed
> --at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.buildMergeIntoConfig(MergeIntoHoodieTableCommand.scala:425)
> --at 
> org.apache.spark.sql.hudi.command.MergeIntoHoodieTableCommand.run(MergeIntoHoodieTableCommand.scala:146)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> --at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> --at 
> org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:229)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (HUDI-1869) Upgrading Spark3 To 3.1

2021-11-03 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu closed HUDI-1869.


> Upgrading Spark3 To 3.1
> ---
>
> Key: HUDI-1869
> URL: https://issues.apache.org/jira/browse/HUDI-1869
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: Yann Byron
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Spark 3.1 has changed some behavior of the internal class and interface for 
> both spark-sql and spark-core module.
> Currently hudi can't compile success under the spark 3.1. We need support sql 
> support for spark 3.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3519: [DO NOT MERGE] 0.9.0 release patch for flink

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3519:
URL: https://github.com/apache/hudi/pull/3519#issuecomment-903204631


   
   ## CI report:
   
   * d022aa7a5bd94492c7c3e96dc5b1288268520087 UNKNOWN
   * b7c5b8c9b25eecca9b2d2f5da7dadb4f32aaf3e6 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2994)
 
   * a654fcd138e63ccd728ad5fa0237563742b392fe UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3912: [HUDI-2665] Fix overflow of huge log file in HoodieLogFormatWriter

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3912:
URL: https://github.com/apache/hudi/pull/3912#issuecomment-958682811


   
   ## CI report:
   
   * abbd66373198288c79bd9cde7b9d30c769c1dce3 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3094)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3519: [DO NOT MERGE] 0.9.0 release patch for flink

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3519:
URL: https://github.com/apache/hudi/pull/3519#issuecomment-903204631


   
   ## CI report:
   
   * d022aa7a5bd94492c7c3e96dc5b1288268520087 UNKNOWN
   * a654fcd138e63ccd728ad5fa0237563742b392fe Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3096)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] guanziyue commented on a change in pull request #3912: [HUDI-2665] Fix overflow of huge log file in HoodieLogFormatWriter

2021-11-03 Thread GitBox


guanziyue commented on a change in pull request #3912:
URL: https://github.com/apache/hudi/pull/3912#discussion_r741670205



##
File path: 
hudi-common/src/main/java/org/apache/hudi/common/table/log/HoodieLogFormatWriter.java
##
@@ -148,10 +148,11 @@ public AppendResult appendBlocks(List 
blocks) throws IOException
 HoodieLogFormat.LogFormatVersion currentLogFormatVersion =
 new HoodieLogFormatVersion(HoodieLogFormat.CURRENT_VERSION);
 
-FSDataOutputStream outputStream = getOutputStream();
-long startPos = outputStream.getPos();
+FSDataOutputStream originalOutputStream = getOutputStream();

Review comment:
   Ummm. Yes we could have a test to write a huge log block to have a 
check. But it may affect UT performance a lot. Not sure if an UT relevant to 
such a reworking of existing logic is necessary.  Anyway, I'm glad to add a UT 
if it is compulsory.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] manojpec commented on pull request #3871: [HUDI-2593][WIP] Enabling virtual keys for the metadata table

2021-11-03 Thread GitBox


manojpec commented on pull request #3871:
URL: https://github.com/apache/hudi/pull/3871#issuecomment-958713288


   > But partition path for the metadata table are hardcoded. Can that be 
helpful? Removing the fields will save a lot of storage space from record level 
index.
   
   @prashantwason So far we only have `files` partition under metadata table. 
But, we are planning to bring in more partitions for storing other indices. So, 
the assumption of single partition for the metadata table will not hold good 
for long. Otherwise, removing 5 meta fields from each record by enabling 
virtual keys would definitely save a lot of space. We either have to improve 
the current metadata schema or infer the partition path from other cues for 
now. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3808: [HUDI-2560] introduce id_based schema to support full schema evolution.

2021-11-03 Thread GitBox


xiarixiaoyao commented on a change in pull request #3808:
URL: https://github.com/apache/hudi/pull/3808#discussion_r740693786



##
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/AbstractHoodieWriteClient.java
##
@@ -1078,4 +1088,138 @@ public void close() {
 this.heartbeatClient.stop();
 this.txnManager.close();
   }
+
+  /**
+   * add columns to table.
+   *
+   * @param colName col name to be added. if we want to add col to a nested 
filed, the fullName should be specify
+   * @param schema col type to be added.
+   * @param doc col doc to be added.
+   * @param position col position to be added
+   * @param positionType col position change type. now support three change 
types: first/after/before
+   */
+  public void addCol(String colName, Schema schema, String doc, String 
position, TableChange.ColumnPositionChange.ColumnPositionType positionType) {
+Pair pair = 
getInternalSchemaAndMetaClient();
+InternalSchema newSchema = SchemaChangePersistHelper
+.applyAddChange(pair.getLeft(), colName, 
AvroInternalSchemaConverter.convertToField(schema), doc, position, 
positionType);
+commitTableChange(newSchema, pair.getRight());
+  }
+
+  public void addCol(String colName, Schema schema) {
+addCol(colName, schema, null, null, null);

Review comment:
   This is because such operations as hive / spark / MySQL use string as a 
parameter。 eg: alter table xxx add columns(name string **after**/**before** id 
. i think it will be better to use  String as param




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] Kavin88 opened a new issue #3913: [SUPPORT] Hudi deltastreamer deployment Model

2021-11-03 Thread GitBox


Kavin88 opened a new issue #3913:
URL: https://github.com/apache/hudi/issues/3913


   what is the deployment model to be followed for Hudi deltastreamer ? Could 
not find any references other than direct spark-submit. Can it be called 
through .py python file as spark submit ? property and config file would be 
different for each table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] vinothchandar opened a new pull request #3914: [DOCS] Add a page to share community sync details

2021-11-03 Thread GitBox


vinothchandar opened a new pull request #3914:
URL: https://github.com/apache/hudi/pull/3914


- Added a new "Community" top nav
- Separated general community info from dev/contribution info
- Added details for community call and also office hours
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290


   
   ## CI report:
   
   * a3677e66a1fb13c1a91d6beb977b00ddfdd6a51e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3089)
 
   * 624c64a620b67825474733e9b056ca275b4c01e2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] dongkelun commented on pull request #3700: [HUDI-2471] Add support ignoring case in merge into

2021-11-03 Thread GitBox


dongkelun commented on pull request #3700:
URL: https://github.com/apache/hudi/pull/3700#issuecomment-958744191


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290


   
   ## CI report:
   
   * a3677e66a1fb13c1a91d6beb977b00ddfdd6a51e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3089)
 
   * 624c64a620b67825474733e9b056ca275b4c01e2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3097)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3700: [HUDI-2471] Add support ignoring case in merge into

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3700:
URL: https://github.com/apache/hudi/pull/3700#issuecomment-924523979


   
   ## CI report:
   
   * f4c30a0554777e8c871d1aeee0ede783e709c6ee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3041)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3098)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope commented on a change in pull request #3914: [DOCS] Add a page to share community sync details

2021-11-03 Thread GitBox


codope commented on a change in pull request #3914:
URL: https://github.com/apache/hudi/pull/3914#discussion_r741709193



##
File path: website/community/syncs.md
##
@@ -0,0 +1,41 @@
+---
+sidebar_position: 2
+title: "Community Syncs"

Review comment:
   Does "Meetups" sound better than "Syncs"?

##
File path: website/community/syncs.md
##
@@ -0,0 +1,41 @@
+---
+sidebar_position: 2
+title: "Community Syncs"
+toc: true
+last_modified_at: 2020-09-01T15:59:57-04:00
+---
+
+# Community Syncs
+
+We have setup the following regular syncs for community users and developers 
to meet, interact and exchange ideas. 
+Meetings will be recorded and made available, on a best-effort basis.
+
+## Monthly Community Call
+
+Every month on the Last Wed, 07:00 AM Pacific Time (US and Canada)([translate 
to other time 
zones](https://www.worldtimebuddy.com/?qm=1&lid=5341145,5128581,1264527,1796236&h=5341145&date=2021-11-24&sln=7-8&hf=1))
+
+**Typical agenda**
+
+*   \[15 mins\] Progress updates & Plans(PMC member)

Review comment:
   missing space after Plans: `...Plans(PMC member)`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2584) Test Bloom filter based out of metadata table.

2021-11-03 Thread Vinoth Chandar (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinoth Chandar updated HUDI-2584:
-
Labels: release-blocker  (was: release-blocker sev:critical)

> Test Bloom filter based out of metadata table. 
> ---
>
> Key: HUDI-2584
> URL: https://issues.apache.org/jira/browse/HUDI-2584
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: sivabalan narayanan
>Assignee: Manoj Govindassamy
>Priority: Blocker
>  Labels: release-blocker
> Fix For: 0.10.0
>
>
> Test Bloom filter based out of metadata table.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codope commented on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-11-03 Thread GitBox


codope commented on pull request #3799:
URL: https://github.com/apache/hudi/pull/3799#issuecomment-958779489


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 removed a comment on pull request #3897: [HUDI-2658] When disable auto clean, do not check if MIN_COMMITS_TO_KEEP was larger CLEANER_COMMITS_RETAINED or not.

2021-11-03 Thread GitBox


zhangyue19921010 removed a comment on pull request #3897:
URL: https://github.com/apache/hudi/pull/3897#issuecomment-957143790


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on pull request #3897: [HUDI-2658] When disable auto clean, do not check if MIN_COMMITS_TO_KEEP was larger CLEANER_COMMITS_RETAINED or not.

2021-11-03 Thread GitBox


zhangyue19921010 commented on pull request #3897:
URL: https://github.com/apache/hudi/pull/3897#issuecomment-958782891


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 commented on pull request #3765: [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job

2021-11-03 Thread GitBox


zhangyue19921010 commented on pull request #3765:
URL: https://github.com/apache/hudi/pull/3765#issuecomment-958784469


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] zhangyue19921010 removed a comment on pull request #3765: [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job

2021-11-03 Thread GitBox


zhangyue19921010 removed a comment on pull request #3765:
URL: https://github.com/apache/hudi/pull/3765#issuecomment-957144562


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290


   
   ## CI report:
   
   * 624c64a620b67825474733e9b056ca275b4c01e2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3097)
 
   * ce26ca89a86e91f41bc38dee2acebdc6b65cde06 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3700: [HUDI-2471] Add support ignoring case in merge into

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3700:
URL: https://github.com/apache/hudi/pull/3700#issuecomment-924523979


   
   ## CI report:
   
   * f4c30a0554777e8c871d1aeee0ede783e709c6ee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3041)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3098)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3765: [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3765:
URL: https://github.com/apache/hudi/pull/3765#issuecomment-938488397


   
   ## CI report:
   
   * 12e84c0d582fb0a958a352df5e72301828cbc6a6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3003)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3050)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3100)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3808: [HUDI-2560] introduce id_based schema to support full schema evolution.

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3808:
URL: https://github.com/apache/hudi/pull/3808#issuecomment-944105314


   
   ## CI report:
   
   * 56e39326f5995cce4f17d8175fe8315ca58d17fc Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2662)
 
   * 440a128f83166e148f59de87784e86a61d521047 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3799:
URL: https://github.com/apache/hudi/pull/3799#issuecomment-943176004


   
   ## CI report:
   
   * aa02b3508fee06bf0f3fd03b65d016eaeb9e4a65 UNKNOWN
   * e98d19ea99ead03b9360e04b1d006a67cf68a285 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2858)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2970)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3101)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3897: [HUDI-2658] When disable auto clean, do not check if MIN_COMMITS_TO_KEEP was larger CLEANER_COMMITS_RETAINED or not.

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3897:
URL: https://github.com/apache/hudi/pull/3897#issuecomment-955953458


   
   ## CI report:
   
   * 3b77c9e8123fa8af3c0dc9aca8c3ec1568c9bd7c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3002)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3004)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3031)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3048)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3103)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290


   
   ## CI report:
   
   * 624c64a620b67825474733e9b056ca275b4c01e2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3097)
 
   * ce26ca89a86e91f41bc38dee2acebdc6b65cde06 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3099)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3808: [HUDI-2560] introduce id_based schema to support full schema evolution.

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3808:
URL: https://github.com/apache/hudi/pull/3808#issuecomment-944105314


   
   ## CI report:
   
   * 56e39326f5995cce4f17d8175fe8315ca58d17fc Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2662)
 
   * 440a128f83166e148f59de87784e86a61d521047 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3102)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290


   
   ## CI report:
   
   * 624c64a620b67825474733e9b056ca275b4c01e2 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3097)
 
   * ce26ca89a86e91f41bc38dee2acebdc6b65cde06 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3099)
 
   * 8bd01e8237b9c449d5cc532a2b2a6e6837e487cf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290


   
   ## CI report:
   
   * ce26ca89a86e91f41bc38dee2acebdc6b65cde06 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3099)
 
   * 8bd01e8237b9c449d5cc532a2b2a6e6837e487cf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-2582) Fix concurrent key generations for bulk insert row writer path

2021-11-03 Thread yao.zhou (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yao.zhou resolved HUDI-2582.

Resolution: Fixed

> Fix concurrent key generations for bulk insert row writer path 
> ---
>
> Key: HUDI-2582
> URL: https://issues.apache.org/jira/browse/HUDI-2582
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: sivabalan narayanan
>Assignee: yao.zhou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Bulkinsert row writer path registers key generations as udf, but uses static 
> strings. So, if concurrent table runs, they might collide. 
> https://github.com/apache/hudi/issues/3759



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2677) Add DFS based message queue for flink writer

2021-11-03 Thread Danny Chen (Jira)
Danny Chen created HUDI-2677:


 Summary: Add DFS based message queue for flink writer
 Key: HUDI-2677
 URL: https://issues.apache.org/jira/browse/HUDI-2677
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 opened a new pull request #3915: [HUDI-2677] Add DFS based message queue for flink writer

2021-11-03 Thread GitBox


danny0405 opened a new pull request #3915:
URL: https://github.com/apache/hudi/pull/3915


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2677) Add DFS based message queue for flink writer

2021-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2677:
-
Labels: pull-request-available  (was: )

> Add DFS based message queue for flink writer
> 
>
> Key: HUDI-2677
> URL: https://issues.apache.org/jira/browse/HUDI-2677
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2678) flink writer writes huge log file

2021-11-03 Thread Danny Chen (Jira)
Danny Chen created HUDI-2678:


 Summary: flink writer writes huge log file
 Key: HUDI-2678
 URL: https://issues.apache.org/jira/browse/HUDI-2678
 Project: Apache Hudi
  Issue Type: Bug
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 opened a new pull request #3916: [HUDI-2678] flink writer writes huge log file

2021-11-03 Thread GitBox


danny0405 opened a new pull request #3916:
URL: https://github.com/apache/hudi/pull/3916


   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2678) flink writer writes huge log file

2021-11-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2678:
-
Labels: pull-request-available  (was: )

> flink writer writes huge log file
> -
>
> Key: HUDI-2678
> URL: https://issues.apache.org/jira/browse/HUDI-2678
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] liujinhui1994 commented on pull request #3614: [HUDI-2370] Supports data encryption

2021-11-03 Thread GitBox


liujinhui1994 commented on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-958882525


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2674) hudi hive reader should not print read values

2021-11-03 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437923#comment-17437923
 ] 

sivabalan narayanan commented on HUDI-2674:
---

Fixed via master 
[https://github.com/apache/hudi/commit/5517d292f917821879d41af515f4ed7331d54ba2]

 

> hudi hive reader should not print read values
> -
>
> Key: HUDI-2674
> URL: https://issues.apache.org/jira/browse/HUDI-2674
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.9.0
> Environment: hudi 0.9.0
> hive 3.1.1
> hadoop 3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> now when we use hive to query hudi table and set 
> hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
> all read values will be print. This can lead to performance problems and data 
> security problems,
> as:
> xxx 20:10:45,045 | INFO | main | Reading from record reader | 
> HoodieCombineRealtimeRecordReader.java:69
> xx 20:10:45,045 | INFO | main | "values_0.158268513314199_10": 
> \{"value0":"20211102192749","type0":"Text","value1":"null","type1":"unknown","value2":"null","type2":"unknown","value3":"null","type3":"unknown","value4":"null","type4":"unknown","value5":"16","type5":"IntWritable","value6":"16jack","type6":"Text","value7":"null","type7":"unknown","value8":"null","type8":"unknown","value9":"null","type9":"unknown"}
>  | HoodieCombineRealtimeRecordReader.java:70
> xxx 20:10:45,045 | INFO | main | Reading from record reader | 
> HoodieCombineRealtimeRecordReader.java:69
> xxx 20:10:45,045 | INFO | main | "values_0.16924293134429924_10": 
> \{"value0":"20211102192749","type0":"Text","value1":"null","type1":"unknown","value2":"null","type2":"unknown","value3":"null","type3":"unknown","value4":"null","type4":"unknown","value5":"96","type5":"IntWritable","value6":"96jack","type6":"Text","value7":"null","type7":"unknown","value8":"null","type8":"unknown","value9":"null","type9":"unknown"}
>  | HoodieCombineRealtimeRecordReader.java:70
> 2021-11-02 20:10:45,045 | INFO | main | Reading from record reader | 
> HoodieCombineRealtimeRecordReader.java:69



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2674) hudi hive reader should not print read values

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2674:
--
Status: In Progress  (was: Open)

> hudi hive reader should not print read values
> -
>
> Key: HUDI-2674
> URL: https://issues.apache.org/jira/browse/HUDI-2674
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.9.0
> Environment: hudi 0.9.0
> hive 3.1.1
> hadoop 3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> now when we use hive to query hudi table and set 
> hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
> all read values will be print. This can lead to performance problems and data 
> security problems,
> as:
> xxx 20:10:45,045 | INFO | main | Reading from record reader | 
> HoodieCombineRealtimeRecordReader.java:69
> xx 20:10:45,045 | INFO | main | "values_0.158268513314199_10": 
> \{"value0":"20211102192749","type0":"Text","value1":"null","type1":"unknown","value2":"null","type2":"unknown","value3":"null","type3":"unknown","value4":"null","type4":"unknown","value5":"16","type5":"IntWritable","value6":"16jack","type6":"Text","value7":"null","type7":"unknown","value8":"null","type8":"unknown","value9":"null","type9":"unknown"}
>  | HoodieCombineRealtimeRecordReader.java:70
> xxx 20:10:45,045 | INFO | main | Reading from record reader | 
> HoodieCombineRealtimeRecordReader.java:69
> xxx 20:10:45,045 | INFO | main | "values_0.16924293134429924_10": 
> \{"value0":"20211102192749","type0":"Text","value1":"null","type1":"unknown","value2":"null","type2":"unknown","value3":"null","type3":"unknown","value4":"null","type4":"unknown","value5":"96","type5":"IntWritable","value6":"96jack","type6":"Text","value7":"null","type7":"unknown","value8":"null","type8":"unknown","value9":"null","type9":"unknown"}
>  | HoodieCombineRealtimeRecordReader.java:70
> 2021-11-02 20:10:45,045 | INFO | main | Reading from record reader | 
> HoodieCombineRealtimeRecordReader.java:69



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2674) hudi hive reader should not print read values

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2674.
---
Resolution: Fixed

> hudi hive reader should not print read values
> -
>
> Key: HUDI-2674
> URL: https://issues.apache.org/jira/browse/HUDI-2674
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Hive Integration
>Affects Versions: 0.9.0
> Environment: hudi 0.9.0
> hive 3.1.1
> hadoop 3.1.1
>Reporter: tao meng
>Assignee: tao meng
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> now when we use hive to query hudi table and set 
> hive.input.format=org.apache.hudi.hadoop.hive.HoodieCombineHiveInputFormat;
> all read values will be print. This can lead to performance problems and data 
> security problems,
> as:
> xxx 20:10:45,045 | INFO | main | Reading from record reader | 
> HoodieCombineRealtimeRecordReader.java:69
> xx 20:10:45,045 | INFO | main | "values_0.158268513314199_10": 
> \{"value0":"20211102192749","type0":"Text","value1":"null","type1":"unknown","value2":"null","type2":"unknown","value3":"null","type3":"unknown","value4":"null","type4":"unknown","value5":"16","type5":"IntWritable","value6":"16jack","type6":"Text","value7":"null","type7":"unknown","value8":"null","type8":"unknown","value9":"null","type9":"unknown"}
>  | HoodieCombineRealtimeRecordReader.java:70
> xxx 20:10:45,045 | INFO | main | Reading from record reader | 
> HoodieCombineRealtimeRecordReader.java:69
> xxx 20:10:45,045 | INFO | main | "values_0.16924293134429924_10": 
> \{"value0":"20211102192749","type0":"Text","value1":"null","type1":"unknown","value2":"null","type2":"unknown","value3":"null","type3":"unknown","value4":"null","type4":"unknown","value5":"96","type5":"IntWritable","value6":"96jack","type6":"Text","value7":"null","type7":"unknown","value8":"null","type8":"unknown","value9":"null","type9":"unknown"}
>  | HoodieCombineRealtimeRecordReader.java:70
> 2021-11-02 20:10:45,045 | INFO | main | Reading from record reader | 
> HoodieCombineRealtimeRecordReader.java:69



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2538) Persist configs to hoodie.properties on the first write

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2538.
---
Resolution: Fixed

> Persist configs to hoodie.properties on the first write
> ---
>
> Key: HUDI-2538
> URL: https://issues.apache.org/jira/browse/HUDI-2538
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Some configs, like `keygenerator.class`, `hive_style_partitioning`, 
> `partitionpath.urlencode` should be persisted to hoodie.properties when write 
> data in the first time. Otherwise, some inconsistent behavior will happen. 
> And the other write operation do not need to provide these configs. If 
> configs provided don't match the existing configs, raise exceptions. 
> And, this is also useful to solve some of the keyGenerator discrepancy issues 
> between DataFrame writer and SQL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2538) Persist configs to hoodie.properties on the first write

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2538:
--
Status: In Progress  (was: Open)

> Persist configs to hoodie.properties on the first write
> ---
>
> Key: HUDI-2538
> URL: https://issues.apache.org/jira/browse/HUDI-2538
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Some configs, like `keygenerator.class`, `hive_style_partitioning`, 
> `partitionpath.urlencode` should be persisted to hoodie.properties when write 
> data in the first time. Otherwise, some inconsistent behavior will happen. 
> And the other write operation do not need to provide these configs. If 
> configs provided don't match the existing configs, raise exceptions. 
> And, this is also useful to solve some of the keyGenerator discrepancy issues 
> between DataFrame writer and SQL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-1869) Upgrading Spark3 To 3.1

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-1869:
---

> Upgrading Spark3 To 3.1
> ---
>
> Key: HUDI-1869
> URL: https://issues.apache.org/jira/browse/HUDI-1869
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: Yann Byron
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Spark 3.1 has changed some behavior of the internal class and interface for 
> both spark-sql and spark-core module.
> Currently hudi can't compile success under the spark 3.1. We need support sql 
> support for spark 3.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290


   
   ## CI report:
   
   * ce26ca89a86e91f41bc38dee2acebdc6b65cde06 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3099)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3104)
 
   * 8bd01e8237b9c449d5cc532a2b2a6e6837e487cf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3765: [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3765:
URL: https://github.com/apache/hudi/pull/3765#issuecomment-938488397


   
   ## CI report:
   
   * 12e84c0d582fb0a958a352df5e72301828cbc6a6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3003)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3050)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3100)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3916: [HUDI-2678] flink writer writes huge log file

2021-11-03 Thread GitBox


hudi-bot commented on pull request #3916:
URL: https://github.com/apache/hudi/pull/3916#issuecomment-958913129


   
   ## CI report:
   
   * 28737b38dbae3ff8543c8ed66e88258c69c98e66 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3915: [HUDI-2677] Add DFS based message queue for flink writer

2021-11-03 Thread GitBox


hudi-bot commented on pull request #3915:
URL: https://github.com/apache/hudi/pull/3915#issuecomment-958913033


   
   ## CI report:
   
   * 9c7de7527a24bfe2f0c00138113276c88be63fbf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-1869) Upgrading Spark3 To 3.1

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-1869.
---
Resolution: Fixed

[~biyan900...@gmail.com] [~rxu]: something to keep in mind. do not "close" the 
jiras. We always have to "resolve" them. so that we know track of what gets 
merged. "closed" refers to invalid or something of those meaning. 

> Upgrading Spark3 To 3.1
> ---
>
> Key: HUDI-1869
> URL: https://issues.apache.org/jira/browse/HUDI-1869
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: pengzhiwei
>Assignee: Yann Byron
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Spark 3.1 has changed some behavior of the internal class and interface for 
> both spark-sql and spark-core module.
> Currently hudi can't compile success under the spark 3.1. We need support sql 
> support for spark 3.1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2515) Add close when producing records failed

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2515:
--
Status: Closed  (was: Patch Available)

> Add close when producing records failed
> ---
>
> Key: HUDI-2515
> URL: https://issues.apache.org/jira/browse/HUDI-2515
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.10.0
>
>
> For now,when producing records failed,it will not close the connection.If a 
> large number of clients have such exceptions, not closing the connection will 
> lead to a large number of CLOSE_WAIT, finally causes almost all ports of the 
> server to be occupied, and the client writing data normally cannot obtain 
> port resources, resulting in failure. This will affect the resources of the 
> entire server.
> The detail exceptions:
>  
> {code:java}
> 2021-10-02 10:48:27,335 ERROR [pool-525-thread-1] 
> o.a.h.c.u.queue.BoundedInMemoryExecutor error producing records
> org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: 
> Avro field 'TRANSFER_RESULT' not found
>  at 
> org.apache.parquet.avro.AvroRecordConverter.getAvroField(AvroRecordConverter.java:225)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:130)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)
>  at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>  at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:183)
>  at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
>  at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>  at 
> org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
>  at 
> org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2021-10-02 10:48:28,324 ERROR [pool-525-thread-2] 
> o.a.h.c.u.queue.BoundedInMemoryExecutor error consuming records
> org.apache.hudi.exception.HoodieException: operation has failed
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:247)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:277)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){code}
>  
>  
>  
> {code:java}
> netstat - nlp | grep 1019 | wc -l
> 1456
> tcp 1 0 ip:42280 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:46370 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:54822 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:51444 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:40062 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:34848 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:40574 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:55540 ip-:1019 CLOSE_WAIT
> tcp 0 0 ip:46554 ip-:1019 ESTABLISHED
> tcp 1 0 ip:37418 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:44476 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:40656 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:41044 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:36310 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:58766 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:39426 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:51552 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:32822 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:50938 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:60448 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:47028 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:49492 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:45274 ip-:1019 CLOSE_WAIT
> 

[jira] [Resolved] (HUDI-2515) Add close when producing records failed

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2515.
---
Resolution: Fixed

> Add close when producing records failed
> ---
>
> Key: HUDI-2515
> URL: https://issues.apache.org/jira/browse/HUDI-2515
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.10.0
>
>
> For now,when producing records failed,it will not close the connection.If a 
> large number of clients have such exceptions, not closing the connection will 
> lead to a large number of CLOSE_WAIT, finally causes almost all ports of the 
> server to be occupied, and the client writing data normally cannot obtain 
> port resources, resulting in failure. This will affect the resources of the 
> entire server.
> The detail exceptions:
>  
> {code:java}
> 2021-10-02 10:48:27,335 ERROR [pool-525-thread-1] 
> o.a.h.c.u.queue.BoundedInMemoryExecutor error producing records
> org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: 
> Avro field 'TRANSFER_RESULT' not found
>  at 
> org.apache.parquet.avro.AvroRecordConverter.getAvroField(AvroRecordConverter.java:225)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:130)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)
>  at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>  at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:183)
>  at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
>  at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>  at 
> org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
>  at 
> org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2021-10-02 10:48:28,324 ERROR [pool-525-thread-2] 
> o.a.h.c.u.queue.BoundedInMemoryExecutor error consuming records
> org.apache.hudi.exception.HoodieException: operation has failed
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:247)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:277)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){code}
>  
>  
>  
> {code:java}
> netstat - nlp | grep 1019 | wc -l
> 1456
> tcp 1 0 ip:42280 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:46370 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:54822 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:51444 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:40062 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:34848 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:40574 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:55540 ip-:1019 CLOSE_WAIT
> tcp 0 0 ip:46554 ip-:1019 ESTABLISHED
> tcp 1 0 ip:37418 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:44476 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:40656 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:41044 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:36310 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:58766 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:39426 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:51552 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:32822 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:50938 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:60448 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:47028 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:49492 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:45274 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:38500 ip

[jira] [Reopened] (HUDI-2515) Add close when producing records failed

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-2515:
---

> Add close when producing records failed
> ---
>
> Key: HUDI-2515
> URL: https://issues.apache.org/jira/browse/HUDI-2515
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: 董可伦
>Assignee: 董可伦
>Priority: Major
>  Labels: pull-request-available, sev:critical
> Fix For: 0.10.0
>
>
> For now,when producing records failed,it will not close the connection.If a 
> large number of clients have such exceptions, not closing the connection will 
> lead to a large number of CLOSE_WAIT, finally causes almost all ports of the 
> server to be occupied, and the client writing data normally cannot obtain 
> port resources, resulting in failure. This will affect the resources of the 
> entire server.
> The detail exceptions:
>  
> {code:java}
> 2021-10-02 10:48:27,335 ERROR [pool-525-thread-1] 
> o.a.h.c.u.queue.BoundedInMemoryExecutor error producing records
> org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: 
> Avro field 'TRANSFER_RESULT' not found
>  at 
> org.apache.parquet.avro.AvroRecordConverter.getAvroField(AvroRecordConverter.java:225)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:130)
>  at 
> org.apache.parquet.avro.AvroRecordConverter.(AvroRecordConverter.java:95)
>  at 
> org.apache.parquet.avro.AvroRecordMaterializer.(AvroRecordMaterializer.java:33)
>  at 
> org.apache.parquet.avro.AvroReadSupport.prepareForRead(AvroReadSupport.java:138)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.initialize(InternalParquetRecordReader.java:183)
>  at org.apache.parquet.hadoop.ParquetReader.initReader(ParquetReader.java:156)
>  at org.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:135)
>  at 
> org.apache.hudi.common.util.ParquetReaderIterator.hasNext(ParquetReaderIterator.java:49)
>  at 
> org.apache.hudi.common.util.queue.IteratorBasedQueueProducer.produce(IteratorBasedQueueProducer.java:45)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$0(BoundedInMemoryExecutor.java:92)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2021-10-02 10:48:28,324 ERROR [pool-525-thread-2] 
> o.a.h.c.u.queue.BoundedInMemoryExecutor error consuming records
> org.apache.hudi.exception.HoodieException: operation has failed
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.throwExceptionIfFailed(BoundedInMemoryQueue.java:247)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.readNextRecord(BoundedInMemoryQueue.java:226)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue.access$100(BoundedInMemoryQueue.java:52)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueue$QueueIterator.hasNext(BoundedInMemoryQueue.java:277)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:36)
>  at 
> org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748){code}
>  
>  
>  
> {code:java}
> netstat - nlp | grep 1019 | wc -l
> 1456
> tcp 1 0 ip:42280 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:46370 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:54822 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:51444 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:40062 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:34848 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:40574 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:55540 ip-:1019 CLOSE_WAIT
> tcp 0 0 ip:46554 ip-:1019 ESTABLISHED
> tcp 1 0 ip:37418 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:44476 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:40656 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:41044 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:36310 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:58766 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:39426 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:51552 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:32822 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:50938 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:60448 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:47028 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:49492 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:45274 ip-:1019 CLOSE_WAIT
> tcp 1 0 ip:38500 ip-:1019 CLOSE_WAIT
> tc

[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290


   
   ## CI report:
   
   * ce26ca89a86e91f41bc38dee2acebdc6b65cde06 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3099)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3104)
 
   * 8bd01e8237b9c449d5cc532a2b2a6e6837e487cf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3105)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3799:
URL: https://github.com/apache/hudi/pull/3799#issuecomment-943176004


   
   ## CI report:
   
   * aa02b3508fee06bf0f3fd03b65d016eaeb9e4a65 UNKNOWN
   * e98d19ea99ead03b9360e04b1d006a67cf68a285 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2858)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2970)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3101)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3915: [HUDI-2677] Add DFS based message queue for flink writer

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3915:
URL: https://github.com/apache/hudi/pull/3915#issuecomment-958913033


   
   ## CI report:
   
   * 9c7de7527a24bfe2f0c00138113276c88be63fbf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3106)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Reopened] (HUDI-2643) Remove duplicated hbase-common with tests classifier exists in bundles

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-2643:
---

> Remove duplicated hbase-common with tests classifier exists in bundles
> --
>
> Key: HUDI-2643
> URL: https://issues.apache.org/jira/browse/HUDI-2643
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3916: [HUDI-2678] flink writer writes huge log file

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3916:
URL: https://github.com/apache/hudi/pull/3916#issuecomment-958913129


   
   ## CI report:
   
   * 28737b38dbae3ff8543c8ed66e88258c69c98e66 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3107)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-2643) Remove duplicated hbase-common with tests classifier exists in bundles

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2643.
---
Resolution: Fixed

[~yanghua]: do remember to "resolve" jiras and not "close" them. "close" refers 
to invalid or not worked upon. only "resolved" shows up in release notes I 
guess.

> Remove duplicated hbase-common with tests classifier exists in bundles
> --
>
> Key: HUDI-2643
> URL: https://issues.apache.org/jira/browse/HUDI-2643
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] nsivabalan commented on pull request #3884: [HUDI-1295] Hash ID generator util for Hudi table columns, partition and files

2021-11-03 Thread GitBox


nsivabalan commented on pull request #3884:
URL: https://github.com/apache/hudi/pull/3884#issuecomment-958925548


   @manojpec : for sub-tasks, may be you can create a new jira rather than 
referring to large one. for eg, instead of HUDI-1295, we could have created 
another sub-task. something to keep in mind for future. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Reopened] (HUDI-1500) Support incrementally reading clustering commit via Spark Datasource/DeltaStreamer

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-1500:
---

> Support incrementally reading clustering  commit via Spark 
> Datasource/DeltaStreamer
> ---
>
> Key: HUDI-1500
> URL: https://issues.apache.org/jira/browse/HUDI-1500
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: DeltaStreamer, Spark Integration
>Reporter: liwei
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> now in DeltaSync.readFromSource() can  not read last instant as replace 
> commit, such as clustering. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1500) Support incrementally reading clustering commit via Spark Datasource/DeltaStreamer

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-1500.
---
Resolution: Fixed

> Support incrementally reading clustering  commit via Spark 
> Datasource/DeltaStreamer
> ---
>
> Key: HUDI-1500
> URL: https://issues.apache.org/jira/browse/HUDI-1500
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: DeltaStreamer, Spark Integration
>Reporter: liwei
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> now in DeltaSync.readFromSource() can  not read last instant as replace 
> commit, such as clustering. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2031) JVM occasionally crashes during compaction when spark speculative execution is enabled

2021-11-03 Thread ZiyueGuan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437937#comment-17437937
 ] 

ZiyueGuan commented on HUDI-2031:
-

Any guys know the root cause of this problem? Curious about why an interrupt 
could lead to problems of parquet

> JVM occasionally crashes during compaction when spark speculative execution 
> is enabled
> --
>
> Key: HUDI-2031
> URL: https://issues.apache.org/jira/browse/HUDI-2031
> Project: Apache Hudi
>  Issue Type: Bug
>Affects Versions: 0.8.0
>Reporter: Rong Ma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>
> This could happen when speculative execution is triggered. The duplicated 
> tasks are expected to terminate normally, but sometimes they cannot and will 
> cause the JVM crashes.
>  
>  From executor logs:
> {quote}ERROR [Executor task launch worker for task 6828] HoodieMergeHandle: 
> Error writing record  HoodieRecord{key=HoodieKey
> { recordKey=45246275517 partitionPath=2021-06-13}, currentLocation='null', 
> newLocation='null'}ERROR [Executor task launch worker for task 6828] 
> HoodieMergeHandle: Error writing record  HoodieRecord\{key=HoodieKey { 
> recordKey=45246275517 partitionPath=2021-06-13}
> , currentLocation='null', 
> newLocation='null'}java.lang.IllegalArgumentException: You cannot call 
> toBytes() more than once without calling reset() at 
> org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53) at 
> org.apache.parquet.column.values.rle.RunLengthBitPackingHybridEncoder.toBytes(RunLengthBitPackingHybridEncoder.java:254)
>  at 
> org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesWriter.getBytes(RunLengthBitPackingHybridValuesWriter.java:65)
>  at 
> org.apache.parquet.column.impl.ColumnWriterV1.writePage(ColumnWriterV1.java:148)
>  at 
> org.apache.parquet.column.impl.ColumnWriterV1.accountForValueWritten(ColumnWriterV1.java:106)
>  at 
> org.apache.parquet.column.impl.ColumnWriterV1.write(ColumnWriterV1.java:200) 
> at 
> org.apache.parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.addBinary(MessageColumnIO.java:469)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:346)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:278)
>  at 
> org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)
>  at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165) 
> at 
> org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
>  at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:299) at 
> org.apache.hudi.io.storage.HoodieParquetWriter.writeAvroWithMetadata(HoodieParquetWriter.java:83)
>  at 
> org.apache.hudi.io.HoodieMergeHandle.writeRecord(HoodieMergeHandle.java:252) 
> at org.apache.hudi.io.HoodieMergeHandle.close(HoodieMergeHandle.java:336) at 
> org.apache.hudi.table.action.commit.SparkMergeHelper.runMerge(SparkMergeHelper.java:107)
>  at 
> org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdateInternal(HoodieSparkCopyOnWriteTable.java:199)
>  at 
> org.apache.hudi.table.HoodieSparkCopyOnWriteTable.handleUpdate(HoodieSparkCopyOnWriteTable.java:190)
>  at 
> org.apache.hudi.table.action.compact.HoodieSparkMergeOnReadTableCompactor.compact(HoodieSparkMergeOnReadTableCompactor.java:154)
>  at 
> org.apache.hudi.table.action.compact.HoodieSparkMergeOnReadTableCompactor.lambda$compact$9ec9d4c7$1(HoodieSparkMergeOnReadTableCompactor.java:105)
>  at 
> org.apache.spark.api.java.JavaPairRDD$.$anonfun$toScalaFunction$1(JavaPairRDD.scala:1041)
>  at scala.collection.Iterator$$anon$10.next(Iterator.scala:459) at 
> scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484) at 
> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490) at 
> org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
>  at 
> org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
>  at 
> org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1388)
>  at 
> org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
>  at 
> org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362) 
> at 
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
>  at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:311) at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:313)

[jira] [Reopened] (HUDI-2502) Refactor index in hudi-client module

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-2502:
---

> Refactor index in hudi-client module
> 
>
> Key: HUDI-2502
> URL: https://issues.apache.org/jira/browse/HUDI-2502
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2502) Refactor index in hudi-client module

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2502.
---
Resolution: Fixed

[~guoyihua]: something to keep in mind. for merged PRs, we should always 
"resolve" the Jira and not "close" them. "close" is something for invalid or 
not worked upon sort of. 

> Refactor index in hudi-client module
> 
>
> Key: HUDI-2502
> URL: https://issues.apache.org/jira/browse/HUDI-2502
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-1475) Fix documentation of preCombine to clarify when this API is used by Hudi

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-1475.
---
Resolution: Fixed

[~Pratyaksh]: something to keep in mind. for merged PRs, we should always 
"resolve" the Jira and not "close" them. "close" is something for invalid or 
not worked upon sort of.

> Fix documentation of preCombine to clarify when this API is used by Hudi 
> -
>
> Key: HUDI-1475
> URL: https://issues.apache.org/jira/browse/HUDI-1475
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Docs
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available, user-support-issues
> Fix For: 0.10.0
>
>
> We need to fix the Javadoc of preCombine in HoodieRecordPayload to clarify 
> that this method is used to pre-merge  unmerged (compaction) and incoming 
> records before the merge with existing record in the dataset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-2614) Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-2614:
---

> Remove duplicated hadoop-hdfs with tests classifier exists in bundles
> -
>
> Key: HUDI-2614
> URL: https://issues.apache.org/jira/browse/HUDI-2614
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-1475) Fix documentation of preCombine to clarify when this API is used by Hudi

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-1475:
---

> Fix documentation of preCombine to clarify when this API is used by Hudi 
> -
>
> Key: HUDI-1475
> URL: https://issues.apache.org/jira/browse/HUDI-1475
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Docs
>Affects Versions: 0.9.0
>Reporter: Balaji Varadarajan
>Assignee: Pratyaksh Sharma
>Priority: Major
>  Labels: pull-request-available, user-support-issues
> Fix For: 0.10.0
>
>
> We need to fix the Javadoc of preCombine in HoodieRecordPayload to clarify 
> that this method is used to pre-merge  unmerged (compaction) and incoming 
> records before the merge with existing record in the dataset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2614) Remove duplicated hadoop-hdfs with tests classifier exists in bundles

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2614.
---
Resolution: Fixed

> Remove duplicated hadoop-hdfs with tests classifier exists in bundles
> -
>
> Key: HUDI-2614
> URL: https://issues.apache.org/jira/browse/HUDI-2614
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3765: [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3765:
URL: https://github.com/apache/hudi/pull/3765#issuecomment-938488397


   
   ## CI report:
   
   * 12e84c0d582fb0a958a352df5e72301828cbc6a6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3003)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3050)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3100)
 
   * 17d7c8e5206b0b61a52b094695bcdc47bdc9588d UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3897: [HUDI-2658] When disable auto clean, do not check if MIN_COMMITS_TO_KEEP was larger CLEANER_COMMITS_RETAINED or not.

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3897:
URL: https://github.com/apache/hudi/pull/3897#issuecomment-955953458


   
   ## CI report:
   
   * 3b77c9e8123fa8af3c0dc9aca8c3ec1568c9bd7c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3002)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3004)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3031)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3048)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3103)
 
   * 1e6c7fedbef87cab4843eab4831202db437118e0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Reopened] (HUDI-2600) Remove duplicated hadoop-common with tests classifier exists in bundles

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-2600:
---

> Remove duplicated hadoop-common with tests classifier exists in bundles
> ---
>
> Key: HUDI-2600
> URL: https://issues.apache.org/jira/browse/HUDI-2600
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release & Administrative
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We found many duplicated dependencies in the generated dependency list, 
> `hadoop-common` is one of them:
> {code:java}
> hadoop-common/org.apache.hadoop/2.7.3//hadoop-common-2.7.3.jar
> hadoop-common/org.apache.hadoop/2.7.3/tests/hadoop-common-2.7.3-tests.jar
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2600) Remove duplicated hadoop-common with tests classifier exists in bundles

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2600.
---
Resolution: Fixed

> Remove duplicated hadoop-common with tests classifier exists in bundles
> ---
>
> Key: HUDI-2600
> URL: https://issues.apache.org/jira/browse/HUDI-2600
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Release & Administrative
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> We found many duplicated dependencies in the generated dependency list, 
> `hadoop-common` is one of them:
> {code:java}
> hadoop-common/org.apache.hadoop/2.7.3//hadoop-common-2.7.3.jar
> hadoop-common/org.apache.hadoop/2.7.3/tests/hadoop-common-2.7.3-tests.jar
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2077) Flaky test: TestHoodieDeltaStreamer

2021-11-03 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437941#comment-17437941
 ] 

sivabalan narayanan commented on HUDI-2077:
---

[~xushiyan] [~codope]: can we close this out or do we have any more pending. 

> Flaky test: TestHoodieDeltaStreamer
> ---
>
> Key: HUDI-2077
> URL: https://issues.apache.org/jira/browse/HUDI-2077
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: Raymond Xu
>Assignee: Sagar Sumit
>Priority: Critical
>  Labels: pull-request-available
> Attachments: 28.txt, hudi_2077_schema_mismatch.txt
>
>
> {code:java}
>  [INFO] Results:8520[INFO] 8521[ERROR] Errors: 8522[ERROR]   
> TestHoodieDeltaStreamer.testUpsertsMORContinuousModeWithMultipleWriters:716->testUpsertsContinuousModeWithMultipleWriters:831->runJobsInParallel:940
>  » Execution{code}
>  Search "testUpsertsMORContinuousModeWithMultipleWriters" in the log file for 
> details.
> {quote} 
> 1730667 [pool-1461-thread-1] WARN 
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer - Got error : }}
>  org.apache.hudi.exception.HoodieIOException: Could not check if 
> hdfs://localhost:4/user/vsts/continuous_mor_mulitwriter is a valid table 
>  at 
> org.apache.hudi.exception.TableNotFoundException.checkTableValidity(TableNotFoundException.java:59)
>  
>  at 
> org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:112)
>  
>  at 
> org.apache.hudi.common.table.HoodieTableMetaClient.(HoodieTableMetaClient.java:73)
>  
>  at 
> org.apache.hudi.common.table.HoodieTableMetaClient$Builder.build(HoodieTableMetaClient.java:606)
>  
>  at 
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer$TestHelpers.assertAtleastNDeltaCommitsAfterCommit(TestHoodieDeltaStreamer.java:322)
>  
>  at 
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer.lambda$runJobsInParallel$8(TestHoodieDeltaStreamer.java:906)
>  
>  at 
> org.apache.hudi.utilities.functional.TestHoodieDeltaStreamer$TestHelpers.lambda$waitTillCondition$0(TestHoodieDeltaStreamer.java:347)
>  
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  
>  at java.lang.Thread.run(Thread.java:748) 
>  {{Caused by: java.net.ConnectException: Call From fv-az238-328/10.1.0.24 to 
> localhost:4 failed on connection exception: java.net.ConnectException: 
> Connection refused; For more details see: 
> [http://wiki.apache.org/hadoop/ConnectionRefused]
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-2592) NumberFormatException: Zero length BigInteger when write.precombine.field is decimal type

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-2592:
---

> NumberFormatException: Zero length BigInteger when write.precombine.field is 
> decimal type
> -
>
> Key: HUDI-2592
> URL: https://issues.apache.org/jira/browse/HUDI-2592
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Matrix42
>Assignee: Matrix42
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0, 0.11.0
>
>
> when write.precombine.field is decimal type,write decimal will be an empty 
> byte array, when read will throw NumberFormatException: Zero length 
> BigInteger like below:
> {code:java}
> 2021-10-20 17:14:03
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:302)
> at 
> org.apache.flink.table.data.DecimalData.fromUnscaledBytes(DecimalData.java:223)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createDecimalConverter$4dc14f00$1(AvroToRowDataConverters.java:158)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createNullableConverter$4568343a$1(AvroToRowDataConverters.java:94)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createRowConverter$68595fbd$1(AvroToRowDataConverters.java:75)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$1.hasNext(MergeOnReadInputFormat.java:300)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$LogFileOnlyIterator.reachedEnd(MergeOnReadInputFormat.java:362)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:202)
> at 
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:90)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)
> {code}
> analyze:
>  
> HoodieAvroUtils.getNestedFieldVal will invoked to extract precombine field.
> next will invoke convertValueForAvroLogicalTypes. when field is decimal 
> type,the bytebuffer will consumed, we should rewind.
> {code:java}
> private static Object convertValueForAvroLogicalTypes(Schema fieldSchema, 
> Object fieldValue) {
>   if (fieldSchema.getLogicalType() == LogicalTypes.date()) {
> return LocalDate.ofEpochDay(Long.parseLong(fieldValue.toString()));
>   } else if (fieldSchema.getLogicalType() instanceof LogicalTypes.Decimal) {
> Decimal dc = (Decimal) fieldSchema.getLogicalType();
> DecimalConversion decimalConversion = new DecimalConversion();
> if (fieldSchema.getType() == Schema.Type.FIXED) {
>   return decimalConversion.fromFixed((GenericFixed) fieldValue, 
> fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> } else if (fieldSchema.getType() == Schema.Type.BYTES) {
>   
> //this methoad will consume the byteBuffer
>   return decimalConversion.fromBytes((ByteBuffer) fieldValue, fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> }
>   }
>   return fieldValue;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-2501) Refactor compaction actions in hudi-client module

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-2501:
---

> Refactor compaction actions in hudi-client module
> -
>
> Key: HUDI-2501
> URL: https://issues.apache.org/jira/browse/HUDI-2501
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2501) Refactor compaction actions in hudi-client module

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2501.
---
Resolution: Fixed

> Refactor compaction actions in hudi-client module
> -
>
> Key: HUDI-2501
> URL: https://issues.apache.org/jira/browse/HUDI-2501
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2592) NumberFormatException: Zero length BigInteger when write.precombine.field is decimal type

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2592.
---
Resolution: Fixed

something to keep in mind. for merged PRs, we should always "resolve" the Jira 
and not "close" them. "close" is something for invalid or not worked upon sort 
of.

> NumberFormatException: Zero length BigInteger when write.precombine.field is 
> decimal type
> -
>
> Key: HUDI-2592
> URL: https://issues.apache.org/jira/browse/HUDI-2592
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Matrix42
>Assignee: Matrix42
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0, 0.11.0
>
>
> when write.precombine.field is decimal type,write decimal will be an empty 
> byte array, when read will throw NumberFormatException: Zero length 
> BigInteger like below:
> {code:java}
> 2021-10-20 17:14:03
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:302)
> at 
> org.apache.flink.table.data.DecimalData.fromUnscaledBytes(DecimalData.java:223)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createDecimalConverter$4dc14f00$1(AvroToRowDataConverters.java:158)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createNullableConverter$4568343a$1(AvroToRowDataConverters.java:94)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createRowConverter$68595fbd$1(AvroToRowDataConverters.java:75)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$1.hasNext(MergeOnReadInputFormat.java:300)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$LogFileOnlyIterator.reachedEnd(MergeOnReadInputFormat.java:362)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:202)
> at 
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:90)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)
> {code}
> analyze:
>  
> HoodieAvroUtils.getNestedFieldVal will invoked to extract precombine field.
> next will invoke convertValueForAvroLogicalTypes. when field is decimal 
> type,the bytebuffer will consumed, we should rewind.
> {code:java}
> private static Object convertValueForAvroLogicalTypes(Schema fieldSchema, 
> Object fieldValue) {
>   if (fieldSchema.getLogicalType() == LogicalTypes.date()) {
> return LocalDate.ofEpochDay(Long.parseLong(fieldValue.toString()));
>   } else if (fieldSchema.getLogicalType() instanceof LogicalTypes.Decimal) {
> Decimal dc = (Decimal) fieldSchema.getLogicalType();
> DecimalConversion decimalConversion = new DecimalConversion();
> if (fieldSchema.getType() == Schema.Type.FIXED) {
>   return decimalConversion.fromFixed((GenericFixed) fieldValue, 
> fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> } else if (fieldSchema.getType() == Schema.Type.BYTES) {
>   
> //this methoad will consume the byteBuffer
>   return decimalConversion.fromBytes((ByteBuffer) fieldValue, fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> }
>   }
>   return fieldValue;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2592) NumberFormatException: Zero length BigInteger when write.precombine.field is decimal type

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2592:
--
Fix Version/s: (was: 0.11.0)

> NumberFormatException: Zero length BigInteger when write.precombine.field is 
> decimal type
> -
>
> Key: HUDI-2592
> URL: https://issues.apache.org/jira/browse/HUDI-2592
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Common Core
>Reporter: Matrix42
>Assignee: Matrix42
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> when write.precombine.field is decimal type,write decimal will be an empty 
> byte array, when read will throw NumberFormatException: Zero length 
> BigInteger like below:
> {code:java}
> 2021-10-20 17:14:03
> java.lang.NumberFormatException: Zero length BigInteger
> at java.math.BigInteger.(BigInteger.java:302)
> at 
> org.apache.flink.table.data.DecimalData.fromUnscaledBytes(DecimalData.java:223)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createDecimalConverter$4dc14f00$1(AvroToRowDataConverters.java:158)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createNullableConverter$4568343a$1(AvroToRowDataConverters.java:94)
> at 
> org.apache.flink.connectors.hudi.util.AvroToRowDataConverters.lambda$createRowConverter$68595fbd$1(AvroToRowDataConverters.java:75)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$1.hasNext(MergeOnReadInputFormat.java:300)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat$LogFileOnlyIterator.reachedEnd(MergeOnReadInputFormat.java:362)
> at 
> org.apache.flink.connectors.hudi.table.format.mor.MergeOnReadInputFormat.reachedEnd(MergeOnReadInputFormat.java:202)
> at 
> org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:90)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:100)
> at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:63)
> at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:213)
> {code}
> analyze:
>  
> HoodieAvroUtils.getNestedFieldVal will invoked to extract precombine field.
> next will invoke convertValueForAvroLogicalTypes. when field is decimal 
> type,the bytebuffer will consumed, we should rewind.
> {code:java}
> private static Object convertValueForAvroLogicalTypes(Schema fieldSchema, 
> Object fieldValue) {
>   if (fieldSchema.getLogicalType() == LogicalTypes.date()) {
> return LocalDate.ofEpochDay(Long.parseLong(fieldValue.toString()));
>   } else if (fieldSchema.getLogicalType() instanceof LogicalTypes.Decimal) {
> Decimal dc = (Decimal) fieldSchema.getLogicalType();
> DecimalConversion decimalConversion = new DecimalConversion();
> if (fieldSchema.getType() == Schema.Type.FIXED) {
>   return decimalConversion.fromFixed((GenericFixed) fieldValue, 
> fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> } else if (fieldSchema.getType() == Schema.Type.BYTES) {
>   
> //this methoad will consume the byteBuffer
>   return decimalConversion.fromBytes((ByteBuffer) fieldValue, fieldSchema,
>   LogicalTypes.decimal(dc.getPrecision(), dc.getScale()));
> }
>   }
>   return fieldValue;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3765: [HUDI-2533] New option for hoodieClusteringJob to check, rollback and re-execute the last failed clustering job

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3765:
URL: https://github.com/apache/hudi/pull/3765#issuecomment-938488397


   
   ## CI report:
   
   * 12e84c0d582fb0a958a352df5e72301828cbc6a6 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3003)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3050)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3100)
 
   * 17d7c8e5206b0b61a52b094695bcdc47bdc9588d Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3108)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3897: [HUDI-2658] When disable auto clean, do not check if MIN_COMMITS_TO_KEEP was larger CLEANER_COMMITS_RETAINED or not.

2021-11-03 Thread GitBox


hudi-bot edited a comment on pull request #3897:
URL: https://github.com/apache/hudi/pull/3897#issuecomment-955953458


   
   ## CI report:
   
   * 3b77c9e8123fa8af3c0dc9aca8c3ec1568c9bd7c Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3002)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3004)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3031)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3048)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3103)
 
   * 1e6c7fedbef87cab4843eab4831202db437118e0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=3109)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Reopened] (HUDI-2507) Generate more dependency list file for other bundles

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-2507:
---

> Generate more dependency list file for other bundles
> 
>
> Key: HUDI-2507
> URL: https://issues.apache.org/jira/browse/HUDI-2507
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Usability
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2507) Generate more dependency list file for other bundles

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2507.
---
Resolution: Fixed

> Generate more dependency list file for other bundles
> 
>
> Key: HUDI-2507
> URL: https://issues.apache.org/jira/browse/HUDI-2507
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Usability
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2482) Support drop partitions SQL

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2482.
---
Resolution: Fixed

> Support drop partitions SQL
> ---
>
> Key: HUDI-2482
> URL: https://issues.apache.org/jira/browse/HUDI-2482
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: features, pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-2482) Support drop partitions SQL

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-2482:
---

> Support drop partitions SQL
> ---
>
> Key: HUDI-2482
> URL: https://issues.apache.org/jira/browse/HUDI-2482
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: features, pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2561) Avoid using InetAddress.getLocalHost() when logging info messages

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2561.
---
Fix Version/s: 0.10.0
   Resolution: Fixed

> Avoid using InetAddress.getLocalHost() when logging info messages
> -
>
> Key: HUDI-2561
> URL: https://issues.apache.org/jira/browse/HUDI-2561
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> InetAddress.getLocalHost() can take up as much as 30+seconds if the network 
> configurations are not done right. This might be due to local hostname IPv6 
> address missing in the /etc/hosts file or the network configs slowing down 
> any IPv6 name resolutions. If this API is used for logging verbose messages 
> and that too in the hot code path, it can lead order of magnitude slowness in 
> the overall task. 
> Sample test case showing this slowness: 
> TestCleaner#testBulkInsertAndCleanByVersions
> Since we can't guarantee on the right network settings for local hostname in 
> all the setups, its better to avoid using InetAddress.getLocalHost() where 
> all is possible, especially when verbose logging status or debug details. 
>  
> Here are the codes that are currently using InetAddress.getLocalHost():
>  # BitCaskDiskMap (this is in the hot code path)
>  # HoodieWithTimelineServer 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2561) Avoid using InetAddress.getLocalHost() when logging info messages

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2561:
--
Status: In Progress  (was: Open)

> Avoid using InetAddress.getLocalHost() when logging info messages
> -
>
> Key: HUDI-2561
> URL: https://issues.apache.org/jira/browse/HUDI-2561
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>Priority: Minor
>  Labels: pull-request-available
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> InetAddress.getLocalHost() can take up as much as 30+seconds if the network 
> configurations are not done right. This might be due to local hostname IPv6 
> address missing in the /etc/hosts file or the network configs slowing down 
> any IPv6 name resolutions. If this API is used for logging verbose messages 
> and that too in the hot code path, it can lead order of magnitude slowness in 
> the overall task. 
> Sample test case showing this slowness: 
> TestCleaner#testBulkInsertAndCleanByVersions
> Since we can't guarantee on the right network settings for local hostname in 
> all the setups, its better to avoid using InetAddress.getLocalHost() where 
> all is possible, especially when verbose logging status or debug details. 
>  
> Here are the codes that are currently using InetAddress.getLocalHost():
>  # BitCaskDiskMap (this is in the hot code path)
>  # HoodieWithTimelineServer 
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2435) Tuning clustering job handle errors

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2435.
---
  Assignee: Satish Kotha
Resolution: Fixed

> Tuning clustering job handle errors
> ---
>
> Key: HUDI-2435
> URL: https://issues.apache.org/jira/browse/HUDI-2435
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Assignee: Satish Kotha
>Priority: Major
>  Labels: pull-request-available
>
> Before clustering job/ async clustering finished, hudi will perform errors 
> check using JavaRDD writeResponse
> It is a collect spark action, when executor is crashed by accident and the 
> cache of JavaRDD is lost, so that this collect action will 
> trigger a complete compute and create unexpected marker files or data files.
> We should use Option commitMetadata to do handle errors 
> action instead.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2435) Tuning clustering job handle errors

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2435:
--
Status: In Progress  (was: Open)

> Tuning clustering job handle errors
> ---
>
> Key: HUDI-2435
> URL: https://issues.apache.org/jira/browse/HUDI-2435
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Yue Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Before clustering job/ async clustering finished, hudi will perform errors 
> check using JavaRDD writeResponse
> It is a collect spark action, when executor is crashed by accident and the 
> cache of JavaRDD is lost, so that this collect action will 
> trigger a complete compute and create unexpected marker files or data files.
> We should use Option commitMetadata to do handle errors 
> action instead.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-2496) Inserts are precombined even with dedup disabled

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-2496:
---

> Inserts are precombined even with dedup disabled
> 
>
> Key: HUDI-2496
> URL: https://issues.apache.org/jira/browse/HUDI-2496
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Sagar Sumit
>Assignee: Helias Antoniou
>Priority: Critical
>  Labels: pull-request-available, sev:critical
> Fix For: 0.10.0
>
>
> Original GH issue https://github.com/apache/hudi/issues/3709
> Test case by [~xushiyan] : [https://github.com/apache/hudi/pull/3723/files]
> RCA by [~shivnarayan] :
> Within HoodieMergeHandle, we use a hashmap to store incoming records, where 
> keys are record keys.
>  and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd 
> batch, only unique records are considered and later concatenated w/ 1st batch.
>  
> [https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[]…]-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2496) Inserts are precombined even with dedup disabled

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2496.
---
Resolution: Fixed

something to keep in mind. for merged PRs, we should always "resolve" the Jira 
and not "close" them. "close" is something for invalid or not worked upon sort 
of.

> Inserts are precombined even with dedup disabled
> 
>
> Key: HUDI-2496
> URL: https://issues.apache.org/jira/browse/HUDI-2496
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Sagar Sumit
>Assignee: Helias Antoniou
>Priority: Critical
>  Labels: pull-request-available, sev:critical
> Fix For: 0.10.0
>
>
> Original GH issue https://github.com/apache/hudi/issues/3709
> Test case by [~xushiyan] : [https://github.com/apache/hudi/pull/3723/files]
> RCA by [~shivnarayan] :
> Within HoodieMergeHandle, we use a hashmap to store incoming records, where 
> keys are record keys.
>  and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd 
> batch, only unique records are considered and later concatenated w/ 1st batch.
>  
> [https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[]…]-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2496) Inserts are precombined even with dedup disabled

2021-11-03 Thread sivabalan narayanan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17437947#comment-17437947
 ] 

sivabalan narayanan commented on HUDI-2496:
---

fixed via master 
https://github.com/apache/hudi/commit/ceace1c653a3ce3c97e6ee5a244d71ff1806be4f

> Inserts are precombined even with dedup disabled
> 
>
> Key: HUDI-2496
> URL: https://issues.apache.org/jira/browse/HUDI-2496
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: Writer Core
>Reporter: Sagar Sumit
>Assignee: Helias Antoniou
>Priority: Critical
>  Labels: pull-request-available, sev:critical
> Fix For: 0.10.0
>
>
> Original GH issue https://github.com/apache/hudi/issues/3709
> Test case by [~xushiyan] : [https://github.com/apache/hudi/pull/3723/files]
> RCA by [~shivnarayan] :
> Within HoodieMergeHandle, we use a hashmap to store incoming records, where 
> keys are record keys.
>  and so, if you see 1st batch, duplicates would remain intact. but wrt 2nd 
> batch, only unique records are considered and later concatenated w/ 1st batch.
>  
> [https://github.com/apache/hudi/blob/36be28712196ff4427c41b0aa885c7fcd7356d7f/hudi-[]…]-common/src/main/java/org/apache/hudi/io/HoodieMergeHandle.java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2530) Add async compaction support to integ test suite infra

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2530:
--
Status: Closed  (was: Patch Available)

> Add async compaction support to integ test suite infra
> --
>
> Key: HUDI-2530
> URL: https://issues.apache.org/jira/browse/HUDI-2530
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Add async compaction support to integ test suite infra



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-2530) Add async compaction support to integ test suite infra

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-2530:
---

> Add async compaction support to integ test suite infra
> --
>
> Key: HUDI-2530
> URL: https://issues.apache.org/jira/browse/HUDI-2530
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Testing
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: pull-request-available
>
> Add async compaction support to integ test suite infra



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2513) Refactor upgrade and downgrade in hudi-client module

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2513:
--
Status: In Progress  (was: Open)

> Refactor upgrade and downgrade in hudi-client module
> 
>
> Key: HUDI-2513
> URL: https://issues.apache.org/jira/browse/HUDI-2513
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Reopened] (HUDI-2456) Support show partitions SQL

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reopened HUDI-2456:
---

> Support show partitions SQL
> ---
>
> Key: HUDI-2456
> URL: https://issues.apache.org/jira/browse/HUDI-2456
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Spark Integration
>Reporter: Yann Byron
>Assignee: Yann Byron
>Priority: Major
>  Labels: features, pull-request-available
> Fix For: 0.10.0
>
>
> Spark SQL support the following syntax to show hudi tabls's partitions.
> {code:java}
> SHOW PARTITIONS tableIdentifier partitionSpec?{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2513) Refactor upgrade and downgrade in hudi-client module

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2513.
---
Resolution: Fixed

> Refactor upgrade and downgrade in hudi-client module
> 
>
> Key: HUDI-2513
> URL: https://issues.apache.org/jira/browse/HUDI-2513
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Code Cleanup
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2440) Add dependency change diff script for dependency governace

2021-11-03 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan resolved HUDI-2440.
---
Resolution: Fixed

> Add dependency change diff script for dependency governace
> --
>
> Key: HUDI-2440
> URL: https://issues.apache.org/jira/browse/HUDI-2440
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: Usability, Utilities
>Reporter: vinoyang
>Assignee: vinoyang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Currently, hudi's dependency management is chaotic, e.g. for 
> `hudi-spark-bundle_2.11`, the dependency list is here:
> {code:java}
> HikariCP/2.5.1//HikariCP-2.5.1.jar
> ST4/4.0.4//ST4-4.0.4.jar
> aircompressor/0.15//aircompressor-0.15.jar
> annotations/17.0.0//annotations-17.0.0.jar
> ant-launcher/1.9.1//ant-launcher-1.9.1.jar
> ant/1.6.5//ant-1.6.5.jar
> ant/1.9.1//ant-1.9.1.jar
> antlr-runtime/3.5.2//antlr-runtime-3.5.2.jar
> aopalliance/1.0//aopalliance-1.0.jar
> apache-curator/2.7.1//apache-curator-2.7.1.pom
> apacheds-i18n/2.0.0-M15//apacheds-i18n-2.0.0-M15.jar
> apacheds-kerberos-codec/2.0.0-M15//apacheds-kerberos-codec-2.0.0-M15.jar
> api-asn1-api/1.0.0-M20//api-asn1-api-1.0.0-M20.jar
> api-util/1.0.0-M20//api-util-1.0.0-M20.jar
> asm/3.1//asm-3.1.jar
> avatica-metrics/1.8.0//avatica-metrics-1.8.0.jar
> avatica/1.8.0//avatica-1.8.0.jar
> avro/1.8.2//avro-1.8.2.jar
> bonecp/0.8.0.RELEASE//bonecp-0.8.0.RELEASE.jar
> calcite-core/1.10.0//calcite-core-1.10.0.jar
> calcite-druid/1.10.0//calcite-druid-1.10.0.jar
> calcite-linq4j/1.10.0//calcite-linq4j-1.10.0.jar
> commons-beanutils-core/1.8.0//commons-beanutils-core-1.8.0.jar
> commons-beanutils/1.7.0//commons-beanutils-1.7.0.jar
> commons-cli/1.2//commons-cli-1.2.jar
> commons-codec/1.4//commons-codec-1.4.jar
> commons-collections/3.2.2//commons-collections-3.2.2.jar
> commons-compiler/2.7.6//commons-compiler-2.7.6.jar
> commons-compress/1.9//commons-compress-1.9.jar
> commons-configuration/1.6//commons-configuration-1.6.jar
> commons-daemon/1.0.13//commons-daemon-1.0.13.jar
> commons-dbcp/1.4//commons-dbcp-1.4.jar
> commons-digester/1.8//commons-digester-1.8.jar
> commons-el/1.0//commons-el-1.0.jar
> commons-httpclient/3.1//commons-httpclient-3.1.jar
> commons-io/2.4//commons-io-2.4.jar
> commons-lang/2.6//commons-lang-2.6.jar
> commons-lang3/3.1//commons-lang3-3.1.jar
> commons-logging/1.2//commons-logging-1.2.jar
> commons-math/2.2//commons-math-2.2.jar
> commons-math3/3.1.1//commons-math3-3.1.1.jar
> commons-net/3.1//commons-net-3.1.jar
> commons-pool/1.5.4//commons-pool-1.5.4.jar
> curator-client/2.7.1//curator-client-2.7.1.jar
> curator-framework/2.7.1//curator-framework-2.7.1.jar
> curator-recipes/2.7.1//curator-recipes-2.7.1.jar
> datanucleus-api-jdo/4.2.4//datanucleus-api-jdo-4.2.4.jar
> datanucleus-core/4.1.17//datanucleus-core-4.1.17.jar
> datanucleus-rdbms/4.1.19//datanucleus-rdbms-4.1.19.jar
> derby/10.10.2.0//derby-10.10.2.0.jar
> disruptor/3.3.0//disruptor-3.3.0.jar
> dropwizard-metrics-hadoop-metrics2-reporter/0.1.2//dropwizard-metrics-hadoop-metrics2-reporter-0.1.2.jar
> eigenbase-properties/1.1.5//eigenbase-properties-1.1.5.jar
> fastutil/7.0.13//fastutil-7.0.13.jar
> findbugs-annotations/1.3.9-1//findbugs-annotations-1.3.9-1.jar
> fluent-hc/4.4.1//fluent-hc-4.4.1.jar
> groovy-all/2.4.4//groovy-all-2.4.4.jar
> gson/2.3.1//gson-2.3.1.jar
> guava/14.0.1//guava-14.0.1.jar
> guice-assistedinject/3.0//guice-assistedinject-3.0.jar
> guice-servlet/3.0//guice-servlet-3.0.jar
> guice/3.0//guice-3.0.jar
> hadoop-annotations/2.7.3//hadoop-annotations-2.7.3.jar
> hadoop-auth/2.7.3//hadoop-auth-2.7.3.jar
> hadoop-client/2.7.3//hadoop-client-2.7.3.jar
> hadoop-common/2.7.3//hadoop-common-2.7.3.jar
> hadoop-common/2.7.3/tests/hadoop-common-2.7.3-tests.jar
> hadoop-hdfs/2.7.3//hadoop-hdfs-2.7.3.jar
> hadoop-hdfs/2.7.3/tests/hadoop-hdfs-2.7.3-tests.jar
> hadoop-mapreduce-client-app/2.7.3//hadoop-mapreduce-client-app-2.7.3.jar
> hadoop-mapreduce-client-common/2.7.3//hadoop-mapreduce-client-common-2.7.3.jar
> hadoop-mapreduce-client-core/2.7.3//hadoop-mapreduce-client-core-2.7.3.jar
> hadoop-mapreduce-client-jobclient/2.7.3//hadoop-mapreduce-client-jobclient-2.7.3.jar
> hadoop-mapreduce-client-shuffle/2.7.3//hadoop-mapreduce-client-shuffle-2.7.3.jar
> hadoop-yarn-api/2.7.3//hadoop-yarn-api-2.7.3.jar
> hadoop-yarn-client/2.7.3//hadoop-yarn-client-2.7.3.jar
> hadoop-yarn-common/2.7.3//hadoop-yarn-common-2.7.3.jar
> hadoop-yarn-registry/2.7.1//hadoop-yarn-registry-2.7.1.jar
> hadoop-yarn-server-applicationhistoryservice/2.7.2//hadoop-yarn-server-applicationhistoryservice-2.7.2.jar
> hadoop-yarn-server-common/2.7.2//hadoop-yarn-server-common-2.7.2.jar
> hadoop-yarn-server-resourcemanager/2.7.2//hadoop-yarn-server-resourcemana

  1   2   3   4   5   6   7   >