[jira] [Resolved] (HUDI-1820) Remove legacy code for Flink writer

2021-10-14 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-1820.
--
Resolution: Invalid

The legacy code already been removed.

> Remove legacy code for Flink writer
> ---
>
> Key: HUDI-1820
> URL: https://issues.apache.org/jira/browse/HUDI-1820
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
> Fix For: 0.10.0
>
>
> Remove legacy code to avoid confusion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2559) Ensure unique timestamps are generated for commit times with concurrent writers

2021-10-14 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2559:
-

Assignee: sivabalan narayanan

> Ensure unique timestamps are generated for commit times with concurrent 
> writers
> ---
>
> Key: HUDI-2559
> URL: https://issues.apache.org/jira/browse/HUDI-2559
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> Ensure unique timestamps are generated for commit times with concurrent 
> writers.
> this is the piece of code in HoodieActiveTimeline which creates a new commit 
> time.
> {code:java}
> public static String createNewInstantTime(long milliseconds) {
>   return lastInstantTime.updateAndGet((oldVal) -> {
> String newCommitTime;
> do {
>   newCommitTime = HoodieActiveTimeline.COMMIT_FORMATTER.format(new 
> Date(System.currentTimeMillis() + milliseconds));
> } while (HoodieTimeline.compareTimestamps(newCommitTime, 
> LESSER_THAN_OR_EQUALS, oldVal));
> return newCommitTime;
>   });
> }
> {code}
> There are chances that a deltastreamer and a concurrent spark ds writer gets 
> same timestamp and one of them fails. 
> Related issues and github jiras: 
> [https://github.com/apache/hudi/issues/3782]
> https://issues.apache.org/jira/browse/HUDI-2549
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2559) Ensure unique timestamps are generated for commit times with concurrent writers

2021-10-14 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-2559:
-

 Summary: Ensure unique timestamps are generated for commit times 
with concurrent writers
 Key: HUDI-2559
 URL: https://issues.apache.org/jira/browse/HUDI-2559
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: sivabalan narayanan


Ensure unique timestamps are generated for commit times with concurrent writers.

this is the piece of code in HoodieActiveTimeline which creates a new commit 
time.
{code:java}
public static String createNewInstantTime(long milliseconds) {
  return lastInstantTime.updateAndGet((oldVal) -> {
String newCommitTime;
do {
  newCommitTime = HoodieActiveTimeline.COMMIT_FORMATTER.format(new 
Date(System.currentTimeMillis() + milliseconds));
} while (HoodieTimeline.compareTimestamps(newCommitTime, 
LESSER_THAN_OR_EQUALS, oldVal));
return newCommitTime;
  });
}
{code}
There are chances that a deltastreamer and a concurrent spark ds writer gets 
same timestamp and one of them fails. 

Related issues and github jiras: 

[https://github.com/apache/hudi/issues/3782]

https://issues.apache.org/jira/browse/HUDI-2549

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2558) Clustering w/ sort columns with null values fails

2021-10-14 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2558:
--
Fix Version/s: 0.10.0

> Clustering w/ sort columns with null values fails
> -
>
> Key: HUDI-2558
> URL: https://issues.apache.org/jira/browse/HUDI-2558
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> https://github.com/apache/hudi/issues/3766



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HUDI-2558) Clustering w/ sort columns with null values fails

2021-10-14 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan reassigned HUDI-2558:
-

Assignee: sivabalan narayanan

> Clustering w/ sort columns with null values fails
> -
>
> Key: HUDI-2558
> URL: https://issues.apache.org/jira/browse/HUDI-2558
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>
> https://github.com/apache/hudi/issues/3766



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HUDI-2558) Clustering w/ sort columns with null values fails

2021-10-14 Thread sivabalan narayanan (Jira)
sivabalan narayanan created HUDI-2558:
-

 Summary: Clustering w/ sort columns with null values fails
 Key: HUDI-2558
 URL: https://issues.apache.org/jira/browse/HUDI-2558
 Project: Apache Hudi
  Issue Type: Improvement
  Components: Writer Core
Reporter: sivabalan narayanan


https://github.com/apache/hudi/issues/3766



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2558) Clustering w/ sort columns with null values fails

2021-10-14 Thread sivabalan narayanan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan updated HUDI-2558:
--
Labels: sev:high user-support-issues  (was: )

> Clustering w/ sort columns with null values fails
> -
>
> Key: HUDI-2558
> URL: https://issues.apache.org/jira/browse/HUDI-2558
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: Writer Core
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
>  Labels: sev:high, user-support-issues
> Fix For: 0.10.0
>
>
> https://github.com/apache/hudi/issues/3766



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3794: [HUDI-2553] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3794:
URL: https://github.com/apache/hudi/pull/3794#issuecomment-942793927


   
   ## CI report:
   
   * 31852dac3234f80b094392197a34ac5704f2e784 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2631)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2653)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] manojpec commented on pull request #3794: [HUDI-2553] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-14 Thread GitBox


manojpec commented on pull request #3794:
URL: https://github.com/apache/hudi/pull/3794#issuecomment-943505633


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2557) Shade javax.servlet for flink bundle jar

2021-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2557:
-
Labels: pull-request-available  (was: )

> Shade javax.servlet for flink bundle jar
> 
>
> Key: HUDI-2557
> URL: https://issues.apache.org/jira/browse/HUDI-2557
> Project: Apache Hudi
>  Issue Type: Test
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yiduwangkai commented on pull request #3798: [HUDI-2557] Shade javax.servlet for flink bundle jar

2021-10-14 Thread GitBox


yiduwangkai commented on pull request #3798:
URL: https://github.com/apache/hudi/pull/3798#issuecomment-943494883


   @danny0405 sorry, i donot know how i can avoid this problem that i 
submitting code that someone else has already submitted 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3799:
URL: https://github.com/apache/hudi/pull/3799#issuecomment-943176004


   
   ## CI report:
   
   * f44907a941b5b61e642abb5783f70fe8830fe6a6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2644)
 
   * a8b2f1a63fc3cb1f4fe99495070d1d160bba4031 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2652)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3799:
URL: https://github.com/apache/hudi/pull/3799#issuecomment-943176004


   
   ## CI report:
   
   * f44907a941b5b61e642abb5783f70fe8830fe6a6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2644)
 
   * a8b2f1a63fc3cb1f4fe99495070d1d160bba4031 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #3754: [HUDI-2482] support 'drop partition' sql

2021-10-14 Thread GitBox


leesf commented on a change in pull request #3754:
URL: https://github.com/apache/hudi/pull/3754#discussion_r729045601



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlUtils.scala
##
@@ -92,7 +93,45 @@ object HoodieSqlUtils extends SparkAdapterSupport {
   properties.putAll((spark.sessionState.conf.getAllConfs ++ 
table.storage.properties).asJava)
   HoodieMetadataConfig.newBuilder.fromProperties(properties).build()
 }
-FSUtils.getAllPartitionPaths(sparkEngine, metadataConfig, 
HoodieSqlUtils.getTableLocation(table, spark)).asScala
+FSUtils.getAllPartitionPaths(sparkEngine, metadataConfig, 
getTableLocation(table, spark)).asScala
+  }
+
+  /**
+   * This method is used to compatible with the old non-hive-styled partition 
table.
+   * By default we enable the "hoodie.datasource.write.hive_style_partitioning"
+   * when writing data to hudi table by spark sql by default.
+   * If the exist table is a non-hive-styled partitioned table, we should
+   * disable the "hoodie.datasource.write.hive_style_partitioning" when
+   * merge or update the table. Or else, we will get an incorrect merge result
+   * as the partition path mismatch.
+   */
+  def isHiveStylePartitionPartitioning(partitionPaths: Seq[String], table: 
CatalogTable): Boolean = {

Review comment:
   rename to isHiveStyledPartitioning




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] leesf commented on a change in pull request #3754: [HUDI-2482] support 'drop partition' sql

2021-10-14 Thread GitBox


leesf commented on a change in pull request #3754:
URL: https://github.com/apache/hudi/pull/3754#discussion_r729045075



##
File path: 
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/HoodieSqlUtils.scala
##
@@ -92,7 +93,45 @@ object HoodieSqlUtils extends SparkAdapterSupport {
   properties.putAll((spark.sessionState.conf.getAllConfs ++ 
table.storage.properties).asJava)
   HoodieMetadataConfig.newBuilder.fromProperties(properties).build()
 }
-FSUtils.getAllPartitionPaths(sparkEngine, metadataConfig, 
HoodieSqlUtils.getTableLocation(table, spark)).asScala
+FSUtils.getAllPartitionPaths(sparkEngine, metadataConfig, 
getTableLocation(table, spark)).asScala
+  }
+
+  /**
+   * This method is used to compatible with the old non-hive-styled partition 
table.
+   * By default we enable the "hoodie.datasource.write.hive_style_partitioning"
+   * when writing data to hudi table by spark sql by default.
+   * If the exist table is a non-hive-styled partitioned table, we should
+   * disable the "hoodie.datasource.write.hive_style_partitioning" when
+   * merge or update the table. Or else, we will get an incorrect merge result
+   * as the partition path mismatch.
+   */
+  def isHiveStylePartitionPartitioning(partitionPaths: Seq[String], table: 
CatalogTable): Boolean = {
+if (table.partitionColumnNames.nonEmpty) {
+  val isHiveStylePartitionPath = (path: String) => {
+val fragments = path.split("/")
+if (fragments.size != table.partitionColumnNames.size) {
+  false
+} else {
+  fragments.zip(table.partitionColumnNames).forall {
+case (pathFragment, partitionColumn) => 
pathFragment.startsWith(s"$partitionColumn=")
+  }
+}
+  }
+  partitionPaths.forall(isHiveStylePartitionPath)
+} else {
+  true

Review comment:
   here means if it is not a partition table, we treat it as hive style 
partition?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3802: [HUDI-1500] Support replace commit in DeltaSync with commit metadata preserved

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3802:
URL: https://github.com/apache/hudi/pull/3802#issuecomment-943342747


   
   ## CI report:
   
   * a6459139223bd70e665424b1ae2b1b9a8f08b5c0 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2651)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope commented on a change in pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-14 Thread GitBox


codope commented on a change in pull request #3799:
URL: https://github.com/apache/hudi/pull/3799#discussion_r728979206



##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/BootstrapExecutor.java
##
@@ -159,10 +159,14 @@ public void execute() throws IOException {
* Sync to Hive.
*/
   private void syncHive() {
-if (cfg.enableHiveSync) {
+if (cfg.enableHiveSync || cfg.enableHiveSync) {

Review comment:
   Redundant condition. Did you mean to add some other condition?

##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncConfig.java
##
@@ -49,6 +49,9 @@
   @Parameter(names = {"--jdbc-url"}, description = "Hive jdbc connect url")
   public String jdbcUrl;
 
+  @Parameter(names = {"--metastore-uris"}, description = "Hive metastore uris")
+  public String metastoreUris;

Review comment:
   This should be used somewhere right. I mean in HoodieHiveClient or 
HiveSyncTool. I don't see it is being used anywhere in `hudi-hive-sync` module.

##
File path: 
hudi-utilities/src/main/java/org/apache/hudi/utilities/deltastreamer/DeltaSync.java
##
@@ -618,9 +618,10 @@ private void syncMeta(HoodieDeltaStreamerMetrics metrics) {
 
   public void syncHive() {
 HiveSyncConfig hiveSyncConfig = DataSourceUtils.buildHiveSyncConfig(props, 
cfg.targetBasePath, cfg.baseFileFormat);
-LOG.info("Syncing target hoodie table with hive table(" + 
hiveSyncConfig.tableName + "). Hive metastore URL :"
-+ hiveSyncConfig.jdbcUrl + ", basePath :" + cfg.targetBasePath);
 HiveConf hiveConf = new HiveConf(conf, HiveConf.class);
+if 
(!DataSourceWriteOptions.METASTORE_URIS().defaultValue().equals(hiveSyncConfig.metastoreUris))
 {

Review comment:
   Why do we need to check this? Why not simply set whatever user has 
passed irrespective of the default value?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3802: [HUDI-1500] Support replace commit in DeltaSync with commit metadata preserved

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3802:
URL: https://github.com/apache/hudi/pull/3802#issuecomment-943342747


   
   ## CI report:
   
   * a6459139223bd70e665424b1ae2b1b9a8f08b5c0 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2651)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3802: [HUDI-1500] Support replace commit in DeltaSync with commit metadata preserved

2021-10-14 Thread GitBox


hudi-bot commented on pull request #3802:
URL: https://github.com/apache/hudi/pull/3802#issuecomment-943342747


   
   ## CI report:
   
   * a6459139223bd70e665424b1ae2b1b9a8f08b5c0 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (HUDI-2549) Exceptions when using second writer into Hudi table managed by DeltaStreamer

2021-10-14 Thread Dave Hagman (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428798#comment-17428798
 ] 

Dave Hagman commented on HUDI-2549:
---

OK so you only ran 1 iteration of each (1 commit from each)? This issue only 
appears for me after multiple commits from each writer which better aligns with 
a real-world use-case (since deltastreamer is usually running continuously). 

> Exceptions when using second writer into Hudi table managed by DeltaStreamer
> 
>
> Key: HUDI-2549
> URL: https://issues.apache.org/jira/browse/HUDI-2549
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: DeltaStreamer, Spark Integration, Writer Core
>Reporter: Dave Hagman
>Assignee: Dave Hagman
>Priority: Critical
>  Labels: multi-writer, sev:critical
> Fix For: 0.10.0
>
>
> When running the DeltaStreamer along with a second spark datasource writer 
> (with [ZK-based OCC 
> enabled|https://hudi.apache.org/docs/concurrency_control#enabling-multi-writing]
>  we receive the following exception (which haults the spark datasource 
> writer). This occurs following warnings of timeline inconsistencies:
>  
> {code:java}
> 21/10/07 17:10:05 INFO TransactionManager: Transaction ending with 
> transaction owner Option{val=[==>20211007170717__commit__INFLIGHT]}
> 21/10/07 17:10:05 INFO ZookeeperBasedLockProvider: RELEASING lock 
> atZkBasePath = /events/test/mwc/v1, lock key = events_mwc_test_v1
> 21/10/07 17:10:05 INFO ZookeeperBasedLockProvider: RELEASED lock atZkBasePath 
> = /events/test/mwc/v1, lock key = events_mwc_test_v1
> 21/10/07 17:10:05 INFO TransactionManager: Transaction ended
> Exception in thread "main" java.lang.IllegalArgumentException
> at 
> org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:31)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:414)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:395)
> at 
> org.apache.hudi.common.table.timeline.HoodieActiveTimeline.saveAsComplete(HoodieActiveTimeline.java:153)
> at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commit(AbstractHoodieWriteClient.java:218)
> at 
> org.apache.hudi.client.AbstractHoodieWriteClient.commitStats(AbstractHoodieWriteClient.java:190)
> at 
> org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:124)
> at 
> org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:617)
> at 
> org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:274)
> at 
> org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:164)
> at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:46)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:90)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:185)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:223)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:220)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:181)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:134)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:133)
> at 
> org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
> at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
> at 
> org.apache.spark.sql.execution.SQLExecution$.executeQuery$1(SQLExecution.scala:110)
> at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:135)
> at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:107)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withTracker(SQLExecution.scala:232)
> at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:135)
>   

[jira] [Updated] (HUDI-1500) Support incrementally reading clustering commit via Spark Datasource/DeltaStreamer

2021-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-1500:
-
Labels: pull-request-available  (was: )

> Support incrementally reading clustering  commit via Spark 
> Datasource/DeltaStreamer
> ---
>
> Key: HUDI-1500
> URL: https://issues.apache.org/jira/browse/HUDI-1500
> Project: Apache Hudi
>  Issue Type: Sub-task
>  Components: DeltaStreamer, Spark Integration
>Reporter: liwei
>Assignee: Sagar Sumit
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>
> now in DeltaSync.readFromSource() can  not read last instant as replace 
> commit, such as clustering. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] codope opened a new pull request #3802: [HUDI-1500] Support replace commit in DeltaSync with commit metadata preserved

2021-10-14 Thread GitBox


codope opened a new pull request #3802:
URL: https://github.com/apache/hudi/pull/3802


   ## What is the purpose of the pull request
   
   This PR fixes [HUDI-1500](https://issues.apache.org/jira/browse/HUDI-1500) 
for deltastreamer. For Spark datasource, it was fixed by #3139 
   
   ## Brief change log
   
   * Enable commit metadata preservation by default.
   * Remove the filter of replace commits in DeltaSync.
   
   ## Verify this pull request
   
   This pull request is already covered by existing tests.
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 0fa6297ce58eb877fd5c4eba59fef20ad9335d26 UNKNOWN
   * c4af04dab3dab31ef05ba6007000738a2dfb81ce Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2649)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3776: [HUDI-2543]: Added guides section

2021-10-14 Thread GitBox


nsivabalan commented on pull request #3776:
URL: https://github.com/apache/hudi/pull/3776#issuecomment-943312764


   One nit. When I mouse over "Guides", I see Tuning as first entry and Trouble 
shooting as 2nd entry. while on the left pane order is different. Can we fix 
that.
   https://user-images.githubusercontent.com/513218/137317986-5061c313-8b5a-47b4-837b-16d7b6f45956.png;>

   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (HUDI-2484) Hive sync not working in HMS mode with DeltaStreamer

2021-10-14 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit closed HUDI-2484.
-
Resolution: Fixed

> Hive sync not working in HMS mode with DeltaStreamer
> 
>
> Key: HUDI-2484
> URL: https://issues.apache.org/jira/browse/HUDI-2484
> Project: Apache Hudi
>  Issue Type: Bug
>Reporter: Sagar Sumit
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: pull-request-available, sev:critical, user-support-issues
> Fix For: 0.10.0
>
>
> Set Hive sync mdoe to HMS and disable JDBC mode:
> ```
> --hoodie-conf hoodie.datasource.hive_sync.mode=hms
> --hoodie-conf hoodie.datasource.hive_sync.use_jdbc=false
> ```
> It throws the following exception:
> ```
> Caused by: java.lang.NoClassDefFoundError: 
> org/apache/calcite/rel/type/RelDataTypeSystem
>   at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzerFactory.get(SemanticAnalyzerFactory.java:318)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:484)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1317)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1457)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
>   at 
> org.apache.hudi.hive.ddl.HiveQueryDDLExecutor.updateHiveSQLs(HiveQueryDDLExecutor.java:94)
>   at 
> org.apache.hudi.hive.ddl.HiveQueryDDLExecutor.runSQL(HiveQueryDDLExecutor.java:85)
>   at 
> org.apache.hudi.hive.ddl.QueryBasedDDLExecutor.createTable(QueryBasedDDLExecutor.java:82)
>   at 
> org.apache.hudi.hive.HoodieHiveClient.createTable(HoodieHiveClient.java:191)
>   at org.apache.hudi.hive.HiveSyncTool.syncSchema(HiveSyncTool.java:237)
>   at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:182)
>   at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:131)
>   at 
> org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:117)
>   at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:625)
> ```
> The same works with Spark data source.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HUDI-2525) Test prometheus metrics with hudi (both spark ds and deltastreamer)

2021-10-14 Thread Sagar Sumit (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sagar Sumit resolved HUDI-2525.
---
Resolution: Fixed

> Test prometheus metrics with hudi (both spark ds and deltastreamer)
> ---
>
> Key: HUDI-2525
> URL: https://issues.apache.org/jira/browse/HUDI-2525
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: sivabalan narayanan
>Assignee: Sagar Sumit
>Priority: Major
>  Labels: sev:critical, user-support-issues
> Fix For: 0.10.0
>
>
> Test prometheus metrics with hudi (both spark ds and deltastreamer)
>  
> exception w/ deltastreamer
> {code:java}
> Exception in thread "main" java.lang.NoSuchMethodError: 'void 
> io.prometheus.client.dropwizard.DropwizardExports.(org.apache.hudi.com.codahale.metrics.MetricRegistry)'
>   at 
> org.apache.hudi.metrics.prometheus.PrometheusReporter.(PrometheusReporter.java:49)
>  at 
> org.apache.hudi.metrics.MetricsReporterFactory.createReporter(MetricsReporterFactory.java:75)
> at org.apache.hudi.metrics.Metrics.(Metrics.java:50)  at 
> org.apache.hudi.metrics.Metrics.init(Metrics.java:96)at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamerMetrics.(HoodieDeltaStreamerMetrics.java:44)
> at 
> org.apache.hudi.utilities.deltastreamer.DeltaSync.(DeltaSync.java:224) 
> at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.(HoodieDeltaStreamer.java:606)
> at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.(HoodieDeltaStreamer.java:143)
>  at 
> org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.(HoodieDeltaStreamer.java:107)
>  at 
> org.apache.hudi.integ.testsuite.HoodieDeltaStreamerWrapper.(HoodieDeltaStreamerWrapper.java:39)
> at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteWriter.(HoodieTestSuiteWriter.java:88)
>   at 
> org.apache.hudi.integ.testsuite.dag.WriterContext.initContext(WriterContext.java:70)
>  at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.runTestSuite(HoodieTestSuiteJob.java:188)
>  at 
> org.apache.hudi.integ.testsuite.HoodieTestSuiteJob.main(HoodieTestSuiteJob.java:170)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown 
> Source)   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
> Source)   at java.base/java.lang.reflect.Method.invoke(Unknown Source)at 
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) 
> at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)   
> at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)at 
> org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030) 
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)at 
> org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)21/10/01 17:06:01 
> INFO ShutdownHookManager: Shutdown hook called
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 0fa6297ce58eb877fd5c4eba59fef20ad9335d26 UNKNOWN
   * 1a80559bd98829552acffdaf20d3ea0384d1d936 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2645)
 
   * c4af04dab3dab31ef05ba6007000738a2dfb81ce Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2649)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 0fa6297ce58eb877fd5c4eba59fef20ad9335d26 UNKNOWN
   * 1a80559bd98829552acffdaf20d3ea0384d1d936 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2645)
 
   * c4af04dab3dab31ef05ba6007000738a2dfb81ce UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (HUDI-2556) Tweak some default config options for flink

2021-10-14 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen resolved HUDI-2556.
--
Resolution: Fixed

Fixed via master branch: 2c370cbae084a41162fedbcc0b1e66558629dcbe

> Tweak some default config options for flink
> ---
>
> Key: HUDI-2556
> URL: https://issues.apache.org/jira/browse/HUDI-2556
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HUDI-2556) Tweak some default config options for flink

2021-10-14 Thread Danny Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Danny Chen updated HUDI-2556:
-
Component/s: Flink Integration

> Tweak some default config options for flink
> ---
>
> Key: HUDI-2556
> URL: https://issues.apache.org/jira/browse/HUDI-2556
> Project: Apache Hudi
>  Issue Type: Task
>  Components: Flink Integration
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[hudi] branch master updated (f897e6d -> 2c370cb)

2021-10-14 Thread danny0405
This is an automated email from the ASF dual-hosted git repository.

danny0405 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/hudi.git.


from f897e6d  [HUDI-2551] Support DefaultHoodieRecordPayload for flink 
(#3792)
 add 2c370cb  [HUDI-2556] Tweak some default config options for flink 
(#3800)

No new revisions were added by this update.

Summary of changes:
 .../apache/hudi/configuration/FlinkOptions.java| 26 --
 .../org/apache/hudi/sink/StreamWriteFunction.java  |  4 ++--
 .../hudi/sink/StreamWriteOperatorCoordinator.java  | 10 +++--
 .../apache/hudi/sink/utils/PayloadCreation.java|  2 +-
 .../apache/hudi/streamer/FlinkStreamerConfig.java  |  8 +++
 .../org/apache/hudi/table/HoodieTableFactory.java  | 19 +++-
 .../java/org/apache/hudi/util/StreamerUtil.java| 12 +-
 .../org/apache/hudi/sink/TestWriteCopyOnWrite.java |  4 ++--
 .../apache/hudi/table/HoodieDataSourceITCase.java  |  4 ++--
 .../apache/hudi/table/TestHoodieTableFactory.java  | 15 +
 10 files changed, 57 insertions(+), 47 deletions(-)


[GitHub] [hudi] danny0405 merged pull request #3800: [HUDI-2556] Tweak some default config options for flink

2021-10-14 Thread GitBox


danny0405 merged pull request #3800:
URL: https://github.com/apache/hudi/pull/3800


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3800: [HUDI-2556] Tweak some default config options for flink

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3800:
URL: https://github.com/apache/hudi/pull/3800#issuecomment-943179326


   
   ## CI report:
   
   * b676a7d441b059d4c22918e700a84b8fe51e240b UNKNOWN
   * 56a96d75ecea14a1f0367ccb339a45f4c8813dfa Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2648)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r728890684



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##
@@ -161,6 +162,51 @@
 return rtSplits.toArray(new InputSplit[0]);
   }
 
+  // pick all incremental files and add them to rtSplits then filter out those 
files.
+  private static Map> filterOutIncrementalSplits(
+  List fileSplitList,
+  List rtSplits,
+  final Option finalHoodieVirtualKeyInfo) {
+return fileSplitList.stream().filter(s -> {
+  // deal with incremental query.
+  try {
+if (s instanceof BaseFileWithLogsSplit) {
+  BaseFileWithLogsSplit bs = (BaseFileWithLogsSplit)s;
+  if (bs.getBelongToIncrementalSplit()) {
+rtSplits.add(new HoodieRealtimeFileSplit(bs, bs.getBasePath(), 
bs.getDeltaLogPaths(), bs.getMaxCommitTime(), finalHoodieVirtualKeyInfo));
+  }
+} else if (s instanceof RealtimeBootstrapBaseFileSplit) {
+  rtSplits.add(s);
+}
+  } catch (IOException e) {
+throw new HoodieIOException("Error creating hoodie real time split ", 
e);
+  }
+  // filter the snapshot split.
+  if (s instanceof RealtimeBootstrapBaseFileSplit) {
+return false;
+  } else if ((s instanceof BaseFileWithLogsSplit) && 
((BaseFileWithLogsSplit) s).getBelongToIncrementalSplit()) {

Review comment:
   i just want to split the logical of incremental query and snapshot 
query.ok   i will  change  it  ,thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3771:
URL: https://github.com/apache/hudi/pull/3771#issuecomment-939200284


   
   ## CI report:
   
   * 9e64e88d819b6b6bf5ccc5811ea5f4714138fc9e UNKNOWN
   * c2bc8115f70b89dfc31f27645f98cfbff8d79c0f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2646)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3800: [HUDI-2556] Tweak some default config options for flink

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3800:
URL: https://github.com/apache/hudi/pull/3800#issuecomment-943179326


   
   ## CI report:
   
   * 75ee4b1f600b8231384fb65986e974c1edf26590 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2647)
 
   * b676a7d441b059d4c22918e700a84b8fe51e240b UNKNOWN
   * 56a96d75ecea14a1f0367ccb339a45f4c8813dfa Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2648)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 0fa6297ce58eb877fd5c4eba59fef20ad9335d26 UNKNOWN
   * 1a80559bd98829552acffdaf20d3ea0384d1d936 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2645)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3799:
URL: https://github.com/apache/hudi/pull/3799#issuecomment-943176004


   
   ## CI report:
   
   * f44907a941b5b61e642abb5783f70fe8830fe6a6 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2644)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] gaoshihang opened a new issue #3801: [SUPPORT]Does Flink-cdc support Schema-Evolution?

2021-10-14 Thread GitBox


gaoshihang opened a new issue #3801:
URL: https://github.com/apache/hudi/issues/3801


   @danny0405 
   Hi~please ask a question, I use kafka(debezium) as source table, and hudi as 
target table, create a cdc application.
   How can I do schema evolution use flink-cdc?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on pull request #3798: 3797 java.lang.NoSuchMethodError: io.javalin.core.CachedRequestWrappe…

2021-10-14 Thread GitBox


danny0405 commented on pull request #3798:
URL: https://github.com/apache/hudi/pull/3798#issuecomment-943210670


   Changes the title and commit message to "[HUDI-2557] Shade javax.servlet for 
flink bundle jar"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2557) Shade javax.servlet for flink bundle jar

2021-10-14 Thread Danny Chen (Jira)
Danny Chen created HUDI-2557:


 Summary: Shade javax.servlet for flink bundle jar
 Key: HUDI-2557
 URL: https://issues.apache.org/jira/browse/HUDI-2557
 Project: Apache Hudi
  Issue Type: Test
  Components: Flink Integration
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 commented on a change in pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


danny0405 commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r728830538



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##
@@ -161,6 +162,51 @@
 return rtSplits.toArray(new InputSplit[0]);
   }
 
+  // pick all incremental files and add them to rtSplits then filter out those 
files.
+  private static Map> filterOutIncrementalSplits(
+  List fileSplitList,
+  List rtSplits,
+  final Option finalHoodieVirtualKeyInfo) {
+return fileSplitList.stream().filter(s -> {
+  // deal with incremental query.
+  try {
+if (s instanceof BaseFileWithLogsSplit) {
+  BaseFileWithLogsSplit bs = (BaseFileWithLogsSplit)s;
+  if (bs.getBelongToIncrementalSplit()) {
+rtSplits.add(new HoodieRealtimeFileSplit(bs, bs.getBasePath(), 
bs.getDeltaLogPaths(), bs.getMaxCommitTime(), finalHoodieVirtualKeyInfo));
+  }
+} else if (s instanceof RealtimeBootstrapBaseFileSplit) {
+  rtSplits.add(s);
+}
+  } catch (IOException e) {
+throw new HoodieIOException("Error creating hoodie real time split ", 
e);
+  }
+  // filter the snapshot split.
+  if (s instanceof RealtimeBootstrapBaseFileSplit) {
+return false;
+  } else if ((s instanceof BaseFileWithLogsSplit) && 
((BaseFileWithLogsSplit) s).getBelongToIncrementalSplit()) {

Review comment:
   Why not just return early, i have pasted the code. And Why we need to 
handle the incremental query first, can we handle them together with snapshot 
query ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3800: [HUDI-2556] Tweak some default config options for flink

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3800:
URL: https://github.com/apache/hudi/pull/3800#issuecomment-943179326


   
   ## CI report:
   
   * 75ee4b1f600b8231384fb65986e974c1edf26590 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2647)
 
   * b676a7d441b059d4c22918e700a84b8fe51e240b UNKNOWN
   * 56a96d75ecea14a1f0367ccb339a45f4c8813dfa UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on issue #3796: [SUPPORT] Flink write to hudi,after running for a period of time,throw a NoClassDefFoundError

2021-10-14 Thread GitBox


danny0405 commented on issue #3796:
URL: https://github.com/apache/hudi/issues/3796#issuecomment-943201590


   Yes, the bundle jar does not package the hadoop jar.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3800: [HUDI-2556] Tweak some default config options for flink

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3800:
URL: https://github.com/apache/hudi/pull/3800#issuecomment-943179326


   
   ## CI report:
   
   * 75ee4b1f600b8231384fb65986e974c1edf26590 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2647)
 
   * b676a7d441b059d4c22918e700a84b8fe51e240b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3800: [HUDI-2556] Tweak some default config options for flink

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3800:
URL: https://github.com/apache/hudi/pull/3800#issuecomment-943179326


   
   ## CI report:
   
   * 75ee4b1f600b8231384fb65986e974c1edf26590 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2647)
 
   * b676a7d441b059d4c22918e700a84b8fe51e240b UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3771:
URL: https://github.com/apache/hudi/pull/3771#issuecomment-939200284


   
   ## CI report:
   
   * b4808aaf973608255c97e1eb1f46ff04d9bb4bee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2556)
 
   * 9e64e88d819b6b6bf5ccc5811ea5f4714138fc9e UNKNOWN
   * c2bc8115f70b89dfc31f27645f98cfbff8d79c0f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2646)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3800: [HUDI-2556] Tweak some default config options for flink

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3800:
URL: https://github.com/apache/hudi/pull/3800#issuecomment-943179326


   
   ## CI report:
   
   * 75ee4b1f600b8231384fb65986e974c1edf26590 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2647)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3771:
URL: https://github.com/apache/hudi/pull/3771#issuecomment-939200284


   
   ## CI report:
   
   * b4808aaf973608255c97e1eb1f46ff04d9bb4bee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2556)
 
   * 9e64e88d819b6b6bf5ccc5811ea5f4714138fc9e UNKNOWN
   * c2bc8115f70b89dfc31f27645f98cfbff8d79c0f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 0fa6297ce58eb877fd5c4eba59fef20ad9335d26 UNKNOWN
   * 6d67b68e19a43f8668e5773d27ca9c33a8de0a37 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2642)
 
   * 1a80559bd98829552acffdaf20d3ea0384d1d936 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2645)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] test-wangxiaoyu commented on a change in pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

2021-10-14 Thread GitBox


test-wangxiaoyu commented on a change in pull request #3771:
URL: https://github.com/apache/hudi/pull/3771#discussion_r728802039



##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java
##
@@ -307,6 +307,10 @@ public static HiveSyncConfig 
buildHiveSyncConfig(TypedProperties props, String b
 
DataSourceWriteOptions.HIVE_SKIP_RO_SUFFIX_FOR_READ_OPTIMIZED_TABLE().defaultValue()));
 hiveSyncConfig.supportTimestamp = 
Boolean.valueOf(props.getString(DataSourceWriteOptions.HIVE_SUPPORT_TIMESTAMP_TYPE().key(),
 DataSourceWriteOptions.HIVE_SUPPORT_TIMESTAMP_TYPE().defaultValue()));
+hiveSyncConfig.useKerberos =
+
Boolean.valueOf(props.getString(DataSourceWriteOptions.HIVE_SYNC_USE_KERBEROS().key(),DataSourceWriteOptions.HIVE_SYNC_USE_KERBEROS().defaultValue()));

Review comment:
   I added in the TestHiveSyncTool testHiveSyncOfKerberosEnvironment method 
is used to test




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3800: [HUDI-2556] Tweak some default config options for flink

2021-10-14 Thread GitBox


hudi-bot commented on pull request #3800:
URL: https://github.com/apache/hudi/pull/3800#issuecomment-943179326


   
   ## CI report:
   
   * 75ee4b1f600b8231384fb65986e974c1edf26590 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3799:
URL: https://github.com/apache/hudi/pull/3799#issuecomment-943176004


   
   ## CI report:
   
   * f44907a941b5b61e642abb5783f70fe8830fe6a6 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2644)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3771:
URL: https://github.com/apache/hudi/pull/3771#issuecomment-939200284


   
   ## CI report:
   
   * b4808aaf973608255c97e1eb1f46ff04d9bb4bee Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2556)
 
   * 9e64e88d819b6b6bf5ccc5811ea5f4714138fc9e UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] test-wangxiaoyu commented on a change in pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

2021-10-14 Thread GitBox


test-wangxiaoyu commented on a change in pull request #3771:
URL: https://github.com/apache/hudi/pull/3771#discussion_r728800351



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##
@@ -77,6 +77,10 @@ public HiveSyncTool(HiveSyncConfig cfg, HiveConf 
configuration, FileSystem fs) {
 super(configuration.getAllProperties(), fs);
 
 try {
+  if (cfg.useKerberos) {
+configuration.set("hive.metastore.sasl.enabled", "true");
+configuration.set("hive.metastore.kerberos.principal", 
cfg.kerberosPrincipal);

Review comment:
   Thanks to review
   I set it to its initial value




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 0fa6297ce58eb877fd5c4eba59fef20ad9335d26 UNKNOWN
   * 6d67b68e19a43f8668e5773d27ca9c33a8de0a37 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2642)
 
   * 1a80559bd98829552acffdaf20d3ea0384d1d936 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-14 Thread GitBox


hudi-bot commented on pull request #3799:
URL: https://github.com/apache/hudi/pull/3799#issuecomment-943176004


   
   ## CI report:
   
   * f44907a941b5b61e642abb5783f70fe8830fe6a6 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3798: 3797 java.lang.NoSuchMethodError: io.javalin.core.CachedRequestWrappe…

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3798:
URL: https://github.com/apache/hudi/pull/3798#issuecomment-943124632


   
   ## CI report:
   
   * e8b0555ed45956734721eda7529f04a3e739a0d2 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2643)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (HUDI-2556) Tweak some default config options for flink

2021-10-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-2556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-2556:
-
Labels: pull-request-available  (was: )

> Tweak some default config options for flink
> ---
>
> Key: HUDI-2556
> URL: https://issues.apache.org/jira/browse/HUDI-2556
> Project: Apache Hudi
>  Issue Type: Task
>Reporter: Danny Chen
>Assignee: Danny Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.10.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] danny0405 opened a new pull request #3800: [HUDI-2556] Tweak some default config options for flink

2021-10-14 Thread GitBox


danny0405 opened a new pull request #3800:
URL: https://github.com/apache/hudi/pull/3800


   * rename write.insert.drop.duplicates to write.precombine and set it as true 
for COW table
   * set index.global.enabled default as true
   * set compaction.target_io default as 500GB
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


xiarixiaoyao commented on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-943163714


   @danny0405 @nsivabalan  thanks you very much for your patience to review 
those code。  already rebase the code and addressed all comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] parisni commented on issue #2498: [SUPPORT] Hudi MERGE_ON_READ load to dataframe fails for the versions [0.6.0],[0.7.0] and runs for [0.5.3]

2021-10-14 Thread GitBox


parisni commented on issue #2498:
URL: https://github.com/apache/hudi/issues/2498#issuecomment-943156528


   @nsivabalan Also I have to mention this is OSS spark 2.4.4 with metastore 
overwrite with aws glue to connect spark to glue: 
https://github.com/awslabs/aws-glue-libs
   This might be related.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] fuyun2024 commented on pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-14 Thread GitBox


fuyun2024 commented on pull request #3799:
URL: https://github.com/apache/hudi/pull/3799#issuecomment-943152937


   @codepe  Are you free to take a look at it for me ? This is my new commit .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] t822876884 commented on issue #3796: [SUPPORT] Flink write to hudi,after running for a period of time,throw a NoClassDefFoundError

2021-10-14 Thread GitBox


t822876884 commented on issue #3796:
URL: https://github.com/apache/hudi/issues/3796#issuecomment-943149014


   
![image](https://user-images.githubusercontent.com/17778114/137283507-2c6d7894-589f-4b79-8157-ddd0397a573c.png)
   the jar  hudi-flink-bundle_2.12-0.9.0.jar from mvnrepository.com has no 
mapreduce 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] fuyun2024 opened a new pull request #3799: [HUDI-2491] hoodie.datasource.hive_sync.mode=hms mode is supported in…

2021-10-14 Thread GitBox


fuyun2024 opened a new pull request #3799:
URL: https://github.com/apache/hudi/pull/3799


   … spark writer option
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
 - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test 
coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please 
describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
 - *Added integration tests for end-to-end.*
 - *Added HoodieClientWriteTest to verify the change.*
 - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
- [ ] Has a corresponding JIRA in PR title & commit

- [ ] Commit message is descriptive of the change

- [ ] CI is green
   
- [ ] Necessary doc changes done or have another open PR
  
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] codope commented on a change in pull request #3771: [HUDI-2402] Add Kerberos configuration options to Hive Sync

2021-10-14 Thread GitBox


codope commented on a change in pull request #3771:
URL: https://github.com/apache/hudi/pull/3771#discussion_r728752802



##
File path: 
hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HiveSyncTool.java
##
@@ -77,6 +77,10 @@ public HiveSyncTool(HiveSyncConfig cfg, HiveConf 
configuration, FileSystem fs) {
 super(configuration.getAllProperties(), fs);
 
 try {
+  if (cfg.useKerberos) {
+configuration.set("hive.metastore.sasl.enabled", "true");
+configuration.set("hive.metastore.kerberos.principal", 
cfg.kerberosPrincipal);

Review comment:
   Let's validate that `cfg.kerberosPrincipal` is not null in this case.

##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java
##
@@ -307,6 +307,10 @@ public static HiveSyncConfig 
buildHiveSyncConfig(TypedProperties props, String b
 
DataSourceWriteOptions.HIVE_SKIP_RO_SUFFIX_FOR_READ_OPTIMIZED_TABLE().defaultValue()));
 hiveSyncConfig.supportTimestamp = 
Boolean.valueOf(props.getString(DataSourceWriteOptions.HIVE_SUPPORT_TIMESTAMP_TYPE().key(),
 DataSourceWriteOptions.HIVE_SUPPORT_TIMESTAMP_TYPE().defaultValue()));
+hiveSyncConfig.useKerberos =
+
Boolean.valueOf(props.getString(DataSourceWriteOptions.HIVE_SYNC_USE_KERBEROS().key(),DataSourceWriteOptions.HIVE_SYNC_USE_KERBEROS().defaultValue()));
+hiveSyncConfig.kerberosPrincipal =
+
props.getString(DataSourceWriteOptions.HIVE_SYNC_KERBEROS_PRINCIPAL().key(), 
DataSourceWriteOptions.HIVE_SYNC_KERBEROS_PRINCIPAL().defaultValue());

Review comment:
   `HIVE_SYNC_KERBEROS_PRINCIPAL` has null default value. This might throw 
HoodieException. Maybe, we can set EMPTY_STRING as default and validate in 
HiveSyncTool that this config is not null or empty when 
`HIVE_SYNC_USE_KERBEROS` is true.

##
File path: 
hudi-spark-datasource/hudi-spark-common/src/main/java/org/apache/hudi/DataSourceUtils.java
##
@@ -307,6 +307,10 @@ public static HiveSyncConfig 
buildHiveSyncConfig(TypedProperties props, String b
 
DataSourceWriteOptions.HIVE_SKIP_RO_SUFFIX_FOR_READ_OPTIMIZED_TABLE().defaultValue()));
 hiveSyncConfig.supportTimestamp = 
Boolean.valueOf(props.getString(DataSourceWriteOptions.HIVE_SUPPORT_TIMESTAMP_TYPE().key(),
 DataSourceWriteOptions.HIVE_SUPPORT_TIMESTAMP_TYPE().defaultValue()));
+hiveSyncConfig.useKerberos =
+
Boolean.valueOf(props.getString(DataSourceWriteOptions.HIVE_SYNC_USE_KERBEROS().key(),DataSourceWriteOptions.HIVE_SYNC_USE_KERBEROS().defaultValue()));

Review comment:
   Can we add a unit test in TestDataSourceUtils or TestHiveSyncTool?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] t822876884 commented on issue #3796: [SUPPORT] Flink write to hudi,after running for a period of time,throw a NoClassDefFoundError

2021-10-14 Thread GitBox


t822876884 commented on issue #3796:
URL: https://github.com/apache/hudi/issues/3796#issuecomment-943128499


   > Seems you do not set up the `HADOOP_CLASSPATH` corrently, how do you 
submit your job ?
   
   i have a machine to submit the job by command  
"/home/flink/flink-1.12.2/bin/flink run -c 
com.xxx.streaming.bdg.exec.YarnDataExecutorSQL -m yarn-cluster -d -yjm 2048 
-ytm 5120 -p 4 -ys 3 -ynm com.xxx.streaming.YarnDataExecutorSQL 
k-bdg-stream.jar"
   
   
   and look at the resource code ,it caused  when merge file.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] fuyun2024 closed pull request #3722: HUDI-2491 hoodie.datasource.hive_sync.mode=hms mode is supported in s…

2021-10-14 Thread GitBox


fuyun2024 closed pull request #3722:
URL: https://github.com/apache/hudi/pull/3722


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3798: 3797 java.lang.NoSuchMethodError: io.javalin.core.CachedRequestWrappe…

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3798:
URL: https://github.com/apache/hudi/pull/3798#issuecomment-943124632


   
   ## CI report:
   
   * e8b0555ed45956734721eda7529f04a3e739a0d2 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2643)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 0fa6297ce58eb877fd5c4eba59fef20ad9335d26 UNKNOWN
   * b5649459e5e8a5ebf5a140418c16951294a29689 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2593)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2604)
 
   * 6d67b68e19a43f8668e5773d27ca9c33a8de0a37 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2642)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot commented on pull request #3798: 3797 java.lang.NoSuchMethodError: io.javalin.core.CachedRequestWrappe…

2021-10-14 Thread GitBox


hudi-bot commented on pull request #3798:
URL: https://github.com/apache/hudi/pull/3798#issuecomment-943124632


   
   ## CI report:
   
   * e8b0555ed45956734721eda7529f04a3e739a0d2 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] t822876884 removed a comment on issue #3796: [SUPPORT] Flink write to hudi,after running for a period of time,throw a NoClassDefFoundError

2021-10-14 Thread GitBox


t822876884 removed a comment on issue #3796:
URL: https://github.com/apache/hudi/issues/3796#issuecomment-943123939


   i have a machine to submit the job by command 
"/home/flink/flink-1.12.2/bin/flink run -c 
com.xxx.streaming.bdg.exec.YarnDataExecutorSQL -m yarn-cluster -d -yjm 2048 
-ytm 5120 -p 4 -ys 3 -ynm YarnDataExecutorSQL k-bdg-stream.jar"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] t822876884 commented on issue #3796: [SUPPORT] Flink write to hudi,after running for a period of time,throw a NoClassDefFoundError

2021-10-14 Thread GitBox


t822876884 commented on issue #3796:
URL: https://github.com/apache/hudi/issues/3796#issuecomment-943123939


   i have a machine to submit the job by command 
"/home/flink/flink-1.12.2/bin/flink run -c 
com.xxx.streaming.bdg.exec.YarnDataExecutorSQL -m yarn-cluster -d -yjm 2048 
-ytm 5120 -p 4 -ys 3 -ynm YarnDataExecutorSQL k-bdg-stream.jar"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290


   
   ## CI report:
   
   * 8328d5b6a6859e6b9b0b043f703ec84ed2607151 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2608)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2641)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r728743555



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##
@@ -119,6 +307,11 @@ void addProjectionToJobConf(final RealtimeSplit 
realtimeSplit, final JobConf job
 addProjectionToJobConf(realtimeSplit, jobConf);
 LOG.info("Creating record reader with readCols :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR)
 + ", Ids :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR));
+
+// for log only split, we no need parquet reader, set it to empty
+if (FSUtils.isLogFile(realtimeSplit.getPath())) {

Review comment:
   agree

##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##
@@ -69,17 +70,18 @@
   private static final Logger LOG = 
LogManager.getLogger(HoodieRealtimeInputFormatUtils.class);
 
   public static InputSplit[] getRealtimeSplits(Configuration conf, 
Stream fileSplits) throws IOException {
-Map> partitionsToParquetSplits =
-fileSplits.collect(Collectors.groupingBy(split -> 
split.getPath().getParent()));
+// for all unique split parents, obtain all delta files based on delta 
commit timeline,
+// grouped on file id
+List rtSplits = new ArrayList<>();
+List candidateFileSplits = 
fileSplits.collect(Collectors.toList());
+Map> partitionsToParquetSplits = candidateFileSplits

Review comment:
   ok




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r728743157



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieEmptyRecordReader.java
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.realtime;
+
+import org.apache.hadoop.io.ArrayWritable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.RecordReader;
+
+import java.io.IOException;
+
+/**
+ * dummy record for log only realtime split.
+ */

Review comment:
   agree
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3762:
URL: https://github.com/apache/hudi/pull/3762#issuecomment-938271221


   
   ## CI report:
   
   * 5fb7a2afa196fd75ada005d26a0fb9fce5472545 UNKNOWN
   * ce6740e42070d594075d47b0efdb3f598fd32e3f Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2640)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] xiarixiaoyao commented on a change in pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


xiarixiaoyao commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r728741908



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##
@@ -66,6 +90,170 @@
 return HoodieRealtimeInputFormatUtils.getRealtimeSplits(job, fileSplits);
   }
 
+  /**
+   * Keep the logical of mor_incr_view as same as spark datasource.
+   * Step1: Get list of commits to be fetched based on start commit and max 
commits(for snapshot max commits is -1).

Review comment:
   agree




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (HUDI-2556) Tweak some default config options for flink

2021-10-14 Thread Danny Chen (Jira)
Danny Chen created HUDI-2556:


 Summary: Tweak some default config options for flink
 Key: HUDI-2556
 URL: https://issues.apache.org/jira/browse/HUDI-2556
 Project: Apache Hudi
  Issue Type: Task
Reporter: Danny Chen
Assignee: Danny Chen
 Fix For: 0.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HUDI-2478) Handle failure mid-way during init buckets

2021-10-14 Thread Prashant Wason (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17428670#comment-17428670
 ] 

Prashant Wason commented on HUDI-2478:
--

Since buckets are initialized only once, the simple fix may be to always delete 
existing ones when initializing. 

Checks needed to ensure - we have exact number of buckets and the user did not 
change count later (as it is not supported). The number of buckets for 
file-listing is 1 but for record index will be configurable.

> Handle failure mid-way during init buckets
> --
>
> Key: HUDI-2478
> URL: https://issues.apache.org/jira/browse/HUDI-2478
> Project: Apache Hudi
>  Issue Type: Sub-task
>Reporter: sivabalan narayanan
>Assignee: sivabalan narayanan
>Priority: Major
> Fix For: 0.10.0
>
>
> If process crashes mid-way while instantiating buckets, if tried again, it 
> should work seamlessly. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [hudi] yiduwangkai opened a new pull request #3798: 3797 java.lang.NoSuchMethodError: io.javalin.core.CachedRequestWrappe…

2021-10-14 Thread GitBox


yiduwangkai opened a new pull request #3798:
URL: https://github.com/apache/hudi/pull/3798


   …r.getContentLengthLong()
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before 
opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *solve javax.servlet jar dependence conflict
   
   ## Brief change log
   
   *(for example:)*
 - *Modify hudi-flink-bundle pom.xml
   
   ## Verify this pull request
 - hudi-flink-bundle pom.xml
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] yiduwangkai opened a new issue #3797: [SUPPORT]

2021-10-14 Thread GitBox


yiduwangkai opened a new issue #3797:
URL: https://github.com/apache/hudi/issues/3797


   * Hudi version : 0.10
   * Hive version : 2.3.8
   * Hadoop version : 2.10.1
   * Flink version : 1.13.1
   i use flink to  fetch data from mysql and sink to hudi, there are some errors
   `http-request
   java.lang.NoSuchMethodError: 
io.javalin.core.CachedRequestWrapper.getContentLengthLong()J
at 
io.javalin.core.CachedRequestWrapper.(CachedRequestWrapper.kt:22) 
~[hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:34) 
~[hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.server.handler.HandlerList.handle(HandlerList.java:61)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.server.Server.handle(Server.java:502) 
[hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.server.HttpChannel.handle(HttpChannel.java:370)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.server.HttpConnection.onFillable(HttpConnection.java:267)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.io.FillInterest.fillable(FillInterest.java:103)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at 
org.apache.hudi.org.apache.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
 [hudi-flink-bundle_2.11-0.10.0-SNAPSHOT.jar:0.10.0-SNAPSHOT]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_291]
   `
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on issue #3796: [SUPPORT] Flink write to hudi,after running for a period of time,throw a NoClassDefFoundError

2021-10-14 Thread GitBox


danny0405 commented on issue #3796:
URL: https://github.com/apache/hudi/issues/3796#issuecomment-943098268


   Seems you do not set up the `HADOOP_CLASSPATH` corrently, how do you submit 
your job ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] danny0405 commented on a change in pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


danny0405 commented on a change in pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#discussion_r728695624



##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##
@@ -69,17 +70,18 @@
   private static final Logger LOG = 
LogManager.getLogger(HoodieRealtimeInputFormatUtils.class);
 
   public static InputSplit[] getRealtimeSplits(Configuration conf, 
Stream fileSplits) throws IOException {
-Map> partitionsToParquetSplits =
-fileSplits.collect(Collectors.groupingBy(split -> 
split.getPath().getParent()));
+// for all unique split parents, obtain all delta files based on delta 
commit timeline,
+// grouped on file id
+List rtSplits = new ArrayList<>();
+List candidateFileSplits = 
fileSplits.collect(Collectors.toList());
+Map> partitionsToParquetSplits = candidateFileSplits

Review comment:
   `candidateFileSplits` => `fileSplitList`

##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieParquetRealtimeInputFormat.java
##
@@ -119,6 +307,11 @@ void addProjectionToJobConf(final RealtimeSplit 
realtimeSplit, final JobConf job
 addProjectionToJobConf(realtimeSplit, jobConf);
 LOG.info("Creating record reader with readCols :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR)
 + ", Ids :" + 
jobConf.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR));
+
+// for log only split, we no need parquet reader, set it to empty
+if (FSUtils.isLogFile(realtimeSplit.getPath())) {

Review comment:
   `we no need parquet reader, set it to empty` => `set the parquet reader 
as empty`

##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/realtime/HoodieEmptyRecordReader.java
##
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.hadoop.realtime;
+
+import org.apache.hadoop.io.ArrayWritable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.Writable;
+import org.apache.hadoop.mapred.JobConf;
+import org.apache.hadoop.mapred.RecordReader;
+
+import java.io.IOException;
+
+/**
+ * dummy record for log only realtime split.
+ */

Review comment:
   `dummy record` => `Dummy record reader`

##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##
@@ -99,7 +101,32 @@
 }
   }
   Option finalHoodieVirtualKeyInfo = 
hoodieVirtualKeyInfo;
-  partitionsToParquetSplits.keySet().forEach(partitionPath -> {
+  // deal with incremental query
+  candidateFileSplits.stream().forEach(s -> {
+try {

Review comment:
   Something like this:
   ```java
 private static Map> filterOutIncrementalSplits(
 List fileSplitList,
 List rtSplits,
 final Option finalHoodieVirtualKeyInfo) {
   return fileSplitList.stream().filter(s -> {
 // deal with incremental query.
 try {
   if (s instanceof BaseFileWithLogsSplit) {
 BaseFileWithLogsSplit bs = (BaseFileWithLogsSplit)s;
 if (bs.getBelongToIncrementalSplit()) {
   rtSplits.add(new HoodieRealtimeFileSplit(bs, bs.getBasePath(), 
bs.getDeltaLogPaths(), bs.getMaxCommitTime(), finalHoodieVirtualKeyInfo));
   return false;
 }
   } else if (s instanceof RealtimeBootstrapBaseFileSplit) {
 rtSplits.add(s);
 return false;
   }
 } catch (IOException e) {
   throw new HoodieIOException("Error creating hoodie real time split 
", e);
 }
 return true;
   }).collect(Collectors.groupingBy(split -> split.getPath().getParent()));
 }
   ```

##
File path: 
hudi-hadoop-mr/src/main/java/org/apache/hudi/hadoop/utils/HoodieRealtimeInputFormatUtils.java
##
@@ -99,7 +101,32 @@
 }
   }
   Option finalHoodieVirtualKeyInfo = 
hoodieVirtualKeyInfo;
-  partitionsToParquetSplits.keySet().forEach(partitionPath -> {
+  // deal with incremental query
+  candidateFileSplits.stream().forEach(s -> {
+try {

Review comment:
   

[GitHub] [hudi] hudi-bot edited a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-914114290


   
   ## CI report:
   
   * 8328d5b6a6859e6b9b0b043f703ec84ed2607151 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2608)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2641)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] manojpec removed a comment on pull request #3794: [HUDI-2553] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-14 Thread GitBox


manojpec removed a comment on pull request #3794:
URL: https://github.com/apache/hudi/pull/3794#issuecomment-943082717


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 removed a comment on pull request #3614: [HUDI-2370] Supports data encryption

2021-10-14 Thread GitBox


liujinhui1994 removed a comment on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-941975593


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] liujinhui1994 commented on pull request #3614: [HUDI-2370] Supports data encryption

2021-10-14 Thread GitBox


liujinhui1994 commented on pull request #3614:
URL: https://github.com/apache/hudi/pull/3614#issuecomment-943089066


   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3203: [HUDI-2086] Refactor hive mor_incremental_view

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3203:
URL: https://github.com/apache/hudi/pull/3203#issuecomment-872092745


   
   ## CI report:
   
   * 0fa6297ce58eb877fd5c4eba59fef20ad9335d26 UNKNOWN
   * b5649459e5e8a5ebf5a140418c16951294a29689 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2593)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2604)
 
   * 6d67b68e19a43f8668e5773d27ca9c33a8de0a37 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] manojpec commented on pull request #3794: [HUDI-2553] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-14 Thread GitBox


manojpec commented on pull request #3794:
URL: https://github.com/apache/hudi/pull/3794#issuecomment-943082717


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3762:
URL: https://github.com/apache/hudi/pull/3762#issuecomment-938271221


   
   ## CI report:
   
   * 5fb7a2afa196fd75ada005d26a0fb9fce5472545 UNKNOWN
   * f27df7aa4e3438bfe003fc054823a88538144c61 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2607)
 
   * ce6740e42070d594075d47b0efdb3f598fd32e3f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2640)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3762:
URL: https://github.com/apache/hudi/pull/3762#issuecomment-938271221


   
   ## CI report:
   
   * 5fb7a2afa196fd75ada005d26a0fb9fce5472545 UNKNOWN
   * f27df7aa4e3438bfe003fc054823a88538144c61 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2607)
 
   * ce6740e42070d594075d47b0efdb3f598fd32e3f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3519: [DO NOT MERGE] 0.9.0 release patch for flink

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3519:
URL: https://github.com/apache/hudi/pull/3519#issuecomment-903204631


   
   ## CI report:
   
   * 1d3142cd55878ba81a358bf0b4d194779585bada Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2638)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-14 Thread GitBox


nsivabalan commented on pull request #3762:
URL: https://github.com/apache/hudi/pull/3762#issuecomment-943066396


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3762: [HUDI-1294] Adding inline read and seek based read(batch get) for hfile log blocks in metadata table

2021-10-14 Thread GitBox


nsivabalan commented on pull request #3762:
URL: https://github.com/apache/hudi/pull/3762#issuecomment-943061856


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] nsivabalan commented on pull request #3794: [HUDI-2553] Metadata table compaction trigger max delta commits default config (re-enable)

2021-10-14 Thread GitBox


nsivabalan commented on pull request #3794:
URL: https://github.com/apache/hudi/pull/3794#issuecomment-943060753


   @hudi-bot azure run


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] t822876884 opened a new issue #3796: [SUPPORT] Flink write to hudi,after running for a period of time,throw a NoClassDefFoundError

2021-10-14 Thread GitBox


t822876884 opened a new issue #3796:
URL: https://github.com/apache/hudi/issues/3796


   hudi 0.9.0
   flink 1.12.2
   
   ```java
   public static void main(String[] args) {
   //ENV
   StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
   env.setStateBackend(new FsStateBackend(YARN_CKP_PATH));
   env.enableCheckpointing(6);
   
env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
   env.setParallelism(1);
   
   EnvironmentSettings settings = 
EnvironmentSettings.newInstance().useBlinkPlanner()
   .inStreamingMode().build();
   StreamTableEnvironment tableEnvironment = 
StreamTableEnvironment.create(env, settings);
   
   FlinkKafkaConsumer consumer = new 
FlinkKafkaConsumer(KAFKA_TOPIC, new SimpleStringSchema(), 
kafkaProperties());
   consumer.setStartFromTimestamp(163379520L);
   
   //SOURCE
   DataStreamSource yarnDS = env
   .addSource(consumer)
   .setParallelism(8);
   
   
   DataStream dataDs = yarnDS.filter(new 
FilterFunction() {
   @Override
   public boolean filter(String value) throws Exception {
   String type = 
JSONObject.parseObject(value).getString("type");
   if (("yarn").equals(type)) {
   return true;
   }
   return false;
   }
   }).setParallelism(4)
   .map(new MapFunction() {
   @Override
   public YarnDataEntity map(String value) throws Exception {
   String data = 
JSONObject.parseObject(value).getString("data");
   YarnDataEntity yarnDataEntities = 
JSONObject.parseObject(data, YarnDataEntity.class);
   
yarnDataEntities.setDt(DateUtil.convertTimeByLong(yarnDataEntities.getStartedTime()));
   return yarnDataEntities;
   }
   }).setParallelism(8);
   
  Table dataDsYarn = tableEnvironment.fromDataStream(dataDs);
   
   //Table result = tableEnvironment.sqlQuery("SELECT * FROM " + 
dataDsYarn);
   //tableEnvironment.toAppendStream(result, 
YarnDataEntity.class).print();
   
   tableEnvironment.executeSql("CREATE TABLE big_data_analyse_yarn(" +
   " allocatedMB INT," +
   " allocatedVCores INT," +
   " amContainerLogs VARCHAR(200)," +
   " amHostHttpAddress VARCHAR(200)," +
   " amNodeLabelExpression VARCHAR(200)," +
   " amRPCAddress VARCHAR(20)," +
   " appNodeLabelExpression VARCHAR(200)," +
   " applicationTags VARCHAR(200)," +
   " applicationType VARCHAR(20)," +
   " clusterId BIGINT," +
   " clusterUsagePercentage FLOAT," +
   " diagnostics VARCHAR(200)," +
   " dt VARCHAR(20)," +
   " elapsedTime BIGINT, " +
   " finalStatus VARCHAR(200)," +
   " finishedTime BIGINT," +
   " id VARCHAR(200)," +
   " logAggregationStatus VARCHAR(200)," +
   " memorySeconds BIGINT, " +
   " name VARCHAR(200)," +
   " numAMContainerPreempted INT, " +
   " numNonAMContainerPreempted INT, " +
   " preemptedResourceMB int," +
   " preemptedResourceVCores BIGINT, " +
   " priority VARCHAR(200)," +
   " progress FLOAT, " +
   " queue VARCHAR(200)," +
   " queueUsagePercentage FLOAT, " +
   " runningContainers INT, " +
   " startedTime BIGINT," +
   " `state` VARCHAR(200)," +
   " trackingUI VARCHAR(200)," +
   " trackingUrl VARCHAR(200)," +
   " unmanagedApplication boolean," +
   " `user` VARCHAR(20)," +
   " vcoreSeconds BIGINT" +
   ")" +
   " PARTITIONED BY (dt)" +
   "WITH (" +
   "  'connector' = 'hudi'," +
   "  'path' = '"+ YARN_DATA_PATH +"'," +
   "  'write.tasks' = '8'," +
   "  'read.streaming.enabled'= 'true',  " +
   "  'table.type' = 'MERGE_ON_READ', " +
   "  'read.streaming.check-interval' = '30'," +
   "  'write.precombine.field' = 'dt'," +
   "  'hoodie.datasource.write.operation' = 'insert'," +
   "  'hoodie.datasource.write.recordkey.field' = 'id' " +
   " )");
   
   tableEnvironment.executeSql("insert into big_data_analyse_yarn 
select * from " + dataDsYarn);
   }
   ```
   
   ```
   

[GitHub] [hudi] hudi-bot edited a comment on pull request #3519: [DO NOT MERGE] 0.9.0 release patch for flink

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3519:
URL: https://github.com/apache/hudi/pull/3519#issuecomment-903204631


   
   ## CI report:
   
   * fd423c27cc15e112b99d8102ab7f5cb9a5d623c5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2637)
 
   * 1d3142cd55878ba81a358bf0b4d194779585bada Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2638)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [hudi] hudi-bot edited a comment on pull request #3519: [DO NOT MERGE] 0.9.0 release patch for flink

2021-10-14 Thread GitBox


hudi-bot edited a comment on pull request #3519:
URL: https://github.com/apache/hudi/pull/3519#issuecomment-903204631


   
   ## CI report:
   
   * fd423c27cc15e112b99d8102ab7f5cb9a5d623c5 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=2637)
 
   * 1d3142cd55878ba81a358bf0b4d194779585bada UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run travis` re-run the last Travis build
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org