[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2023-01-07 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1374727149

   
   ## CI report:
   
   * af728e3eeb1fd694eba037bc9a48869831ddb053 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14172)
 Azure: 
[CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14171)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2023-01-07 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1374725718

   
   ## CI report:
   
   * af728e3eeb1fd694eba037bc9a48869831ddb053 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14171)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14172)
 
   *  Unknown: [CANCELED](TBD) 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2023-01-07 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1374724389

   
   ## CI report:
   
   * af728e3eeb1fd694eba037bc9a48869831ddb053 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7621: [HUDI-5512] fix spark call procedure run_bootstrap missing conf and c…

2023-01-07 Thread GitBox


hudi-bot commented on PR #7621:
URL: https://github.com/apache/hudi/pull/7621#issuecomment-1374723047

   
   ## CI report:
   
   * 2837b8dc79a5e968f9a15e3f79547dcc7f4b142f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14165)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14170)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7608: [HUDI-5503]Optimize flink table factory option check

2023-01-07 Thread GitBox


hudi-bot commented on PR #7608:
URL: https://github.com/apache/hudi/pull/7608#issuecomment-1374723003

   
   ## CI report:
   
   * 158f8a9c55aecdfe8465e092651edbbd24f911f4 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14169)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] XuQianJin-Stars commented on pull request #7621: [HUDI-5512] fix spark call procedure run_bootstrap missing conf and c…

2023-01-07 Thread GitBox


XuQianJin-Stars commented on PR #7621:
URL: https://github.com/apache/hudi/pull/7621#issuecomment-1374715339

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] XuQianJin-Stars commented on pull request #7621: [HUDI-5512] fix spark call procedure run_bootstrap missing conf and c…

2023-01-07 Thread GitBox


XuQianJin-Stars commented on PR #7621:
URL: https://github.com/apache/hudi/pull/7621#issuecomment-1374714684

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on a diff in pull request #7620: [HUDI-5511] Do not clean the CkpMetadata dir when restart the job

2023-01-07 Thread GitBox


SteNicholas commented on code in PR #7620:
URL: https://github.com/apache/hudi/pull/7620#discussion_r1064075908


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/meta/CkpMetadata.java:
##
@@ -92,13 +92,14 @@ public void close() {
   // -
 
   /**
-   * Initialize the message bus, would clean all the messages
+   * Initialize the message bus, would keep all the messages.
*
* This expects to be called by the driver.
*/
   public void bootstrap() throws IOException {
-fs.delete(path, true);
-fs.mkdirs(path);
+if (!fs.exists(path)) {

Review Comment:
   If a checkpoint succeed and the job crashes suddenly, meanwhile the JM 
restarts on another machine instance, the ckp metadata isn't keeped. This 
change only solves the scenario where JM is on the same machine. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7609: [HUDI-5504]Fix concurrency conflict when asyncCompaction is enabled

2023-01-07 Thread GitBox


hudi-bot commented on PR #7609:
URL: https://github.com/apache/hudi/pull/7609#issuecomment-1374708940

   
   ## CI report:
   
   * 94a8e3bb534c386cc55c3150120c8e56b7596f29 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14158)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14163)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14168)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] davidshtian commented on issue #7591: [SUPPORT] Kinesis Data Analytics Flink1.13 to HUDI

2023-01-07 Thread GitBox


davidshtian commented on issue #7591:
URL: https://github.com/apache/hudi/issues/7591#issuecomment-1374699427

   > @davidshtian
   
   @soumilshah1995 Have you tried 1.13.2 version of the packege 
_flink-s3-fs-hadoop-1.13.2.jar_? As KDA [supports for Apache Flink version 
1.13.2](https://docs.aws.amazon.com/kinesisanalytics/latest/java/doc-history.html),
 thanks~


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7608: [HUDI-5503]Optimize flink table factory option check

2023-01-07 Thread GitBox


hudi-bot commented on PR #7608:
URL: https://github.com/apache/hudi/pull/7608#issuecomment-1374699255

   
   ## CI report:
   
   * f7391999a7868e7c97797823cab078a3e42f0bca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14134)
 
   * 158f8a9c55aecdfe8465e092651edbbd24f911f4 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14169)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hbgstc123 commented on a diff in pull request #7608: [HUDI-5503]Optimize flink table factory option check

2023-01-07 Thread GitBox


hbgstc123 commented on code in PR #7608:
URL: https://github.com/apache/hudi/pull/7608#discussion_r1064076705


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java:
##
@@ -88,11 +95,44 @@ public DynamicTableSink createDynamicTableSink(Context 
context) {
 
checkArgument(!StringUtils.isNullOrEmpty(conf.getString(FlinkOptions.PATH)),
 "Option [path] should not be empty.");
 ResolvedSchema schema = context.getCatalogTable().getResolvedSchema();
+mergeTableConfig(conf, schema);
 sanityCheck(conf, schema);
 setupConfOptions(conf, context.getObjectIdentifier(), 
context.getCatalogTable(), schema);
 return new HoodieTableSink(conf, schema);
   }
 
+  /**
+   * fallback pk and pre-combine to table config if not provided
+   */
+  private void mergeTableConfig(Configuration conf, ResolvedSchema schema) {
+String basePath = conf.getOptional(FlinkOptions.PATH).orElseThrow(() ->
+new ValidationException("Option [path] should not be empty."));
+Path metaPath = new CachingPath(basePath, METAFOLDER_NAME);
+FileSystem fileSystem = FSUtils.getFs(metaPath, 
HadoopConfigurations.getHadoopConf(conf));
+HoodieTableConfig tableConfig;
+try {
+  tableConfig = new HoodieTableConfig(fileSystem, metaPath.toString(), 
null, null);
+} catch (HoodieIOException e) {
+  LOG.info("Fail to get table config.", e);
+  return;
+}
+
+Map propsMap = tableConfig.propsMap();
+List writeColumnNames = schema.getColumnNames();
+
+if (!conf.contains(FlinkOptions.RECORD_KEY_FIELD) && 
!schema.getPrimaryKey().isPresent()
+&& propsMap.containsKey(HoodieTableConfig.RECORDKEY_FIELDS.key())
+&& 
writeColumnNames.contains(propsMap.get(HoodieTableConfig.RECORDKEY_FIELDS.key(
 {
+  conf.set(FlinkOptions.RECORD_KEY_FIELD, 
propsMap.get(HoodieTableConfig.RECORDKEY_FIELDS.key()));

Review Comment:
   right, will fix this. thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7608: [HUDI-5503]Optimize flink table factory option check

2023-01-07 Thread GitBox


hudi-bot commented on PR #7608:
URL: https://github.com/apache/hudi/pull/7608#issuecomment-1374698509

   
   ## CI report:
   
   * f7391999a7868e7c97797823cab078a3e42f0bca Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14134)
 
   * 158f8a9c55aecdfe8465e092651edbbd24f911f4 UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hbgstc123 commented on a diff in pull request #7608: [HUDI-5503]Optimize flink table factory option check

2023-01-07 Thread GitBox


hbgstc123 commented on code in PR #7608:
URL: https://github.com/apache/hudi/pull/7608#discussion_r1064076453


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/table/HoodieTableFactory.java:
##
@@ -69,7 +77,6 @@ public class HoodieTableFactory implements 
DynamicTableSourceFactory, DynamicTab
   public DynamicTableSource createDynamicTableSource(Context context) {
 Configuration conf = 
FlinkOptions.fromMap(context.getCatalogTable().getOptions());
 ResolvedSchema schema = context.getCatalogTable().getResolvedSchema();
-sanityCheck(conf, schema);
 setupConfOptions(conf, context.getObjectIdentifier(), 
context.getCatalogTable(), schema);

Review Comment:
   Oh i miss that pk field is used to emit delete data.
   I add sanity check for stream read mor table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hbgstc123 commented on a diff in pull request #7608: [HUDI-5503]Optimize flink table factory option check

2023-01-07 Thread GitBox


hbgstc123 commented on code in PR #7608:
URL: https://github.com/apache/hudi/pull/7608#discussion_r1064076345


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/bulk/RowDataKeyGen.java:
##
@@ -134,7 +155,9 @@ public HoodieKey getHoodieKey(RowData rowData) {
   }
 
   public String getRecordKey(RowData rowData) {
-if (this.simpleRecordKey) {
+if (!hasRecordKey) {
+  return DEFAULT_RECORD_KEY;
+} else if (this.simpleRecordKey) {

Review Comment:
   Not sure if remove the pk field will cause error somewhere, and write a 
identical value should use very low storage in columnar file format like 
parquet, and UUID will use much more space since its uniq so cannot compress 
well, and i don't know where we can use uuid, so i think maybe store a 
identical value for pk is better.
   
   I change default key value to RowDataKeyGen.EMPTY_RECORDKEY_PLACEHOLDER 
since empty row key will report error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] SteNicholas commented on a diff in pull request #7620: [HUDI-5511] Do not clean the CkpMetadata dir when restart the job

2023-01-07 Thread GitBox


SteNicholas commented on code in PR #7620:
URL: https://github.com/apache/hudi/pull/7620#discussion_r1064075908


##
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/meta/CkpMetadata.java:
##
@@ -92,13 +92,14 @@ public void close() {
   // -
 
   /**
-   * Initialize the message bus, would clean all the messages
+   * Initialize the message bus, would keep all the messages.
*
* This expects to be called by the driver.
*/
   public void bootstrap() throws IOException {
-fs.delete(path, true);
-fs.mkdirs(path);
+if (!fs.exists(path)) {

Review Comment:
   If a checkpoint succeed and the job crashes suddenly, meanwhile the JM 
restarts on another machine instance, the ckp metadata isn't keeped. This 
change only solves the scenario where JM is on the same machine. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7609: [HUDI-5504]Fix concurrency conflict when asyncCompaction is enabled

2023-01-07 Thread GitBox


hudi-bot commented on PR #7609:
URL: https://github.com/apache/hudi/pull/7609#issuecomment-1374685640

   
   ## CI report:
   
   * 94a8e3bb534c386cc55c3150120c8e56b7596f29 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14158)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14163)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14168)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6361: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s

2023-01-07 Thread GitBox


hudi-bot commented on PR #6361:
URL: https://github.com/apache/hudi/pull/6361#issuecomment-1374684926

   
   ## CI report:
   
   * a3f8cab6db30b8186e19c3c3ac1c85c0fe3fa63f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14167)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ThinkerLei commented on pull request #7609: [HUDI-5504]Fix concurrency conflict when asyncCompaction is enabled

2023-01-07 Thread GitBox


ThinkerLei commented on PR #7609:
URL: https://github.com/apache/hudi/pull/7609#issuecomment-1374684521

   @hudi-bot run azure
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7609: [HUDI-5504]Fix concurrency conflict when asyncCompaction is enabled

2023-01-07 Thread GitBox


hudi-bot commented on PR #7609:
URL: https://github.com/apache/hudi/pull/7609#issuecomment-1374684204

   
   ## CI report:
   
   * 94a8e3bb534c386cc55c3150120c8e56b7596f29 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14158)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14163)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] KnightChess commented on a diff in pull request #7607: [HUDI-5499] Fixing Spark SQL configs not being properly propagated for CTAS and other commands

2023-01-07 Thread GitBox


KnightChess commented on code in PR #7607:
URL: https://github.com/apache/hudi/pull/7607#discussion_r1064068273


##
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##
@@ -81,10 +80,8 @@ trait ProvidesHoodieConfig extends Logging {
 HoodieSyncConfig.META_SYNC_PARTITION_FIELDS.key -> 
tableConfig.getPartitionFieldProp,
 HoodieSyncConfig.META_SYNC_PARTITION_EXTRACTOR_CLASS.key -> 
hiveSyncConfig.getStringOrDefault(HoodieSyncConfig.META_SYNC_PARTITION_EXTRACTOR_CLASS),
 HiveSyncConfigHolder.HIVE_SUPPORT_TIMESTAMP_TYPE.key -> 
hiveSyncConfig.getBoolean(HiveSyncConfigHolder.HIVE_SUPPORT_TIMESTAMP_TYPE).toString,
-HoodieWriteConfig.UPSERT_PARALLELISM_VALUE.key -> 
hoodieProps.getString(HoodieWriteConfig.UPSERT_PARALLELISM_VALUE.key, "200"),

Review Comment:
   > Does this mean that the upsert parallelism cannot be tuned anymore from 
the SQL statement? Generally, are the key-value pairs in `Map.apply()` just 
overrides?
   
   the `combineOptions` method add it from SQLConf, and the properties priority 
logical is different from the old, `Map.apply()` is highest
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ThinkerLei commented on pull request #7609: [HUDI-5504]Fix concurrency conflict when asyncCompaction is enabled

2023-01-07 Thread GitBox


ThinkerLei commented on PR #7609:
URL: https://github.com/apache/hudi/pull/7609#issuecomment-1374672841

   Test failure has nothing to do with  this PR, @hudi-bot run azure re-run the 
last Azure build


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6361: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s

2023-01-07 Thread GitBox


hudi-bot commented on PR #6361:
URL: https://github.com/apache/hudi/pull/6361#issuecomment-1374649595

   
   ## CI report:
   
   * edfcc047ac71663a47813ac4187a523cbd0e5c9e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14162)
 
   * a3f8cab6db30b8186e19c3c3ac1c85c0fe3fa63f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14167)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6361: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s

2023-01-07 Thread GitBox


hudi-bot commented on PR #6361:
URL: https://github.com/apache/hudi/pull/6361#issuecomment-1374648144

   
   ## CI report:
   
   * edfcc047ac71663a47813ac4187a523cbd0e5c9e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14162)
 
   * a3f8cab6db30b8186e19c3c3ac1c85c0fe3fa63f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7573: [HUDI-5484] Avoid using `GenericRecord` in `HoodieColumnStatMetadata`

2023-01-07 Thread GitBox


hudi-bot commented on PR #7573:
URL: https://github.com/apache/hudi/pull/7573#issuecomment-1374619773

   
   ## CI report:
   
   * c59596637cd44124388717082704db7e7bb8bdaf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14164)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14166)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7573: [HUDI-5484] Avoid using `GenericRecord` in `HoodieColumnStatMetadata`

2023-01-07 Thread GitBox


hudi-bot commented on PR #7573:
URL: https://github.com/apache/hudi/pull/7573#issuecomment-1374569864

   
   ## CI report:
   
   * c59596637cd44124388717082704db7e7bb8bdaf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14164)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14166)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] soumilshah1995 commented on issue #7591: [SUPPORT] Kinesis Data Analytics Flink1.13 to HUDI

2023-01-07 Thread GitBox


soumilshah1995 commented on issue #7591:
URL: https://github.com/apache/hudi/issues/7591#issuecomment-1374565548

   @davidshtian


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] cxzl25 commented on pull request #7573: [HUDI-5484] Avoid using `GenericRecord` in `HoodieColumnStatMetadata`

2023-01-07 Thread GitBox


cxzl25 commented on PR #7573:
URL: https://github.com/apache/hudi/pull/7573#issuecomment-1374546366

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #7487: [SUPPORT] S3 Buckets reached quota limit when reading from hudi tables

2023-01-07 Thread GitBox


xushiyan commented on issue #7487:
URL: https://github.com/apache/hudi/issues/7487#issuecomment-1374545829

   Is this still happening? pls share more info like what the job is doing when 
this occurs - is it reading or writing? the logs would tell. It's likely due to 
a lot of small files. have you run clustering for this table? what do the 
writer configs look like?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #7533: [SUPPORT] Recreate deleted metadata table

2023-01-07 Thread GitBox


xushiyan commented on issue #7533:
URL: https://github.com/apache/hudi/issues/7533#issuecomment-1374545025

   @szingerpeter @yihua what is the latest state of this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #7494: FileNotFoundException while writing dataframe to local file system

2023-01-07 Thread GitBox


xushiyan commented on issue #7494:
URL: https://github.com/apache/hudi/issues/7494#issuecomment-1374544413

   > java.io.FileNotFoundException: File file:/tmp/hudi_trips_cow_4 does not 
exist
   
   Likely the file path scheme is not working. pls refer to @jonvex 's complete 
example above. will close this as working example provided.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan closed issue #7494: FileNotFoundException while writing dataframe to local file system

2023-01-07 Thread GitBox


xushiyan closed issue #7494: FileNotFoundException while writing dataframe to 
local file system
URL: https://github.com/apache/hudi/issues/7494


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan closed issue #7507: [SUPPORT] how to use flink offline with occ

2023-01-07 Thread GitBox


xushiyan closed issue #7507: [SUPPORT] how to use flink offline with occ
URL: https://github.com/apache/hudi/issues/7507


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #7507: [SUPPORT] how to use flink offline with occ

2023-01-07 Thread GitBox


xushiyan commented on issue #7507:
URL: https://github.com/apache/hudi/issues/7507#issuecomment-1374543732

   closing this as suggestion was provided.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan closed issue #7530: Hudi Log files are increasing in our application day by day

2023-01-07 Thread GitBox


xushiyan closed issue #7530: Hudi Log files are increasing in our application 
day by day 
URL: https://github.com/apache/hudi/issues/7530


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #7530: Hudi Log files are increasing in our application day by day

2023-01-07 Thread GitBox


xushiyan commented on issue #7530:
URL: https://github.com/apache/hudi/issues/7530#issuecomment-1374542671

   is this the same issue as https://github.com/apache/hudi/issues/7600 ? let's 
consolidate the discussion in one place. moving the discussion there and will 
link the issue there and close this one.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7621: [HUDI-5512] fix spark call procedure run_bootstrap missing conf and c…

2023-01-07 Thread GitBox


hudi-bot commented on PR #7621:
URL: https://github.com/apache/hudi/pull/7621#issuecomment-1374542301

   
   ## CI report:
   
   * 2837b8dc79a5e968f9a15e3f79547dcc7f4b142f Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14165)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7573: [HUDI-5484] Avoid using `GenericRecord` in `HoodieColumnStatMetadata`

2023-01-07 Thread GitBox


hudi-bot commented on PR #7573:
URL: https://github.com/apache/hudi/pull/7573#issuecomment-1374542270

   
   ## CI report:
   
   * c59596637cd44124388717082704db7e7bb8bdaf Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14164)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #7531: [SUPPORT] table comments not fully supported

2023-01-07 Thread GitBox


xushiyan commented on issue #7531:
URL: https://github.com/apache/hudi/issues/7531#issuecomment-1374538276

   @jonvex can you look into this please? looks like some config fixes should 
resolve it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5513) Improve documentation for spark-sql write configs

2023-01-07 Thread Jonathan Vexler (Jira)
Jonathan Vexler created HUDI-5513:
-

 Summary: Improve documentation for spark-sql write configs
 Key: HUDI-5513
 URL: https://issues.apache.org/jira/browse/HUDI-5513
 Project: Apache Hudi
  Issue Type: Improvement
  Components: configs, spark-sql
Reporter: Jonathan Vexler


Add documentation for how to set write configs in spark-sql, especially in the 
situation when working with multiple tables.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xushiyan commented on issue #7539: [SUPPORT]IllegalStateException: Trying to access closed classloader

2023-01-07 Thread GitBox


xushiyan commented on issue #7539:
URL: https://github.com/apache/hudi/issues/7539#issuecomment-1374525865

   @hbgstc123 does this happen every few hours or it only happened once so far? 
can you try upgrading to 0.12.2 and see how it goes?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] jomach commented on issue #7565: [SUPPORT] Memory Exception when building BuildProfile

2023-01-07 Thread GitBox


jomach commented on issue #7565:
URL: https://github.com/apache/hudi/issues/7565#issuecomment-1374525652

   The executors are being killed due to memory exceptions. (OOM)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-07 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5485:
-
Component/s: metadata

> Improve performance of savepoint with MDT
> -
>
> Key: HUDI-5485
> URL: https://issues.apache.org/jira/browse/HUDI-5485
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: metadata
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Critical
> Fix For: 0.13.0
>
>
> [https://github.com/apache/hudi/issues/7541]
> When metadata table is enabled, the savepoint operation is slow for a large 
> number of partitions (e.g., 75k).  The root cause is that for each partition, 
> the metadata table is scanned, which is unnecessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5485) Improve performance of savepoint with MDT

2023-01-07 Thread Raymond Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raymond Xu updated HUDI-5485:
-
Priority: Blocker  (was: Critical)

> Improve performance of savepoint with MDT
> -
>
> Key: HUDI-5485
> URL: https://issues.apache.org/jira/browse/HUDI-5485
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: metadata
>Reporter: Ethan Guo
>Assignee: Ethan Guo
>Priority: Blocker
> Fix For: 0.13.0
>
>
> [https://github.com/apache/hudi/issues/7541]
> When metadata table is enabled, the savepoint operation is slow for a large 
> number of partitions (e.g., 75k).  The root cause is that for each partition, 
> the metadata table is scanned, which is unnecessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] xushiyan commented on issue #7557: [SUPPORT]: org.apache.hudi.exception.HoodieException: Could not sync using the meta sync class org.apache.hudi.hive.HiveSyncTool

2023-01-07 Thread GitBox


xushiyan commented on issue #7557:
URL: https://github.com/apache/hudi/issues/7557#issuecomment-1374523969

   >  *Hive version : 1.2.1000
   
   Hive 1.x is not supported. pls try upgrade to Hive 2.x or 3.x. Also if 
you're on hudi 0.11.0, pls consider upgrade to later patch releases like 0.11.1 
or 0.12.2


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7609: [HUDI-5504]Fix concurrency conflict when asyncCompaction is enabled

2023-01-07 Thread GitBox


hudi-bot commented on PR #7609:
URL: https://github.com/apache/hudi/pull/7609#issuecomment-1374523316

   
   ## CI report:
   
   * 94a8e3bb534c386cc55c3150120c8e56b7596f29 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14158)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14163)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #7565: [SUPPORT] Memory Exception when building BuildProfile

2023-01-07 Thread GitBox


xushiyan commented on issue #7565:
URL: https://github.com/apache/hudi/issues/7565#issuecomment-1374522126

   ```java
   inputRecords
   .mapToPair(record -> Pair.of(
   new Tuple2<>(record.getPartitionPath(), 
Option.ofNullable(record.getCurrentLocation())), record))
   .countByKey();
   ```
   
   you should refer to 
`org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor#buildProfile`
 which is used by spark.
   
   I think this is more of a spark job tuning issue, where parallelism and 
executor memory should be tuned. 
   
   > Reason: Remote RPC client disassociated. Likely due to containers 
exceeding thresholds, or network issues. Check driver logs for WARN messages.
   
   Any further info on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7609: [HUDI-5504]Fix concurrency conflict when asyncCompaction is enabled

2023-01-07 Thread GitBox


hudi-bot commented on PR #7609:
URL: https://github.com/apache/hudi/pull/7609#issuecomment-1374521685

   
   ## CI report:
   
   * 94a8e3bb534c386cc55c3150120c8e56b7596f29 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14158)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14163)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan closed issue #7570: [SUPPORT]Sync hive lost some partitions when submit multiple commits at the same time

2023-01-07 Thread GitBox


xushiyan closed issue #7570: [SUPPORT]Sync hive lost some partitions when 
submit multiple commits at the same time 
URL: https://github.com/apache/hudi/issues/7570


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #7589: Keep only clustered file(all) after cleaning

2023-01-07 Thread GitBox


xushiyan commented on issue #7589:
URL: https://github.com/apache/hudi/issues/7589#issuecomment-1374517461

   @maheshguptags what you need is to do savepointing. see 
https://hudi.apache.org/docs/disaster_recovery
   For each clustering (replace commit), you just need to trigger a savepoint 
and then cleaner won't delete the savepointed commit and its files, hence 
retain it forever (until you delete the savepoint).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan closed issue #7596: [SUPPORT] java.lang.NoSuchMethodException: org.apache.hudi.utilities.sources.AvroKafkaSource when running HoodieDeltaStreamer

2023-01-07 Thread GitBox


xushiyan closed issue #7596: [SUPPORT] java.lang.NoSuchMethodException: 
org.apache.hudi.utilities.sources.AvroKafkaSource when running 
HoodieDeltaStreamer
URL: https://github.com/apache/hudi/issues/7596


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] ThinkerLei commented on pull request #7609: [HUDI-5504]Fix concurrency conflict when asyncCompaction is enabled

2023-01-07 Thread GitBox


ThinkerLei commented on PR #7609:
URL: https://github.com/apache/hudi/pull/7609#issuecomment-1374513867

   @hudi-bot run azure re-run the last Azure build


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #7600: Hoodie clean is not deleting old files for MOR table

2023-01-07 Thread GitBox


xushiyan commented on issue #7600:
URL: https://github.com/apache/hudi/issues/7600#issuecomment-1374513596

   @SabyasachiDasTR have you observed any error or warn in logs? it's likely 
that something is blocking the clean or failing it. Can you search logs and 
find any statement wrt "clean"? looks like it just stop clean at some point.
   
   yes you can use cli to trigger clean manually. it won't impact the data. if 
you want to be cautious, you can perform it against a table clone to try it 
out. If something is failing the clean, it'll be the same result though. Need 
to check the logs still.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #7602: [SUPPORT] When does the Spark engine's bulk insert mode support bucket index

2023-01-07 Thread GitBox


xushiyan commented on issue #7602:
URL: https://github.com/apache/hudi/issues/7602#issuecomment-1374503920

   @minihippo can you please advise? it's gonna be a very useful improvement


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] xushiyan commented on issue #7617: [SUPPORT] Hudi "write" command doesn't fail when on incompatible partition type, but "read" command fails.

2023-01-07 Thread GitBox


xushiyan commented on issue #7617:
URL: https://github.com/apache/hudi/issues/7617#issuecomment-1374501572

   @jonvex can you help verify this with 0.12.2 and master version pls? just to 
confirm if the behavior was fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7621: [HUDI-5512] fix spark call procedure run_bootstrap missing conf and c…

2023-01-07 Thread GitBox


hudi-bot commented on PR #7621:
URL: https://github.com/apache/hudi/pull/7621#issuecomment-1374499632

   
   ## CI report:
   
   * 2837b8dc79a5e968f9a15e3f79547dcc7f4b142f Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14165)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2023-01-07 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1374499289

   
   ## CI report:
   
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7621: [HUDI-5512] fix spark call procedure run_bootstrap missing conf and c…

2023-01-07 Thread GitBox


hudi-bot commented on PR #7621:
URL: https://github.com/apache/hudi/pull/7621#issuecomment-1374498072

   
   ## CI report:
   
   * 2837b8dc79a5e968f9a15e3f79547dcc7f4b142f UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7573: [HUDI-5484] Avoid using `GenericRecord` in `HoodieColumnStatMetadata`

2023-01-07 Thread GitBox


hudi-bot commented on PR #7573:
URL: https://github.com/apache/hudi/pull/7573#issuecomment-1374498015

   
   ## CI report:
   
   * 1ac267ba9af690ecd47f74f60c34851387aee9eb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14080)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14083)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14089)
 
   * c59596637cd44124388717082704db7e7bb8bdaf Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14164)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2023-01-07 Thread GitBox


hudi-bot commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1374497670

   
   ## CI report:
   
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7573: [HUDI-5484] Avoid using `GenericRecord` in `HoodieColumnStatMetadata`

2023-01-07 Thread GitBox


hudi-bot commented on PR #7573:
URL: https://github.com/apache/hudi/pull/7573#issuecomment-1374496229

   
   ## CI report:
   
   * 1ac267ba9af690ecd47f74f60c34851387aee9eb Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14080)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14083)
 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14089)
 
   * c59596637cd44124388717082704db7e7bb8bdaf UNKNOWN
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yuzhaojing closed pull request #6732: [HUDI-4148] Add client for hudi table service manager

2023-01-07 Thread GitBox


yuzhaojing closed pull request #6732: [HUDI-4148] Add client for hudi table 
service manager
URL: https://github.com/apache/hudi/pull/6732


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] yuzhaojing opened a new pull request, #6732: [HUDI-4148] Add client for hudi table service manager

2023-01-07 Thread GitBox


yuzhaojing opened a new pull request, #6732:
URL: https://github.com/apache/hudi/pull/6732

   ### Change Logs
   
   Refactor the part of BaseHoodieWriteClient about table service and wrapped 
it into BaseHoodieTableServiceClient.
   
   _About the Public API for the table service part of BaseHoodieWriteClient._
   
   Add BaseTableServiceClient.
   
   ### Impact
   
   Affect core writer paths
   
   ### Risk level
   
   Medium
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Updated] (HUDI-5512) spark call procedure run_bootstrap missing params cause job fail

2023-01-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HUDI-5512:
-
Labels: pull-request-available  (was: )

> spark call procedure run_bootstrap missing params cause job fail
> 
>
> Key: HUDI-5512
> URL: https://issues.apache.org/jira/browse/HUDI-5512
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: spark-sql
>Reporter: KnightChess
>Assignee: KnightChess
>Priority: Major
>  Labels: pull-request-available
>
> # spark sql call procedure run_bootstrap lose many conf when save to 
> `hoodit.properties`
>  # some conf can not take effect sometimes, like key_gen_class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] KnightChess opened a new pull request, #7621: [HUDI-5512] fix spark call procedure run_bootstrap missing conf and c…

2023-01-07 Thread GitBox


KnightChess opened a new pull request, #7621:
URL: https://github.com/apache/hudi/pull/7621

   ### Change Logs
   
   - According to the init bootstrap table code in `BootstrapExecutor` and 
`HoodieSparkSqlWriter`, add some conf to hoodie.properties
   - fix `key_generator_class` can not take effect
   
   ### Impact
   
   None, current conf will contain all old conf
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (HUDI-5512) spark call procedure run_bootstrap missing params cause job fail

2023-01-07 Thread KnightChess (Jira)
KnightChess created HUDI-5512:
-

 Summary: spark call procedure run_bootstrap missing params cause 
job fail
 Key: HUDI-5512
 URL: https://issues.apache.org/jira/browse/HUDI-5512
 Project: Apache Hudi
  Issue Type: Bug
  Components: spark-sql
Reporter: KnightChess
Assignee: KnightChess


# spark sql call procedure run_bootstrap lose many conf when save to 
`hoodit.properties`
 # some conf can not take effect sometimes, like key_gen_class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [hudi] yuzhaojing commented on pull request #6732: [HUDI-4148] Add client for hudi table service manager

2023-01-07 Thread GitBox


yuzhaojing commented on PR #6732:
URL: https://github.com/apache/hudi/pull/6732#issuecomment-1374481061

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7609: [HUDI-5504]Fix concurrency conflict when asyncCompaction is enabled

2023-01-07 Thread GitBox


hudi-bot commented on PR #7609:
URL: https://github.com/apache/hudi/pull/7609#issuecomment-1374470670

   
   ## CI report:
   
   * 94a8e3bb534c386cc55c3150120c8e56b7596f29 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14158)
 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14163)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7620: [HUDI-5511] Do not clean the CkpMetadata dir when restart the job

2023-01-07 Thread GitBox


hudi-bot commented on PR #7620:
URL: https://github.com/apache/hudi/pull/7620#issuecomment-1374429123

   
   ## CI report:
   
   * cd670233392323f8602950a5d2595661b668f3e9 Azure: 
[SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14161)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #6361: [HUDI-4690][HUDI-4503] Cleaning up Hudi custom Spark `Rule`s

2023-01-07 Thread GitBox


hudi-bot commented on PR #6361:
URL: https://github.com/apache/hudi/pull/6361#issuecomment-1374428866

   
   ## CI report:
   
   * edfcc047ac71663a47813ac4187a523cbd0e5c9e Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14162)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #7609: [HUDI-5504]Fix concurrency conflict when asyncCompaction is enabled

2023-01-07 Thread GitBox


hudi-bot commented on PR #7609:
URL: https://github.com/apache/hudi/pull/7609#issuecomment-1374415222

   
   ## CI report:
   
   * 94a8e3bb534c386cc55c3150120c8e56b7596f29 Azure: 
[FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14158)
 Azure: 
[PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14163)
 
   
   
   Bot commands
 @hudi-bot supports the following commands:
   
- `@hudi-bot run azure` re-run the last Azure build
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org