[jira] [Created] (HUDI-6451) Randomly obtain a path in HoodieMemoryConfig#getDefaultSpillableMapBasePath

2023-06-27 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-6451:


 Summary: Randomly obtain a path in 
HoodieMemoryConfig#getDefaultSpillableMapBasePath
 Key: HUDI-6451
 URL: https://issues.apache.org/jira/browse/HUDI-6451
 Project: Apache Hudi
  Issue Type: Improvement
  Components: cli
Reporter: Shilun Fan


The HoodieMemoryConfig#getDefaultSpillableMapBasePath method retrieves a path 
from YARN's LOCAL_DIRS environment variable. However, the order of LOCAL_DIRS 
concatenation in YARN is typically fixed, resulting in the 
DefaultSpillableMapBasePath being a fixed value. Considering the disk load 
perspective, we should randomize the selection of a path from the array.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6451) Randomly obtain a path in HoodieMemoryConfig#getDefaultSpillableMapBasePath

2023-06-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-6451:
-
Status: In Progress  (was: Open)

> Randomly obtain a path in HoodieMemoryConfig#getDefaultSpillableMapBasePath
> ---
>
> Key: HUDI-6451
> URL: https://issues.apache.org/jira/browse/HUDI-6451
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Priority: Major
>
> The HoodieMemoryConfig#getDefaultSpillableMapBasePath method retrieves a path 
> from YARN's LOCAL_DIRS environment variable. However, the order of LOCAL_DIRS 
> concatenation in YARN is typically fixed, resulting in the 
> DefaultSpillableMapBasePath being a fixed value. Considering the disk load 
> perspective, we should randomize the selection of a path from the array.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6451) Randomly obtain a path in HoodieMemoryConfig#getDefaultSpillableMapBasePath

2023-06-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-6451:


Assignee: Shilun Fan

> Randomly obtain a path in HoodieMemoryConfig#getDefaultSpillableMapBasePath
> ---
>
> Key: HUDI-6451
> URL: https://issues.apache.org/jira/browse/HUDI-6451
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> The HoodieMemoryConfig#getDefaultSpillableMapBasePath method retrieves a path 
> from YARN's LOCAL_DIRS environment variable. However, the order of LOCAL_DIRS 
> concatenation in YARN is typically fixed, resulting in the 
> DefaultSpillableMapBasePath being a fixed value. Considering the disk load 
> perspective, we should randomize the selection of a path from the array.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6086) Improve HiveSchemaUtil#generateCreateDDL With StringBuilder.

2023-04-26 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-6086:
-
Summary: Improve HiveSchemaUtil#generateCreateDDL With StringBuilder.  
(was: Improve HiveSchemaUtil#generateCreateDDL With ST)

> Improve HiveSchemaUtil#generateCreateDDL With StringBuilder.
> 
>
> Key: HUDI-6086
> URL: https://issues.apache.org/jira/browse/HUDI-6086
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> The code of HiveSchemaUtil#generateCreateDDL uses a lot of append, which 
> makes the code very difficult to read. Usually, in this case, we should use 
> antlr's ST to generate SQL. This jira will use ST to improve this part of the 
> code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6086) Improve HiveSchemaUtil#generateCreateDDL With ST

2023-04-17 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-6086:


 Summary: Improve HiveSchemaUtil#generateCreateDDL With ST
 Key: HUDI-6086
 URL: https://issues.apache.org/jira/browse/HUDI-6086
 Project: Apache Hudi
  Issue Type: Improvement
  Components: hive
Reporter: Shilun Fan
Assignee: Shilun Fan


The code of HiveSchemaUtil#generateCreateDDL uses a lot of append, which makes 
the code very difficult to read. Usually, in this case, we should use antlr's 
ST to generate SQL. This jira will use ST to improve this part of the code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6086) Improve HiveSchemaUtil#generateCreateDDL With ST

2023-04-17 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-6086:
-
Status: In Progress  (was: Open)

> Improve HiveSchemaUtil#generateCreateDDL With ST
> 
>
> Key: HUDI-6086
> URL: https://issues.apache.org/jira/browse/HUDI-6086
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> The code of HiveSchemaUtil#generateCreateDDL uses a lot of append, which 
> makes the code very difficult to read. Usually, in this case, we should use 
> antlr's ST to generate SQL. This jira will use ST to improve this part of the 
> code



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6079) Improve the code of HMSDDLExecutor, HiveQueryDDLExecutor

2023-04-14 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-6079:
-
Status: In Progress  (was: Open)

> Improve the code of HMSDDLExecutor, HiveQueryDDLExecutor
> 
>
> Key: HUDI-6079
> URL: https://issues.apache.org/jira/browse/HUDI-6079
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> 1. Modify the log format
> 2. Remove redundant code
> 3. Increase code readability



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6079) Improve the code of HMSDDLExecutor, HiveQueryDDLExecutor

2023-04-14 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-6079:


Assignee: Shilun Fan

> Improve the code of HMSDDLExecutor, HiveQueryDDLExecutor
> 
>
> Key: HUDI-6079
> URL: https://issues.apache.org/jira/browse/HUDI-6079
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> 1. Modify the log format
> 2. Remove redundant code
> 3. Increase code readability



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6079) Improve the code of HMSDDLExecutor, HiveQueryDDLExecutor

2023-04-14 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-6079:


 Summary: Improve the code of HMSDDLExecutor, HiveQueryDDLExecutor
 Key: HUDI-6079
 URL: https://issues.apache.org/jira/browse/HUDI-6079
 Project: Apache Hudi
  Issue Type: Improvement
  Components: hive
Reporter: Shilun Fan


1. Modify the log format
2. Remove redundant code
3. Increase code readability



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6064) Improve JDBCExecutor#getTableSchema Use ColName

2023-04-11 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-6064:


 Summary: Improve JDBCExecutor#getTableSchema Use ColName
 Key: HUDI-6064
 URL: https://issues.apache.org/jira/browse/HUDI-6064
 Project: Apache Hudi
  Issue Type: Improvement
  Components: hive
Reporter: Shilun Fan
Assignee: Shilun Fan


JDBCExecutor#getTableSchema Use ColIndex, which is not conducive to code 
reading, use ColName instead of ColIndex.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6064) Improve JDBCExecutor#getTableSchema Use ColName

2023-04-11 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-6064:
-
Status: In Progress  (was: Open)

> Improve JDBCExecutor#getTableSchema Use ColName
> ---
>
> Key: HUDI-6064
> URL: https://issues.apache.org/jira/browse/HUDI-6064
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: hive
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> JDBCExecutor#getTableSchema Use ColIndex, which is not conducive to code 
> reading, use ColName instead of ColIndex.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-6063) Modify logging errors In JDBCExecutor

2023-04-11 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-6063:


Assignee: Shilun Fan

> Modify logging errors In JDBCExecutor
> -
>
> Key: HUDI-6063
> URL: https://issues.apache.org/jira/browse/HUDI-6063
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> There is a logging error in JDBCExecutor. During the process of drop 
> partitions, the log prints add partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-6063) Modify logging errors In JDBCExecutor

2023-04-11 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-6063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-6063:
-
Status: In Progress  (was: Open)

> Modify logging errors In JDBCExecutor
> -
>
> Key: HUDI-6063
> URL: https://issues.apache.org/jira/browse/HUDI-6063
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: hive
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> There is a logging error in JDBCExecutor. During the process of drop 
> partitions, the log prints add partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-6063) Modify logging errors In JDBCExecutor

2023-04-11 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-6063:


 Summary: Modify logging errors In JDBCExecutor
 Key: HUDI-6063
 URL: https://issues.apache.org/jira/browse/HUDI-6063
 Project: Apache Hudi
  Issue Type: Bug
  Components: hive
Reporter: Shilun Fan


There is a logging error in JDBCExecutor. During the process of drop 
partitions, the log prints add partitions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5958) Improve ResolvedSchema Instead of TableSchema

2023-03-19 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-5958:


 Summary: Improve ResolvedSchema Instead of TableSchema
 Key: HUDI-5958
 URL: https://issues.apache.org/jira/browse/HUDI-5958
 Project: Apache Hudi
  Issue Type: Improvement
  Components: flink
Reporter: Shilun Fan


When reading the code, I found that there is a case of using TableSchema in the 
flink-example project, TableSchema has been Deprecated, We can use 
resolvedSchema instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5958) Improve ResolvedSchema Instead of TableSchema

2023-03-19 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-5958:


Assignee: Shilun Fan

> Improve ResolvedSchema Instead of TableSchema
> -
>
> Key: HUDI-5958
> URL: https://issues.apache.org/jira/browse/HUDI-5958
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> When reading the code, I found that there is a case of using TableSchema in 
> the flink-example project, TableSchema has been Deprecated, We can use 
> resolvedSchema instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5958) Improve ResolvedSchema Instead of TableSchema

2023-03-19 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-5958:
-
Status: In Progress  (was: Open)

> Improve ResolvedSchema Instead of TableSchema
> -
>
> Key: HUDI-5958
> URL: https://issues.apache.org/jira/browse/HUDI-5958
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: flink
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> When reading the code, I found that there is a case of using TableSchema in 
> the flink-example project, TableSchema has been Deprecated, We can use 
> resolvedSchema instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5876) Remove usage of deprecated TableConfig.

2023-03-04 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-5876:


 Summary: Remove usage of deprecated TableConfig.
 Key: HUDI-5876
 URL: https://issues.apache.org/jira/browse/HUDI-5876
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Shilun Fan
Assignee: Shilun Fan


This is a small change, I found out that SortOperatorGen initializes 
TableConfig using deprecated method. Use recommended methods to improve.

TableConfig

/** Please use \{@link TableConfig#getDefault()} instead. */
@Deprecated
public TableConfig() {}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5869) Fix Some Typos in Hudi-Common

2023-03-01 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-5869:
-
Status: In Progress  (was: Open)

> Fix Some Typos in Hudi-Common
> -
>
> Key: HUDI-5869
> URL: https://issues.apache.org/jira/browse/HUDI-5869
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>
> When reading the code, I found some typo issues and fixed them



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5869) Fix Some Typos in Hudi-Common

2023-03-01 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-5869:


 Summary: Fix Some Typos in Hudi-Common
 Key: HUDI-5869
 URL: https://issues.apache.org/jira/browse/HUDI-5869
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Shilun Fan
Assignee: Shilun Fan


When reading the code, I found some typo issues and fixed them



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5389) Remove Hudi Cli Duplicates Code

2023-02-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-5389:


Assignee: Shilun Fan

> Remove Hudi Cli Duplicates Code
> ---
>
> Key: HUDI-5389
> URL: https://issues.apache.org/jira/browse/HUDI-5389
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
>
> In the process of reading the code, I found some duplicate code, I think this 
> part of the duplicate code can be removed directly.
> ||cli||hudi-spark||
> |org.apache.hudi.cli.DedupeSparkJob|org.apache.spark.sql.hudi.DedupeSparkJob|
> |org.apache.hudi.cli.DeDupeType|org.apache.spark.sql.hudi.DeDupeType|
> |org.apache.hudi.cli.SparkHelpers|org.apache.spark.sql.hudi.SparkHelpers|
> The code on the left side of the table can be directly replaced by the code 
> on the right side of the table, because their contents are exactly the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5398) Fix Typo in hudi-integ-test#README.md

2023-02-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-5398:


Assignee: Shilun Fan

> Fix Typo in hudi-integ-test#README.md
> -
>
> Key: HUDI-5398
> URL: https://issues.apache.org/jira/browse/HUDI-5398
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When reading the README.md of hudi-integ-test, I found some Typo, after 
> reading the document, fix it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5283) Replace deprecated method Schema.parse with Schema.Parser

2023-02-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-5283:


Assignee: Shilun Fan

> Replace deprecated method Schema.parse with Schema.Parser
> -
>
> Key: HUDI-5283
> URL: https://issues.apache.org/jira/browse/HUDI-5283
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When reading the code, I found that 
> HoodieBootstrapSchemaProvider#getBootstrapSchema uses the deprecated method 
> Schema.parse, which can be replaced by Schema.Parser().parse(),
> At the same time, I searched at the moudle level, only to find that this 
> place uses an deprecated method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5035) Remove deprecated API usage in SparkPreCommitValidator#validate

2023-02-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-5035:


Assignee: Shilun Fan

> Remove deprecated API usage in SparkPreCommitValidator#validate
> ---
>
> Key: HUDI-5035
> URL: https://issues.apache.org/jira/browse/HUDI-5035
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
> Attachments: image-2022-10-15-07-23-43-689.png
>
>
> I found that the code uses the deprecated API, modify the code to use the 
> recommended API
>  
> !image-2022-10-15-07-23-43-689.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5124) Fix HoodieInternalRowFileWriter#canWrite error return tag

2023-02-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-5124:


Assignee: Shilun Fan

> Fix HoodieInternalRowFileWriter#canWrite error return tag
> -
>
> Key: HUDI-5124
> URL: https://issues.apache.org/jira/browse/HUDI-5124
> Project: Apache Hudi
>  Issue Type: Bug
>  Components: cli
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5154) Improve hudi-spark-client Lambada writing

2023-02-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-5154:


Assignee: Shilun Fan

> Improve hudi-spark-client Lambada writing
> -
>
> Key: HUDI-5154
> URL: https://issues.apache.org/jira/browse/HUDI-5154
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When reading the code, I found that the hudi-spark-client module can improve 
> the writing of Lambada expressions and make the code cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5072) Extract transform duplicate code

2023-02-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-5072:


Assignee: Shilun Fan

> Extract transform duplicate code
> 
>
> Key: HUDI-5072
> URL: https://issues.apache.org/jira/browse/HUDI-5072
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When reading the code, I found that the transform methods of 
> MultipleSparkJobExecutionStrategy and SingleSparkJobExecutionStrategy have 
> redundant code. I think we can extract them to make the code cleaner.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5027) Replace hardcoded hbase config keys with HbaseConstants

2023-02-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-5027:


Assignee: Shilun Fan

> Replace hardcoded hbase config keys with HbaseConstants 
> 
>
> Key: HUDI-5027
> URL: https://issues.apache.org/jira/browse/HUDI-5027
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: code-quality
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> When I read the code, I found that SparkHoodieHBaseIndex uses a lot of 
> hardcoded variables, it would be better to replace with Hbase's Constants.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-4997) use jackson-v2 replace jackson-v1 import

2023-02-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-4997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-4997:


Assignee: Shilun Fan

> use jackson-v2 replace jackson-v1 import
> 
>
> Key: HUDI-4997
> URL: https://issues.apache.org/jira/browse/HUDI-4997
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.12.2
>
>
> HoodieWriteCommitCallbackUtil uses ObjectMapper, but uses jackson-v1 import, 
> jackson-v1 has security risks, replace import with jackson-v2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5002) Remove deprecated API usage in SparkHoodieHBaseIndex#generateStatement

2023-02-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-5002:


Assignee: Shilun Fan

> Remove deprecated API usage in SparkHoodieHBaseIndex#generateStatement 
> ---
>
> Key: HUDI-5002
> URL: https://issues.apache.org/jira/browse/HUDI-5002
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: index
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.12.2
>
> Attachments: image-2022-10-10-21-31-59-535.png
>
>
> When I read the code, I found that SparkHoodieHBaseIndex#generateStatement 
> uses Hbase's deprecated method(setMaxVersion), I replaced it with new method.
>  
> {code:java}
> private Get generateStatement(String key) throws IOException {
>   return new 
> Get(Bytes.toBytes(getHBaseKey(key))).setMaxVersions(1).addColumn(SYSTEM_COLUMN_FAMILY,
>  COMMIT_TS_COLUMN)
>   .addColumn(SYSTEM_COLUMN_FAMILY, 
> FILE_NAME_COLUMN).addColumn(SYSTEM_COLUMN_FAMILY, PARTITION_PATH_COLUMN);
> } {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (HUDI-5033) Fix Broken Link In MultipleSparkJobExecutionStrategy

2023-02-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved HUDI-5033.
--

> Fix Broken Link In MultipleSparkJobExecutionStrategy
> 
>
> Key: HUDI-5033
> URL: https://issues.apache.org/jira/browse/HUDI-5033
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-10-15-07-09-08-084.png
>
>
> When I read the code, I found that there is a link that cannot be linked to 
> the code. I will fix it. I have completed the inspection of the entire module 
> (hudi-spark-client), only this is the problem
> !image-2022-10-15-07-09-08-084.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5033) Fix Broken Link In MultipleSparkJobExecutionStrategy

2023-02-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-5033:


Assignee: Shilun Fan

> Fix Broken Link In MultipleSparkJobExecutionStrategy
> 
>
> Key: HUDI-5033
> URL: https://issues.apache.org/jira/browse/HUDI-5033
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-10-15-07-09-08-084.png
>
>
> When I read the code, I found that there is a link that cannot be linked to 
> the code. I will fix it. I have completed the inspection of the entire module 
> (hudi-spark-client), only this is the problem
> !image-2022-10-15-07-09-08-084.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (HUDI-5664) Improve SqlQueryPreCommitValidator#queries Parallelism

2023-02-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan reassigned HUDI-5664:


Assignee: Shilun Fan

> Improve SqlQueryPreCommitValidator#queries Parallelism
> --
>
> Key: HUDI-5664
> URL: https://issues.apache.org/jira/browse/HUDI-5664
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.13.1, 0.14.0
>
>
> I found that SqlQueryPreCommitValidator#validateRecordsBeforeAndAfter has a 
> todo
> // TODO run this in a thread pool to improve parallelism
> I think we can improve it using List's parallelStream



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5845) Remove usage of deprecated getTableAvroSchemaWithoutMetadataFields

2023-02-23 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-5845:
-
Status: In Progress  (was: Open)

> Remove usage of deprecated getTableAvroSchemaWithoutMetadataFields
> --
>
> Key: HUDI-5845
> URL: https://issues.apache.org/jira/browse/HUDI-5845
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Shilun Fan
>Priority: Major
>
> Remove usage of deprecated getTableAvroSchemaWithoutMetadataFields



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5845) Remove usage of deprecated getTableAvroSchemaWithoutMetadataFields

2023-02-23 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-5845:


 Summary: Remove usage of deprecated 
getTableAvroSchemaWithoutMetadataFields
 Key: HUDI-5845
 URL: https://issues.apache.org/jira/browse/HUDI-5845
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Shilun Fan


Remove usage of deprecated getTableAvroSchemaWithoutMetadataFields



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5664) Improve SqlQueryPreCommitValidator#queries Parallelism

2023-01-31 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-5664:


 Summary: Improve SqlQueryPreCommitValidator#queries Parallelism
 Key: HUDI-5664
 URL: https://issues.apache.org/jira/browse/HUDI-5664
 Project: Apache Hudi
  Issue Type: Improvement
  Components: cli
Reporter: Shilun Fan


I found that SqlQueryPreCommitValidator#validateRecordsBeforeAndAfter has a todo

// TODO run this in a thread pool to improve parallelism

I think we can improve it using List's parallelStream



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HUDI-5398) Fix Typo in hudi-integ-test#README.md

2022-12-15 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/HUDI-5398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17648306#comment-17648306
 ] 

Shilun Fan commented on HUDI-5398:
--

Can any partner give me a contributor permission so that I can assign jira, 
thank you very much!

> Fix Typo in hudi-integ-test#README.md
> -
>
> Key: HUDI-5398
> URL: https://issues.apache.org/jira/browse/HUDI-5398
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Shilun Fan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.13.0
>
>
> When reading the README.md of hudi-integ-test, I found some Typo, after 
> reading the document, fix it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5398) Fix Typo in hudi-integ-test#README.md

2022-12-15 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-5398:
-
Status: In Progress  (was: Open)

> Fix Typo in hudi-integ-test#README.md
> -
>
> Key: HUDI-5398
> URL: https://issues.apache.org/jira/browse/HUDI-5398
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Shilun Fan
>Priority: Minor
> Fix For: 0.13.0
>
>
> When reading the README.md of hudi-integ-test, I found some Typo, after 
> reading the document, fix it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5398) Fix Typo in hudi-integ-test#README.md

2022-12-15 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-5398:


 Summary: Fix Typo in hudi-integ-test#README.md
 Key: HUDI-5398
 URL: https://issues.apache.org/jira/browse/HUDI-5398
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Shilun Fan
 Fix For: 0.13.0


When reading the README.md of hudi-integ-test, I found some Typo, after reading 
the document, fix it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5389) Remove Hudi Cli Duplicates Code

2022-12-14 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-5389:


 Summary: Remove Hudi Cli Duplicates Code
 Key: HUDI-5389
 URL: https://issues.apache.org/jira/browse/HUDI-5389
 Project: Apache Hudi
  Issue Type: Improvement
Reporter: Shilun Fan


In the process of reading the code, I found some duplicate code, I think this 
part of the duplicate code can be removed directly.
||cli||hudi-spark||
|org.apache.hudi.cli.DedupeSparkJob|org.apache.spark.sql.hudi.DedupeSparkJob|
|org.apache.hudi.cli.DeDupeType|org.apache.spark.sql.hudi.DeDupeType|
|org.apache.hudi.cli.SparkHelpers|org.apache.spark.sql.hudi.SparkHelpers|

The code on the left side of the table can be directly replaced by the code on 
the right side of the table, because their contents are exactly the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5389) Remove Hudi Cli Duplicates Code

2022-12-14 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-5389:
-
Status: In Progress  (was: Open)

> Remove Hudi Cli Duplicates Code
> ---
>
> Key: HUDI-5389
> URL: https://issues.apache.org/jira/browse/HUDI-5389
> Project: Apache Hudi
>  Issue Type: Improvement
>Reporter: Shilun Fan
>Priority: Major
>
> In the process of reading the code, I found some duplicate code, I think this 
> part of the duplicate code can be removed directly.
> ||cli||hudi-spark||
> |org.apache.hudi.cli.DedupeSparkJob|org.apache.spark.sql.hudi.DedupeSparkJob|
> |org.apache.hudi.cli.DeDupeType|org.apache.spark.sql.hudi.DeDupeType|
> |org.apache.hudi.cli.SparkHelpers|org.apache.spark.sql.hudi.SparkHelpers|
> The code on the left side of the table can be directly replaced by the code 
> on the right side of the table, because their contents are exactly the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HUDI-5283) Replace deprecated method Schema.parse With Schema.Parser

2022-11-27 Thread Shilun Fan (Jira)
Shilun Fan created HUDI-5283:


 Summary: Replace deprecated method Schema.parse With Schema.Parser
 Key: HUDI-5283
 URL: https://issues.apache.org/jira/browse/HUDI-5283
 Project: Apache Hudi
  Issue Type: Improvement
  Components: cli
Reporter: Shilun Fan


When reading the code, I found that 
HoodieBootstrapSchemaProvider#getBootstrapSchema uses the deprecated method 
Schema.parse, which can be replaced by Schema.Parser().parse(),
At the same time, I searched at the moudle level, only to find that this place 
uses an deprecated method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HUDI-5283) Replace deprecated method Schema.parse With Schema.Parser

2022-11-27 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HUDI-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HUDI-5283:
-
Status: In Progress  (was: Open)

> Replace deprecated method Schema.parse With Schema.Parser
> -
>
> Key: HUDI-5283
> URL: https://issues.apache.org/jira/browse/HUDI-5283
> Project: Apache Hudi
>  Issue Type: Improvement
>  Components: cli
>Reporter: Shilun Fan
>Priority: Major
>
> When reading the code, I found that 
> HoodieBootstrapSchemaProvider#getBootstrapSchema uses the deprecated method 
> Schema.parse, which can be replaced by Schema.Parser().parse(),
> At the same time, I searched at the moudle level, only to find that this 
> place uses an deprecated method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)