GitHub user xushiyan edited a comment on the discussion: Utilities using
jcommander for boolean argument can be misleading
Yes @vinothchandar , we only concerned about default `true`. In addition to
@wombatu-kun 's list:
```
HoodieClusteringJob.java
public Boolean skipClean = true;
HoodieCompactionAdminTool.java
public boolean printOutput = true;
HoodieCompactor.java
public Boolean skipClean = true;
HoodieDropPartitionsTool.java
public boolean hiveUseJdbc = true;
HoodieMetadataTableValidator.java
public boolean validateRecordIndexCount = true;
public boolean validateRecordIndexContent = true;
```
there are a few sync tool related options that are default `true`:
```
--use-file-listing-from-metadata
--sync-incremental
--use-jdbc
--auto-create-database
--spark-datasource
```
## Approach 1: keep behaviors compatible
Keep everything as is, tweak the option parsing to accept `--a-boolean-flag
true|false` as additional format.
## Approach 2: breaking changes
This approach is about rename, drop, or change default for applicable options,
reducing number of options.
For
```
HoodieClusteringJob.java
public Boolean skipClean = true;
HoodieCompactor.java
public Boolean skipClean = true;
```
Rename `skipClean` to `doClean` and set default `false`
For
```
HoodieCompactionAdminTool.java
public boolean printOutput = true;
```
Drop the option, always print output.
For
```
HoodieDropPartitionsTool.java
public boolean hiveUseJdbc = true;
--use-jdbc (sync tool)
```
drop the option, using HMS sync mode as default
For
```
HoodieMetadataTableValidator.java
public boolean validateRecordIndexCount = true;
public boolean validateRecordIndexContent = true;
```
remove the all options like `validateXXX`, use a new option `--validation-list
X,Y,Z` to pass comma-delimited options to configure what to validate
For
```
--auto-create-database
--use-file-listing-from-metadata
--sync-incremental
--spark-datasource
```
drop these; always perform these behaviors:
- sync tool always tries to create database if not exist as per the database
name config
- If metadata is not available, then file listing auto fallback to fs listing.
- Add an option to allow full sync `--full-sync` and default `false`.
- always sync as spark datasource table to HMS; add an option
`--sync-as-hive-table` and default `false`
All these are breaking changes that should be flagged in a major release that
has these.
If approach 1 is doable, then anyway can go for it. For those changes listed in
the approach 2, we can select the changes that make more sense and improve
usability, then implement and flag them in the release. Please share your
thoughts.
GitHub link:
https://github.com/apache/hudi/discussions/13845#discussioncomment-15154434
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]