GitHub user xushiyan edited a comment on the discussion: Utilities using 
jcommander for boolean argument can be misleading

Yes @vinothchandar , we only concerned about default `true`. In addition to 
@wombatu-kun 's list:

```
HoodieClusteringJob.java
    public Boolean skipClean = true;
HoodieCompactionAdminTool.java
    public boolean printOutput = true;
HoodieCompactor.java
    public Boolean skipClean = true;
HoodieDropPartitionsTool.java
    public boolean hiveUseJdbc = true;
HoodieMetadataTableValidator.java
   public boolean validateRecordIndexCount = true;
   public boolean validateRecordIndexContent = true;
```

there are a few sync tool related options that are default `true`:

```
--use-file-listing-from-metadata
--sync-incremental
--use-jdbc
--auto-create-database
--spark-datasource
```

## Approach 1: keep behaviors compatible

Keep everything as is, tweak the option parsing to accept `--a-boolean-flag 
true|false` as additional format.

## Approach 2: breaking changes

This approach is about rename, drop, or change default for applicable options, 
reducing number of options.

For 
```
HoodieClusteringJob.java
    public Boolean skipClean = true;
HoodieCompactor.java
    public Boolean skipClean = true;
```
Rename `skipClean` to `doClean` and set default `false`

For

```
HoodieCompactionAdminTool.java
    public boolean printOutput = true;
```
Drop the option, always print output.

For

```
HoodieDropPartitionsTool.java
    public boolean hiveUseJdbc = true;

--use-jdbc (sync tool)
```

drop the option, using HMS sync mode as default


For
```
HoodieMetadataTableValidator.java
   public boolean validateRecordIndexCount = true;
   public boolean validateRecordIndexContent = true;
```

remove the all options like `validateXXX`, use a new option `--validation-list 
X,Y,Z` to pass comma-delimited options to configure what to validate


For 

```
--auto-create-database
--use-file-listing-from-metadata
--sync-incremental
--spark-datasource
```

drop these; always perform these behaviors:
- sync tool always tries to create database if not exist as per the database 
name config
- If metadata is not available, then file listing auto fallback to fs listing. 
- Add an option to allow full sync `--full-sync` and default `false`.
- always sync as spark datasource table to HMS; add an option 
`--sync-as-hive-table` and default `false`


All these are breaking changes that should be flagged in a major release that 
has these. 

If approach 1 is doable, then anyway can go for it. For those changes listed in 
the approach 2, we can select the changes that make more sense and improve 
usability, then implement and flag them in the release. Please share your 
thoughts.



GitHub link: 
https://github.com/apache/hudi/discussions/13845#discussioncomment-15154434

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to