GitHub user xushiyan edited a comment on the discussion: Utilities using 
jcommander for boolean argument can be misleading

Yes @vinothchandar , we only concerned about default `true`. In addition to 
@wombatu-kun 's list:

```
HoodieClusteringJob.java
    public Boolean skipClean = true;
HoodieCompactionAdminTool.java
    public boolean printOutput = true;
HoodieCompactor.java
    public Boolean skipClean = true;
HoodieDropPartitionsTool.java
    public boolean hiveUseJdbc = true;
HoodieMetadataTableValidator.java
   public boolean validateRecordIndexCount = true;
   public boolean validateRecordIndexContent = true;
```

there are a few sync tool related options that are default `true`:

```
--use-file-listing-from-metadata
--sync-incremental
--use-jdbc
--auto-create-database
--spark-datasource
```

## Approach 1: keep behaviors compatible

Keep everything as is, tweak the option parsing to accept `--a-boolean-flag 
true|false` as additional format.

## Approach 2: breaking changes

This approach is about rename, drop, or change default for applicable options, 
reducing number of options.

For 
```
HoodieClusteringJob.java
    public Boolean skipClean = true;
HoodieCompactor.java
    public Boolean skipClean = true;
```
Rename `skipClean` to `doClean` and set default `false`

For

```
HoodieCompactionAdminTool.java
    public boolean printOutput = true;
```
Drop the option, always print output.

For

```
HoodieDropPartitionsTool.java
    public boolean hiveUseJdbc = true;

--use-jdbc (sync tool)
```

drop the option, using HMS sync mode as default


For
```
HoodieMetadataTableValidator.java
   public boolean validateRecordIndexCount = true;
   public boolean validateRecordIndexContent = true;
```

remove the all options like `validateXXX`, use a new option `--validation-list 
X,Y,Z` to pass comma-delimited options to configure what to validate


For 

```
--auto-create-database
--use-file-listing-from-metadata
--sync-incremental
--spark-datasource
```

drop these; always perform these behaviors:
- sync tool always tries to create database if not exist as per the database 
name config
- If metadata is not available, then file listing auto fallback to fs listing. 
- Add an option to allow full sync `--full-sync` and default `false`.
- always sync as spark datasource table to HMS; add an option 
`--sync-as-hive-table` and default `false`


All these are breaking changes that should be flagged in a major release that 
has these. Please share your thoughts.



GitHub link: 
https://github.com/apache/hudi/discussions/13845#discussioncomment-15154434

----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]

Reply via email to