Junegunn Choi created HBASE-29365:
-------------------------------------
Summary: Improve option parser of PerformanceEvaluation
Key: HBASE-29365
URL: https://issues.apache.org/jira/browse/HBASE-29365
Project: HBase
Issue Type: Bug
Components: PE
Reporter: Junegunn Choi
h2. Summary
{{PerformanceEvaluation.parseOpts}} spans over 300 lines long with significant
code duplication, and has several usability issues. It needs to be refactored
and improved.
*
[https://github.com/apache/hbase/blob/787f5459883211b6017cee6a9bcb109f3cb66ddb/hbase-diagnostics/src/main/java/org/apache/hadoop/hbase/PerformanceEvaluation.java#L2845-L3158]
h2. Problems
h3. Code duplication
The same code pattern is repeated for each option.
{code:java}
final String rows = "--rows=";
if (cmd.startsWith(rows)) {
opts.perClientRunRows = Long.parseLong(cmd.substring(rows.length()));
continue;
}
{code}
* Define the option prefix as a constant
* Check if the argument starts with the prefix
* Use {{substring}} to extract the value
* Move on to the next argument
We can easily refactor this pattern and reduce the code duplication.
h3. Usability issues
Malformed arguments are not handled gracefully, often resulting in confusing
error messages. Here are some examples.
h4. Confusing error messages
The {{--rows}} option expects a value, but if omitted, it prints a
{{ClassNotFoundException}} exception, followed by an error message that doesn't
clearly indicate the problem.
{noformat}
$ bin/hbase pe --rows
INFO [main {}] hbase.PerformanceEvaluation: No class found for command: --rows
java.lang.ClassNotFoundException: --rows
at
jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641)
~[?:?]
at
jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
~[?:?]
at java.lang.ClassLoader.loadClass(ClassLoader.java:525) ~[?:?]
at
org.apache.hadoop.hbase.PerformanceEvaluation.isCustomTestClass(PerformanceEvaluation.java:3258)
~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
at
org.apache.hadoop.hbase.PerformanceEvaluation.isCommandClass(PerformanceEvaluation.java:3251)
~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
at
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:3139)
~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
at
org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3217)
~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
~[hadoop-common-3.4.1.jar:?]
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
~[hadoop-common-3.4.1.jar:?]
at
org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3273)
~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
ERROR: Unrecognized option/command: --rows
Usage: hbase pe <OPTIONS> [-D<property=value>]* <command|class> <nclients>
...
{noformat}
The same holds true for other options that expect a value.
—
The error message for invalid values can also be improved.
{noformat}
$ bin/hbase pe --rows= --table= --compress=
java.lang.NumberFormatException: For input string: ""
at
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
at java.base/java.lang.Long.parseLong(Long.java:721)
at java.base/java.lang.Long.parseLong(Long.java:836)
at
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2864)
at
org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3217)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
at
org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3273)
{noformat}
It doesn't show which option caused the error, so you would have to guess it or
follow the stack trace to find out.
h4. Inconsistent handling of boolean options
PE supports many boolean options, but their handling is inconsistent. Some
options require {{--flag=true/false}} syntax, while others just accept
{{{}--flag{}}}.
* In this case, this fails because {{autoFlush}} requires a value, while
{{valueRandom}} doesn't.
{code:java}
$ bin/hbase pe --valueRandom --autoFlush sequentialWrite 1
...
ERROR: Unrecognized option/command: --autoFlush
{code}
* And in this case, {{valueRandom}} is set to true, because {{=false}} after
{{--valueRandom}} is simply ignored.
{code:java}
$ bin/hbase pe --valueRandom=false --autoFlush=true sequentialWrite 1
SequentialWriteTest test run options={ ... ,"autoFlush":true, ...
"valueRandom":true, ... }
{code}
* Another problem is that they silently ignore illegal values and just treat
them as {{{}false{}}}. This can lead to confusion, and we should be more strict
about it.
{code:java}
$ bin/hbase pe --autoFlush=yes sequentialWrite 1
SequentialWriteTest test run options={ ..., "autoFlush":false, ... }
{code}
h2. Suggestion
Refactor the code, make the behavior more consistent, and improve the error
messages.
h2. Result
* Shaved off over 180 lines of code
* Fixed hidden bugs
* Better error messages
h3. Examples
{code:java}
$ bin/hbase pe --rows
java.lang.IllegalArgumentException: Invalid option: --rows
at
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2943)
at
org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3033)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
at
org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3089)
Caused by: java.lang.IllegalArgumentException: --rows requires a value
at
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2937)
... 4 more
{code}
{code:java}
$ bin/hbase pe --rows= --table= --compress=
java.lang.IllegalArgumentException: Invalid option: --rows=
at
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2943)
at
org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3033)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
at
org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3089)
Caused by: java.lang.NumberFormatException: For input string: ""
at
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
at java.base/java.lang.Long.parseLong(Long.java:721)
at java.base/java.lang.Long.parseLong(Long.java:836)
at
org.apache.hadoop.hbase.PerformanceEvaluation.lambda$parseOpts$14(PerformanceEvaluation.java:2866)
at
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2939)
... 4 more
{code}
{code:java}
$ bin/hbase pe sequentialWrite 1
SequentialWriteTest test run options={ ..., "autoFlush":false, ...
"valueRandom":false, ... }
{code}
{code:java}
$ bin/hbase pe --autoFlush --valueRandom sequentialWrite 1
SequentialWriteTest test run options={ ..., "autoFlush":true, ...
"valueRandom":true, ...}
...
{code}
{code:java}
$ bin/hbase pe --autoFlush=true --valueRandom=false sequentialWrite 1
SequentialWriteTest test run options={ ..., "autoFlush":true, ...
"valueRandom":false, ... }
...
{code}
{code:java}
$ bin/hbase pe --autoFlush=yes sequentialWrite 1
java.lang.IllegalArgumentException: Invalid option: --autoFlush=yes
at
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2943)
at
org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3033)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
at
org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3089)
Caused by: java.lang.IllegalArgumentException: Invalid boolean value: yes
at
org.apache.hadoop.hbase.PerformanceEvaluation.parseBoolean(PerformanceEvaluation.java:2973)
at
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2929)
... 4 more
{code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)