[
https://issues.apache.org/jira/browse/HBASE-29013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Duo Zhang resolved HBASE-29013.
-------------------------------
Fix Version/s: 2.7.0
3.0.0-beta-2
2.5.11
2.6.2
Hadoop Flags: Reviewed
Resolution: Fixed
Pushed to all active branches.
Thanks [~junegunn] for contributing!
> Make PerformanceEvaluation support larger data sets
> ---------------------------------------------------
>
> Key: HBASE-29013
> URL: https://issues.apache.org/jira/browse/HBASE-29013
> Project: HBase
> Issue Type: Improvement
> Components: PE
> Reporter: Junegunn Choi
> Assignee: Junegunn Choi
> Priority: Minor
> Labels: pull-request-available
> Fix For: 2.7.0, 3.0.0-beta-2, 2.5.11, 2.6.2
>
>
> The use of 4-byte integers in PerformanceEvaluation can be limiting when you
> want to test with larger data sets. Suppose you want to generate 10TB of data
> with the default value size of 1KB, you would need 10G rows.
> {code:java}
> bin/hbase pe --nomapred --presplit=21 --compress=LZ4 --rows=10737418240
> randomWrite 1
> {code}
> But you can't do it because {{--rows}} expect a number that can be
> represented with 4 bytes.
> {noformat}
> java.lang.NumberFormatException: For input string: "10737418240"
> {noformat}
> We can instead increase the value size and decrease the number of the rows to
> circumvent the limitation, but I don't see a good reason to have the
> limitation in the first place.
> And even if we use a smaller value for {{{}--row{}}}, we can accidentally
> cause integer overflow as we increase the number of clients.
> {code:java}
> bin/hbase pe --nomapred --compress=LZ4 --rows=1073741824 randomWrite 20
> {code}
> {noformat}
> 2024-12-03T12:21:10,333 INFO [main {}] hbase.PerformanceEvaluation: Created
> 20 connections for 20 threads
> 2024-12-03T12:21:10,337 INFO [TestClient-5 {}] hbase.PerformanceEvaluation:
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at
> offset 1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO [TestClient-1 {}] hbase.PerformanceEvaluation:
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at
> offset 1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO [TestClient-3 {}] hbase.PerformanceEvaluation:
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at
> offset -1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO [TestClient-4 {}] hbase.PerformanceEvaluation:
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at
> offset 0 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO [TestClient-7 {}] hbase.PerformanceEvaluation:
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at
> offset -1073741824 for 1073741824 rows
> 2024-12-03T12:21:10,337 INFO [TestClient-8 {}] hbase.PerformanceEvaluation:
> Start class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest at
> offset 0 for 1073741824 rows
> ...
> 2024-12-03T12:21:10,338 INFO [TestClient-17 {}] hbase.PerformanceEvaluation:
> Sampling 1 every 0 out of 1073741824 total rows.
> 2024-12-03T12:21:10,338 INFO [TestClient-16 {}] hbase.PerformanceEvaluation:
> Sampling 1 every 0 out of 1073741824 total rows.
> 2024-12-03T12:21:10,338 INFO [TestClient-6 {}] hbase.PerformanceEvaluation:
> Sampling 1 every 0 out of 1073741824 total rows.
> 2024-12-03T12:21:10,338 INFO [TestClient-4 {}] hbase.PerformanceEvaluation:
> Sampling 1 every 0 out of 1073741824 total rows.
> ...
> java.io.IOException: java.lang.ArithmeticException: / by zero
> at
> org.apache.hadoop.hbase.PerformanceEvaluation.doLocalClients(PerformanceEvaluation.java:540)
> at
> org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:2674)
> at
> org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3216)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
> at
> org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3250)
> {noformat}
> So I think it's best that we just use 8-byte long integers throughout the
> code.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)