[ 
https://issues.apache.org/jira/browse/HADOOP-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HADOOP-2488:
--------------------------

    Description: 
Opening a mapfile on hdfs and then doing 100k random reads inside the open file 
takes 3 times longer to complete in TRUNK than same test run on 0.15.x.

Random read performance is important to hbase.

Serial reads and writes perform about the same in TRUNK and 0.15.x.

Below are 3 runs done against 0.15 of the mapfile test that can be found in 
hbase followed by 2 runs done against TRUNK:

0.15 branch
{code}
[EMAIL PROTECTED] hbase]$ for i in 1 2 3; do ./bin/hbase 
org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 
23644ms (Note: generation of keys and values is done inline and has been seen 
to consume significant time: e.g. ~30% of cpu time
07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
....
07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random 
records took 129879ms (Note: generation of random key is done in line and takes 
a significant amount of cpu time: e.g 10-15%
07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
sequentially
07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records 
serially took 1832ms
07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 
24188ms (Note: generation of keys and values is done inline and has been seen 
to consume significant time: e.g. ~30% of cpu time
07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random 
records took 127879ms (Note: generation of random key is done in line and takes 
a significant amount of cpu time: e.g 10-15%
07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
sequentially
07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records 
serially took 1787ms
07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 
23832ms (Note: generation of keys and values is done inline and has been seen 
to consume significant time: e.g. ~30% of cpu time
07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
..
07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random 
records took 126954ms (Note: generation of random key is done in line and takes 
a significant amount of cpu time: e.g 10-15%
07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
sequentially
07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records 
serially took 1766ms
07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 
24181ms (Note: generation of keys and values is done inline and has been seen 
to consume significant time: e.g. ~30% of cpu time
07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random 
records took 129564ms (Note: generation of random key is done in line and takes 
a significant amount of cpu time: e.g 10-15%
07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
sequentially
07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records 
serially took 1793ms 
{code}

TRUNK
{code}
[EMAIL PROTECTED] hbase]$ ./bin/hbase 
org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
hdfs://XXXX3/user/?/performanceevaluation.mapfile
07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 
25500ms (Note: generation of keys and values is done inline and has been seen 
to consume significant time: e.g. ~30% of cpu time
07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
....
07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random 
records took 500306ms (Note: generation of random key is done in line and takes 
a significant amount of cpu time: e.g 10-15%
07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
sequentially
07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records 
serially took 1940ms

[EMAIL PROTECTED] hbase]$ ./bin/hbase 
org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
hdfs://XXXX/user/?/performanceevaluation.mapfile
07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 
25593ms (Note: generation of keys and values is done inline and has been seen 
to consume significant time: e.g. ~30% of cpu time
07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random 
records took 543992ms (Note: generation of random key is done in line and takes 
a significant amount of cpu time: e.g 10-15%
07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
sequentially
07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records 
serially took 1786ms
{code}

Above was done on a small hdfs cluster of 4 machines with each a dfs node.


  was:
Opening a mapfile on hdfs and then doing 100k random reads inside the open file 
takes 3 times longer to complete in TRUNK than same test run on 0.15.x.

Random read performance is important to hbase.

Serial reads and writes perform about the same in TRUNK and 0.15.x.

Below are 3 runs done against 0.15 of the mapfile test that can be found in 
hbase followed by 2 runs done against TRUNK:

0.15 branch
{code}
[EMAIL PROTECTED] hbase]$ for i in 1 2 3; do ./bin/hbase 
org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records took 
23644ms (Note: generation of keys and values is done inline and has been seen 
to consume significant time: e.g. ~30% of cpu time
07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
....
07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random 
records took 129879ms (Note: generation of random key is done in line and takes 
a significant amount of cpu time: e.g 10-15%
07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
sequentially
07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records 
serially took 1832ms
07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records took 
24188ms (Note: generation of keys and values is done inline and has been seen 
to consume significant time: e.g. ~30% of cpu time
07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random 
records took 127879ms (Note: generation of random key is done in line and takes 
a significant amount of cpu time: e.g 10-15%
07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
sequentially
07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records 
serially took 1787ms
07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records took 
23832ms (Note: generation of keys and values is done inline and has been seen 
to consume significant time: e.g. ~30% of cpu time
07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
..
07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random 
records took 126954ms (Note: generation of random key is done in line and takes 
a significant amount of cpu time: e.g 10-15%
07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
sequentially
07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records 
serially took 1766ms
07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records took 
24181ms (Note: generation of keys and values is done inline and has been seen 
to consume significant time: e.g. ~30% of cpu time
07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random 
records took 129564ms (Note: generation of random key is done in line and takes 
a significant amount of cpu time: e.g 10-15%
07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
sequentially
07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records 
serially took 1793ms 
{code}

TRUNK
{code]
[EMAIL PROTECTED] hbase]$ ./bin/hbase 
org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
hdfs://XXXX3/user/?/performanceevaluation.mapfile
07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records took 
25500ms (Note: generation of keys and values is done inline and has been seen 
to consume significant time: e.g. ~30% of cpu time
07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
....
07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random 
records took 500306ms (Note: generation of random key is done in line and takes 
a significant amount of cpu time: e.g 10-15%
07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
sequentially
07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records 
serially took 1940ms

[EMAIL PROTECTED] hbase]$ ./bin/hbase 
org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
hdfs://XXXX/user/?/performanceevaluation.mapfile
07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records took 
25593ms (Note: generation of keys and values is done inline and has been seen 
to consume significant time: e.g. ~30% of cpu time
07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
...
07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random 
records took 543992ms (Note: generation of random key is done in line and takes 
a significant amount of cpu time: e.g 10-15%
07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
sequentially
07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records 
serially took 1786ms
{code}

Above was done on a small hdfs cluster of 4 machines with each a dfs node.



> Random reads in mapfile are 3times slower in TRUNK than they are in 0.15.x
> --------------------------------------------------------------------------
>
>                 Key: HADOOP-2488
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2488
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: stack
>
> Opening a mapfile on hdfs and then doing 100k random reads inside the open 
> file takes 3 times longer to complete in TRUNK than same test run on 0.15.x.
> Random read performance is important to hbase.
> Serial reads and writes perform about the same in TRUNK and 0.15.x.
> Below are 3 runs done against 0.15 of the mapfile test that can be found in 
> hbase followed by 2 runs done against TRUNK:
> 0.15 branch
> {code}
> [EMAIL PROTECTED] hbase]$ for i in 1 2 3; do ./bin/hbase 
> org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1; done
> 07/12/24 18:34:50 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 07/12/24 18:34:50 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
> hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Writing 100000 records 
> took 23644ms (Note: generation of keys and values is done inline and has been 
> seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:35:13 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 random 
> records took 129879ms (Note: generation of random key is done in line and 
> takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:37:23 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
> sequentially
> 07/12/24 18:37:25 INFO hbase.PerformanceEvaluation: Reading 100000 records 
> serially took 1832ms
> 07/12/24 18:37:26 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 07/12/24 18:37:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
> hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Writing 100000 records 
> took 24188ms (Note: generation of keys and values is done inline and has been 
> seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:37:50 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 random 
> records took 127879ms (Note: generation of random key is done in line and 
> takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:39:58 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
> sequentially
> 07/12/24 18:40:00 INFO hbase.PerformanceEvaluation: Reading 100000 records 
> serially took 1787ms
> 07/12/24 18:40:01 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 07/12/24 18:40:01 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
> hdfs://XXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Writing 100000 records 
> took 23832ms (Note: generation of keys and values is done inline and has been 
> seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:40:24 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ..
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 random 
> records took 126954ms (Note: generation of random key is done in line and 
> takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:42:31 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
> sequentially
> 07/12/24 18:42:33 INFO hbase.PerformanceEvaluation: Reading 100000 records 
> serially took 1766ms
> 07/12/24 17:24:25 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 07/12/24 17:24:25 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
> hdfs://XXXXX:9123/user/?/performanceevaluation.mapfile
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Writing 100000 records 
> took 24181ms (Note: generation of keys and values is done inline and has been 
> seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 17:24:49 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 random 
> records took 129564ms (Note: generation of random key is done in line and 
> takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 17:26:59 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
> sequentially
> 07/12/24 17:27:00 INFO hbase.PerformanceEvaluation: Reading 100000 records 
> serially took 1793ms 
> {code}
> TRUNK
> {code}
> [EMAIL PROTECTED] hbase]$ ./bin/hbase 
> org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:02:26 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 07/12/24 18:02:26 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
> hdfs://XXXX3/user/?/performanceevaluation.mapfile
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Writing 100000 records 
> took 25500ms (Note: generation of keys and values is done inline and has been 
> seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:02:51 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ....
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 random 
> records took 500306ms (Note: generation of random key is done in line and 
> takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:11:11 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
> sequentially
> 07/12/24 18:11:13 INFO hbase.PerformanceEvaluation: Reading 100000 records 
> serially took 1940ms
> [EMAIL PROTECTED] hbase]$ ./bin/hbase 
> org.apache.hadoop.hbase.PerformanceEvaluation mapfile 1
> 07/12/24 18:12:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 07/12/24 18:12:46 INFO hbase.PerformanceEvaluation: Writing 100000 rows to 
> hdfs://XXXX/user/?/performanceevaluation.mapfile
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Writing 100000 records 
> took 25593ms (Note: generation of keys and values is done inline and has been 
> seen to consume significant time: e.g. ~30% of cpu time
> 07/12/24 18:13:12 INFO hbase.PerformanceEvaluation: Reading 100000 random rows
> ...
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 random 
> records took 543992ms (Note: generation of random key is done in line and 
> takes a significant amount of cpu time: e.g 10-15%
> 07/12/24 18:22:16 INFO hbase.PerformanceEvaluation: Reading 100000 rows 
> sequentially
> 07/12/24 18:22:18 INFO hbase.PerformanceEvaluation: Reading 100000 records 
> serially took 1786ms
> {code}
> Above was done on a small hdfs cluster of 4 machines with each a dfs node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to