other bottlenecks?

Chris Seline Tue, 13 Oct 2009 09:06:28 -0700

I am using the 0.3 Cloudera scripts to start a Hadoop cluster on EC2 of11 c1.xlarge instances (1 master, 10 slaves), that is the biggestinstance available with 20 compute units and 4x 400gb disks.

I wrote some scripts to test many (100's) of configurations running aparticular Hive query to try to make it as fast as possible, but nomatter what I don't seem to be able to get above roughly 45% cpuutilization on the slaves, and not more than about 1.5% wait state. Ihave also measured network traffic and there don't seem to bebottlenecks there at all.

Here are some typical CPU utilization lines from top on a slave whenrunning a query:Cpu(s): 33.9%us, 7.4%sy, 0.0%ni, 56.8%id, 0.6%wa, 0.0%hi, 0.5%si,0.7%stCpu(s): 33.6%us, 5.9%sy, 0.0%ni, 58.7%id, 0.9%wa, 0.0%hi, 0.4%si,0.5%stCpu(s): 33.9%us, 7.2%sy, 0.0%ni, 56.8%id, 0.5%wa, 0.0%hi, 0.6%si,1.0%stCpu(s): 38.6%us, 8.7%sy, 0.0%ni, 50.8%id, 0.5%wa, 0.0%hi, 0.7%si,0.7%stCpu(s): 36.8%us, 7.4%sy, 0.0%ni, 53.6%id, 0.4%wa, 0.0%hi, 0.5%si,1.3%st

It seems like if tuned properly, I should be able to max out my cpu (ormy disk) and get roughly twice the performance I am seeing now. None ofthe parameters I am tuning seem to be able to achieve this. Adjustingmapred.map.tasks and mapred.reduce.tasks does help somewhat, and settingthe io.file.buffer.size to 4096 does better than the default, but therest of the values I am testing seem to have little positive effect.


These are the parameters I am testing, and the values tried:

io.sort.factor=2,3,4,5,10,15,20,25,30,50,100
mapred.job.shuffle.merge.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
io.bytes.per.checksum=256,512,1024,2048,4192
mapred.output.compress=true,false
hive.exec.compress.intermediate=true,false
hive.map.aggr.hash.min.reduction=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
mapred.map.tasks=1,2,3,4,5,6,8,10,12,15,20,25,30,40,50,60,75,100,150,200
mapred.child.java.opts=-Xmx400m,-Xmx500m,-Xmx600m,-Xmx700m,-Xmx800m,-Xmx900m,-Xmx1000m,-Xmx1200m,-Xmx1400m,-Xmx1600m,-Xmx2000m
mapred.reduce.tasks=5,10,15,20,25,30,35,40,50,60,70,80,100,125,150,200
mapred.merge.recordsBeforeProgress=5000,10000,20000,30000
mapred.job.shuffle.input.buffer.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99
io.sort.spill.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.99
mapred.job.tracker.handler.count=3,4,5,7,10,15,25
hive.merge.size.per.task=64000000,128000000,168000000,256000000,300000000,400000000
hive.optimize.ppd=true,false
hive.merge.mapredfiles=false,true
io.sort.record.percent=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
hive.map.aggr.hash.percentmemory=0.10,0.20,0.30,0.40,0.50,0.60,0.70,0.80,0.90,0.93,0.95,0.97,0.98,0.99
mapred.tasktracker.reduce.tasks.maximum=1,2,3,4,5,6,8,10,12,15,20,30
mapred.reduce.parallel.copies=1,2,4,6,8,10,13,16,20,25,30,50
io.seqfile.lazydecompress=true,false
io.sort.mb=20,50,75,100,150,200,250,350,500
mapred.compress.map.output=true,false
io.file.buffer.size=1024,2048,4096,8192,16384,32768,65536,131072,262144
hive.exec.reducers.bytes.per.reducer=1000000000
dfs.datanode.handler.count=1,2,3,4,5,6,8,10,15
mapred.tasktracker.map.tasks.maximum=5,8,12,20

Anyone have any thoughts for other parameters I might try? Am I goingabout this the wrong way? Am I missing some other bottleneck?


thanks

Chris Seline

Optimization of cpu and i/o usage / other bottlenecks?

Reply via email to