Re: All keys went to single reducer in WordCount program

Tom White Fri, 08 May 2009 01:22:51 -0700

> mapred.reduce.tasks     1

You've only got one reduce task, as Jason correctly surmised. Try
setting it using


-D mapred.reduce.tasks=2

when you run your job, or by calling JobConf#setNumReduceTasks()

Tom

On Fri, May 8, 2009 at 7:46 AM, Foss User <foss...@gmail.com> wrote:
> On Thu, May 7, 2009 at 9:45 PM, jason hadoop <jason.had...@gmail.com> wrote:
>> If you have it available still, via the job tracker web interface, attach
>> the per job xml configuration
>
> Job Configuration: JobId - job_200905071619_0003
>
> name    value
> fs.s3n.impl     org.apache.hadoop.fs.s3native.NativeS3FileSystem
> mapred.task.cache.levels        2
> hadoop.tmp.dir  /tmp/hadoop-${user.name}
> hadoop.native.lib       true
> map.sort.class  org.apache.hadoop.util.QuickSort
> dfs.namenode.decommission.nodes.per.interval    5
> ipc.client.idlethreshold        4000
> mapred.system.dir       ${hadoop.tmp.dir}/mapred/system
> mapred.job.tracker.persist.jobstatus.hours      0
> dfs.namenode.logging.level      info
> dfs.datanode.address    0.0.0.0:50010
> io.skip.checksum.errors false
> fs.default.name hdfs://192.168.1.130:9000
> mapred.child.tmp        ./tmp
> dfs.safemode.threshold.pct      0.999f
> mapred.skip.reduce.max.skip.groups      0
> dfs.namenode.handler.count      10
> dfs.blockreport.initialDelay    0
> mapred.jobtracker.instrumentation       
> org.apache.hadoop.mapred.JobTrackerMetricsInst
> mapred.tasktracker.dns.nameserver       default
> io.sort.factor  10
> mapred.task.timeout     600000
> mapred.max.tracker.failures     4
> hadoop.rpc.socket.factory.class.default 
> org.apache.hadoop.net.StandardSocketFactory
> fs.hdfs.impl    org.apache.hadoop.hdfs.DistributedFileSystem
> mapred.queue.default.acl-administer-jobs        *
> mapred.queue.default.acl-submit-job     *
> mapred.output.key.class org.apache.hadoop.io.Text
> mapred.skip.map.auto.incr.proc.count    true
> dfs.safemode.extension  30000
> tasktracker.http.threads        40
> mapred.job.shuffle.merge.percent        0.66
> fs.ftp.impl     org.apache.hadoop.fs.ftp.FTPFileSystem
> user.name       fossist
> mapred.output.compress  false
> io.bytes.per.checksum   512
> topology.node.switch.mapping.impl       
> org.apache.hadoop.net.ScriptBasedMapping
> mapred.reduce.max.attempts      4
> fs.ramfs.impl   org.apache.hadoop.fs.InMemoryFileSystem
> mapred.skip.map.max.skip.records        0
> dfs.name.edits.dir      ${dfs.name.dir}
> hadoop.job.ugi  fossist,fossist,dialout,cdrom,floppy,audio,video,plugdev
> mapred.job.tracker.persist.jobstatus.dir        /jobtracker/jobsInfo
> mapred.jar      
> /tmp/hadoop-hadoop/mapred/local/jobTracker/job_200905071619_0003.jar
> fs.s3.buffer.dir        ${hadoop.tmp.dir}/s3
> dfs.block.size  67108864
> job.end.retry.attempts  0
> fs.file.impl    org.apache.hadoop.fs.LocalFileSystem
> mapred.local.dir.minspacestart  0
> mapred.output.compression.type  RECORD
> dfs.datanode.ipc.address        0.0.0.0:50020
> dfs.permissions true
> topology.script.number.args     100
> mapred.task.profile.maps        0-2
> dfs.datanode.https.address      0.0.0.0:50475
> mapred.userlog.retain.hours     24
> dfs.secondary.http.address      0.0.0.0:50090
> dfs.replication.max     512
> mapred.job.tracker.persist.jobstatus.active     false
> local.cache.size        10737418240
> mapred.min.split.size   0
> mapred.map.tasks        3
> mapred.child.java.opts  -Xmx200m
> mapred.output.value.class       org.apache.hadoop.io.IntWritable
> mapred.job.queue.name   default
> dfs.https.address       0.0.0.0:50470
> dfs.balance.bandwidthPerSec     1048576
> ipc.server.listen.queue.size    128
> group.name      fossist
> job.end.retry.interval  30000
> mapred.inmem.merge.threshold    1000
> mapred.skip.attempts.to.start.skipping  2
> fs.checkpoint.dir       ${hadoop.tmp.dir}/dfs/namesecondary
> mapred.reduce.tasks     1
> mapred.merge.recordsBeforeProgress      10000
> mapred.userlog.limit.kb 0
> webinterface.private.actions    false
> dfs.max.objects 0
> io.sort.spill.percent   0.80
> mapred.job.shuffle.input.buffer.percent 0.70
> mapred.job.split.file   
> hdfs://192.168.1.130:9000/tmp/hadoop-hadoop/mapred/system/job_200905071619_0003/job.split
> mapred.job.name wordcount
> mapred.map.tasks.speculative.execution  true
> dfs.datanode.dns.nameserver     default
> dfs.blockreport.intervalMsec    3600000
> mapred.map.max.attempts 4
> mapred.job.tracker.handler.count        10
> dfs.client.block.write.retries  3
> mapred.input.format.class       org.apache.hadoop.mapred.TextInputFormat
> mapred.tasktracker.expiry.interval      600000
> mapred.jobtracker.maxtasks.per.job      -1
> mapred.jobtracker.job.history.block.size        3145728
> keep.failed.task.files  false
> mapred.output.format.class      org.apache.hadoop.mapred.TextOutputFormat
> https.keystore.info.rsrc        sslinfo.xml
> ipc.client.tcpnodelay   false
> mapred.task.profile.reduces     0-2
> mapred.output.compression.codec org.apache.hadoop.io.compress.DefaultCodec
> io.map.index.skip       0
> mapred.working.dir      hdfs://192.168.1.130:9000/user/fossist
> ipc.server.tcpnodelay   false
> mapred.reducer.class    in.fossist.examples.Reduce
> dfs.default.chunk.view.size     32768
> hadoop.logfile.size     10000000
> mapred.reduce.tasks.speculative.execution       true
> dfs.datanode.du.reserved        0
> fs.checkpoint.period    3600
> mapred.combiner.class   in.fossist.examples.Combine
> mapred.job.reuse.jvm.num.tasks  1
> dfs.web.ugi     webuser,webgroup
> mapred.jobtracker.completeuserjobs.maximum      100
> dfs.df.interval 60000
> dfs.data.dir    ${hadoop.tmp.dir}/dfs/data
> fs.s3.maxRetries        4
> dfs.datanode.dns.interface      default
> mapred.local.dir        ${hadoop.tmp.dir}/mapred/local
> fs.hftp.impl    org.apache.hadoop.hdfs.HftpFileSystem
> dfs.permissions.supergroup      supergroup
> mapred.mapper.class     in.fossist.examples.Map
> fs.trash.interval       0
> fs.s3.sleepTimeSeconds  10
> dfs.replication.min     1
> mapred.submit.replication       10
> fs.har.impl     org.apache.hadoop.fs.HarFileSystem
> mapred.map.output.compression.codec     
> org.apache.hadoop.io.compress.DefaultCodec
> mapred.tasktracker.dns.interface        default
> dfs.namenode.decommission.interval      30
> dfs.http.address        0.0.0.0:50070
> mapred.job.tracker      192.168.1.135:9001
> dfs.heartbeat.interval  3
> io.seqfile.sorter.recordlimit   1000000
> dfs.name.dir    ${hadoop.tmp.dir}/dfs/name
> mapred.line.input.format.linespermap    1
> mapred.jobtracker.taskScheduler org.apache.hadoop.mapred.JobQueueTaskScheduler
> mapred.tasktracker.instrumentation      
> org.apache.hadoop.mapred.TaskTrackerMetricsInst
> dfs.datanode.http.address       0.0.0.0:50075
> mapred.tasktracker.procfsbasedprocesstree.sleeptime-before-sigkill      5000
> mapred.local.dir.minspacekill   0
> dfs.replication.interval        3
> io.sort.record.percent  0.05
> fs.kfs.impl     org.apache.hadoop.fs.kfs.KosmosFileSystem
> mapred.temp.dir ${hadoop.tmp.dir}/mapred/temp
> mapred.tasktracker.reduce.tasks.maximum 2
> dfs.replication 2
> fs.checkpoint.edits.dir ${fs.checkpoint.dir}
> mapred.job.reduce.input.buffer.percent  0.0
> mapred.tasktracker.indexcache.mb        10
> hadoop.logfile.count    10
> mapred.skip.reduce.auto.incr.proc.count true
> io.seqfile.compress.blocksize   1000000
> fs.s3.block.size        67108864
> mapred.tasktracker.taskmemorymanager.monitoring-interval        5000
> mapred.acls.enabled     false
> mapred.queue.names      default
> dfs.access.time.precision       3600000
> fs.hsftp.impl   org.apache.hadoop.hdfs.HsftpFileSystem
> mapred.task.tracker.http.address        0.0.0.0:50060
> mapred.reduce.parallel.copies   5
> io.seqfile.lazydecompress       true
> mapred.output.dir       hdfs://192.168.1.130:9000/fossist/output5
> ipc.client.connection.maxidletime       10000
> io.sort.mb      100
> mapred.compress.map.output      false
> mapred.task.tracker.report.address      127.0.0.1:0
> ipc.client.kill.max     10
> ipc.client.connect.max.retries  10
> fs.s3.impl      org.apache.hadoop.fs.s3.S3FileSystem
> mapred.input.dir        hdfs://192.168.1.130:9000/fossist/input
> mapred.job.tracker.http.address 0.0.0.0:50030
> io.file.buffer.size     4096
> mapred.jobtracker.restart.recover       false
> io.serializations       org.apache.hadoop.io.serializer.WritableSerialization
> dfs.datanode.handler.count      3
> mapred.reduce.copy.backoff      300
> mapred.task.profile     false
> dfs.replication.considerLoad    true
> jobclient.output.filter FAILED
> mapred.tasktracker.map.tasks.maximum    2
> io.compression.codecs   
> org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec
> fs.checkpoint.size      67108864
>

Re: All keys went to single reducer in WordCount program

Reply via email to