Sorry, I enter tab and it send my unfinished post. See the following mail for answers of other questions.
I forget the exception's detail. It throws exception in terminal. The default io.sort.mb is 100 and I set it to 500 to speed up reducer. So I set mapred.child.java.opts to 1g The datanode/regionserver has 16GB memory but free memory for map-reduce is about 5gb. So I can't add more mappers On Tue, Jul 22, 2014 at 1:37 PM, Stack <st...@duboce.net> wrote: > On Mon, Jul 21, 2014 at 10:32 PM, Li Li <fancye...@gmail.com> wrote: > >> 1. yes, I have 20 concurrent running mappers. >> 2. I can't add more mappers because I set io.sort.mb to 500mb and if I >> set 8 mappers, it hit oov exception and load average is high >> > > > What is OOV? > > Do you have to have a reducer? > > Load average is high? How high? > > > >> 3. fast mapper only use 1 minute. following is the statistics >> > > > So each region is only taking 1 minute to scan? 1.4Gs scanned? > > Can you add other counters to your MR job so we can get more of an idea of > what is going on in it? > > Please answer my other questions. > > Thanks, > St.Ack > > >> HBase Counters >> REMOTE_RPC_CALLS 0 >> RPC_CALLS 523 >> RPC_RETRIES 0 >> NOT_SERVING_REGION_EXCEPTION 0 >> NUM_SCANNER_RESTARTS 0 >> MILLIS_BETWEEN_NEXTS 62,415 >> BYTES_IN_RESULTS 1,380,694,667 >> BYTES_IN_REMOTE_RESULTS 0 >> REGIONS_SCANNED 1 >> REMOTE_RPC_RETRIES 0 >> >> FileSystemCounters >> FILE_BYTES_READ 120,508,552 >> HDFS_BYTES_READ 176 >> FILE_BYTES_WRITTEN 241,000,600 >> >> File Input Format Counters >> Bytes Read 0 >> >> Map-Reduce Framework >> Map output materialized bytes 120,448,992 >> Combine output records 0 >> Map input records 5,208,607 >> Physical memory (bytes) snapshot 965,730,304 >> Spilled Records 10,417,214 >> Map output bytes 282,122,973 >> CPU time spent (ms) 82,610 >> Total committed heap usage (bytes) 1,061,158,912 >> Virtual memory (bytes) snapshot 1,681,047,552 >> Combine input records 0 >> Map output records 5,208,607 >> SPLIT_RAW_BYTES 176 >> >> >> On Tue, Jul 22, 2014 at 12:11 PM, Stack <st...@duboce.net> wrote: >> > How many regions now? >> > >> > You still have 20 concurrent mappers running? Are your machines loaded >> w/ >> > 4 map tasks on each? Can you up the number of concurrent mappers? Can >> you >> > get an idea of your scan rates? Are all map tasks scanning at same rate? >> > Does one task lag the others? Do you emit stats on each map task such >> as >> > rows processed? Can you figure your bottleneck? Are you seeking disk all >> > the time? Anything else running while this big scan is going on? How >> big >> > are your cells? Do you have one or more column families? How many >> columns? >> > >> > For average region size, do du on the hdfs region directories and then >> sum >> > and divide by region count. >> > >> > St.Ack >> > >> > >> > On Mon, Jul 21, 2014 at 7:30 PM, Li Li <fancye...@gmail.com> wrote: >> > >> >> anyone could help? now I have about 1.1 billion nodes and it takes 2 >> >> hours to finish a map reduce job. >> >> >> >> ---------- Forwarded message ---------- >> >> From: Li Li <fancye...@gmail.com> >> >> Date: Thu, Jun 26, 2014 at 3:34 PM >> >> Subject: how to do parallel scanning in map reduce using hbase as input? >> >> To: u...@hbase.apache.org >> >> >> >> >> >> my table has about 700 million rows and about 80 regions. each task >> >> tracker is configured with 4 mappers and 4 reducers at the same time. >> >> The hadoop/hbase cluster has 5 nodes so at the same time, it has 20 >> >> mappers running. it takes more than an hour to finish mapper stage. >> >> The hbase cluster's load is very low, about 2,000 request per second. >> >> I think one mapper for a region is too small. How can I run more than >> >> one mapper for a region so that it can take full advantage of >> >> computing resources? >> >> >>