Fwd: HDFS: file is not distributed after upload

2014-02-07 Thread Alexander Frolov
Hi, folks!

I've deployed hadoop (0.20.203.0rc1) on 8-node cluster. After uploading
file onto hdfs I've got this file only on one of the nodes instead of being
uniformly distributed across all nodes. What can be the issue?

$HADOOP_HOME/bin/hadoop dfs -copyFromLocal ../data/rmat-20.0
/user/frolo/input/rmat-20.0

$HADOOP_HOME/bin/hadoop dfs -stat %b %o %r %n /user/frolo/input/rmat-*
1220222968 67108864 1 rmat-20.0

$HADOOP_HOME/bin/hadoop dfsadmin -report
Configured Capacity: 2536563998720 (2.31 TB)
Present Capacity: 1642543419392 (1.49 TB)
DFS Remaining: 1641312030720 (1.49 TB)
DFS Used: 1231388672 (1.15 GB)
DFS Used%: 0.07%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-
Datanodes available: 8 (8 total, 0 dead)

Name: 10.10.1.15:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 131536928768 (122.5 GB)
DFS Remaining: 185533546496(172.79 GB)
DFS Used%: 0%
DFS Remaining%: 58.51%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.13:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 131533377536 (122.5 GB)
DFS Remaining: 185537097728(172.79 GB)
DFS Used%: 0%
DFS Remaining%: 58.52%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.17:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 120023924736 (111.78 GB)
DFS Remaining: 197046550528(183.51 GB)
DFS Used%: 0%
DFS Remaining%: 62.15%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.18:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 78510628864 (73.12 GB)
DFS Remaining: 238559846400(222.18 GB)
DFS Used%: 0%
DFS Remaining%: 75.24%
Last contact: Fri Feb 07 12:10:24 MSK 2014


Name: 10.10.1.14:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 131537530880 (122.5 GB)
DFS Remaining: 185532944384(172.79 GB)
DFS Used%: 0%
DFS Remaining%: 58.51%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.11:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 1231216640 (1.15 GB)
Non DFS Used: 84698116096 (78.88 GB)
DFS Remaining: 231141167104(215.27 GB)
DFS Used%: 0.39%
DFS Remaining%: 72.9%
Last contact: Fri Feb 07 12:10:24 MSK 2014


Name: 10.10.1.16:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 131537494016 (122.5 GB)
DFS Remaining: 185532981248(172.79 GB)
DFS Used%: 0%
DFS Remaining%: 58.51%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Name: 10.10.1.12:50010
Decommission Status : Normal
Configured Capacity: 317070499840 (295.29 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 84642578432 (78.83 GB)
DFS Remaining: 232427896832(216.47 GB)
DFS Used%: 0%
DFS Remaining%: 73.3%
Last contact: Fri Feb 07 12:10:27 MSK 2014


Best,
  Alex


Re: HDFS: file is not distributed after upload

2014-02-07 Thread Harsh J
Hi,

The 0.20.203.0rc1 is a very old version at this point. Why not use a
more current version if you're deploying a new cluster?

Onto your issue, your configuration XML files (core-site.xml,
hdfs-site.xml or mapred-site.xml) most likely have a dfs.replication
value set to 1 causing only that may replicas to be written out by
default.

On Fri, Feb 7, 2014 at 2:11 PM, Alexander Frolov
alexndr.fro...@gmail.com wrote:

 Hi, folks!

 I've deployed hadoop (0.20.203.0rc1) on 8-node cluster. After uploading file
 onto hdfs I've got this file only on one of the nodes instead of being
 uniformly distributed across all nodes. What can be the issue?

 $HADOOP_HOME/bin/hadoop dfs -copyFromLocal ../data/rmat-20.0
 /user/frolo/input/rmat-20.0

 $HADOOP_HOME/bin/hadoop dfs -stat %b %o %r %n /user/frolo/input/rmat-*
 1220222968 67108864 1 rmat-20.0

 $HADOOP_HOME/bin/hadoop dfsadmin -report
 Configured Capacity: 2536563998720 (2.31 TB)
 Present Capacity: 1642543419392 (1.49 TB)
 DFS Remaining: 1641312030720 (1.49 TB)
 DFS Used: 1231388672 (1.15 GB)
 DFS Used%: 0.07%
 Under replicated blocks: 0
 Blocks with corrupt replicas: 0
 Missing blocks: 0

 -
 Datanodes available: 8 (8 total, 0 dead)

 Name: 10.10.1.15:50010
 Decommission Status : Normal
 Configured Capacity: 317070499840 (295.29 GB)
 DFS Used: 24576 (24 KB)
 Non DFS Used: 131536928768 (122.5 GB)
 DFS Remaining: 185533546496(172.79 GB)
 DFS Used%: 0%
 DFS Remaining%: 58.51%
 Last contact: Fri Feb 07 12:10:27 MSK 2014


 Name: 10.10.1.13:50010
 Decommission Status : Normal
 Configured Capacity: 317070499840 (295.29 GB)
 DFS Used: 24576 (24 KB)
 Non DFS Used: 131533377536 (122.5 GB)
 DFS Remaining: 185537097728(172.79 GB)
 DFS Used%: 0%
 DFS Remaining%: 58.52%
 Last contact: Fri Feb 07 12:10:27 MSK 2014


 Name: 10.10.1.17:50010
 Decommission Status : Normal
 Configured Capacity: 317070499840 (295.29 GB)
 DFS Used: 24576 (24 KB)
 Non DFS Used: 120023924736 (111.78 GB)
 DFS Remaining: 197046550528(183.51 GB)
 DFS Used%: 0%
 DFS Remaining%: 62.15%
 Last contact: Fri Feb 07 12:10:27 MSK 2014


 Name: 10.10.1.18:50010
 Decommission Status : Normal
 Configured Capacity: 317070499840 (295.29 GB)
 DFS Used: 24576 (24 KB)
 Non DFS Used: 78510628864 (73.12 GB)
 DFS Remaining: 238559846400(222.18 GB)
 DFS Used%: 0%
 DFS Remaining%: 75.24%
 Last contact: Fri Feb 07 12:10:24 MSK 2014


 Name: 10.10.1.14:50010
 Decommission Status : Normal
 Configured Capacity: 317070499840 (295.29 GB)
 DFS Used: 24576 (24 KB)
 Non DFS Used: 131537530880 (122.5 GB)
 DFS Remaining: 185532944384(172.79 GB)
 DFS Used%: 0%
 DFS Remaining%: 58.51%
 Last contact: Fri Feb 07 12:10:27 MSK 2014


 Name: 10.10.1.11:50010
 Decommission Status : Normal
 Configured Capacity: 317070499840 (295.29 GB)
 DFS Used: 1231216640 (1.15 GB)
 Non DFS Used: 84698116096 (78.88 GB)
 DFS Remaining: 231141167104(215.27 GB)
 DFS Used%: 0.39%
 DFS Remaining%: 72.9%
 Last contact: Fri Feb 07 12:10:24 MSK 2014


 Name: 10.10.1.16:50010
 Decommission Status : Normal
 Configured Capacity: 317070499840 (295.29 GB)
 DFS Used: 24576 (24 KB)
 Non DFS Used: 131537494016 (122.5 GB)
 DFS Remaining: 185532981248(172.79 GB)
 DFS Used%: 0%
 DFS Remaining%: 58.51%
 Last contact: Fri Feb 07 12:10:27 MSK 2014


 Name: 10.10.1.12:50010
 Decommission Status : Normal
 Configured Capacity: 317070499840 (295.29 GB)
 DFS Used: 24576 (24 KB)
 Non DFS Used: 84642578432 (78.83 GB)
 DFS Remaining: 232427896832(216.47 GB)
 DFS Used%: 0%
 DFS Remaining%: 73.3%
 Last contact: Fri Feb 07 12:10:27 MSK 2014


 Best,
   Alex




-- 
Harsh J


Re: HDFS: file is not distributed after upload

2014-02-07 Thread Selçuk Şenkul
Hi Alex,

You should give the copyFromLocal command from the namenode or any machine
that is not a datanode to get the file distributed.


On Fri, Feb 7, 2014 at 10:53 AM, Harsh J ha...@cloudera.com wrote:

 Hi,

 The 0.20.203.0rc1 is a very old version at this point. Why not use a
 more current version if you're deploying a new cluster?

 Onto your issue, your configuration XML files (core-site.xml,
 hdfs-site.xml or mapred-site.xml) most likely have a dfs.replication
 value set to 1 causing only that may replicas to be written out by
 default.

 On Fri, Feb 7, 2014 at 2:11 PM, Alexander Frolov
 alexndr.fro...@gmail.com wrote:
 
  Hi, folks!
 
  I've deployed hadoop (0.20.203.0rc1) on 8-node cluster. After uploading
 file
  onto hdfs I've got this file only on one of the nodes instead of being
  uniformly distributed across all nodes. What can be the issue?
 
  $HADOOP_HOME/bin/hadoop dfs -copyFromLocal ../data/rmat-20.0
  /user/frolo/input/rmat-20.0
 
  $HADOOP_HOME/bin/hadoop dfs -stat %b %o %r %n /user/frolo/input/rmat-*
  1220222968 67108864 1 rmat-20.0
 
  $HADOOP_HOME/bin/hadoop dfsadmin -report
  Configured Capacity: 2536563998720 (2.31 TB)
  Present Capacity: 1642543419392 (1.49 TB)
  DFS Remaining: 1641312030720 (1.49 TB)
  DFS Used: 1231388672 (1.15 GB)
  DFS Used%: 0.07%
  Under replicated blocks: 0
  Blocks with corrupt replicas: 0
  Missing blocks: 0
 
  -
  Datanodes available: 8 (8 total, 0 dead)
 
  Name: 10.10.1.15:50010
  Decommission Status : Normal
  Configured Capacity: 317070499840 (295.29 GB)
  DFS Used: 24576 (24 KB)
  Non DFS Used: 131536928768 (122.5 GB)
  DFS Remaining: 185533546496(172.79 GB)
  DFS Used%: 0%
  DFS Remaining%: 58.51%
  Last contact: Fri Feb 07 12:10:27 MSK 2014
 
 
  Name: 10.10.1.13:50010
  Decommission Status : Normal
  Configured Capacity: 317070499840 (295.29 GB)
  DFS Used: 24576 (24 KB)
  Non DFS Used: 131533377536 (122.5 GB)
  DFS Remaining: 185537097728(172.79 GB)
  DFS Used%: 0%
  DFS Remaining%: 58.52%
  Last contact: Fri Feb 07 12:10:27 MSK 2014
 
 
  Name: 10.10.1.17:50010
  Decommission Status : Normal
  Configured Capacity: 317070499840 (295.29 GB)
  DFS Used: 24576 (24 KB)
  Non DFS Used: 120023924736 (111.78 GB)
  DFS Remaining: 197046550528(183.51 GB)
  DFS Used%: 0%
  DFS Remaining%: 62.15%
  Last contact: Fri Feb 07 12:10:27 MSK 2014
 
 
  Name: 10.10.1.18:50010
  Decommission Status : Normal
  Configured Capacity: 317070499840 (295.29 GB)
  DFS Used: 24576 (24 KB)
  Non DFS Used: 78510628864 (73.12 GB)
  DFS Remaining: 238559846400(222.18 GB)
  DFS Used%: 0%
  DFS Remaining%: 75.24%
  Last contact: Fri Feb 07 12:10:24 MSK 2014
 
 
  Name: 10.10.1.14:50010
  Decommission Status : Normal
  Configured Capacity: 317070499840 (295.29 GB)
  DFS Used: 24576 (24 KB)
  Non DFS Used: 131537530880 (122.5 GB)
  DFS Remaining: 185532944384(172.79 GB)
  DFS Used%: 0%
  DFS Remaining%: 58.51%
  Last contact: Fri Feb 07 12:10:27 MSK 2014
 
 
  Name: 10.10.1.11:50010
  Decommission Status : Normal
  Configured Capacity: 317070499840 (295.29 GB)
  DFS Used: 1231216640 (1.15 GB)
  Non DFS Used: 84698116096 (78.88 GB)
  DFS Remaining: 231141167104(215.27 GB)
  DFS Used%: 0.39%
  DFS Remaining%: 72.9%
  Last contact: Fri Feb 07 12:10:24 MSK 2014
 
 
  Name: 10.10.1.16:50010
  Decommission Status : Normal
  Configured Capacity: 317070499840 (295.29 GB)
  DFS Used: 24576 (24 KB)
  Non DFS Used: 131537494016 (122.5 GB)
  DFS Remaining: 185532981248(172.79 GB)
  DFS Used%: 0%
  DFS Remaining%: 58.51%
  Last contact: Fri Feb 07 12:10:27 MSK 2014
 
 
  Name: 10.10.1.12:50010
  Decommission Status : Normal
  Configured Capacity: 317070499840 (295.29 GB)
  DFS Used: 24576 (24 KB)
  Non DFS Used: 84642578432 (78.83 GB)
  DFS Remaining: 232427896832(216.47 GB)
  DFS Used%: 0%
  DFS Remaining%: 73.3%
  Last contact: Fri Feb 07 12:10:27 MSK 2014
 
 
  Best,
Alex
 



 --
 Harsh J



Can we avoid restarting of AM when it fails?

2014-02-07 Thread Krishna Kishore Bonagiri
Hi,

   I am having some failure test cases where my Application Master is
supposed to fail. But when it fails it is again started with appID_02 .
Is there a way for me to avoid the second instance of the Application
Master getting started? Is it re-started automatically by the RM after the
first one failed?

Thanks,
Kishore


Re: java.lang.OutOfMemoryError: Java heap space

2014-02-07 Thread praveenesh kumar
Thanks Park for sharing the above configs

But I am wondering if the above config changes would make any huge
difference in my case.
As per my logs, I am very worried about this line -

 INFO org.apache.hadoop.mapred.MapTask: Record too large for in-memory
buffer: 644245358 bytes

If I am understanding it properly, my 1 record is very large to fit
into the memory, which is causing the issue.
Any of the above changes wouldn't make any huge impact, please correct
me if I am taking it totally wrong.

 - Adding hadoop user group here as well, to throw some valuable
inputs to understand the above question.


Since I am doing a join on a grouped bag, do you think that might be the case ?

But if that is the issue, as far as I understand Bags in Pig are
spillable, it shouldn't have given this issue.

I can't get rid of group by, Grouping by first should idealing improve
my join. But if this is the root cause, if I am understanding it
correctly,

do you think I should get rid of group-by.

But my question in that case would be what would happen if I do group
by later after join, if will result in much bigger bag (because it
would have more records after join)

Am I thinking here correctly ?

Regards

Prav



On Fri, Feb 7, 2014 at 3:11 AM, Cheolsoo Park piaozhe...@gmail.com wrote:

 Looks like you're running out of space in MapOutputBuffer. Two suggestions-

 1)
 You said that io.sort.mb is already set to 768 MB, but did you try to lower
 io.sort.spill.percent in order to spill earlier and more often?

 Page 12-

 http://www.slideshare.net/Hadoop_Summit/optimizing-mapreduce-job-performance

 2)
 Can't you increase the parallelism of mappers so that each mapper has to
 handle a smaller size of data? Pig determines the number of mappers by
 total input size / pig.maxCombinedSplitSize (128MB by default). So you can
 try to lower pig.maxCombinedSplitSize.

 But I admit Pig internal data types are not memory-efficient, and that is
 an optimization opportunity. Contribute!



 On Thu, Feb 6, 2014 at 2:54 PM, praveenesh kumar praveen...@gmail.com
 wrote:

  Its a normal join. I can't use replicated join, as the data is very
 large.
 
  Regards
  Prav
 
 
  On Thu, Feb 6, 2014 at 7:52 PM, abhishek abhishek.dod...@gmail.com
  wrote:
 
   Hi Praveenesh,
  
   Did you use replicated join in your pig script or is it a regular
 join
  ??
  
   Regards
   Abhishek
  
   Sent from my iPhone
  
On Feb 6, 2014, at 11:25 AM, praveenesh kumar praveen...@gmail.com
   wrote:
   
Hi all,
   
I am running a Pig Script which is running fine for small data. But
  when
   I
scale the data, I am getting the following error at my map stage.
Please refer to the map logs as below.
   
My Pig script is doing a group by first, followed by a join on the
   grouped
data.
   
   
Any clues to understand where I should look at or how shall I deal
 with
this situation. I don't want to just go by just increasing the heap
   space.
My map jvm heap space is already 3 GB with io.sort.mb = 768 MB.
   
2014-02-06 19:15:12,243 WARN org.apache.hadoop.util.NativeCodeLoader:
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable 2014-02-06 19:15:15,025 INFO
org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2014-02-06 19:15:15,123 INFO org.apache.hadoop.mapred.Task: Using
ResourceCalculatorPlugin :
   
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2bd9e2822014-02-06
19:15:15,546 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 768
2014-02-06 19:15:19,846 INFO org.apache.hadoop.mapred.MapTask: data
   buffer
= 612032832/644245088 2014-02-06 19:15:19,846 INFO
org.apache.hadoop.mapred.MapTask: record buffer = 9563013/10066330
2014-02-06 19:15:20,037 INFO org.apache.hadoop.io.compress.CodecPool:
  Got
brand-new decompressor 2014-02-06 19:15:21,083 INFO
   
  
 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader:
Created input record counter: Input records from _1_tmp1327641329
2014-02-06 19:15:52,894 INFO org.apache.hadoop.mapred.MapTask:
 Spilling
   map
output: buffer full= true 2014-02-06 19:15:52,895 INFO
org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 611949600;
   bufvoid
= 644245088 2014-02-06 19:15:52,895 INFO
   org.apache.hadoop.mapred.MapTask:
kvstart = 0; kvend = 576; length = 10066330 2014-02-06 19:16:06,182
  INFO
org.apache.hadoop.mapred.MapTask: Finished spill 0 2014-02-06
   19:16:16,169
INFO org.apache.pig.impl.util.SpillableMemoryManager: first memory
   handler
call - Collection threshold init = 328728576(321024K) used =
1175055104(1147514K) committed = 1770848256(1729344K) max =
2097152000(2048000K) 2014-02-06 19:16:20,446 INFO
org.apache.pig.impl.util.SpillableMemoryManager: Spilled an estimate
 of
308540402 bytes from 1 objects. init = 328728576(321024K) used =
1175055104(1147514K) committed 

meaning or usage of reserved containers in YARN Capacity scheduler

2014-02-07 Thread ricky l
Hi all,

I have a question about reserved containers in the YARN capacity
scheduler. After reading the source code and related document, it is
not very clear. What is the purpose or practical usage of the reserved
container? thx.


Re: java.lang.OutOfMemoryError: Java heap space

2014-02-07 Thread praveenesh kumar
Hi Park,

Your explanation makes perfect sense in my case. Thanks for explaining what
is happening behind the scenes. I am wondering you used normal java
compression/decompression or is there a UDF already available to do this
stuff or some kind of property that we need to enable to say to PIG that
compress bags before spilling.

Regards
Prav


On Fri, Feb 7, 2014 at 4:37 PM, Cheolsoo Park piaozhe...@gmail.com wrote:

 Hi Prav,

 You're thinking correctly, and it's true that Pig bags are spillable.

 However, spilling is no magic, meaning you can still run into OOM with huge
 bags like you have here. Pig runs Spillable Memory Manager (SMM) in a
 separate thread. When spilling is triggered, SMM locks bags that it's
 trying to spill to disk. After the spilling is finished, GC frees up
 memory. The problem is that it's possible that more bags are loaded into
 memory while the spilling is in progress. Now JVM triggers GC, but GC
 cannot free up memory because SMM is locking the bags, resulting in OOM
 error. This happens quite often.

 Sounds like you do group-by to reduce the number of rows before join and
 don't immediately run any aggregation function on the grouped bags. If
 that's the case, can you compress those bags? For eg, you could add a
 foreach after group-by and run a UDF that compresses a bag and returns it
 as bytearray. From there, you're moving around small blobs rather than big
 bags. Of course, you will need to decompress them when you restore data out
 of those bags at some point. This trick saved me several times in the past
 particularly when I dealt with bags of large chararrays.

 Just a thought. Hope this is helpful.

 Thanks,
 Cheolsoo


 On Fri, Feb 7, 2014 at 7:37 AM, praveenesh kumar praveen...@gmail.com
 wrote:

  Thanks Park for sharing the above configs
 
  But I am wondering if the above config changes would make any huge
  difference in my case.
  As per my logs, I am very worried about this line -
 
   INFO org.apache.hadoop.mapred.MapTask: Record too large for in-memory
 buffer: 644245358 bytes
 
  If I am understanding it properly, my 1 record is very large to fit into
 the memory, which is causing the issue.
 
  Any of the above changes wouldn't make any huge impact, please correct
 me if I am taking it totally wrong.
 
   - Adding hadoop user group here as well, to throw some valuable inputs
 to understand the above question.
 
 
  Since I am doing a join on a grouped bag, do you think that might be the
 case ?
 
  But if that is the issue, as far as I understand Bags in Pig are
 spillable, it shouldn't have given this issue.
 
  I can't get rid of group by, Grouping by first should idealing improve
 my join. But if this is the root cause, if I am understanding it correctly,
 
  do you think I should get rid of group-by.
 
  But my question in that case would be what would happen if I do group by
 later after join, if will result in much bigger bag (because it would have
 more records after join)
 
  Am I thinking here correctly ?
 
  Regards
 
  Prav
 
 
 
  On Fri, Feb 7, 2014 at 3:11 AM, Cheolsoo Park piaozhe...@gmail.com
 wrote:
 
  Looks like you're running out of space in MapOutputBuffer. Two
  suggestions-
 
  1)
  You said that io.sort.mb is already set to 768 MB, but did you try to
  lower
  io.sort.spill.percent in order to spill earlier and more often?
 
  Page 12-
 
 
 http://www.slideshare.net/Hadoop_Summit/optimizing-mapreduce-job-performance
 
  2)
  Can't you increase the parallelism of mappers so that each mapper has to
  handle a smaller size of data? Pig determines the number of mappers by
  total input size / pig.maxCombinedSplitSize (128MB by default). So you
 can
  try to lower pig.maxCombinedSplitSize.
 
  But I admit Pig internal data types are not memory-efficient, and that
 is
  an optimization opportunity. Contribute!
 
 
 
  On Thu, Feb 6, 2014 at 2:54 PM, praveenesh kumar praveen...@gmail.com
  wrote:
 
   Its a normal join. I can't use replicated join, as the data is very
  large.
  
   Regards
   Prav
  
  
   On Thu, Feb 6, 2014 at 7:52 PM, abhishek abhishek.dod...@gmail.com
   wrote:
  
Hi Praveenesh,
   
Did you use replicated join in your pig script or is it a regular
  join
   ??
   
Regards
Abhishek
   
Sent from my iPhone
   
 On Feb 6, 2014, at 11:25 AM, praveenesh kumar 
 praveen...@gmail.com
  
wrote:

 Hi all,

 I am running a Pig Script which is running fine for small data.
 But
   when
I
 scale the data, I am getting the following error at my map stage.
 Please refer to the map logs as below.

 My Pig script is doing a group by first, followed by a join on the
grouped
 data.


 Any clues to understand where I should look at or how shall I deal
  with
 this situation. I don't want to just go by just increasing the
 heap
space.
 My map jvm heap space is already 3 GB with io.sort.mb = 768 MB.

 2014-02-06 19:15:12,243 WARN
  

Re: java.lang.OutOfMemoryError: Java heap space

2014-02-07 Thread Cheolsoo Park
Hi Prav,

You're thinking correctly, and it's true that Pig bags are spillable.

However, spilling is no magic, meaning you can still run into OOM with huge
bags like you have here. Pig runs Spillable Memory Manager (SMM) in a
separate thread. When spilling is triggered, SMM locks bags that it's
trying to spill to disk. After the spilling is finished, GC frees up
memory. The problem is that it's possible that more bags are loaded into
memory while the spilling is in progress. Now JVM triggers GC, but GC
cannot free up memory because SMM is locking the bags, resulting in OOM
error. This happens quite often.

Sounds like you do group-by to reduce the number of rows before join and
don't immediately run any aggregation function on the grouped bags. If
that's the case, can you compress those bags? For eg, you could add a
foreach after group-by and run a UDF that compresses a bag and returns it
as bytearray. From there, you're moving around small blobs rather than big
bags. Of course, you will need to decompress them when you restore data out
of those bags at some point. This trick saved me several times in the past
particularly when I dealt with bags of large chararrays.

Just a thought. Hope this is helpful.

Thanks,
Cheolsoo


On Fri, Feb 7, 2014 at 7:37 AM, praveenesh kumar praveen...@gmail.comwrote:

 Thanks Park for sharing the above configs

 But I am wondering if the above config changes would make any huge
 difference in my case.
 As per my logs, I am very worried about this line -

  INFO org.apache.hadoop.mapred.MapTask: Record too large for in-memory 
 buffer: 644245358 bytes

 If I am understanding it properly, my 1 record is very large to fit into the 
 memory, which is causing the issue.

 Any of the above changes wouldn't make any huge impact, please correct me if 
 I am taking it totally wrong.

  - Adding hadoop user group here as well, to throw some valuable inputs to 
 understand the above question.


 Since I am doing a join on a grouped bag, do you think that might be the case 
 ?

 But if that is the issue, as far as I understand Bags in Pig are spillable, 
 it shouldn't have given this issue.

 I can't get rid of group by, Grouping by first should idealing improve my 
 join. But if this is the root cause, if I am understanding it correctly,

 do you think I should get rid of group-by.

 But my question in that case would be what would happen if I do group by 
 later after join, if will result in much bigger bag (because it would have 
 more records after join)

 Am I thinking here correctly ?

 Regards

 Prav



 On Fri, Feb 7, 2014 at 3:11 AM, Cheolsoo Park piaozhe...@gmail.comwrote:

 Looks like you're running out of space in MapOutputBuffer. Two
 suggestions-

 1)
 You said that io.sort.mb is already set to 768 MB, but did you try to
 lower
 io.sort.spill.percent in order to spill earlier and more often?

 Page 12-

 http://www.slideshare.net/Hadoop_Summit/optimizing-mapreduce-job-performance

 2)
 Can't you increase the parallelism of mappers so that each mapper has to
 handle a smaller size of data? Pig determines the number of mappers by
 total input size / pig.maxCombinedSplitSize (128MB by default). So you can
 try to lower pig.maxCombinedSplitSize.

 But I admit Pig internal data types are not memory-efficient, and that is
 an optimization opportunity. Contribute!



 On Thu, Feb 6, 2014 at 2:54 PM, praveenesh kumar praveen...@gmail.com
 wrote:

  Its a normal join. I can't use replicated join, as the data is very
 large.
 
  Regards
  Prav
 
 
  On Thu, Feb 6, 2014 at 7:52 PM, abhishek abhishek.dod...@gmail.com
  wrote:
 
   Hi Praveenesh,
  
   Did you use replicated join in your pig script or is it a regular
 join
  ??
  
   Regards
   Abhishek
  
   Sent from my iPhone
  
On Feb 6, 2014, at 11:25 AM, praveenesh kumar praveen...@gmail.com
 
   wrote:
   
Hi all,
   
I am running a Pig Script which is running fine for small data. But
  when
   I
scale the data, I am getting the following error at my map stage.
Please refer to the map logs as below.
   
My Pig script is doing a group by first, followed by a join on the
   grouped
data.
   
   
Any clues to understand where I should look at or how shall I deal
 with
this situation. I don't want to just go by just increasing the heap
   space.
My map jvm heap space is already 3 GB with io.sort.mb = 768 MB.
   
2014-02-06 19:15:12,243 WARN
 org.apache.hadoop.util.NativeCodeLoader:
Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable 2014-02-06 19:15:15,025 INFO
org.apache.hadoop.util.ProcessTree: setsid exited with exit code 0
2014-02-06 19:15:15,123 INFO org.apache.hadoop.mapred.Task: Using
ResourceCalculatorPlugin :
   
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@2bd9e2822014-02-06
19:15:15,546 INFO org.apache.hadoop.mapred.MapTask: io.sort.mb = 768
2014-02-06 19:15:19,846 INFO 

Re: Problems building hadoop 2.2.0 from source

2014-02-07 Thread Christopher Thomas
Thanks, I built 2.3 yesterday (checked out from from link suggested in
earlier post of this thread) without problems apart from VM running out of
memory which was fixed with

export MAVEN_OPTS=-Xmx2048m

At least, I got a message saying successful build.

Thanks for your help.




On 8 February 2014 10:53, Ted Yu yuzhih...@gmail.com wrote:

 In the output for a passing test, I saw:

 2014-02-06 16:48:49,722 ERROR [Thread[Thread-71,5,main]]
 delegation.AbstractDelegationTokenSecretManager
 (AbstractDelegationTokenSecretManager.java:run(557)) -
 InterruptedExcpetion recieved for ExpiredTokenRemover thread
 java.lang.InterruptedException: sleep interrupted

 Meaning the above was not critical.

 branch-2.3 is receiving attention now.
 Discovering test failure there would be more helpful.

 Cheers


 On Thu, Feb 6, 2014 at 9:25 PM, Christopher Thomas 
 christophermauricetho...@gmail.com wrote:

 I guess the ERROR lines in

 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-
 jobclient/target/surefire-reports/org.apache.hadoop.mapreduce.v2.
 TestMRJobsWithHistoryService-output.txt

 led me to believe that the problem was with TestMRJobsWithHistoryService.
 If that's not the case then what do these messages indicate?

 As I say I am a complete novice and the learning curve is very steep.


 On 7 February 2014 14:47, Ted Yu yuzhih...@gmail.com wrote:

 I checked out source code from
 http://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.3 and
 it builds.

 From TestMRJobsWithHistoryService.txt, the test passed.
 What led to this test being singled out among the 454 tests ?

 Thanks


 On Thu, Feb 6, 2014 at 7:26 PM, Christopher Thomas 
 christophermauricetho...@gmail.com wrote:

 Yes well I tried 2.3, but I have found a number of problems building
 it. I had to resort to manually applying patches that I found in the bug
 tracking lists, which did not seem to have made it into all branches. So
 for the moment I am sticking with 2.2.0 which is advertised as being
 stable.

 I apologise for the confusion.

 Here is the contents
 of 
 ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/surefire-reports/org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService.txt,
 though perhaps not that illuminating:



 ---
 Test set: org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService

 ---
 Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 52.669
 sec




 On 7 February 2014 14:12, Ted Yu yuzhih...@gmail.com wrote:

 The output was
 from 
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/surefire-reports/org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService-output.txt

  Can you show us the contents
 of 
 ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/surefire-reports/org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService.txt
 ?

 BTW hadoop 2.3 release candidate is coming up. You may consider paying
 more attention to hadoop 2.3

 Cheers


 On Thu, Feb 6, 2014 at 5:33 PM, Christopher Thomas 
 christophermauricetho...@gmail.com wrote:

 I included the last part of

 hadoop-mapreduce-project/hadoop-mapreduce-client/
 hadoop-mapreduce-client-jobclient/target/surefire-
 reports/org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService.
 txt

 in the second half of my initial posting, calling it the output
 from TestMRJobsWithHistoryService. Sloppy terminology I know, sorry
 if I wasn't very clear.

 Regards

 Chris


 On 7 February 2014 11:53, Ted Yu yuzhih...@gmail.com wrote:

 There isn't System.exit call in TestMRJobsWithHistoryService.java

 What
 did 
 hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/surefire-reports/org.apache.hadoop.mapreduce.v2.TestMRJobsWithHistoryService.txt
 say ?

 Cheers


 On Thu, Feb 6, 2014 at 4:41 PM, Christopher Thomas 
 christophermauricetho...@gmail.com wrote:

 Hi,

 I am a complete beginner to Hadoop, trying to build 2.2.0 from
 source on a Macbook Pro running OS X Mavericks.

 I am following the 'instructions' at
 http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html
  such
 as they are.

 I get the following test failure:

 Forking command line: /bin/sh -c cd
 /Users/hadoop/hadoop-2.2.0-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient
 
 /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java
 -Xmx1024m -XX:+HeapDumpOnOutOfMemoryError -jar
 /Users/hadoop/hadoop-2.2.0-src/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-jobclient/target/surefire/surefirebooter1837947962445626736.jar