Re: OOM Error Map output copy.
All Thanks for your inputs. Fortunately, our requirements for this job = changed, allowing me to use a combiner that ended up reducing the data = that was being pumped to the reducers and the problem went away.=20 @Arun -- You are probably right that the number of reducers is rather = small, given the amount of data that is being reduced. We are currently = using the Fair scheduler and it fits our other use cases very well. The = newer versions of Fair scheduler appear to support limits on the reducer = pools. So this solution has to wait until we can make a switch.=20 ~ Niranjan. On Dec 9, 2011, at 8:51 PM, Chandraprakash Bhagtani wrote: Hi Niranjan, Your issue looks similar to https://issues.apache.org/jira/browse/MAPREDUCE-1182 . Which hadoop version are you using? On Sat, Dec 10, 2011 at 12:17 AM, Prashant Kommireddi prash1...@gmail.comwrote: Arun, I faced the same issue and increasing the # of reducers fixed the problem. I was initially under the impression MR framework spills to disk if data is too huge to keep in memory, however on extraordinarily large reduce inputs this was not the case and the job failed on trying to assign the in-memory buffer private MapOutput shuffleInMemory(MapOutputLocation mapOutputLoc, URLConnection connection, InputStream input, int mapOutputLength, int compressedLength) throws IOException, InterruptedException { // Reserve ram for the map-output . . . . // Copy map-output into an in-memory buffer byte[] shuffleData = new byte[mapOutputLength]; -Prahant Kommireddi On Fri, Dec 9, 2011 at 10:29 AM, Arun C Murthy a...@hortonworks.com wrote: Moving to mapreduce-user@, bcc common-user@. Please use project specific lists. Niranjan, If you average as 0.5G output per-map, it's 5000 maps *0.5G - 2.5TB over 12 reduces i.e. nearly 250G per reduce - compressed! If you think you have 4:1 compression you are doing nearly a Terabyte per reducer... which is way too high! I'd recommend you bump to somewhere along 1000 reduces to get to 2.5G (compressed) per reducer for your job. If your compression ratio is 2:1, try 500 reduces and so on. If you are worried about other users, use the CapacityScheduler and submit your job to a queue with a small capacity and max-capacity to restrict your job to 10 or 20 concurrent reduces at a given point. Arun On Dec 7, 2011, at 10:51 AM, Niranjan Balasubramanian wrote: All I am encountering the following out-of-memory error during the reduce phase of a large job. Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1669) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1529) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1378) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1310) I tried increasing the memory available using mapped.child.java.opts but that only helps a little. The reduce task eventually fails again. Here are some relevant job configuration details: 1. The input to the mappers is about 2.5 TB (LZO compressed). The mappers filter out a small percentage of the input ( less than 1%). 2. I am currently using 12 reducers and I can't increase this count by much to ensure availability of reduce slots for other users. 3. mapred.child.java.opts -- -Xms512M -Xmx1536M -XX:+UseSerialGC 4. mapred.job.shuffle.input.buffer.percent-- 0.70 5. mapred.job.shuffle.merge.percent -- 0.66 6. mapred.inmem.merge.threshold -- 1000 7. I have nearly 5000 mappers which are supposed to produce LZO compressed outputs. The logs seem to indicate that the map outputs range between 0.3G to 0.8GB. Does anything here seem amiss? I'd appreciate any input of what settings to try. I can try different reduced values for the input buffer percent and the merge percent. Given that the job runs for about 7-8 hours before crashing, I would like to make some informed choices if possible. Thanks. ~ Niranjan. -- Thanks Regards, Chandra Prakash Bhagtani, Nokia India Pvt. Ltd.
RE: OOM Error Map output copy.
Can you try increasing the max heap memory whether still you face the problem. Devaraj K -Original Message- From: Niranjan Balasubramanian [mailto:niran...@cs.washington.edu] Sent: Thursday, December 08, 2011 11:09 PM To: common-user@hadoop.apache.org Subject: Re: OOM Error Map output copy. I am using version 0.20.203. Thanks ~ Niranjan. On Dec 8, 2011, at 9:26 AM, Niranjan Balasubramanian wrote: Devaraj These are indeed the actual settings I copied over from the job.xml. ~ Niranjan. On Dec 8, 2011, at 12:10 AM, Devaraj K wrote: Hi Niranjan, Every thing looks ok as per the info you have given. Can you check in the job.xml file whether these child opts reflecting or any thing else is overwriting this config. 3. mapred.child.java.opts -- -Xms512M -Xmx1536M -XX:+UseSerialGC and also can you tell me which version of hadoop using? Devaraj K -Original Message- From: Niranjan Balasubramanian [mailto:niran...@cs.washington.edu] Sent: Thursday, December 08, 2011 12:21 AM To: common-user@hadoop.apache.org Subject: OOM Error Map output copy. All I am encountering the following out-of-memory error during the reduce phase of a large job. Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMe mory(ReduceTask.java:1669) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutpu t(ReduceTask.java:1529) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput( ReduceTask.java:1378) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceT ask.java:1310) I tried increasing the memory available using mapped.child.java.opts but that only helps a little. The reduce task eventually fails again. Here are some relevant job configuration details: 1. The input to the mappers is about 2.5 TB (LZO compressed). The mappers filter out a small percentage of the input ( less than 1%). 2. I am currently using 12 reducers and I can't increase this count by much to ensure availability of reduce slots for other users. 3. mapred.child.java.opts -- -Xms512M -Xmx1536M -XX:+UseSerialGC 4. mapred.job.shuffle.input.buffer.percent -- 0.70 5. mapred.job.shuffle.merge.percent -- 0.66 6. mapred.inmem.merge.threshold -- 1000 7. I have nearly 5000 mappers which are supposed to produce LZO compressed outputs. The logs seem to indicate that the map outputs range between 0.3G to 0.8GB. Does anything here seem amiss? I'd appreciate any input of what settings to try. I can try different reduced values for the input buffer percent and the merge percent. Given that the job runs for about 7-8 hours before crashing, I would like to make some informed choices if possible. Thanks. ~ Niranjan.
Re: OOM Error Map output copy.
Moving to mapreduce-user@, bcc common-user@. Please use project specific lists. Niranjan, If you average as 0.5G output per-map, it's 5000 maps *0.5G - 2.5TB over 12 reduces i.e. nearly 250G per reduce - compressed! If you think you have 4:1 compression you are doing nearly a Terabyte per reducer... which is way too high! I'd recommend you bump to somewhere along 1000 reduces to get to 2.5G (compressed) per reducer for your job. If your compression ratio is 2:1, try 500 reduces and so on. If you are worried about other users, use the CapacityScheduler and submit your job to a queue with a small capacity and max-capacity to restrict your job to 10 or 20 concurrent reduces at a given point. Arun On Dec 7, 2011, at 10:51 AM, Niranjan Balasubramanian wrote: All I am encountering the following out-of-memory error during the reduce phase of a large job. Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1669) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1529) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1378) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1310) I tried increasing the memory available using mapped.child.java.opts but that only helps a little. The reduce task eventually fails again. Here are some relevant job configuration details: 1. The input to the mappers is about 2.5 TB (LZO compressed). The mappers filter out a small percentage of the input ( less than 1%). 2. I am currently using 12 reducers and I can't increase this count by much to ensure availability of reduce slots for other users. 3. mapred.child.java.opts -- -Xms512M -Xmx1536M -XX:+UseSerialGC 4. mapred.job.shuffle.input.buffer.percent-- 0.70 5. mapred.job.shuffle.merge.percent -- 0.66 6. mapred.inmem.merge.threshold -- 1000 7. I have nearly 5000 mappers which are supposed to produce LZO compressed outputs. The logs seem to indicate that the map outputs range between 0.3G to 0.8GB. Does anything here seem amiss? I'd appreciate any input of what settings to try. I can try different reduced values for the input buffer percent and the merge percent. Given that the job runs for about 7-8 hours before crashing, I would like to make some informed choices if possible. Thanks. ~ Niranjan.
Re: OOM Error Map output copy.
Arun, I faced the same issue and increasing the # of reducers fixed the problem. I was initially under the impression MR framework spills to disk if data is too huge to keep in memory, however on extraordinarily large reduce inputs this was not the case and the job failed on trying to assign the in-memory buffer private MapOutput shuffleInMemory(MapOutputLocation mapOutputLoc, URLConnection connection, InputStream input, int mapOutputLength, int compressedLength) throws IOException, InterruptedException { // Reserve ram for the map-output . . . . // Copy map-output into an in-memory buffer byte[] shuffleData = new byte[mapOutputLength]; -Prahant Kommireddi On Fri, Dec 9, 2011 at 10:29 AM, Arun C Murthy a...@hortonworks.com wrote: Moving to mapreduce-user@, bcc common-user@. Please use project specific lists. Niranjan, If you average as 0.5G output per-map, it's 5000 maps *0.5G - 2.5TB over 12 reduces i.e. nearly 250G per reduce - compressed! If you think you have 4:1 compression you are doing nearly a Terabyte per reducer... which is way too high! I'd recommend you bump to somewhere along 1000 reduces to get to 2.5G (compressed) per reducer for your job. If your compression ratio is 2:1, try 500 reduces and so on. If you are worried about other users, use the CapacityScheduler and submit your job to a queue with a small capacity and max-capacity to restrict your job to 10 or 20 concurrent reduces at a given point. Arun On Dec 7, 2011, at 10:51 AM, Niranjan Balasubramanian wrote: All I am encountering the following out-of-memory error during the reduce phase of a large job. Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1669) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1529) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1378) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1310) I tried increasing the memory available using mapped.child.java.opts but that only helps a little. The reduce task eventually fails again. Here are some relevant job configuration details: 1. The input to the mappers is about 2.5 TB (LZO compressed). The mappers filter out a small percentage of the input ( less than 1%). 2. I am currently using 12 reducers and I can't increase this count by much to ensure availability of reduce slots for other users. 3. mapred.child.java.opts -- -Xms512M -Xmx1536M -XX:+UseSerialGC 4. mapred.job.shuffle.input.buffer.percent-- 0.70 5. mapred.job.shuffle.merge.percent -- 0.66 6. mapred.inmem.merge.threshold -- 1000 7. I have nearly 5000 mappers which are supposed to produce LZO compressed outputs. The logs seem to indicate that the map outputs range between 0.3G to 0.8GB. Does anything here seem amiss? I'd appreciate any input of what settings to try. I can try different reduced values for the input buffer percent and the merge percent. Given that the job runs for about 7-8 hours before crashing, I would like to make some informed choices if possible. Thanks. ~ Niranjan.
Re: OOM Error Map output copy.
Hi Niranjan, Your issue looks similar to https://issues.apache.org/jira/browse/MAPREDUCE-1182 . Which hadoop version are you using? On Sat, Dec 10, 2011 at 12:17 AM, Prashant Kommireddi prash1...@gmail.comwrote: Arun, I faced the same issue and increasing the # of reducers fixed the problem. I was initially under the impression MR framework spills to disk if data is too huge to keep in memory, however on extraordinarily large reduce inputs this was not the case and the job failed on trying to assign the in-memory buffer private MapOutput shuffleInMemory(MapOutputLocation mapOutputLoc, URLConnection connection, InputStream input, int mapOutputLength, int compressedLength) throws IOException, InterruptedException { // Reserve ram for the map-output . . . . // Copy map-output into an in-memory buffer byte[] shuffleData = new byte[mapOutputLength]; -Prahant Kommireddi On Fri, Dec 9, 2011 at 10:29 AM, Arun C Murthy a...@hortonworks.com wrote: Moving to mapreduce-user@, bcc common-user@. Please use project specific lists. Niranjan, If you average as 0.5G output per-map, it's 5000 maps *0.5G - 2.5TB over 12 reduces i.e. nearly 250G per reduce - compressed! If you think you have 4:1 compression you are doing nearly a Terabyte per reducer... which is way too high! I'd recommend you bump to somewhere along 1000 reduces to get to 2.5G (compressed) per reducer for your job. If your compression ratio is 2:1, try 500 reduces and so on. If you are worried about other users, use the CapacityScheduler and submit your job to a queue with a small capacity and max-capacity to restrict your job to 10 or 20 concurrent reduces at a given point. Arun On Dec 7, 2011, at 10:51 AM, Niranjan Balasubramanian wrote: All I am encountering the following out-of-memory error during the reduce phase of a large job. Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1669) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1529) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1378) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1310) I tried increasing the memory available using mapped.child.java.opts but that only helps a little. The reduce task eventually fails again. Here are some relevant job configuration details: 1. The input to the mappers is about 2.5 TB (LZO compressed). The mappers filter out a small percentage of the input ( less than 1%). 2. I am currently using 12 reducers and I can't increase this count by much to ensure availability of reduce slots for other users. 3. mapred.child.java.opts -- -Xms512M -Xmx1536M -XX:+UseSerialGC 4. mapred.job.shuffle.input.buffer.percent-- 0.70 5. mapred.job.shuffle.merge.percent -- 0.66 6. mapred.inmem.merge.threshold -- 1000 7. I have nearly 5000 mappers which are supposed to produce LZO compressed outputs. The logs seem to indicate that the map outputs range between 0.3G to 0.8GB. Does anything here seem amiss? I'd appreciate any input of what settings to try. I can try different reduced values for the input buffer percent and the merge percent. Given that the job runs for about 7-8 hours before crashing, I would like to make some informed choices if possible. Thanks. ~ Niranjan. -- Thanks Regards, Chandra Prakash Bhagtani, Nokia India Pvt. Ltd.
RE: OOM Error Map output copy.
Hi Niranjan, Every thing looks ok as per the info you have given. Can you check in the job.xml file whether these child opts reflecting or any thing else is overwriting this config. 3. mapred.child.java.opts -- -Xms512M -Xmx1536M -XX:+UseSerialGC and also can you tell me which version of hadoop using? Devaraj K -Original Message- From: Niranjan Balasubramanian [mailto:niran...@cs.washington.edu] Sent: Thursday, December 08, 2011 12:21 AM To: common-user@hadoop.apache.org Subject: OOM Error Map output copy. All I am encountering the following out-of-memory error during the reduce phase of a large job. Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMe mory(ReduceTask.java:1669) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutpu t(ReduceTask.java:1529) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput( ReduceTask.java:1378) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceT ask.java:1310) I tried increasing the memory available using mapped.child.java.opts but that only helps a little. The reduce task eventually fails again. Here are some relevant job configuration details: 1. The input to the mappers is about 2.5 TB (LZO compressed). The mappers filter out a small percentage of the input ( less than 1%). 2. I am currently using 12 reducers and I can't increase this count by much to ensure availability of reduce slots for other users. 3. mapred.child.java.opts -- -Xms512M -Xmx1536M -XX:+UseSerialGC 4. mapred.job.shuffle.input.buffer.percent -- 0.70 5. mapred.job.shuffle.merge.percent -- 0.66 6. mapred.inmem.merge.threshold -- 1000 7. I have nearly 5000 mappers which are supposed to produce LZO compressed outputs. The logs seem to indicate that the map outputs range between 0.3G to 0.8GB. Does anything here seem amiss? I'd appreciate any input of what settings to try. I can try different reduced values for the input buffer percent and the merge percent. Given that the job runs for about 7-8 hours before crashing, I would like to make some informed choices if possible. Thanks. ~ Niranjan.
Re: OOM Error Map output copy.
Devaraj These are indeed the actual settings I copied over from the job.xml. ~ Niranjan. On Dec 8, 2011, at 12:10 AM, Devaraj K wrote: Hi Niranjan, Every thing looks ok as per the info you have given. Can you check in the job.xml file whether these child opts reflecting or any thing else is overwriting this config. 3. mapred.child.java.opts -- -Xms512M -Xmx1536M -XX:+UseSerialGC and also can you tell me which version of hadoop using? Devaraj K -Original Message- From: Niranjan Balasubramanian [mailto:niran...@cs.washington.edu] Sent: Thursday, December 08, 2011 12:21 AM To: common-user@hadoop.apache.org Subject: OOM Error Map output copy. All I am encountering the following out-of-memory error during the reduce phase of a large job. Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMe mory(ReduceTask.java:1669) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutpu t(ReduceTask.java:1529) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput( ReduceTask.java:1378) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceT ask.java:1310) I tried increasing the memory available using mapped.child.java.opts but that only helps a little. The reduce task eventually fails again. Here are some relevant job configuration details: 1. The input to the mappers is about 2.5 TB (LZO compressed). The mappers filter out a small percentage of the input ( less than 1%). 2. I am currently using 12 reducers and I can't increase this count by much to ensure availability of reduce slots for other users. 3. mapred.child.java.opts -- -Xms512M -Xmx1536M -XX:+UseSerialGC 4. mapred.job.shuffle.input.buffer.percent-- 0.70 5. mapred.job.shuffle.merge.percent -- 0.66 6. mapred.inmem.merge.threshold -- 1000 7. I have nearly 5000 mappers which are supposed to produce LZO compressed outputs. The logs seem to indicate that the map outputs range between 0.3G to 0.8GB. Does anything here seem amiss? I'd appreciate any input of what settings to try. I can try different reduced values for the input buffer percent and the merge percent. Given that the job runs for about 7-8 hours before crashing, I would like to make some informed choices if possible. Thanks. ~ Niranjan.
Re: OOM Error Map output copy.
I am using version 0.20.203. Thanks ~ Niranjan. On Dec 8, 2011, at 9:26 AM, Niranjan Balasubramanian wrote: Devaraj These are indeed the actual settings I copied over from the job.xml. ~ Niranjan. On Dec 8, 2011, at 12:10 AM, Devaraj K wrote: Hi Niranjan, Every thing looks ok as per the info you have given. Can you check in the job.xml file whether these child opts reflecting or any thing else is overwriting this config. 3. mapred.child.java.opts -- -Xms512M -Xmx1536M -XX:+UseSerialGC and also can you tell me which version of hadoop using? Devaraj K -Original Message- From: Niranjan Balasubramanian [mailto:niran...@cs.washington.edu] Sent: Thursday, December 08, 2011 12:21 AM To: common-user@hadoop.apache.org Subject: OOM Error Map output copy. All I am encountering the following out-of-memory error during the reduce phase of a large job. Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMe mory(ReduceTask.java:1669) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutpu t(ReduceTask.java:1529) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput( ReduceTask.java:1378) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceT ask.java:1310) I tried increasing the memory available using mapped.child.java.opts but that only helps a little. The reduce task eventually fails again. Here are some relevant job configuration details: 1. The input to the mappers is about 2.5 TB (LZO compressed). The mappers filter out a small percentage of the input ( less than 1%). 2. I am currently using 12 reducers and I can't increase this count by much to ensure availability of reduce slots for other users. 3. mapred.child.java.opts -- -Xms512M -Xmx1536M -XX:+UseSerialGC 4. mapred.job.shuffle.input.buffer.percent -- 0.70 5. mapred.job.shuffle.merge.percent -- 0.66 6. mapred.inmem.merge.threshold -- 1000 7. I have nearly 5000 mappers which are supposed to produce LZO compressed outputs. The logs seem to indicate that the map outputs range between 0.3G to 0.8GB. Does anything here seem amiss? I'd appreciate any input of what settings to try. I can try different reduced values for the input buffer percent and the merge percent. Given that the job runs for about 7-8 hours before crashing, I would like to make some informed choices if possible. Thanks. ~ Niranjan.
OOM Error Map output copy.
All I am encountering the following out-of-memory error during the reduce phase of a large job. Map output copy failure : java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1669) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1529) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1378) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1310) I tried increasing the memory available using mapped.child.java.opts but that only helps a little. The reduce task eventually fails again. Here are some relevant job configuration details: 1. The input to the mappers is about 2.5 TB (LZO compressed). The mappers filter out a small percentage of the input ( less than 1%). 2. I am currently using 12 reducers and I can't increase this count by much to ensure availability of reduce slots for other users. 3. mapred.child.java.opts -- -Xms512M -Xmx1536M -XX:+UseSerialGC 4. mapred.job.shuffle.input.buffer.percent -- 0.70 5. mapred.job.shuffle.merge.percent -- 0.66 6. mapred.inmem.merge.threshold -- 1000 7. I have nearly 5000 mappers which are supposed to produce LZO compressed outputs. The logs seem to indicate that the map outputs range between 0.3G to 0.8GB. Does anything here seem amiss? I'd appreciate any input of what settings to try. I can try different reduced values for the input buffer percent and the merge percent. Given that the job runs for about 7-8 hours before crashing, I would like to make some informed choices if possible. Thanks. ~ Niranjan.