Re: chaining (the output of) jobs/ reducers
Thanks Bryan. This is great stuff! On Thu, Sep 12, 2013 at 8:49 PM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: Hey Adrian, To clarify, the replication happens on *write*. So as you write output from the reducer of Job A, you are writing into hdfs. Part of that write path is replicating the data to 2 additional hosts in the cluster (local + 2, this is configured by dfs.replication configuration value). So by the time Job B starts, hadoop has 3 options where each mapper can run and be data-local. Hadoop will do all the work to try to make everything as local as possible. You'll be able to see from the counters on the job how successful hadoop was at placing your mappers. See the counters Data-local map tasks and Rack-local map tasks. Rack-local being those where hadoop was not able to place the mapper on the same host as the data, but was at least able to keep it within the same rack. All of this is dependent a proper topology configuration, both in your NameNode and JobTracker. On Thu, Sep 12, 2013 at 3:02 PM, Adrian CAPDEFIER chivas314...@gmail.comwrote: Thanks Bryan. Yes, I am using hadoop + hdfs. If I understand your point, hadoop tries to start the mapping processes on nodes where the data is local and if that's not possible, then it is hdfs that replicates the data to the mapper nodes? I expected to have to set up this in the code and I completely ignored HDFS; I guess it's a case of not seeing the forest from all the trees! On Thu, Sep 12, 2013 at 6:38 PM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: It really comes down to the following: In Job A set mapred.output.dir to some directory X. In Job B set mapred.input.dir to the same directory X. For Job A, do context.write() as normally, and each reducer will create an output file in mapred.output.dir. Then in Job B each of those will correspond to a mapper. Of course you need to make sure your input and output formats, as well as input and output keys/values, match up between the two jobs as well. If you are using HDFS, which it seems you are, the directories specified can be HDFS directories. In that case, with a replication factor of 3, each of these output files will exist on 3 nodes. Hadoop and HDFS will do the work to ensure that the mappers in the second job do as good a job as possible to be data or rack-local. On Thu, Sep 12, 2013 at 12:35 PM, Adrian CAPDEFIER chivas314...@gmail.com wrote: Thank you, Chris. I will look at Cascading and Pig, but for starters I'd prefer to keep, if possible, everything as close to the hadoop libraries. I am sure I am overlooking something basic as repartitioning is a fairly common operation in MPP environments. On Thu, Sep 12, 2013 at 2:39 PM, Chris Curtin curtin.ch...@gmail.comwrote: If you want to stay in Java look at Cascading. Pig is also helpful. I think there are other (Spring integration maybe?) but I'm not familiar with them enough to make a recommendation. Note that with Cascading and Pig you don't write 'map reduce' you write logic and they map it to the various mapper/reducer steps automatically. Hope this helps, Chris On Thu, Sep 12, 2013 at 9:36 AM, Adrian CAPDEFIER chivas314...@gmail.com wrote: Howdy, My application requires 2 distinct processing steps (reducers) to be performed on the input data. The first operation generates changes the key values and, records that had different keys in step 1 can end up having the same key in step 2. The heavy lifting of the operation is in step1 and step2 only combines records where keys were changed. In short the overview is: Sequential file - Step 1 - Step 2 - Output. To implement this in hadoop, it seems that I need to create a separate job for each step. Now I assumed, there would some sort of job management under hadoop to link Job 1 and 2, but the only thing I could find was related to job scheduling and nothing on how to synchronize the input/output of the linked jobs. The only crude solution that I can think of is to use a temporary file under HDFS, but even so I'm not sure if this will work. The overview of the process would be: Sequential Input (lines) = Job A[Mapper (key1, value1) = ChainReducer (key2, value2)] = Temporary file = Job B[Mapper (key2, value2) = Reducer (key2, value 3)] = output. Is there a better way to pass the output from Job A as input to Job B (e.g. using network streams or some built in java classes that don't do disk i/o)? The temporary file solution will work in a single node configuration, but I'm not sure about an MPP config. Let's say Job A runs on nodes 0 and 1 and job B runs on nodes 2 and 3 or both jobs run on all 4 nodes - will HDFS be able to redistribute automagically the records between nodes or does this need to be coded somehow?
Re: chaining (the output of) jobs/ reducers
I've just seen your email, Vinod. This is the behaviour that I'd expect and similar to other data integration tools; I will keep an eye out for it as a long term option. On Fri, Sep 13, 2013 at 5:26 AM, Vinod Kumar Vavilapalli vino...@apache.org wrote: Other than the short term solutions that others have proposed, Apache Tez solves this exact problem. It can M-M-R-R-R chains, and mult-way mappers and reducers, and your own custom processors - all without persisting the intermediate outputs to HDFS. It works on top of YARN, though the first release of Tez is yet to happen. You can learn about it more here: http://tez.incubator.apache.org/ HTH, +Vinod On Sep 12, 2013, at 6:36 AM, Adrian CAPDEFIER wrote: Howdy, My application requires 2 distinct processing steps (reducers) to be performed on the input data. The first operation generates changes the key values and, records that had different keys in step 1 can end up having the same key in step 2. The heavy lifting of the operation is in step1 and step2 only combines records where keys were changed. In short the overview is: Sequential file - Step 1 - Step 2 - Output. To implement this in hadoop, it seems that I need to create a separate job for each step. Now I assumed, there would some sort of job management under hadoop to link Job 1 and 2, but the only thing I could find was related to job scheduling and nothing on how to synchronize the input/output of the linked jobs. The only crude solution that I can think of is to use a temporary file under HDFS, but even so I'm not sure if this will work. The overview of the process would be: Sequential Input (lines) = Job A[Mapper (key1, value1) = ChainReducer (key2, value2)] = Temporary file = Job B[Mapper (key2, value2) = Reducer (key2, value 3)] = output. Is there a better way to pass the output from Job A as input to Job B (e.g. using network streams or some built in java classes that don't do disk i/o)? The temporary file solution will work in a single node configuration, but I'm not sure about an MPP config. Let's say Job A runs on nodes 0 and 1 and job B runs on nodes 2 and 3 or both jobs run on all 4 nodes - will HDFS be able to redistribute automagically the records between nodes or does this need to be coded somehow? CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
chaining (the output of) jobs/ reducers
Howdy, My application requires 2 distinct processing steps (reducers) to be performed on the input data. The first operation generates changes the key values and, records that had different keys in step 1 can end up having the same key in step 2. The heavy lifting of the operation is in step1 and step2 only combines records where keys were changed. In short the overview is: Sequential file - Step 1 - Step 2 - Output. To implement this in hadoop, it seems that I need to create a separate job for each step. Now I assumed, there would some sort of job management under hadoop to link Job 1 and 2, but the only thing I could find was related to job scheduling and nothing on how to synchronize the input/output of the linked jobs. The only crude solution that I can think of is to use a temporary file under HDFS, but even so I'm not sure if this will work. The overview of the process would be: Sequential Input (lines) = Job A[Mapper (key1, value1) = ChainReducer (key2, value2)] = Temporary file = Job B[Mapper (key2, value2) = Reducer (key2, value 3)] = output. Is there a better way to pass the output from Job A as input to Job B (e.g. using network streams or some built in java classes that don't do disk i/o)? The temporary file solution will work in a single node configuration, but I'm not sure about an MPP config. Let's say Job A runs on nodes 0 and 1 and job B runs on nodes 2 and 3 or both jobs run on all 4 nodes - will HDFS be able to redistribute automagically the records between nodes or does this need to be coded somehow?
Re: chaining (the output of) jobs/ reducers
Thank you, Chris. I will look at Cascading and Pig, but for starters I'd prefer to keep, if possible, everything as close to the hadoop libraries. I am sure I am overlooking something basic as repartitioning is a fairly common operation in MPP environments. On Thu, Sep 12, 2013 at 2:39 PM, Chris Curtin curtin.ch...@gmail.comwrote: If you want to stay in Java look at Cascading. Pig is also helpful. I think there are other (Spring integration maybe?) but I'm not familiar with them enough to make a recommendation. Note that with Cascading and Pig you don't write 'map reduce' you write logic and they map it to the various mapper/reducer steps automatically. Hope this helps, Chris On Thu, Sep 12, 2013 at 9:36 AM, Adrian CAPDEFIER chivas314...@gmail.comwrote: Howdy, My application requires 2 distinct processing steps (reducers) to be performed on the input data. The first operation generates changes the key values and, records that had different keys in step 1 can end up having the same key in step 2. The heavy lifting of the operation is in step1 and step2 only combines records where keys were changed. In short the overview is: Sequential file - Step 1 - Step 2 - Output. To implement this in hadoop, it seems that I need to create a separate job for each step. Now I assumed, there would some sort of job management under hadoop to link Job 1 and 2, but the only thing I could find was related to job scheduling and nothing on how to synchronize the input/output of the linked jobs. The only crude solution that I can think of is to use a temporary file under HDFS, but even so I'm not sure if this will work. The overview of the process would be: Sequential Input (lines) = Job A[Mapper (key1, value1) = ChainReducer (key2, value2)] = Temporary file = Job B[Mapper (key2, value2) = Reducer (key2, value 3)] = output. Is there a better way to pass the output from Job A as input to Job B (e.g. using network streams or some built in java classes that don't do disk i/o)? The temporary file solution will work in a single node configuration, but I'm not sure about an MPP config. Let's say Job A runs on nodes 0 and 1 and job B runs on nodes 2 and 3 or both jobs run on all 4 nodes - will HDFS be able to redistribute automagically the records between nodes or does this need to be coded somehow?
Re: chaining (the output of) jobs/ reducers
Thanks Bryan. Yes, I am using hadoop + hdfs. If I understand your point, hadoop tries to start the mapping processes on nodes where the data is local and if that's not possible, then it is hdfs that replicates the data to the mapper nodes? I expected to have to set up this in the code and I completely ignored HDFS; I guess it's a case of not seeing the forest from all the trees! On Thu, Sep 12, 2013 at 6:38 PM, Bryan Beaudreault bbeaudrea...@hubspot.com wrote: It really comes down to the following: In Job A set mapred.output.dir to some directory X. In Job B set mapred.input.dir to the same directory X. For Job A, do context.write() as normally, and each reducer will create an output file in mapred.output.dir. Then in Job B each of those will correspond to a mapper. Of course you need to make sure your input and output formats, as well as input and output keys/values, match up between the two jobs as well. If you are using HDFS, which it seems you are, the directories specified can be HDFS directories. In that case, with a replication factor of 3, each of these output files will exist on 3 nodes. Hadoop and HDFS will do the work to ensure that the mappers in the second job do as good a job as possible to be data or rack-local. On Thu, Sep 12, 2013 at 12:35 PM, Adrian CAPDEFIER chivas314...@gmail.com wrote: Thank you, Chris. I will look at Cascading and Pig, but for starters I'd prefer to keep, if possible, everything as close to the hadoop libraries. I am sure I am overlooking something basic as repartitioning is a fairly common operation in MPP environments. On Thu, Sep 12, 2013 at 2:39 PM, Chris Curtin curtin.ch...@gmail.comwrote: If you want to stay in Java look at Cascading. Pig is also helpful. I think there are other (Spring integration maybe?) but I'm not familiar with them enough to make a recommendation. Note that with Cascading and Pig you don't write 'map reduce' you write logic and they map it to the various mapper/reducer steps automatically. Hope this helps, Chris On Thu, Sep 12, 2013 at 9:36 AM, Adrian CAPDEFIER chivas314...@gmail.com wrote: Howdy, My application requires 2 distinct processing steps (reducers) to be performed on the input data. The first operation generates changes the key values and, records that had different keys in step 1 can end up having the same key in step 2. The heavy lifting of the operation is in step1 and step2 only combines records where keys were changed. In short the overview is: Sequential file - Step 1 - Step 2 - Output. To implement this in hadoop, it seems that I need to create a separate job for each step. Now I assumed, there would some sort of job management under hadoop to link Job 1 and 2, but the only thing I could find was related to job scheduling and nothing on how to synchronize the input/output of the linked jobs. The only crude solution that I can think of is to use a temporary file under HDFS, but even so I'm not sure if this will work. The overview of the process would be: Sequential Input (lines) = Job A[Mapper (key1, value1) = ChainReducer (key2, value2)] = Temporary file = Job B[Mapper (key2, value2) = Reducer (key2, value 3)] = output. Is there a better way to pass the output from Job A as input to Job B (e.g. using network streams or some built in java classes that don't do disk i/o)? The temporary file solution will work in a single node configuration, but I'm not sure about an MPP config. Let's say Job A runs on nodes 0 and 1 and job B runs on nodes 2 and 3 or both jobs run on all 4 nodes - will HDFS be able to redistribute automagically the records between nodes or does this need to be coded somehow?
Re: Job config before read fields
Hi Shahab, Sorry about the late reply, a personal matter came up and it took most of my time. Thank you for your replies. The solution I chose was to temporarily transfer the metadata along with the data and then restore it on the reduce nodes. This works from a functional perspective as long as there are no performance requirements and it will have to do for now. The permanent solution will likely involve tweaking hadoop, but that is a different kettle of fish. On Sun, Sep 1, 2013 at 12:48 AM, Shahab Yunus shahab.yu...@gmail.comwrote: Personally, I don't know a way to access job configuration parameters in custom implementation of Writables ( at least not an elegant and appropriate one. Of course hacks of various kinds be done.) Maybe experts can chime in? One idea that I thought about was to use MapWritable (if you have not explored it already.) You can encode the 'custom metadata' for you 'data' as one byte symbols and move your data in the M/R flow as a map. Then while deserialization you will have the type (or your 'custom metadata') in the key part of the map and the value would be you actual data. This aligns with the efficient approach that is used natively in Hadoop for Strings/Text i.e. compact metadata (though I agree that you are not taking advantage of the other aspect of non-dependence between metadata and the data it defines.) Take a look at that: Page 96 of the Definitive Guide: http://books.google.com/books?id=Nff49D7vnJcCpg=PA96lpg=PA96dq=mapwritable+in+hadoopsource=blots=IiixYu7vXusig=4V6H7cY-MrNT7Rzs3WlODsDOoP4hl=ensa=Xei=aX4iUp2YGoaosASs_YCACQsqi=2ved=0CFUQ6AEwBA#v=onepageq=mapwritable%20in%20hadoopf=false and then this: http://www.chrisstucchio.com/blog/2011/mapwritable_sometimes_a_performance_hog.html and add your own custom types here (note that you are restricted by size of byte): http://hadoop.sourcearchive.com/documentation/0.20.2plus-pdfsg1-1/AbstractMapWritable_8java-source.html Regards, Shahab On Sat, Aug 31, 2013 at 5:38 AM, Adrian CAPDEFIER chivas314...@gmail.comwrote: Thank you for your help Shahab. I guess I wasn't being too clear. My logic is that I use a custom type as key and in order to deserialize it on the compute nodes, I need an extra piece of information (also a custom type). To use an analogy, a Text is serialized by writing the length of the string as a number and then the bytes that compose the actual string. When it is deserialized, the number informs the reader when to stop reading the string. This number is varies from string to string and it is compact so it makes sense to serialize it with the string. My use case is similar to it. I have a complex type (let's call this data), and in order to deserialize it, I need another complex type (let's call this second type metadata). The metadata is not closely tied to the data (i.e. if the data value changes, the metadata does not) and the metadata size is quite large. I ruled out a couple of options, but please let me know if you think I did so for the wrong reasons: 1. I could serialize each data value with it's own metadata value, but since the data value count is in the +tens of millions and the metadata value distinct count can be up to one hundred, it would waste resources in the system. 2. I could serialize the metadata and then the data as a collection property of the metadata. This would be an elegant solution code-wise, but then all the data would have to be read and kept in memory as a massive object before any reduce operations can happen. I wasn't able to find any info on this online so this is just a guess from peeking at the hadoop code. My solution was to serialize the data with a hash of the metadata and separately serialize the metadata and its hash in the job configuration (as key/value pairs). For this to work, I would need to be able to deserialize the metadata on the reduce node before the data is deserialized in the readFields() method. I think that for that to happen I need to hook into the code somewhere where a context or job configuration is used (before readFields()), but I'm stumped as to where that is. Cheers, Adi On Sat, Aug 31, 2013 at 3:42 AM, Shahab Yunus shahab.yu...@gmail.comwrote: What I meant was that you might have to split or redesign your logic or your usecase (which we don't know about)? Regards, Shahab On Fri, Aug 30, 2013 at 10:31 PM, Adrian CAPDEFIER chivas314...@gmail.com wrote: But how would the comparator have access to the job config? On Sat, Aug 31, 2013 at 2:38 AM, Shahab Yunus shahab.yu...@gmail.comwrote: I think you have to override/extend the Comparator to achieve that, something like what is done in Secondary Sort? Regards, Shahab On Fri, Aug 30, 2013 at 9:01 PM, Adrian CAPDEFIER chivas314...@gmail.com wrote: Howdy, I apologise for the lack of code in this message, but the code is fairly convoluted and it would obscure my problem
Re: Job config before read fields
Thank you for your help Shahab. I guess I wasn't being too clear. My logic is that I use a custom type as key and in order to deserialize it on the compute nodes, I need an extra piece of information (also a custom type). To use an analogy, a Text is serialized by writing the length of the string as a number and then the bytes that compose the actual string. When it is deserialized, the number informs the reader when to stop reading the string. This number is varies from string to string and it is compact so it makes sense to serialize it with the string. My use case is similar to it. I have a complex type (let's call this data), and in order to deserialize it, I need another complex type (let's call this second type metadata). The metadata is not closely tied to the data (i.e. if the data value changes, the metadata does not) and the metadata size is quite large. I ruled out a couple of options, but please let me know if you think I did so for the wrong reasons: 1. I could serialize each data value with it's own metadata value, but since the data value count is in the +tens of millions and the metadata value distinct count can be up to one hundred, it would waste resources in the system. 2. I could serialize the metadata and then the data as a collection property of the metadata. This would be an elegant solution code-wise, but then all the data would have to be read and kept in memory as a massive object before any reduce operations can happen. I wasn't able to find any info on this online so this is just a guess from peeking at the hadoop code. My solution was to serialize the data with a hash of the metadata and separately serialize the metadata and its hash in the job configuration (as key/value pairs). For this to work, I would need to be able to deserialize the metadata on the reduce node before the data is deserialized in the readFields() method. I think that for that to happen I need to hook into the code somewhere where a context or job configuration is used (before readFields()), but I'm stumped as to where that is. Cheers, Adi On Sat, Aug 31, 2013 at 3:42 AM, Shahab Yunus shahab.yu...@gmail.comwrote: What I meant was that you might have to split or redesign your logic or your usecase (which we don't know about)? Regards, Shahab On Fri, Aug 30, 2013 at 10:31 PM, Adrian CAPDEFIER chivas314...@gmail.com wrote: But how would the comparator have access to the job config? On Sat, Aug 31, 2013 at 2:38 AM, Shahab Yunus shahab.yu...@gmail.comwrote: I think you have to override/extend the Comparator to achieve that, something like what is done in Secondary Sort? Regards, Shahab On Fri, Aug 30, 2013 at 9:01 PM, Adrian CAPDEFIER chivas314...@gmail.com wrote: Howdy, I apologise for the lack of code in this message, but the code is fairly convoluted and it would obscure my problem. That being said, I can put together some sample code if really needed. I am trying to pass some metadata between the map reduce steps. This metadata is read and generated in the map step and stored in the job config. It also needs to be recreated on the reduce node before the key/ value fields can be read in the readFields function. I had assumed that I would be able to override the Reducer.setup() function and that would be it, but apparently the readFields function is called before the Reducer.setup() function. My question is what is any (the best) place on the reduce node where I can access the job configuration/ context before the readFields function is called? This is the stack trace: at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:) at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:70) at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1399) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249)
Job config before read fields
Howdy, I apologise for the lack of code in this message, but the code is fairly convoluted and it would obscure my problem. That being said, I can put together some sample code if really needed. I am trying to pass some metadata between the map reduce steps. This metadata is read and generated in the map step and stored in the job config. It also needs to be recreated on the reduce node before the key/ value fields can be read in the readFields function. I had assumed that I would be able to override the Reducer.setup() function and that would be it, but apparently the readFields function is called before the Reducer.setup() function. My question is what is any (the best) place on the reduce node where I can access the job configuration/ context before the readFields function is called? This is the stack trace: at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:) at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:70) at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1399) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249)
Re: Job config before read fields
But how would the comparator have access to the job config? On Sat, Aug 31, 2013 at 2:38 AM, Shahab Yunus shahab.yu...@gmail.comwrote: I think you have to override/extend the Comparator to achieve that, something like what is done in Secondary Sort? Regards, Shahab On Fri, Aug 30, 2013 at 9:01 PM, Adrian CAPDEFIER chivas314...@gmail.comwrote: Howdy, I apologise for the lack of code in this message, but the code is fairly convoluted and it would obscure my problem. That being said, I can put together some sample code if really needed. I am trying to pass some metadata between the map reduce steps. This metadata is read and generated in the map step and stored in the job config. It also needs to be recreated on the reduce node before the key/ value fields can be read in the readFields function. I had assumed that I would be able to override the Reducer.setup() function and that would be it, but apparently the readFields function is called before the Reducer.setup() function. My question is what is any (the best) place on the reduce node where I can access the job configuration/ context before the readFields function is called? This is the stack trace: at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:103) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:) at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:70) at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1399) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapred.Child.main(Child.java:249)