Re: Are mapper classes re-instantiated for each record?
Setup() Method is called before all the mappers and cleanup() method is called after all mappers On Tue, May 6, 2014 at 1:17 PM, Raj K Singh rajkrrsi...@gmail.com wrote: point 2 is right,The framework first calls setup() followed by map() for each key/value pair in the InputSplit. Finally cleanup() is called irrespective of no of records in the input split. Raj K Singh http://in.linkedin.com/in/rajkrrsingh http://www.rajkrrsingh.blogspot.com Mobile Tel: +91 (0)9899821370 On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev sergeymury...@gmail.comwrote: Hi Jeremy, According to official documentationhttp://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.htmlsetup and cleanup calls performed for each InputSplit. In this case you variant 2 is more correct. But actually single mapper can be used for processing multiple InputSplits. In you case if you have 5 files with 1 record each it can call setup/cleanup 5 times. But if your records are in single file I think that setup/cleanup should be called once. -- Thanks, Sergey On 06/05/14 02:49, jeremy p wrote: Let's say I have TaskTracker that receives 5 records to process for a single job. When the TaskTracker processses the first record, it will instantiate my Mapper class and execute my setup() function. It will then run the map() method on that record. My question is this : what happens when the map() method has finished processing the first record? I'm guessing it will do one of two things : 1) My cleanup() function will execute. After the cleanup() method has finished, this instance of the Mapper object will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. Then my setup() method will execute, the map() method will execute, the cleanup() method will execute, and then the Mapper instance will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. This process will repeat itself until all 5 records have been processed. In other words, my setup() and cleanup() methods will have been executed 5 times each. or 2) When the map() method has finished processing my first record, the Mapper instance will NOT be destroyed. It will be reused for all 5 records. When the map() method has finished processing the last record, my cleanup() method will execute. In other words, my setup() and cleanup() methods will only execute 1 time each. Thanks for the help! -- *Thanks Regards * *Unmesha Sreeveni U.B* *Hadoop, Bigdata Developer* *Center for Cyber Security | Amrita Vishwa Vidyapeetham* http://www.unmeshasreeveni.blogspot.in/
Re: Are mapper classes re-instantiated for each record?
Thank you! This has helped me immensely. On Tue, May 6, 2014 at 12:47 AM, Raj K Singh rajkrrsi...@gmail.com wrote: point 2 is right,The framework first calls setup() followed by map() for each key/value pair in the InputSplit. Finally cleanup() is called irrespective of no of records in the input split. Raj K Singh http://in.linkedin.com/in/rajkrrsingh http://www.rajkrrsingh.blogspot.com Mobile Tel: +91 (0)9899821370 On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev sergeymury...@gmail.comwrote: Hi Jeremy, According to official documentationhttp://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.htmlsetup and cleanup calls performed for each InputSplit. In this case you variant 2 is more correct. But actually single mapper can be used for processing multiple InputSplits. In you case if you have 5 files with 1 record each it can call setup/cleanup 5 times. But if your records are in single file I think that setup/cleanup should be called once. -- Thanks, Sergey On 06/05/14 02:49, jeremy p wrote: Let's say I have TaskTracker that receives 5 records to process for a single job. When the TaskTracker processses the first record, it will instantiate my Mapper class and execute my setup() function. It will then run the map() method on that record. My question is this : what happens when the map() method has finished processing the first record? I'm guessing it will do one of two things : 1) My cleanup() function will execute. After the cleanup() method has finished, this instance of the Mapper object will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. Then my setup() method will execute, the map() method will execute, the cleanup() method will execute, and then the Mapper instance will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. This process will repeat itself until all 5 records have been processed. In other words, my setup() and cleanup() methods will have been executed 5 times each. or 2) When the map() method has finished processing my first record, the Mapper instance will NOT be destroyed. It will be reused for all 5 records. When the map() method has finished processing the last record, my cleanup() method will execute. In other words, my setup() and cleanup() methods will only execute 1 time each. Thanks for the help!
Re: Are mapper classes re-instantiated for each record?
point 2 is right,The framework first calls setup() followed by map() for each key/value pair in the InputSplit. Finally cleanup() is called irrespective of no of records in the input split. Raj K Singh http://in.linkedin.com/in/rajkrrsingh http://www.rajkrrsingh.blogspot.com Mobile Tel: +91 (0)9899821370 On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev sergeymury...@gmail.comwrote: Hi Jeremy, According to official documentationhttp://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.htmlsetup and cleanup calls performed for each InputSplit. In this case you variant 2 is more correct. But actually single mapper can be used for processing multiple InputSplits. In you case if you have 5 files with 1 record each it can call setup/cleanup 5 times. But if your records are in single file I think that setup/cleanup should be called once. -- Thanks, Sergey On 06/05/14 02:49, jeremy p wrote: Let's say I have TaskTracker that receives 5 records to process for a single job. When the TaskTracker processses the first record, it will instantiate my Mapper class and execute my setup() function. It will then run the map() method on that record. My question is this : what happens when the map() method has finished processing the first record? I'm guessing it will do one of two things : 1) My cleanup() function will execute. After the cleanup() method has finished, this instance of the Mapper object will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. Then my setup() method will execute, the map() method will execute, the cleanup() method will execute, and then the Mapper instance will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. This process will repeat itself until all 5 records have been processed. In other words, my setup() and cleanup() methods will have been executed 5 times each. or 2) When the map() method has finished processing my first record, the Mapper instance will NOT be destroyed. It will be reused for all 5 records. When the map() method has finished processing the last record, my cleanup() method will execute. In other words, my setup() and cleanup() methods will only execute 1 time each. Thanks for the help!
Are mapper classes re-instantiated for each record?
Let's say I have TaskTracker that receives 5 records to process for a single job. When the TaskTracker processses the first record, it will instantiate my Mapper class and execute my setup() function. It will then run the map() method on that record. My question is this : what happens when the map() method has finished processing the first record? I'm guessing it will do one of two things : 1) My cleanup() function will execute. After the cleanup() method has finished, this instance of the Mapper object will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. Then my setup() method will execute, the map() method will execute, the cleanup() method will execute, and then the Mapper instance will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. This process will repeat itself until all 5 records have been processed. In other words, my setup() and cleanup() methods will have been executed 5 times each. or 2) When the map() method has finished processing my first record, the Mapper instance will NOT be destroyed. It will be reused for all 5 records. When the map() method has finished processing the last record, my cleanup() method will execute. In other words, my setup() and cleanup() methods will only execute 1 time each. Thanks for the help!
Re: Are mapper classes re-instantiated for each record?
Hi Jeremy, According to official documentation http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html setup and cleanup calls performed for each InputSplit. In this case you variant 2 is more correct. But actually single mapper can be used for processing multiple InputSplits. In you case if you have 5 files with 1 record each it can call setup/cleanup 5 times. But if your records are in single file I think that setup/cleanup should be called once. -- Thanks, Sergey On 06/05/14 02:49, jeremy p wrote: Let's say I have TaskTracker that receives 5 records to process for a single job. When the TaskTracker processses the first record, it will instantiate my Mapper class and execute my setup() function. It will then run the map() method on that record. My question is this : what happens when the map() method has finished processing the first record? I'm guessing it will do one of two things : 1) My cleanup() function will execute. After the cleanup() method has finished, this instance of the Mapper object will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. Then my setup() method will execute, the map() method will execute, the cleanup() method will execute, and then the Mapper instance will be destroyed. When it is time to process the next record, a new Mapper object will be instantiated. This process will repeat itself until all 5 records have been processed. In other words, my setup() and cleanup() methods will have been executed 5 times each. or 2) When the map() method has finished processing my first record, the Mapper instance will NOT be destroyed. It will be reused for all 5 records. When the map() method has finished processing the last record, my cleanup() method will execute. In other words, my setup() and cleanup() methods will only execute 1 time each. Thanks for the help! signature.asc Description: OpenPGP digital signature