Re: Are mapper classes re-instantiated for each record?

2014-05-16 Thread unmesha sreeveni
Setup() Method is called before all the mappers and cleanup() method is
called after all mappers


On Tue, May 6, 2014 at 1:17 PM, Raj K Singh rajkrrsi...@gmail.com wrote:

 point 2 is right,The framework first calls setup() followed by map() for
 each key/value pair in the InputSplit. Finally cleanup() is called
 irrespective of no of records in the input split.

 
 Raj K Singh
 http://in.linkedin.com/in/rajkrrsingh
 http://www.rajkrrsingh.blogspot.com
 Mobile  Tel: +91 (0)9899821370


 On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev 
 sergeymury...@gmail.comwrote:

  Hi Jeremy,

 According to official 
 documentationhttp://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.htmlsetup
  and cleanup calls performed for each InputSplit. In this case you
 variant 2 is more correct. But actually single mapper can be used for
 processing multiple InputSplits. In you case if you have 5 files with 1
 record each it can call setup/cleanup 5 times. But if your records are in
 single file I think that setup/cleanup should be called once.

 --
 Thanks,
 Sergey


 On 06/05/14 02:49, jeremy p wrote:

 Let's say I have TaskTracker that receives 5 records to process for a
 single job.  When the TaskTracker processses the first record, it will
 instantiate my Mapper class and execute my setup() function.  It will then
 run the map() method on that record.  My question is this : what happens
 when the map() method has finished processing the first record?  I'm
 guessing it will do one of two things :

  1) My cleanup() function will execute.  After the cleanup() method has
 finished, this instance of the Mapper object will be destroyed.  When it is
 time to process the next record, a new Mapper object will be instantiated.
  Then my setup() method will execute, the map() method will execute, the
 cleanup() method will execute, and then the Mapper instance will be
 destroyed.  When it is time to process the next record, a new Mapper object
 will be instantiated.  This process will repeat itself until all 5 records
 have been processed.  In other words, my setup() and cleanup() methods will
 have been executed 5 times each.

  or

  2) When the map() method has finished processing my first record, the
 Mapper instance will NOT be destroyed.  It will be reused for all 5
 records.  When the map() method has finished processing the last record, my
 cleanup() method will execute.  In other words, my setup() and cleanup()
 methods will only execute 1 time each.

  Thanks for the help!






-- 
*Thanks  Regards *


*Unmesha Sreeveni U.B*
*Hadoop, Bigdata Developer*
*Center for Cyber Security | Amrita Vishwa Vidyapeetham*
http://www.unmeshasreeveni.blogspot.in/


Re: Are mapper classes re-instantiated for each record?

2014-05-07 Thread jeremy p
Thank you!  This has helped me immensely.


On Tue, May 6, 2014 at 12:47 AM, Raj K Singh rajkrrsi...@gmail.com wrote:

 point 2 is right,The framework first calls setup() followed by map() for
 each key/value pair in the InputSplit. Finally cleanup() is called
 irrespective of no of records in the input split.

 
 Raj K Singh
 http://in.linkedin.com/in/rajkrrsingh
 http://www.rajkrrsingh.blogspot.com
 Mobile  Tel: +91 (0)9899821370


 On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev 
 sergeymury...@gmail.comwrote:

  Hi Jeremy,

 According to official 
 documentationhttp://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.htmlsetup
  and cleanup calls performed for each InputSplit. In this case you
 variant 2 is more correct. But actually single mapper can be used for
 processing multiple InputSplits. In you case if you have 5 files with 1
 record each it can call setup/cleanup 5 times. But if your records are in
 single file I think that setup/cleanup should be called once.

 --
 Thanks,
 Sergey


 On 06/05/14 02:49, jeremy p wrote:

 Let's say I have TaskTracker that receives 5 records to process for a
 single job.  When the TaskTracker processses the first record, it will
 instantiate my Mapper class and execute my setup() function.  It will then
 run the map() method on that record.  My question is this : what happens
 when the map() method has finished processing the first record?  I'm
 guessing it will do one of two things :

  1) My cleanup() function will execute.  After the cleanup() method has
 finished, this instance of the Mapper object will be destroyed.  When it is
 time to process the next record, a new Mapper object will be instantiated.
  Then my setup() method will execute, the map() method will execute, the
 cleanup() method will execute, and then the Mapper instance will be
 destroyed.  When it is time to process the next record, a new Mapper object
 will be instantiated.  This process will repeat itself until all 5 records
 have been processed.  In other words, my setup() and cleanup() methods will
 have been executed 5 times each.

  or

  2) When the map() method has finished processing my first record, the
 Mapper instance will NOT be destroyed.  It will be reused for all 5
 records.  When the map() method has finished processing the last record, my
 cleanup() method will execute.  In other words, my setup() and cleanup()
 methods will only execute 1 time each.

  Thanks for the help!






Re: Are mapper classes re-instantiated for each record?

2014-05-06 Thread Raj K Singh
point 2 is right,The framework first calls setup() followed by map() for
each key/value pair in the InputSplit. Finally cleanup() is called
irrespective of no of records in the input split.


Raj K Singh
http://in.linkedin.com/in/rajkrrsingh
http://www.rajkrrsingh.blogspot.com
Mobile  Tel: +91 (0)9899821370


On Tue, May 6, 2014 at 11:21 AM, Sergey Murylev sergeymury...@gmail.comwrote:

  Hi Jeremy,

 According to official 
 documentationhttp://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.htmlsetup
  and cleanup calls performed for each InputSplit. In this case you
 variant 2 is more correct. But actually single mapper can be used for
 processing multiple InputSplits. In you case if you have 5 files with 1
 record each it can call setup/cleanup 5 times. But if your records are in
 single file I think that setup/cleanup should be called once.

 --
 Thanks,
 Sergey


 On 06/05/14 02:49, jeremy p wrote:

 Let's say I have TaskTracker that receives 5 records to process for a
 single job.  When the TaskTracker processses the first record, it will
 instantiate my Mapper class and execute my setup() function.  It will then
 run the map() method on that record.  My question is this : what happens
 when the map() method has finished processing the first record?  I'm
 guessing it will do one of two things :

  1) My cleanup() function will execute.  After the cleanup() method has
 finished, this instance of the Mapper object will be destroyed.  When it is
 time to process the next record, a new Mapper object will be instantiated.
  Then my setup() method will execute, the map() method will execute, the
 cleanup() method will execute, and then the Mapper instance will be
 destroyed.  When it is time to process the next record, a new Mapper object
 will be instantiated.  This process will repeat itself until all 5 records
 have been processed.  In other words, my setup() and cleanup() methods will
 have been executed 5 times each.

  or

  2) When the map() method has finished processing my first record, the
 Mapper instance will NOT be destroyed.  It will be reused for all 5
 records.  When the map() method has finished processing the last record, my
 cleanup() method will execute.  In other words, my setup() and cleanup()
 methods will only execute 1 time each.

  Thanks for the help!





Are mapper classes re-instantiated for each record?

2014-05-05 Thread jeremy p
Let's say I have TaskTracker that receives 5 records to process for a
single job.  When the TaskTracker processses the first record, it will
instantiate my Mapper class and execute my setup() function.  It will then
run the map() method on that record.  My question is this : what happens
when the map() method has finished processing the first record?  I'm
guessing it will do one of two things :

1) My cleanup() function will execute.  After the cleanup() method has
finished, this instance of the Mapper object will be destroyed.  When it is
time to process the next record, a new Mapper object will be instantiated.
 Then my setup() method will execute, the map() method will execute, the
cleanup() method will execute, and then the Mapper instance will be
destroyed.  When it is time to process the next record, a new Mapper object
will be instantiated.  This process will repeat itself until all 5 records
have been processed.  In other words, my setup() and cleanup() methods will
have been executed 5 times each.

or

2) When the map() method has finished processing my first record, the
Mapper instance will NOT be destroyed.  It will be reused for all 5
records.  When the map() method has finished processing the last record, my
cleanup() method will execute.  In other words, my setup() and cleanup()
methods will only execute 1 time each.

Thanks for the help!


Re: Are mapper classes re-instantiated for each record?

2014-05-05 Thread Sergey Murylev
Hi Jeremy,

According to official documentation
http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html
setup and cleanup calls performed for each InputSplit. In this case you
variant 2 is more correct. But actually single mapper can be used for
processing multiple InputSplits. In you case if you have 5 files with 1
record each it can call setup/cleanup 5 times. But if your records are
in single file I think that setup/cleanup should be called once.

--
Thanks,
Sergey

On 06/05/14 02:49, jeremy p wrote:
 Let's say I have TaskTracker that receives 5 records to process for a
 single job.  When the TaskTracker processses the first record, it will
 instantiate my Mapper class and execute my setup() function.  It will
 then run the map() method on that record.  My question is this : what
 happens when the map() method has finished processing the first
 record?  I'm guessing it will do one of two things :

 1) My cleanup() function will execute.  After the cleanup() method has
 finished, this instance of the Mapper object will be destroyed.  When
 it is time to process the next record, a new Mapper object will be
 instantiated.  Then my setup() method will execute, the map() method
 will execute, the cleanup() method will execute, and then the Mapper
 instance will be destroyed.  When it is time to process the next
 record, a new Mapper object will be instantiated.  This process will
 repeat itself until all 5 records have been processed.  In other
 words, my setup() and cleanup() methods will have been executed 5
 times each.

 or

 2) When the map() method has finished processing my first record, the
 Mapper instance will NOT be destroyed.  It will be reused for all 5
 records.  When the map() method has finished processing the last
 record, my cleanup() method will execute.  In other words, my setup()
 and cleanup() methods will only execute 1 time each.

 Thanks for the help!



signature.asc
Description: OpenPGP digital signature