Re: What happens when I do not output anything from my mapper

2012-06-05 Thread murat migdisoglu
Hi Devaraj ,
Indeed, the previous email that I've sent you contained -ls output of
SequenceFileOutputFormat with signatures of the class in it. Hence it was
87 bytes.  Hadoop was creating empty files(in fact, files containing only
the signature) before I started to use LazyOutputFormat.

Regards
Murat


On Tue, Jun 5, 2012 at 7:22 AM, Devaraj k devara...@huawei.com wrote:

 The output files should 0 kb size if you use
 FileOutputFormat/TextOutputFormat.

 I think your output format writer is writing some meta data in those
 files. Can you check what is the data present in those files.

 Can you tell me which output format are you using?

 Thanks
 Devaraj

 
 From: murat migdisoglu [murat.migdiso...@gmail.com]
 Sent: Monday, June 04, 2012 6:18 PM
 To: common-user@hadoop.apache.org
 Subject: Re: What happens when I do not output anything from my mapper

 Hi,
 Thanks for your answer. After I've read your emails, I decided to clear
 completely my mapper method to see If I can disable the output of the
 mapper class at all, but it seems it did not work
 So, here is my mapper method:

@Override
public void map(ByteBuffer key, SortedMapByteBuffer, IColumn columns,
 Context context)
throws IOException, InterruptedException
{

}

 when I execute hadoop fs -ls, I still see many small output files as
 following:

 -rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:44
 /user/mmigdiso/output/part-m-00034
 -rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
 /user/mmigdiso/output/part-m-00037
 -rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
 /user/mmigdiso/output/part-m-00039
 -rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
 /user/mmigdiso/output/part-m-00040
 -rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
 /user/mmigdiso/output/part-m-00042

 Do you know If I have to put something special to the context to specify
 the empty output?

 Regards
 Murat



 On Mon, Jun 4, 2012 at 2:38 PM, Devaraj k devara...@huawei.com wrote:

  Hi Murat,
 
  As Praveenesh explained, you can control the map outputs as you want.
 
  map() function will be called for each input i.e map() function invokes
  multiple times with different inputs in the same mapper. You can check by
  having the logs in the map function what is happening in it.
 
 
  Thanks
  Devaraj
 
  
  From: praveenesh kumar [praveen...@gmail.com]
  Sent: Monday, June 04, 2012 5:57 PM
  To: common-user@hadoop.apache.org
  Subject: Re: What happens when I do not output anything from my mapper
 
  You can control your map outputs based on any condition you want. I have
  done that - it worked for me.
  It could be your code problem that its not working for you.
  Can you please share your map code or cross-check whether your conditions
  are correct ?
 
  Regards,
  Praveenesh
 
  On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu 
  murat.migdiso...@gmail.com
   wrote:
 
   Hi,
   I have a small application where I have only mapper class defined(no
   reducer, no combiner).
   Within the mapper class, I have an if condition according to which I
  decide
   If I want to put something in the context or not.
   If my condition is not match, I want that mapper does not give any
 output
   to the hdfs.
   But apparently, this does not worj as I expected. Once I run my job, a
  file
   per mapper in the hdfs with 87 kb of size.
  
   the if block that I'm using in the map method is as following:
   if (ip == null || ip.equals(cip)) {
  Text value = new Text(mwrapper.toJson());
  word.set(ip);
  context.write( word, value);
  } else {
  log.info(ip not match [ + ip + ]);
  }
   }
   }//end of mapper method
  
   How can I manage that? Does mapper always need to have an output?
  
   --
   Find a job you enjoy, and you'll never work a day in your life.
   Confucius
  
 



 --
 Find a job you enjoy, and you'll never work a day in your life.
 Confucius




-- 
Find a job you enjoy, and you'll never work a day in your life.
Confucius


Re: What happens when I do not output anything from my mapper

2012-06-04 Thread praveenesh kumar
You can control your map outputs based on any condition you want. I have
done that - it worked for me.
It could be your code problem that its not working for you.
Can you please share your map code or cross-check whether your conditions
are correct ?

Regards,
Praveenesh

On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu murat.migdiso...@gmail.com
 wrote:

 Hi,
 I have a small application where I have only mapper class defined(no
 reducer, no combiner).
 Within the mapper class, I have an if condition according to which I decide
 If I want to put something in the context or not.
 If my condition is not match, I want that mapper does not give any output
 to the hdfs.
 But apparently, this does not worj as I expected. Once I run my job, a file
 per mapper in the hdfs with 87 kb of size.

 the if block that I'm using in the map method is as following:
 if (ip == null || ip.equals(cip)) {
Text value = new Text(mwrapper.toJson());
word.set(ip);
context.write( word, value);
} else {
log.info(ip not match [ + ip + ]);
}
 }
 }//end of mapper method

 How can I manage that? Does mapper always need to have an output?

 --
 Find a job you enjoy, and you'll never work a day in your life.
 Confucius



RE: What happens when I do not output anything from my mapper

2012-06-04 Thread Devaraj k
Hi Murat,

As Praveenesh explained, you can control the map outputs as you want. 

map() function will be called for each input i.e map() function invokes 
multiple times with different inputs in the same mapper. You can check by 
having the logs in the map function what is happening in it.
   

Thanks
Devaraj


From: praveenesh kumar [praveen...@gmail.com]
Sent: Monday, June 04, 2012 5:57 PM
To: common-user@hadoop.apache.org
Subject: Re: What happens when I do not output anything from my mapper

You can control your map outputs based on any condition you want. I have
done that - it worked for me.
It could be your code problem that its not working for you.
Can you please share your map code or cross-check whether your conditions
are correct ?

Regards,
Praveenesh

On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu murat.migdiso...@gmail.com
 wrote:

 Hi,
 I have a small application where I have only mapper class defined(no
 reducer, no combiner).
 Within the mapper class, I have an if condition according to which I decide
 If I want to put something in the context or not.
 If my condition is not match, I want that mapper does not give any output
 to the hdfs.
 But apparently, this does not worj as I expected. Once I run my job, a file
 per mapper in the hdfs with 87 kb of size.

 the if block that I'm using in the map method is as following:
 if (ip == null || ip.equals(cip)) {
Text value = new Text(mwrapper.toJson());
word.set(ip);
context.write( word, value);
} else {
log.info(ip not match [ + ip + ]);
}
 }
 }//end of mapper method

 How can I manage that? Does mapper always need to have an output?

 --
 Find a job you enjoy, and you'll never work a day in your life.
 Confucius



Re: What happens when I do not output anything from my mapper

2012-06-04 Thread murat migdisoglu
Hi,
Thanks for your answer. After I've read your emails, I decided to clear
completely my mapper method to see If I can disable the output of the
mapper class at all, but it seems it did not work
So, here is my mapper method:

@Override
public void map(ByteBuffer key, SortedMapByteBuffer, IColumn columns,
Context context)
throws IOException, InterruptedException
{

}

when I execute hadoop fs -ls, I still see many small output files as
following:

-rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:44
/user/mmigdiso/output/part-m-00034
-rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00037
-rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00039
-rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00040
-rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00042

Do you know If I have to put something special to the context to specify
the empty output?

Regards
Murat



On Mon, Jun 4, 2012 at 2:38 PM, Devaraj k devara...@huawei.com wrote:

 Hi Murat,

 As Praveenesh explained, you can control the map outputs as you want.

 map() function will be called for each input i.e map() function invokes
 multiple times with different inputs in the same mapper. You can check by
 having the logs in the map function what is happening in it.


 Thanks
 Devaraj

 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Monday, June 04, 2012 5:57 PM
 To: common-user@hadoop.apache.org
 Subject: Re: What happens when I do not output anything from my mapper

 You can control your map outputs based on any condition you want. I have
 done that - it worked for me.
 It could be your code problem that its not working for you.
 Can you please share your map code or cross-check whether your conditions
 are correct ?

 Regards,
 Praveenesh

 On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu 
 murat.migdiso...@gmail.com
  wrote:

  Hi,
  I have a small application where I have only mapper class defined(no
  reducer, no combiner).
  Within the mapper class, I have an if condition according to which I
 decide
  If I want to put something in the context or not.
  If my condition is not match, I want that mapper does not give any output
  to the hdfs.
  But apparently, this does not worj as I expected. Once I run my job, a
 file
  per mapper in the hdfs with 87 kb of size.
 
  the if block that I'm using in the map method is as following:
  if (ip == null || ip.equals(cip)) {
 Text value = new Text(mwrapper.toJson());
 word.set(ip);
 context.write( word, value);
 } else {
 log.info(ip not match [ + ip + ]);
 }
  }
  }//end of mapper method
 
  How can I manage that? Does mapper always need to have an output?
 
  --
  Find a job you enjoy, and you'll never work a day in your life.
  Confucius
 




-- 
Find a job you enjoy, and you'll never work a day in your life.
Confucius


Re: What happens when I do not output anything from my mapper - Solution

2012-06-04 Thread murat migdisoglu
Ok,
For the ones that faces the problem, here is how I solved the problem:
First of all, there was a task created for that on hadoop:
https://issues.apache.org/jira/browse/HADOOP-4927

and
http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html#Lazy+Output+Creation
explains how to solve that.

So hadoop does indeed create empty part-00x files irrespective what you do
in the mapper class.

So you have to call the following static method of the lazyoutputformat:
LazyOutputFormat.setOutputFormatClass(job, SequenceFileOutputFormat.class);

Be aware, from my experience, this method should be called after you set
the outputformat class:
 job.setOutputFormatClass(SequenceFileOutputFormat.class);


On Mon, Jun 4, 2012 at 2:48 PM, murat migdisoglu murat.migdiso...@gmail.com
 wrote:

 Hi,
 Thanks for your answer. After I've read your emails, I decided to clear
 completely my mapper method to see If I can disable the output of the
 mapper class at all, but it seems it did not work
 So, here is my mapper method:

 @Override
 public void map(ByteBuffer key, SortedMapByteBuffer, IColumn
 columns, Context context)
 throws IOException, InterruptedException
 {

 }

 when I execute hadoop fs -ls, I still see many small output files as
 following:

 -rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:44
 /user/mmigdiso/output/part-m-00034
 -rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
 /user/mmigdiso/output/part-m-00037
 -rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
 /user/mmigdiso/output/part-m-00039
 -rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
 /user/mmigdiso/output/part-m-00040
 -rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
 /user/mmigdiso/output/part-m-00042

 Do you know If I have to put something special to the context to specify
 the empty output?

 Regards
 Murat




 On Mon, Jun 4, 2012 at 2:38 PM, Devaraj k devara...@huawei.com wrote:

 Hi Murat,

 As Praveenesh explained, you can control the map outputs as you want.

 map() function will be called for each input i.e map() function invokes
 multiple times with different inputs in the same mapper. You can check by
 having the logs in the map function what is happening in it.


 Thanks
 Devaraj

 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Monday, June 04, 2012 5:57 PM
 To: common-user@hadoop.apache.org
 Subject: Re: What happens when I do not output anything from my mapper

 You can control your map outputs based on any condition you want. I have
 done that - it worked for me.
 It could be your code problem that its not working for you.
 Can you please share your map code or cross-check whether your conditions
 are correct ?

 Regards,
 Praveenesh

 On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu 
 murat.migdiso...@gmail.com
  wrote:

  Hi,
  I have a small application where I have only mapper class defined(no
  reducer, no combiner).
  Within the mapper class, I have an if condition according to which I
 decide
  If I want to put something in the context or not.
  If my condition is not match, I want that mapper does not give any
 output
  to the hdfs.
  But apparently, this does not worj as I expected. Once I run my job, a
 file
  per mapper in the hdfs with 87 kb of size.
 
  the if block that I'm using in the map method is as following:
  if (ip == null || ip.equals(cip)) {
 Text value = new Text(mwrapper.toJson());
 word.set(ip);
 context.write( word, value);
 } else {
 log.info(ip not match [ + ip + ]);
 }
  }
  }//end of mapper method
 
  How can I manage that? Does mapper always need to have an output?
 
  --
  Find a job you enjoy, and you'll never work a day in your life.
  Confucius
 




 --
 Find a job you enjoy, and you'll never work a day in your life.
 Confucius




-- 
Find a job you enjoy, and you'll never work a day in your life.
Confucius


RE: What happens when I do not output anything from my mapper

2012-06-04 Thread Devaraj k
The output files should 0 kb size if you use FileOutputFormat/TextOutputFormat. 

I think your output format writer is writing some meta data in those files. Can 
you check what is the data present in those files.

Can you tell me which output format are you using?

Thanks
Devaraj


From: murat migdisoglu [murat.migdiso...@gmail.com]
Sent: Monday, June 04, 2012 6:18 PM
To: common-user@hadoop.apache.org
Subject: Re: What happens when I do not output anything from my mapper

Hi,
Thanks for your answer. After I've read your emails, I decided to clear
completely my mapper method to see If I can disable the output of the
mapper class at all, but it seems it did not work
So, here is my mapper method:

@Override
public void map(ByteBuffer key, SortedMapByteBuffer, IColumn columns,
Context context)
throws IOException, InterruptedException
{

}

when I execute hadoop fs -ls, I still see many small output files as
following:

-rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:44
/user/mmigdiso/output/part-m-00034
-rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00037
-rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00039
-rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00040
-rw-r--r--   3 mmigdiso supergroup 87 2012-06-04 12:45
/user/mmigdiso/output/part-m-00042

Do you know If I have to put something special to the context to specify
the empty output?

Regards
Murat



On Mon, Jun 4, 2012 at 2:38 PM, Devaraj k devara...@huawei.com wrote:

 Hi Murat,

 As Praveenesh explained, you can control the map outputs as you want.

 map() function will be called for each input i.e map() function invokes
 multiple times with different inputs in the same mapper. You can check by
 having the logs in the map function what is happening in it.


 Thanks
 Devaraj

 
 From: praveenesh kumar [praveen...@gmail.com]
 Sent: Monday, June 04, 2012 5:57 PM
 To: common-user@hadoop.apache.org
 Subject: Re: What happens when I do not output anything from my mapper

 You can control your map outputs based on any condition you want. I have
 done that - it worked for me.
 It could be your code problem that its not working for you.
 Can you please share your map code or cross-check whether your conditions
 are correct ?

 Regards,
 Praveenesh

 On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu 
 murat.migdiso...@gmail.com
  wrote:

  Hi,
  I have a small application where I have only mapper class defined(no
  reducer, no combiner).
  Within the mapper class, I have an if condition according to which I
 decide
  If I want to put something in the context or not.
  If my condition is not match, I want that mapper does not give any output
  to the hdfs.
  But apparently, this does not worj as I expected. Once I run my job, a
 file
  per mapper in the hdfs with 87 kb of size.
 
  the if block that I'm using in the map method is as following:
  if (ip == null || ip.equals(cip)) {
 Text value = new Text(mwrapper.toJson());
 word.set(ip);
 context.write( word, value);
 } else {
 log.info(ip not match [ + ip + ]);
 }
  }
  }//end of mapper method
 
  How can I manage that? Does mapper always need to have an output?
 
  --
  Find a job you enjoy, and you'll never work a day in your life.
  Confucius
 




--
Find a job you enjoy, and you'll never work a day in your life.
Confucius