Re: What happens when I do not output anything from my mapper
Hi Devaraj , Indeed, the previous email that I've sent you contained -ls output of SequenceFileOutputFormat with signatures of the class in it. Hence it was 87 bytes. Hadoop was creating empty files(in fact, files containing only the signature) before I started to use LazyOutputFormat. Regards Murat On Tue, Jun 5, 2012 at 7:22 AM, Devaraj k devara...@huawei.com wrote: The output files should 0 kb size if you use FileOutputFormat/TextOutputFormat. I think your output format writer is writing some meta data in those files. Can you check what is the data present in those files. Can you tell me which output format are you using? Thanks Devaraj From: murat migdisoglu [murat.migdiso...@gmail.com] Sent: Monday, June 04, 2012 6:18 PM To: common-user@hadoop.apache.org Subject: Re: What happens when I do not output anything from my mapper Hi, Thanks for your answer. After I've read your emails, I decided to clear completely my mapper method to see If I can disable the output of the mapper class at all, but it seems it did not work So, here is my mapper method: @Override public void map(ByteBuffer key, SortedMapByteBuffer, IColumn columns, Context context) throws IOException, InterruptedException { } when I execute hadoop fs -ls, I still see many small output files as following: -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:44 /user/mmigdiso/output/part-m-00034 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00037 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00039 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00040 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00042 Do you know If I have to put something special to the context to specify the empty output? Regards Murat On Mon, Jun 4, 2012 at 2:38 PM, Devaraj k devara...@huawei.com wrote: Hi Murat, As Praveenesh explained, you can control the map outputs as you want. map() function will be called for each input i.e map() function invokes multiple times with different inputs in the same mapper. You can check by having the logs in the map function what is happening in it. Thanks Devaraj From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, June 04, 2012 5:57 PM To: common-user@hadoop.apache.org Subject: Re: What happens when I do not output anything from my mapper You can control your map outputs based on any condition you want. I have done that - it worked for me. It could be your code problem that its not working for you. Can you please share your map code or cross-check whether your conditions are correct ? Regards, Praveenesh On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu murat.migdiso...@gmail.com wrote: Hi, I have a small application where I have only mapper class defined(no reducer, no combiner). Within the mapper class, I have an if condition according to which I decide If I want to put something in the context or not. If my condition is not match, I want that mapper does not give any output to the hdfs. But apparently, this does not worj as I expected. Once I run my job, a file per mapper in the hdfs with 87 kb of size. the if block that I'm using in the map method is as following: if (ip == null || ip.equals(cip)) { Text value = new Text(mwrapper.toJson()); word.set(ip); context.write( word, value); } else { log.info(ip not match [ + ip + ]); } } }//end of mapper method How can I manage that? Does mapper always need to have an output? -- Find a job you enjoy, and you'll never work a day in your life. Confucius -- Find a job you enjoy, and you'll never work a day in your life. Confucius -- Find a job you enjoy, and you'll never work a day in your life. Confucius
Re: What happens when I do not output anything from my mapper
You can control your map outputs based on any condition you want. I have done that - it worked for me. It could be your code problem that its not working for you. Can you please share your map code or cross-check whether your conditions are correct ? Regards, Praveenesh On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu murat.migdiso...@gmail.com wrote: Hi, I have a small application where I have only mapper class defined(no reducer, no combiner). Within the mapper class, I have an if condition according to which I decide If I want to put something in the context or not. If my condition is not match, I want that mapper does not give any output to the hdfs. But apparently, this does not worj as I expected. Once I run my job, a file per mapper in the hdfs with 87 kb of size. the if block that I'm using in the map method is as following: if (ip == null || ip.equals(cip)) { Text value = new Text(mwrapper.toJson()); word.set(ip); context.write( word, value); } else { log.info(ip not match [ + ip + ]); } } }//end of mapper method How can I manage that? Does mapper always need to have an output? -- Find a job you enjoy, and you'll never work a day in your life. Confucius
RE: What happens when I do not output anything from my mapper
Hi Murat, As Praveenesh explained, you can control the map outputs as you want. map() function will be called for each input i.e map() function invokes multiple times with different inputs in the same mapper. You can check by having the logs in the map function what is happening in it. Thanks Devaraj From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, June 04, 2012 5:57 PM To: common-user@hadoop.apache.org Subject: Re: What happens when I do not output anything from my mapper You can control your map outputs based on any condition you want. I have done that - it worked for me. It could be your code problem that its not working for you. Can you please share your map code or cross-check whether your conditions are correct ? Regards, Praveenesh On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu murat.migdiso...@gmail.com wrote: Hi, I have a small application where I have only mapper class defined(no reducer, no combiner). Within the mapper class, I have an if condition according to which I decide If I want to put something in the context or not. If my condition is not match, I want that mapper does not give any output to the hdfs. But apparently, this does not worj as I expected. Once I run my job, a file per mapper in the hdfs with 87 kb of size. the if block that I'm using in the map method is as following: if (ip == null || ip.equals(cip)) { Text value = new Text(mwrapper.toJson()); word.set(ip); context.write( word, value); } else { log.info(ip not match [ + ip + ]); } } }//end of mapper method How can I manage that? Does mapper always need to have an output? -- Find a job you enjoy, and you'll never work a day in your life. Confucius
Re: What happens when I do not output anything from my mapper
Hi, Thanks for your answer. After I've read your emails, I decided to clear completely my mapper method to see If I can disable the output of the mapper class at all, but it seems it did not work So, here is my mapper method: @Override public void map(ByteBuffer key, SortedMapByteBuffer, IColumn columns, Context context) throws IOException, InterruptedException { } when I execute hadoop fs -ls, I still see many small output files as following: -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:44 /user/mmigdiso/output/part-m-00034 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00037 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00039 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00040 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00042 Do you know If I have to put something special to the context to specify the empty output? Regards Murat On Mon, Jun 4, 2012 at 2:38 PM, Devaraj k devara...@huawei.com wrote: Hi Murat, As Praveenesh explained, you can control the map outputs as you want. map() function will be called for each input i.e map() function invokes multiple times with different inputs in the same mapper. You can check by having the logs in the map function what is happening in it. Thanks Devaraj From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, June 04, 2012 5:57 PM To: common-user@hadoop.apache.org Subject: Re: What happens when I do not output anything from my mapper You can control your map outputs based on any condition you want. I have done that - it worked for me. It could be your code problem that its not working for you. Can you please share your map code or cross-check whether your conditions are correct ? Regards, Praveenesh On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu murat.migdiso...@gmail.com wrote: Hi, I have a small application where I have only mapper class defined(no reducer, no combiner). Within the mapper class, I have an if condition according to which I decide If I want to put something in the context or not. If my condition is not match, I want that mapper does not give any output to the hdfs. But apparently, this does not worj as I expected. Once I run my job, a file per mapper in the hdfs with 87 kb of size. the if block that I'm using in the map method is as following: if (ip == null || ip.equals(cip)) { Text value = new Text(mwrapper.toJson()); word.set(ip); context.write( word, value); } else { log.info(ip not match [ + ip + ]); } } }//end of mapper method How can I manage that? Does mapper always need to have an output? -- Find a job you enjoy, and you'll never work a day in your life. Confucius -- Find a job you enjoy, and you'll never work a day in your life. Confucius
Re: What happens when I do not output anything from my mapper - Solution
Ok, For the ones that faces the problem, here is how I solved the problem: First of all, there was a task created for that on hadoop: https://issues.apache.org/jira/browse/HADOOP-4927 and http://hadoop.apache.org/mapreduce/docs/current/mapred_tutorial.html#Lazy+Output+Creation explains how to solve that. So hadoop does indeed create empty part-00x files irrespective what you do in the mapper class. So you have to call the following static method of the lazyoutputformat: LazyOutputFormat.setOutputFormatClass(job, SequenceFileOutputFormat.class); Be aware, from my experience, this method should be called after you set the outputformat class: job.setOutputFormatClass(SequenceFileOutputFormat.class); On Mon, Jun 4, 2012 at 2:48 PM, murat migdisoglu murat.migdiso...@gmail.com wrote: Hi, Thanks for your answer. After I've read your emails, I decided to clear completely my mapper method to see If I can disable the output of the mapper class at all, but it seems it did not work So, here is my mapper method: @Override public void map(ByteBuffer key, SortedMapByteBuffer, IColumn columns, Context context) throws IOException, InterruptedException { } when I execute hadoop fs -ls, I still see many small output files as following: -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:44 /user/mmigdiso/output/part-m-00034 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00037 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00039 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00040 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00042 Do you know If I have to put something special to the context to specify the empty output? Regards Murat On Mon, Jun 4, 2012 at 2:38 PM, Devaraj k devara...@huawei.com wrote: Hi Murat, As Praveenesh explained, you can control the map outputs as you want. map() function will be called for each input i.e map() function invokes multiple times with different inputs in the same mapper. You can check by having the logs in the map function what is happening in it. Thanks Devaraj From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, June 04, 2012 5:57 PM To: common-user@hadoop.apache.org Subject: Re: What happens when I do not output anything from my mapper You can control your map outputs based on any condition you want. I have done that - it worked for me. It could be your code problem that its not working for you. Can you please share your map code or cross-check whether your conditions are correct ? Regards, Praveenesh On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu murat.migdiso...@gmail.com wrote: Hi, I have a small application where I have only mapper class defined(no reducer, no combiner). Within the mapper class, I have an if condition according to which I decide If I want to put something in the context or not. If my condition is not match, I want that mapper does not give any output to the hdfs. But apparently, this does not worj as I expected. Once I run my job, a file per mapper in the hdfs with 87 kb of size. the if block that I'm using in the map method is as following: if (ip == null || ip.equals(cip)) { Text value = new Text(mwrapper.toJson()); word.set(ip); context.write( word, value); } else { log.info(ip not match [ + ip + ]); } } }//end of mapper method How can I manage that? Does mapper always need to have an output? -- Find a job you enjoy, and you'll never work a day in your life. Confucius -- Find a job you enjoy, and you'll never work a day in your life. Confucius -- Find a job you enjoy, and you'll never work a day in your life. Confucius
RE: What happens when I do not output anything from my mapper
The output files should 0 kb size if you use FileOutputFormat/TextOutputFormat. I think your output format writer is writing some meta data in those files. Can you check what is the data present in those files. Can you tell me which output format are you using? Thanks Devaraj From: murat migdisoglu [murat.migdiso...@gmail.com] Sent: Monday, June 04, 2012 6:18 PM To: common-user@hadoop.apache.org Subject: Re: What happens when I do not output anything from my mapper Hi, Thanks for your answer. After I've read your emails, I decided to clear completely my mapper method to see If I can disable the output of the mapper class at all, but it seems it did not work So, here is my mapper method: @Override public void map(ByteBuffer key, SortedMapByteBuffer, IColumn columns, Context context) throws IOException, InterruptedException { } when I execute hadoop fs -ls, I still see many small output files as following: -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:44 /user/mmigdiso/output/part-m-00034 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00037 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00039 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00040 -rw-r--r-- 3 mmigdiso supergroup 87 2012-06-04 12:45 /user/mmigdiso/output/part-m-00042 Do you know If I have to put something special to the context to specify the empty output? Regards Murat On Mon, Jun 4, 2012 at 2:38 PM, Devaraj k devara...@huawei.com wrote: Hi Murat, As Praveenesh explained, you can control the map outputs as you want. map() function will be called for each input i.e map() function invokes multiple times with different inputs in the same mapper. You can check by having the logs in the map function what is happening in it. Thanks Devaraj From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, June 04, 2012 5:57 PM To: common-user@hadoop.apache.org Subject: Re: What happens when I do not output anything from my mapper You can control your map outputs based on any condition you want. I have done that - it worked for me. It could be your code problem that its not working for you. Can you please share your map code or cross-check whether your conditions are correct ? Regards, Praveenesh On Mon, Jun 4, 2012 at 5:52 PM, murat migdisoglu murat.migdiso...@gmail.com wrote: Hi, I have a small application where I have only mapper class defined(no reducer, no combiner). Within the mapper class, I have an if condition according to which I decide If I want to put something in the context or not. If my condition is not match, I want that mapper does not give any output to the hdfs. But apparently, this does not worj as I expected. Once I run my job, a file per mapper in the hdfs with 87 kb of size. the if block that I'm using in the map method is as following: if (ip == null || ip.equals(cip)) { Text value = new Text(mwrapper.toJson()); word.set(ip); context.write( word, value); } else { log.info(ip not match [ + ip + ]); } } }//end of mapper method How can I manage that? Does mapper always need to have an output? -- Find a job you enjoy, and you'll never work a day in your life. Confucius -- Find a job you enjoy, and you'll never work a day in your life. Confucius