Re: OutOfMemoryError of PIG job (UDF loads big file)
Yeah! Wait till you stumble across the need to adjust shuffle/reduce buffers, reuse JVMs, sort factor, copier threads . :-) On 2/23/10 8:06 AM, jiang licht licht_ji...@yahoo.com wrote: Thanks Jeff. I also just found this one and solved my problem. BTW, so many settings to play with :) Michael --- On Mon, 2/22/10, Jeff Zhang zjf...@gmail.com wrote: From: Jeff Zhang zjf...@gmail.com Subject: Re: OutOfMemoryError of PIG job (UDF loads big file) To: common-user@hadoop.apache.org Date: Monday, February 22, 2010, 8:13 PM Hi Jiang, you should set property *mapred.child.java.opts* in mapred-site.xml to increase the memeory as following: property namemapred.child.java.opts/name value-Xmx1024m/value /property and then restart your hadoop cluster On Tue, Feb 23, 2010 at 9:43 AM, jiang licht licht_ji...@yahoo.com wrote: I am running a hadoop job written in PIG. It fails from out of memory because a UDF function consumes a lot of memory, it loads a big file. What are the settings to avoid the following OutOfMemoryError? I guess by simply giving PIG big memory (java -XmxBIGmemory org.apache.pig.Main ...) won't work. Error message --- java.lang.OutOfMemoryError: Java heap space at java.util.regex.Pattern.compile(Pattern.java:1451) at java.util.regex.Pattern.(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:823) at java.lang.String.split(String.java:2293) at java.lang.String.split(String.java:2335) at UDF.load(Unknown Source) at UDF.load(Unknown Source) at UDF.exec(Unknown Source) at UDF.exec(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child.main(Child.java:155) Thanks! Michael -- Best Regards Jeff Zhang
Re: why not zookeeper for the namenode
From what I read, I thought, that bookkeeper would be the ideal enhancement for the namenode, to make it distributed and therefor finaly highly available. Being distributed doesn't imply high availability. Availability is about minimizing downtime. For example, a primary that can fail over to a secondary (and back) may be more available than a distributed system that needs to be restarted when it's software or dependencies are upgraded. A distributed system that can only handle x ops/second may be less available than a non-distributed system that can handle 2x ops/second. A large distributed system may be less available than n smaller systems, depending on consistency requirements. Implementation and management complexity often result in more downtime. Etc. Decreasing the time it takes to restart and upgrade an HDFS cluster would significantly improve it's availability for many users (there are jiras for these). - Why hasn't zookeeper(-bookkeeper) not been chosen? Persisting NN metadata over a set of servers is only part of the problem. You might be interested in checking out HDFS-976. Thanks, Eli
Re: why not zookeeper for the namenode
E. Sammer wrote: On 2/22/10 12:53 PM, zlatin.balev...@barclayscapital.com wrote: My 2 cents: If the NN stores all state behind a javax.Cache façade it will be possible to use all kinds of products (commercial, open source, facades to ZK) for redundancy, load balancing, etc. This would be pretty interesting from a deployment / scale point of view in that many jcache providers support flushing to disk and the concept of distribution and near / far data. Now, all of that said, this also removes some certainty from the name node contract, namely: If jcache was used we: - couldn't promise data would be in memory (both a plus and a minus). - couldn't promise data is on the same machine. - can't make guarantees about consistency (flushes, PITR features, etc.). It may be too general an abstraction layer for this type of application. It's a good avenue to explore, just playing devil's advocate. Regards. I know nothing about HA, though I work with people who do. They normally start their conversations with Steve, you idiot, you don't understand Because HA is all or nothing: either you are HA or you aren't. It's also somewhere you need to reach for the mathematics to prove works in theory, then test in the field in interesting situations. Even then, they have horror stories. We all know that NN's have limits, but most of those limits are known and documented: -doesn't like full disks -a corrupted editlog doesn't replay -if the 2ary NN isn't live, the NN will stay up (It's been discussed having it somehow react if there isn't any secondary around and you say you require one) There's also an open issue that may be related to UTF-8 translation that is confusing restarts, under discussion in -hdfs right now. What is best is this: everyone has the same code base, one that is tested at scale. If you switch to multiple back ends, then nobody other than the people who have the same back end as you will be able to replicate the problem. You lose the Yahoo! and Facebook run on PetaByte sized clusters, so my 40 TB is noise argument, replacing it with Yahoo! and Facebook run on one cache back end, I use something else and am on my own when it fails. I don't want to go there
Re: why not zookeeper for the namenode
Eli Collins wrote: From what I read, I thought, that bookkeeper would be the ideal enhancement for the namenode, to make it distributed and therefor finaly highly available. Being distributed doesn't imply high availability. Availability is about minimizing downtime. For example, a primary that can fail over to a secondary (and back) may be more available than a distributed system that needs to be restarted when it's software or dependencies are upgraded. A distributed system that can only handle x ops/second may be less available than a non-distributed system that can handle 2x ops/second. A large distributed system may be less available than n smaller systems, depending on consistency requirements. Implementation and management complexity often result in more downtime. Etc. Decreasing the time it takes to restart and upgrade an HDFS cluster would significantly improve it's availability for many users (there are jiras for these). There's another availability, engineering availability. What we have today is nice in that the HDFS engineering skills are distributed among a number of companies, and the source is there for anyone else to learn, rebuilding is fairly straightforward. Don't dismiss that as unimportant. Engineering availability means that if you discover a new problem, you have the ability to patch your copy of the code, and keep going, while filing a bug report for others to deal with. That significantly reduces your downtime on a software problem compared to filing a bugrep and hoping that a future release will have addressed it -steve
How are intermediate key/value pairs materialized between map and reduce?
Hi there, can anybody help me out on a (most likely) simple unclarity. I am wondering how intermediate key/value pairs are materialized. I have a job where the map phase produces 600,000 records and map output bytes is ~300GB. What I thought (up to now) is that these 600,000 records, i.e., 300GB, are materialized locally by the mappers and that later on reducers pull these records (based on the key). What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is as high as ~900GB. So - where does the factor 3 come from between Map output bytes and FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the file system - but that should be HDFS only?! Thanks - tim
MultipleOutputs and new 20.1 API
Does the code for MutipleOutputs work (as described in the Javadocs) for the new 20.1 API? This object expects a JobConf to most of its methods, which is deprecated in 20.1? If not then is there any plans to update MultipleOutputs to work with the new API?
Re: Job Tracker questions
i have to find Counter and i am using inetsocketAddress and connecting into the jobtracker jsp with help of it am able to find the counter when it is running on pseudo mode but as i am trying to run this in cluster . i am not able to get any counters . SI there any parameters that I need to change in configuration to read the counter information ? Thanks in advance On Tue, Feb 9, 2010 at 8:49 PM, Jeff Zhang zjf...@gmail.com wrote: JobClient also use proxy of JobTracker. On Mon, Feb 8, 2010 at 11:19 PM, Mark N nipen.m...@gmail.com wrote: Did you check the jobClient source code? On Thu, Feb 4, 2010 at 5:21 PM, Jeff Zhang zjf...@gmail.com wrote: I look at the source code, it seems the job tracker web ui also use the proxy of JobTracker to get the counter information rather the xml file. On Thu, Feb 4, 2010 at 7:29 PM, Mark N nipen.m...@gmail.com wrote: yes we can create a webservice in java which would be called by .net to display these counters. But since the java code to read these counters needs use hadoop APIs ( job client ) , am not sure we can create a webservice to read the counters Question is how does the default hadoop task tracker display counter information in JSP pages ? does it read from the XML files ? thanks, On Thu, Feb 4, 2010 at 5:08 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can create web service using Java, and then in .net using the web service to display the result. On Thu, Feb 4, 2010 at 7:21 PM, Jeff Zhang zjf...@gmail.com wrote: Do you mean want to connect the JobTracker using .Net ? If so, I'm afraid I have no idea how to this. The rpc of hadoop is language dependent. On Thu, Feb 4, 2010 at 7:18 PM, Mark N nipen.m...@gmail.com wrote: could you please elaborate on this ( * hint to get started as am very new to hadoop? ) So far I could succesfully read all the default and custom counters. Currently we are having a .net client. thanks in advance. On Thu, Feb 4, 2010 at 4:53 PM, Jeff Zhang zjf...@gmail.com wrote: Well, you can create a proxy of JobTracker in client side, and then you can use the API of JobTracker to get the information of jobs. The Proxy take the responsibility of communication with the Master Node. Read the source code of JobClient can help you. On Thu, Feb 4, 2010 at 6:59 PM, Mark N nipen.m...@gmail.com wrote: Ye currently am using jobclient to read these counters. But We are not able to use *webservices *because the jar which is used to read the counters from running hadoop job is itself a Hadoop program If we could have pure Java Api which is run without hadoop command then we could return the counter variable into webservices and show in UI. Any help or technique to show thsese counters in the UI would be appreciated ( not necessarily using web service ) I am using webservices because I am having .net VB client thanks On Wed, Feb 3, 2010 at 8:33 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can use JobClient to get the counters in your web service. If you look at the shell script bin/hadoop, you will find that actually this shell use the JobClient to get the counters. On Wed, Feb 3, 2010 at 4:34 AM, Mark N nipen.m...@gmail.com wrote: We have a hadoop job running and have used custom counters to track few counters ( like no of successfully processed documents matching certain conditions) Since we need to get this counters even while the Hadoop job is running , we wrote another Java program to read these counters *Counter reader program *will do the following : 1) List all the running jobs. 2) Get the running job using Job name 2) Get all the counter for individual running jobs 3) Set this counters in variables. We could successfully read these counters , but since we need to show these counters to custom UI , how can we show these counters? we looked into various options to read these counters to show in UI as following : 1. Dump these counters to
Re: How are intermediate key/value pairs materialized between map and reduce?
Hi Tim, the intermediate data is materialized to local file system. Before it is available for reducers, mappers will sort them. If the buffer (io.sort.mb) is too small for the intermediate data, multi-phase sorting happen, which means you read and write the same bit more than one time. Besides, are you looking at the statistics per mapper through the job tracker, or just the information output when a job finish? If you look at the information given out at the end of the job, note that this is an overall statistics which include sorting at reduce side. It also include the amount of data written to HDFS (I am not 100% sure). And, the FILE-BYTES_WRITTEN has nothing to do with the replication factor. I think if you change the factor to 6, FILE_BYTES_WRITTEN is still the same. -Gang - 原始邮件 发件人: Tim Kiefer tim-kie...@gmx.de 收件人: common-user@hadoop.apache.org common-user@hadoop.apache.org 发送日期: 2010/2/23 (周二) 6:44:28 上午 主 题: How are intermediate key/value pairs materialized between map and reduce? Hi there, can anybody help me out on a (most likely) simple unclarity. I am wondering how intermediate key/value pairs are materialized. I have a job where the map phase produces 600,000 records and map output bytes is ~300GB. What I thought (up to now) is that these 600,000 records, i.e., 300GB, are materialized locally by the mappers and that later on reducers pull these records (based on the key). What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is as high as ~900GB. So - where does the factor 3 come from between Map output bytes and FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the file system - but that should be HDFS only?! Thanks - tim ___ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/
Re: How are intermediate key/value pairs materialized between map and reduce?
Hi Gang, thanks for your reply. To clarify: I look at the statistics through the job tracker. In the webinterface for my job I have columns for map, reduce and total. What I was refering to is map - i.e. I see FILE_BYTES_WRITTEN = 3 * Map Output Bytes in the map column. About the replication factor: I would expect the exact same thing - changing to 6 has no influence on FILE_BYTES_WRITTEN. About the sorting: I have io.sort.mb = 100 and io.sort.factor = 10. Furthermore, I have 40 mappers and map output data is ~300GB. I can't see how that ends up in a factor 3? - tim Am 23.02.2010 14:39, schrieb Gang Luo: Hi Tim, the intermediate data is materialized to local file system. Before it is available for reducers, mappers will sort them. If the buffer (io.sort.mb) is too small for the intermediate data, multi-phase sorting happen, which means you read and write the same bit more than one time. Besides, are you looking at the statistics per mapper through the job tracker, or just the information output when a job finish? If you look at the information given out at the end of the job, note that this is an overall statistics which include sorting at reduce side. It also include the amount of data written to HDFS (I am not 100% sure). And, the FILE-BYTES_WRITTEN has nothing to do with the replication factor. I think if you change the factor to 6, FILE_BYTES_WRITTEN is still the same. -Gang - 原始邮件 发件人: Tim Kiefer tim-kie...@gmx.de 收件人: common-user@hadoop.apache.org common-user@hadoop.apache.org 发送日期: 2010/2/23 (周二) 6:44:28 上午 主 题: How are intermediate key/value pairs materialized between map and reduce? Hi there, can anybody help me out on a (most likely) simple unclarity. I am wondering how intermediate key/value pairs are materialized. I have a job where the map phase produces 600,000 records and map output bytes is ~300GB. What I thought (up to now) is that these 600,000 records, i.e., 300GB, are materialized locally by the mappers and that later on reducers pull these records (based on the key). What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is as high as ~900GB. So - where does the factor 3 come from between Map output bytes and FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the file system - but that should be HDFS only?! Thanks - tim ___ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/
Re: Wrong FS
Thanks Marc and Bill I solved this Wrong FS problem editing the /etc/hosts as Marc said. Now, the cluster is working ok : ] master01 127.0.0.1 localhost.localdomain localhost 10.0.0.101 master01 10.0.0.102 master02 10.0.0.200 slave00 10.0.0.201 slave01 master02 127.0.0.1 localhost.localdomain localhost 10.0.0.101 master01 10.0.0.102 master02 10.0.0.200 slave00 10.0.0.201 slave01 slave00 127.0.0.1 slave00 localhost.localdomain localhost 10.0.0.101 master01 10.0.0.102 master02 10.0.0.201 slave01 slave01 127.0.0.1 slave01 localhost.localdomain localhost 10.0.0.101 master01 10.0.0.102 master02 10.0.0.200 slave00 Edson Ramiro On 22 February 2010 17:56, Marc Farnum Rendino mvg...@gmail.com wrote: Perhaps an /etc/hosts file is sufficient. However, FWIW, I didn't get it working til I moved to using all the real FQDNs. - Marc
Re: Native libraries on OS X
Allen, You mean compiling the native library? I did ant compile-native. My code to read/write the files was built in Eclipse, but the same problem occurs with testsequencefile in hadoop-0.21-test.jar. What would you recommend regarding how to build, or tweak or regenerate the configure scripts? Thanks, Derek From: Allen Wittenauer awittena...@linkedin.com To: common-user@hadoop.apache.org Date: Mon, 22 Feb 2010 15:00:04 -0800 Subject: Re: Native libraries on OS X Did you compile this outside of ant? If so, you probably got bit by the fact that the configure scripts assume that the bitness level is shared via an environment variable. [I really need to finish cleaning up my configure script patches. :( ]
Re: How are intermediate key/value pairs materialized between map and reduce?
Hi Tim, I'm guessing a lot of these writes are happening on the reduce side. On the JT web interface, there are three columns: map, reduce, overall. Is the 900GB figure from the overall column? The value in the map column will probably be closer to what you were expecting. There are writes on the reduce side too during the shuffle and multi-pass merge. Ed 2010/2/23 Tim Kiefer tim-kie...@gmx.de: Hi Gang, thanks for your reply. To clarify: I look at the statistics through the job tracker. In the webinterface for my job I have columns for map, reduce and total. What I was refering to is map - i.e. I see FILE_BYTES_WRITTEN = 3 * Map Output Bytes in the map column. About the replication factor: I would expect the exact same thing - changing to 6 has no influence on FILE_BYTES_WRITTEN. About the sorting: I have io.sort.mb = 100 and io.sort.factor = 10. Furthermore, I have 40 mappers and map output data is ~300GB. I can't see how that ends up in a factor 3? - tim Am 23.02.2010 14:39, schrieb Gang Luo: Hi Tim, the intermediate data is materialized to local file system. Before it is available for reducers, mappers will sort them. If the buffer (io.sort.mb) is too small for the intermediate data, multi-phase sorting happen, which means you read and write the same bit more than one time. Besides, are you looking at the statistics per mapper through the job tracker, or just the information output when a job finish? If you look at the information given out at the end of the job, note that this is an overall statistics which include sorting at reduce side. It also include the amount of data written to HDFS (I am not 100% sure). And, the FILE-BYTES_WRITTEN has nothing to do with the replication factor. I think if you change the factor to 6, FILE_BYTES_WRITTEN is still the same. -Gang - 原始邮件 发件人: Tim Kiefer tim-kie...@gmx.de 收件人: common-user@hadoop.apache.org common-user@hadoop.apache.org 发送日期: 2010/2/23 (周二) 6:44:28 上午 主 题: How are intermediate key/value pairs materialized between map and reduce? Hi there, can anybody help me out on a (most likely) simple unclarity. I am wondering how intermediate key/value pairs are materialized. I have a job where the map phase produces 600,000 records and map output bytes is ~300GB. What I thought (up to now) is that these 600,000 records, i.e., 300GB, are materialized locally by the mappers and that later on reducers pull these records (based on the key). What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is as high as ~900GB. So - where does the factor 3 come from between Map output bytes and FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the file system - but that should be HDFS only?! Thanks - tim ___ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/
Re: How are intermediate key/value pairs materialized between map and reduce?
No... 900GB is in the map column. Reduce adds another ~70GB of FILE_BYTES_WRITTEN and the total column consequently shows ~970GB. Am 23.02.2010 16:11, schrieb Ed Mazur: Hi Tim, I'm guessing a lot of these writes are happening on the reduce side. On the JT web interface, there are three columns: map, reduce, overall. Is the 900GB figure from the overall column? The value in the map column will probably be closer to what you were expecting. There are writes on the reduce side too during the shuffle and multi-pass merge. Ed 2010/2/23 Tim Kiefer tim-kie...@gmx.de: Hi Gang, thanks for your reply. To clarify: I look at the statistics through the job tracker. In the webinterface for my job I have columns for map, reduce and total. What I was refering to is map - i.e. I see FILE_BYTES_WRITTEN = 3 * Map Output Bytes in the map column. About the replication factor: I would expect the exact same thing - changing to 6 has no influence on FILE_BYTES_WRITTEN. About the sorting: I have io.sort.mb = 100 and io.sort.factor = 10. Furthermore, I have 40 mappers and map output data is ~300GB. I can't see how that ends up in a factor 3? - tim Am 23.02.2010 14:39, schrieb Gang Luo: Hi Tim, the intermediate data is materialized to local file system. Before it is available for reducers, mappers will sort them. If the buffer (io.sort.mb) is too small for the intermediate data, multi-phase sorting happen, which means you read and write the same bit more than one time. Besides, are you looking at the statistics per mapper through the job tracker, or just the information output when a job finish? If you look at the information given out at the end of the job, note that this is an overall statistics which include sorting at reduce side. It also include the amount of data written to HDFS (I am not 100% sure). And, the FILE-BYTES_WRITTEN has nothing to do with the replication factor. I think if you change the factor to 6, FILE_BYTES_WRITTEN is still the same. -Gang - 原始邮件 发件人: Tim Kiefer tim-kie...@gmx.de 收件人: common-user@hadoop.apache.org common-user@hadoop.apache.org 发送日期: 2010/2/23 (周二) 6:44:28 上午 主 题: How are intermediate key/value pairs materialized between map and reduce? Hi there, can anybody help me out on a (most likely) simple unclarity. I am wondering how intermediate key/value pairs are materialized. I have a job where the map phase produces 600,000 records and map output bytes is ~300GB. What I thought (up to now) is that these 600,000 records, i.e., 300GB, are materialized locally by the mappers and that later on reducers pull these records (based on the key). What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is as high as ~900GB. So - where does the factor 3 come from between Map output bytes and FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the file system - but that should be HDFS only?! Thanks - tim ___ 好玩贺卡等你发,邮箱贺卡全新上线! http://card.mail.cn.yahoo.com/
Re: New or old Map/Reduce API ?
Hello Kai, To answer your questions: - Most of the missing stuff from the new API are convenience classes -- InputFormats, OutputFormats, etc. One very handy class that is missing from the new API is MultipleOutputs which allows you to write multiple files in a single pass. - You cannot mix classes from the old API and the new API in the same MapReduce job. - However, you can run different jobs that use either the old or new API in the same cluster, and their data inputs and outputs should be compatible. - Figuring out how to do things right in the new versus the old API is really a matter of having a template MapReduce job (e.g., WordCount) for each API. The surrounding classes are different, but the way you write your map and reduce functions are largely the same. - Porting old code to the new API is not hard (especially with an IDE), but it can get tedious. As an exercise, you could try porting WordCount from the old API to the new one. - It will probably take more than a year to fully remove the old APIs. Today, the old API is marked as 'deprecated' which is really a misnomer since the new API doesn't provide a full alternative to the old. My recommendation would be to learn and use both. Personally, I find the new API to be cleaner and more 'Java-like'. For someone just picking this stuff up, the old APIs might be easier as they correlate exactly to the examples in print. Regards, - Patrick On Tue, Feb 23, 2010 at 10:00 AM, Kai Londenberg londenb...@nurago.comwrote: Hi .. I'm currently trying to get information for a decision on which version of the Map/Reduce API to use for our Map/Reduce Jobs. We have existing (production) code that uses the old API and some new code (non-production) that uses the new API, and we have some developers who will definitely not have much time to dig into Hadoop Sources to figure out how to do things right in the new API (instead of being able to look it up in a book), so the state of documentation does matter. So far I got the following information: - The old API is deprecated and will be removed, but that will probably take at least a year - The new API does not provide really new functionality - There's library and contrib. code that has not been ported to the new API yet - Most of the existing thorough documentation (like Hadoop: the definitive Guide) covers the old API - Porting to the new API will probably become easier in future versions of Hadoop when more of the lib code and docs have been ported So, what are your experiences with new vs. old API ? Would you recommend to switch to the new API right now, or wait for a later release ? Is it problematic to have applications using old and new API side by side ? How hard is it currently to port old code to the new API ? If these questions have been covered by some other thread already, please point me to it. I could not find much of a discussion browsing the mailing list archives, though. Thanks in advance for any advice you can give, Kai Londenberg . . . . . . . . . . . . . . . . . . . . . . . . Software Developer nurago GmbH applied research technologies Kurt-Schumacher-Str. 24 . 30159 Hannover Tel. +49 511 213 866 . 0 Fax +49 511 213 866 . 22 londenb...@nurago.commailto:londenb...@nurago.com . www.nurago.com http://www.nurago.com Geschäftsführer: Thomas Knauer Amtsgericht Hannover: HRB 201817 UID (Vat)-No: DE 2540 787 09
Custom InputFormat: LineRecordReader.LineReader reads 0 bytes
Hi All! I am implementing a custom InputFormat. Its custom RecordReader uses LineRecordReader.LineReader inside. In some cases its read() method returns 0, i.e. reads 0 bytes. This happen also in unit test where it reads form a regular file on UNIX filesystem. What does it mean and how should I handle it/avoid it? After examining the sources I can see that zero count of bytes read comes from the value returned by FSDataInputStream.read(). In which situations can this happen? -1 would mean end of file, 0 can mean that no data was available (but end of file still not reached) - I can not understand how this can happen in case of local file. Please give me advice :) Regards, Alexey Tigarev ti...@nlp.od.ua Jabber: ti...@jabber.od.ua Skype: t__gra Как программисту стать фрилансером и заработать первую $1000 на oDesk: http://freelance-start.com/earn-first-1000-on-odesk
Re: java.io.IOException: Spill failed when using w/ GzipCodec for Map output
Thanks, Amogh. Good to know :) Michael --- On Tue, 2/23/10, Amogh Vasekar am...@yahoo-inc.com wrote: From: Amogh Vasekar am...@yahoo-inc.com Subject: Re: java.io.IOException: Spill failed when using w/ GzipCodec for Map output To: common-user@hadoop.apache.org common-user@hadoop.apache.org Date: Tuesday, February 23, 2010, 1:45 AM Hi, Certainly this might not cause the issue. But, Hadoop native library is supported only on *nix platforms only. Unfortunately it is known not to work on Cygwin and Mac OS X and has mainly been used on the GNU/Linux platform. http://hadoop.apache.org/common/docs/current/native_libraries.html#Supported+Platforms The mapper log would throw more light on this Amogh On 2/23/10 11:41 AM, jiang licht licht_ji...@yahoo.com wrote: Thanks Amogh. The platform that I got this error is mac os x and hadoop 0.20.1. All native library installed except lzo (which will report that codec not found). But I didn't see this error when I ran the same thing w/o expression specified, in addition I also ran sth with the same expression setting on Fedora 8 and 0.19.1 without any problem. So, I think it might depends on some other settings (wrt what spill is about). Thanks, Michael --- On Mon, 2/22/10, Amogh Vasekar am...@yahoo-inc.com wrote: From: Amogh Vasekar am...@yahoo-inc.com Subject: Re: java.io.IOException: Spill failed when using w/ GzipCodec for Map output To: common-user@hadoop.apache.org common-user@hadoop.apache.org Date: Monday, February 22, 2010, 11:27 PM Hi, Can you please let us know what platform you are running on your hadoop machines? For gzip and lzo to work, you need supported hadoop native libraries ( I remember reading on this somewhere in hadoop wiki :) ) Amogh On 2/23/10 8:16 AM, jiang licht licht_ji...@yahoo.com wrote: I have a pig script. If I don't set any codec for Map output for hadoop cluster, no problem. Now I made the following compression settings, the job failed and the error message is shown below. I guess there are some other settings that should be correctly set together with using the compression. Im using 0.20.1. Any thoughts? Thanks for your help! mapred-site.xml property namemapred.compress.map.output/name valuetrue/value /property property namemapred.map.output.compression.codec/name valueorg.apache.hadoop.io.compress.GzipCodec/value /property error message of failed map task--- java.io.IOException: Spill failed at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:822) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:170) Caused by: java.lang.NullPointerException at org.apache.hadoop.mapred.IFile$Writer.(IFile.java:102) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1198) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:648) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1135) Thanks, Michael
Re: OutOfMemoryError of PIG job (UDF loads big file)
Hm, Im wondering if there are some case studies regarding how ppl handle memory related issues posted somewhere as good references? Thanks, Michael --- On Tue, 2/23/10, Ankur C. Goel gan...@yahoo-inc.com wrote: From: Ankur C. Goel gan...@yahoo-inc.com Subject: Re: OutOfMemoryError of PIG job (UDF loads big file) To: common-user@hadoop.apache.org common-user@hadoop.apache.org Date: Tuesday, February 23, 2010, 2:11 AM Yeah! Wait till you stumble across the need to adjust shuffle/reduce buffers, reuse JVMs, sort factor, copier threads . :-) On 2/23/10 8:06 AM, jiang licht licht_ji...@yahoo.com wrote: Thanks Jeff. I also just found this one and solved my problem. BTW, so many settings to play with :) Michael --- On Mon, 2/22/10, Jeff Zhang zjf...@gmail.com wrote: From: Jeff Zhang zjf...@gmail.com Subject: Re: OutOfMemoryError of PIG job (UDF loads big file) To: common-user@hadoop.apache.org Date: Monday, February 22, 2010, 8:13 PM Hi Jiang, you should set property *mapred.child.java.opts* in mapred-site.xml to increase the memeory as following: property namemapred.child.java.opts/name value-Xmx1024m/value /property and then restart your hadoop cluster On Tue, Feb 23, 2010 at 9:43 AM, jiang licht licht_ji...@yahoo.com wrote: I am running a hadoop job written in PIG. It fails from out of memory because a UDF function consumes a lot of memory, it loads a big file. What are the settings to avoid the following OutOfMemoryError? I guess by simply giving PIG big memory (java -XmxBIGmemory org.apache.pig.Main ...) won't work. Error message --- java.lang.OutOfMemoryError: Java heap space at java.util.regex.Pattern.compile(Pattern.java:1451) at java.util.regex.Pattern.(Pattern.java:1133) at java.util.regex.Pattern.compile(Pattern.java:823) at java.lang.String.split(String.java:2293) at java.lang.String.split(String.java:2335) at UDF.load(Unknown Source) at UDF.load(Unknown Source) at UDF.exec(Unknown Source) at UDF.exec(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:201) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:287) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:278) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332) at org.apache.hadoop.mapred.Child.main(Child.java:155) Thanks! Michael -- Best Regards Jeff Zhang
JobConf.setJobEndNotificationURI
Hi, I am looking for counterpart to JobConf.setJobEndNotificationURI() in org.apache.hadoop.mapreduce Please advise. Thanks
Re: Native libraries on OS X
Hmm. Weird then. Maybe try doing the configure straight up then, setting JVM_DATA_MODEL=64 manually. The base problem you are having is that -m64 isn't getting set to build a 64-bit library. On 2/23/10 7:06 AM, Derek Brown de...@media6degrees.com wrote: Allen, You mean compiling the native library? I did ant compile-native. My code to read/write the files was built in Eclipse, but the same problem occurs with testsequencefile in hadoop-0.21-test.jar. What would you recommend regarding how to build, or tweak or regenerate the configure scripts? Thanks, Derek From: Allen Wittenauer awittena...@linkedin.com To: common-user@hadoop.apache.org Date: Mon, 22 Feb 2010 15:00:04 -0800 Subject: Re: Native libraries on OS X Did you compile this outside of ant? If so, you probably got bit by the fact that the configure scripts assume that the bitness level is shared via an environment variable. [I really need to finish cleaning up my configure script patches. :( ]
Re: On CDH2, (Cloudera EC2) No valid local directories in property: mapred.local.dir
Hi Saptarshi, Can you please ssh into the JobTracker node and check that this directory is mounted, writable by the hadoop user, and not full? -Todd On Fri, Feb 19, 2010 at 2:13 PM, Saptarshi Guha saptarshi.g...@gmail.com wrote: Hello, Not sure if i should post this here or on Cloudera's message board, but here goes. When I run EC2 using the latest CDH2 and Hadoop 0.20 (by settiing the env variables are hadoop-ec2), and launch a job hadoop jar ... I get the following error 10/02/19 17:04:55 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. org.apache.hadoop.ipc.RemoteException: java.io.IOException: No valid local directories in property: mapred.local.dir at org.apache.hadoop.conf.Configuration.getLocalPath(Configuration.java:975) at org.apache.hadoop.mapred.JobConf.getLocalPath(JobConf.java:279) at org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:256) at org.apache.hadoop.mapred.JobInProgress.init(JobInProgress.java:240) at org.apache.hadoop.mapred.JobTracker.submitJob(JobTracker.java:3026) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:960) at org.apache.hadoop.ipc.Client.call(Client.java:740) at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220) at org.apache.hadoop.mapred.$Proxy0.submitJob(Unknown Source) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:841) at org.apache.hadoop.mapreduce.Job.submit(Job.java:432) at org.godhuli.f.RHMR.submitAndMonitorJob(RHMR.java:195) but the value of mapred.local.dir is /mnt/hadoop/mapred/local Any ideas?
Is it possible to run multiple mapreduce jobs from within the same application
In other words: I have a situation where I want to feed the output from the first iteration of my mapreduce job to a second iteration and so on. I have a for loop in my main method to setup the job parameters and to run it through all iterations but on about the third run the Hadoop processes lose their association with the 'jps' command and then weird things start happening. I remember reading somewhere about chaining - is that what is needed? I'm not sure what causes jps to not report the hadoop processes even though they are still active as can be seen with the ps command. Thanks. (This is on version 0.20.1)
CDH2 or Apache Hadoop
Just wanted to get the groups general feelings on what the preferred distro is and why? Obviously assuming one didn't have a service agreement with cloudera. Ananth T Sarathy
applying MAPREDUCE-318 to 0.20.1
Is there a patch that I can apply to my 0.20.1 cluster that fixes issue reported by Scott Carey: Comment out or delete the line: break; //we have a map from this host in ReduceOutput.java in ReduceCopier.fetchOutputs() - line 1804 in 0.19.2-dev, 1911 on 0.20.1-dev and line 1954 currently on trunk. Thanks
Re: Many child processes dont exit
Thank you Jason, your reply is helpful. 2010/2/23 Jason Venner jason.had...@gmail.com Someone is using a threadpool that does not have daemon priority threads, and that is not shutdown before the main method returns. The non daemon threads prevent the jvm from exiting. We had this problem for a while and modified the Child.main to exit, rather than trying to work out and fix the third party library that ran the thread pool. This thecnique does of prevent jvm reuse. On Sat, Feb 20, 2010 at 6:49 AM, Ted Yu yuzhih...@gmail.com wrote: Do you have System.exit() as the last line in your main() ? Job job = createSubmittableJob(conf, otherArgs); System.exit(job.waitForCompletion(true)? 0 : 1); On Sat, Feb 20, 2010 at 12:32 AM, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Ted, Yes. Every hour a job will be created and started, and when it finished, it will maintain. The logs looks like normal, do you know what can lead to this happen?Thank you. LvZheng 2010/2/20 Ted Yu yuzhih...@gmail.com Did the number of child processes increase over time ? On Friday, February 19, 2010, Zheng Lv lvzheng19800...@gmail.com wrote: Hello Edson, Thank you for your reply. I don't want to kill them, I want to know why these child processes don't exit, and to know how to make them exit successfully when they finish. Any ideas? Thank you. LvZheng 2010/2/18 Edson Ramiro erlfi...@gmail.com Do you want to kill them ? if yes, you can use ./bin/slaves.sh pkill java but it will kill the datanode and tasktracker processes in all slaves and you'll need to start these processes again. Edson Ramiro On 14 February 2010 22:09, Zheng Lv lvzheng19800...@gmail.com wrote: any idea? 2010/2/11 Zheng Lv lvzheng19800...@gmail.com Hello Everyone, We often find many child processes in datanodes, which have already finished for long time. And following are the jstack log: Full thread dump Java HotSpot(TM) 64-Bit Server VM (14.3-b01 mixed mode): DestroyJavaVM prio=10 tid=0x2aaac8019800 nid=0x2422 waiting on condition [0x] java.lang.Thread.State: RUNNABLE NioProcessor-31 prio=10 tid=0x439fa000 nid=0x2826 runnable [0x4100a000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:215) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked 0x2aaab9b5f6f8 (a sun.nio.ch.Util$1) - locked 0x2aaab9b5f710 (a java.util.Collections$UnmodifiableSet) - locked 0x2aaab9b5f680 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:80) at org.apache.mina.transport.socket.nio.NioProcessor.select(NioProcessor.java:65) at org.apache.mina.common.AbstractPollingIoProcessor$Worker.run(AbstractPollingIoProcessor.java:672) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:51) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) pool-15-thread-1 prio=10 tid=0x2aaac802d000 nid=0x2825 waiting on condition [0x41604000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x2aaab9b61620 (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Threa -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Job Tracker questions
This is the sample code for get the counters for one specified job ( I have tested it on my cluster). What you need to change is the jobtracker address and jobID. Remember to put this class in package org.apache.hadoop.mapred, becuase the JobSubmissionProtocol is not public. package org.apache.hadoop.mapred; import java.io.IOException; import java.net.InetSocketAddress; import java.util.Iterator; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.ipc.RPC; import org.apache.hadoop.mapred.Counters.Counter; import org.apache.hadoop.mapred.Counters.Group; public class MyServer { public static void main(String[] args) throws IOException { InetSocketAddress address = new InetSocketAddress(sha-cs-04, 9001); JobSubmissionProtocol jobClient = (JobSubmissionProtocol) RPC.getProxy( JobSubmissionProtocol.class, JobSubmissionProtocol.versionID, address, new Configuration()); JobID jobID=new JobID(201002111947,460); Counters counters= jobClient.getJobCounters(jobID); IteratorGroup iter=counters.iterator(); while(iter.hasNext()){ Group group=iter.next(); System.out.println(group.getDisplayName()); IteratorCounter cIter=group.iterator(); while(cIter.hasNext()){ Counter counter=cIter.next(); System.out.println(\t+counter.getName()+:+counter.getValue()); } } } } On Tue, Feb 23, 2010 at 9:38 PM, Mark N nipen.m...@gmail.com wrote: i have to find Counter and i am using inetsocketAddress and connecting into the jobtracker jsp with help of it am able to find the counter when it is running on pseudo mode but as i am trying to run this in cluster . i am not able to get any counters . SI there any parameters that I need to change in configuration to read the counter information ? Thanks in advance On Tue, Feb 9, 2010 at 8:49 PM, Jeff Zhang zjf...@gmail.com wrote: JobClient also use proxy of JobTracker. On Mon, Feb 8, 2010 at 11:19 PM, Mark N nipen.m...@gmail.com wrote: Did you check the jobClient source code? On Thu, Feb 4, 2010 at 5:21 PM, Jeff Zhang zjf...@gmail.com wrote: I look at the source code, it seems the job tracker web ui also use the proxy of JobTracker to get the counter information rather the xml file. On Thu, Feb 4, 2010 at 7:29 PM, Mark N nipen.m...@gmail.com wrote: yes we can create a webservice in java which would be called by .net to display these counters. But since the java code to read these counters needs use hadoop APIs ( job client ) , am not sure we can create a webservice to read the counters Question is how does the default hadoop task tracker display counter information in JSP pages ? does it read from the XML files ? thanks, On Thu, Feb 4, 2010 at 5:08 PM, Jeff Zhang zjf...@gmail.com wrote: I think you can create web service using Java, and then in .net using the web service to display the result. On Thu, Feb 4, 2010 at 7:21 PM, Jeff Zhang zjf...@gmail.com wrote: Do you mean want to connect the JobTracker using .Net ? If so, I'm afraid I have no idea how to this. The rpc of hadoop is language dependent. On Thu, Feb 4, 2010 at 7:18 PM, Mark N nipen.m...@gmail.com wrote: could you please elaborate on this ( * hint to get started as am very new to hadoop? ) So far I could succesfully read all the default and custom counters. Currently we are having a .net client. thanks in advance. On Thu, Feb 4, 2010 at 4:53 PM, Jeff Zhang zjf...@gmail.com wrote: Well, you can create a proxy of JobTracker in client side, and then you can use the API of JobTracker to get the information of jobs. The Proxy take the responsibility of communication with the Master Node. Read the source code of JobClient can help you. On Thu, Feb 4, 2010 at 6:59 PM, Mark N nipen.m...@gmail.com wrote: Ye currently am using jobclient to read these counters. But We are not able to use *webservices *because the jar which is used to read the counters from running hadoop job is itself a Hadoop program If we could have pure Java Api which is run without hadoop command then we could return the counter variable into webservices and show in UI. Any help or technique to show thsese counters in the UI would be appreciated ( not necessarily using web service ) I am using webservices because I am
Re: MultipleOutputs and new 20.1 API
MultipleOutputs is ported to use new api through http://issues.apache.org/jira/browse/MAPREDUCE-370. This change is done in branch 0.21. Thanks Amareshwari On 2/23/10 5:42 PM, Chris White chriswhite...@googlemail.com wrote: Does the code for MutipleOutputs work (as described in the Javadocs) for the new 20.1 API? This object expects a JobConf to most of its methods, which is deprecated in 20.1? If not then is there any plans to update MultipleOutputs to work with the new API?
Re: How are intermediate key/value pairs materialized between map and reduce?
Hi, Can you let us know what is the value for : Map input records Map spilled records Map output bytes Is there any side effect file written? Thanks, Amogh On 2/23/10 8:57 PM, Tim Kiefer tim-kie...@gmx.de wrote: No... 900GB is in the map column. Reduce adds another ~70GB of FILE_BYTES_WRITTEN and the total column consequently shows ~970GB. Am 23.02.2010 16:11, schrieb Ed Mazur: Hi Tim, I'm guessing a lot of these writes are happening on the reduce side. On the JT web interface, there are three columns: map, reduce, overall. Is the 900GB figure from the overall column? The value in the map column will probably be closer to what you were expecting. There are writes on the reduce side too during the shuffle and multi-pass merge. Ed 2010/2/23 Tim Kiefer tim-kie...@gmx.de: Hi Gang, thanks for your reply. To clarify: I look at the statistics through the job tracker. In the webinterface for my job I have columns for map, reduce and total. What I was refering to is map - i.e. I see FILE_BYTES_WRITTEN = 3 * Map Output Bytes in the map column. About the replication factor: I would expect the exact same thing - changing to 6 has no influence on FILE_BYTES_WRITTEN. About the sorting: I have io.sort.mb = 100 and io.sort.factor = 10. Furthermore, I have 40 mappers and map output data is ~300GB. I can't see how that ends up in a factor 3? - tim Am 23.02.2010 14:39, schrieb Gang Luo: Hi Tim, the intermediate data is materialized to local file system. Before it is available for reducers, mappers will sort them. If the buffer (io.sort.mb) is too small for the intermediate data, multi-phase sorting happen, which means you read and write the same bit more than one time. Besides, are you looking at the statistics per mapper through the job tracker, or just the information output when a job finish? If you look at the information given out at the end of the job, note that this is an overall statistics which include sorting at reduce side. It also include the amount of data written to HDFS (I am not 100% sure). And, the FILE-BYTES_WRITTEN has nothing to do with the replication factor. I think if you change the factor to 6, FILE_BYTES_WRITTEN is still the same. -Gang Hi there, can anybody help me out on a (most likely) simple unclarity. I am wondering how intermediate key/value pairs are materialized. I have a job where the map phase produces 600,000 records and map output bytes is ~300GB. What I thought (up to now) is that these 600,000 records, i.e., 300GB, are materialized locally by the mappers and that later on reducers pull these records (based on the key). What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is as high as ~900GB. So - where does the factor 3 come from between Map output bytes and FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the file system - but that should be HDFS only?! Thanks - tim
Re: CDH2 or Apache Hadoop
I've had a great experience with CDH2 on various platforms (Ubuntu, OpenSolaris). Worked as advertised. My 2 cents. On Tue, Feb 23, 2010 at 3:13 PM, Ananth Sarathy ananth.t.sara...@gmail.comwrote: Just wanted to get the groups general feelings on what the preferred distro is and why? Obviously assuming one didn't have a service agreement with cloudera. Ananth T Sarathy -- Dali Kilani === Twitter : @dadicool Phone : (650) 492-5921 (Google Voice) E-Fax : (775) 552-2982
Re: CDH2 or Apache Hadoop
I have been very grateful for the Cloudera Hadoop distributions. They are well-documented and current, and the Cloudera folks are quick to answer questions both in the issue-tracking system for their customers and on the public mailing lists. Zak On Tue, Feb 23, 2010 at 11:59 PM, Eric Sammer e...@lifeless.net wrote: On 2/23/10 6:13 PM, Ananth Sarathy wrote: Just wanted to get the groups general feelings on what the preferred distro is and why? Obviously assuming one didn't have a service agreement with cloudera. Ananth: CDH pros: - Nice init scripts. - Integration with the alternatives management system. - Nice back ported patches. - Yum / Apt repository integration / easy updates. - Frequent bug fixes released. CDH cons: - Filing bugs with ASF is a bit weird because you may have slightly different code due to patches. This can be mitigated by filing bugs with Cloudera and letting them go upstream. I'm not entirely neutral, but CDH has little in the way of cons to me. Hope that helps. -- Eric Sammer e...@lifeless.net http://esammer.blogspot.com
Re: How are intermediate key/value pairs materialized between map and reduce?
Sure, I see: Map input eecords: 10,000 Map output records: 600,000 Map output bytes: 307,216,800,000 (each reacord is about 500kb - that fits the application and is to be expected) Map spilled records: 1,802,965 (ahhh... now that you ask for it - here there also is a factor of 3 between output and spilled). So - question now is: why are three times as many records spilled than actually produced by the mappers? In my map function, I do not perform any additional file writing besides the context.write() for the intermediate records. Thanks, Tim Am 24.02.2010 05:28, schrieb Amogh Vasekar: Hi, Can you let us know what is the value for : Map input records Map spilled records Map output bytes Is there any side effect file written? Thanks, Amogh On 2/23/10 8:57 PM, Tim Kiefertim-kie...@gmx.de wrote: No... 900GB is in the map column. Reduce adds another ~70GB of FILE_BYTES_WRITTEN and the total column consequently shows ~970GB. Am 23.02.2010 16:11, schrieb Ed Mazur: Hi Tim, I'm guessing a lot of these writes are happening on the reduce side. On the JT web interface, there are three columns: map, reduce, overall. Is the 900GB figure from the overall column? The value in the map column will probably be closer to what you were expecting. There are writes on the reduce side too during the shuffle and multi-pass merge. Ed 2010/2/23 Tim Kiefertim-kie...@gmx.de: Hi Gang, thanks for your reply. To clarify: I look at the statistics through the job tracker. In the webinterface for my job I have columns for map, reduce and total. What I was refering to is map - i.e. I see FILE_BYTES_WRITTEN = 3 * Map Output Bytes in the map column. About the replication factor: I would expect the exact same thing - changing to 6 has no influence on FILE_BYTES_WRITTEN. About the sorting: I have io.sort.mb = 100 and io.sort.factor = 10. Furthermore, I have 40 mappers and map output data is ~300GB. I can't see how that ends up in a factor 3? - tim Am 23.02.2010 14:39, schrieb Gang Luo: Hi Tim, the intermediate data is materialized to local file system. Before it is available for reducers, mappers will sort them. If the buffer (io.sort.mb) is too small for the intermediate data, multi-phase sorting happen, which means you read and write the same bit more than one time. Besides, are you looking at the statistics per mapper through the job tracker, or just the information output when a job finish? If you look at the information given out at the end of the job, note that this is an overall statistics which include sorting at reduce side. It also include the amount of data written to HDFS (I am not 100% sure). And, the FILE-BYTES_WRITTEN has nothing to do with the replication factor. I think if you change the factor to 6, FILE_BYTES_WRITTEN is still the same. -Gang Hi there, can anybody help me out on a (most likely) simple unclarity. I am wondering how intermediate key/value pairs are materialized. I have a job where the map phase produces 600,000 records and map output bytes is ~300GB. What I thought (up to now) is that these 600,000 records, i.e., 300GB, are materialized locally by the mappers and that later on reducers pull these records (based on the key). What I see (and cannot explain) is that the FILE_BYTES_WRITTEN counter is as high as ~900GB. So - where does the factor 3 come from between Map output bytes and FILE_BYTES_WRITTEN??? I thought about the replication factor of 3 in the file system - but that should be HDFS only?! Thanks - tim