Re: rack awarness unexpected behaviour
And that's the rub. Rack awareness is an artificial construct. You want to fix it and match the real world, you need to balance the racks physically. Otherwise you need to rewrite load balancing to take in to consideration the number and power of the nodes in the rack. The short answer, it's easier to fudge the values in the script. Sent from a remote device. Please excuse any typos... Mike Segel On Oct 3, 2013, at 8:58 AM, Marc Sturlese marc.sturl...@gmail.com wrote: Doing that will balance the block writing but I think here you loose the concept of physical rack awareness. Let's say you have 2 physical racks, one with 2 servers and one with 4. If you artificially tell hadoop that one rack has 3 servers and the other 3 you are loosing the concept of rack awareness. You're not guaranteeing that each physical rack contains at least a replica of each block. So if you have 2 racks with different number of servers, it's not possible to do proper rack awareness without filling the disks of the rack with less servers first. Am I right or am I missing something? -- View this message in context: http://lucene.472066.n3.nabble.com/rack-awareness-unexpected-behaviour-tp4086029p4093337.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Re: rack awarness unexpected behaviour
Rack aware is an artificial concept. Meaning you can define where a node is regardless of is real position in the rack. Going from memory, and its probably been changed in later versions of the code... Isn't the replication... Copy on node 1, copy on same rack, third copy on different rack? Or has this been improved upon? Sent from a remote device. Please excuse any typos... Mike Segel On Aug 22, 2013, at 5:14 AM, Harsh J ha...@cloudera.com wrote: I'm not aware of a bug in 0.20.2 that would not honor the Rack Awareness, but have you done the two below checks as well? 1. Ensuring JT has the same rack awareness scripts and configuration so it can use it for scheduling, and, 2. Checking if the map and reduce tasks are being evenly spread across both racks. On Thu, Aug 22, 2013 at 2:50 PM, Marc Sturlese marc.sturl...@gmail.com wrote: I'm on cdh3u4 (0.20.2), gonna try to read a bit on this bug -- View this message in context: http://lucene.472066.n3.nabble.com/rack-awareness-unexpected-behaviour-tp4086029p4086049.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com. -- Harsh J
Re: DN cannot talk to NN using Kerberos on secured hdfs
That should be a bug. All host names should be case insensitive. Sent from a remote device. Please excuse any typos... Mike Segel On Sep 12, 2012, at 12:25 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: This is because JAVA only supports AES 128 by default. To support AES 256, you will need to install the unlimited-JCE policy jar from http://www.oracle.com/technetwork/java/javase/downloads/index.html Also, there is another case of Kerberos having issues with hostnames with some/all letters in caps. If that is the case, you should try tweaking your host-names to all lower-case. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Sep 12, 2012, at 9:47 AM, Shumin Wu wrote: Hi, I am setting up a secured hdfs using Kerberos. I got NN, 2NN working just fine. However, DN cannot talk to NN and throws the following exception. I disabled the AES256 from keytab, which in theory it should fall back to the AES128, or whatever encryption on the top of the list, but it still complains about the same. Any help, suggestion, comment is highly appreciated. *Apache Hadoop version: * 2.0.0 *Security configuration Snippet of DN:* ... property namedfs.datanode.data.dir.perm/name value700/value /property property namedfs.datanode.address/name value0.0.0.0:1004/value /property property namedfs.datanode.http.address/name value0.0.0.0:1006/value /property property namedfs.datanode.keytab.file/name value/etc/hadoop/conf/hdfs.keytab/value property namedfs.datanode.kerberos.principal/name valuehdfs/_HOST@REALM/value /property ... *Exceptions in Log:* javax.security.sasl. SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)] at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:159) at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1199) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1393) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:710) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:509) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:484) Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled) at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:741) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:323) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:267) at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:137) ... 5 more Caused by: KrbException: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled Thanks, Shumin Wu
Re: DN cannot talk to NN using Kerberos on secured hdfs
That should be a bug. All host names should be case insensitive. Sent from a remote device. Please excuse any typos... Mike Segel On Sep 12, 2012, at 12:25 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: This is because JAVA only supports AES 128 by default. To support AES 256, you will need to install the unlimited-JCE policy jar from http://www.oracle.com/technetwork/java/javase/downloads/index.html Also, there is another case of Kerberos having issues with hostnames with some/all letters in caps. If that is the case, you should try tweaking your host-names to all lower-case. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Sep 12, 2012, at 9:47 AM, Shumin Wu wrote: Hi, I am setting up a secured hdfs using Kerberos. I got NN, 2NN working just fine. However, DN cannot talk to NN and throws the following exception. I disabled the AES256 from keytab, which in theory it should fall back to the AES128, or whatever encryption on the top of the list, but it still complains about the same. Any help, suggestion, comment is highly appreciated. *Apache Hadoop version: * 2.0.0 *Security configuration Snippet of DN:* ... property namedfs.datanode.data.dir.perm/name value700/value /property property namedfs.datanode.address/name value0.0.0.0:1004/value /property property namedfs.datanode.http.address/name value0.0.0.0:1006/value /property property namedfs.datanode.keytab.file/name value/etc/hadoop/conf/hdfs.keytab/value property namedfs.datanode.kerberos.principal/name valuehdfs/_HOST@REALM/value /property ... *Exceptions in Log:* javax.security.sasl. SaslException: GSS initiate failed [Caused by GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled)] at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:159) at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1199) at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1393) at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:710) at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:509) at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:484) Caused by: GSSException: Failure unspecified at GSS-API level (Mechanism level: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled) at sun.security.jgss.krb5.Krb5Context.acceptSecContext(Krb5Context.java:741) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:323) at sun.security.jgss.GSSContextImpl.acceptSecContext(GSSContextImpl.java:267) at com.sun.security.sasl.gsskerb.GssKrb5Server.evaluateResponse(GssKrb5Server.java:137) ... 5 more Caused by: KrbException: Encryption type AES256 CTS mode with HMAC SHA1-96 is not supported/enabled Thanks, Shumin Wu
Re: Hive error when loading csv data.
Yup. I just didnt add the quotes. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 26, 2012, at 4:30 PM, Sandeep Reddy P sandeepreddy.3...@gmail.com wrote: Thanks for the reply. I didnt get that Michael. My f2 should be abc,def On Tue, Jun 26, 2012 at 4:00 PM, Michael Segel michael_se...@hotmail.comwrote: Alternatively you could write a simple script to convert the csv to a pipe delimited file so that abc,def will be abc,def. On Jun 26, 2012, at 2:51 PM, Harsh J wrote: Hive's delimited-fields-format record reader does not handle quoted text that carry the same delimiter within them. Excel supports such records, so it reads it fine. You will need to create your table with a custom InputFormat class that can handle this (Try using OpenCSV readers, they support this), instead of relying on Hive to do this for you. If you're successful in your approach, please also consider contributing something back to Hive/Pig to help others. On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P sandeepreddy.3...@gmail.com wrote: Hi all, I have a csv file with 46 columns but i'm getting error when i do some analysis on that data type. For simplification i have taken 3 columns and now my csv is like c,zxy,xyz d,abc,def,abcd i have created table for this data using, hive create table test3( f1 string, f2 string, f3 string) row format delimited fields terminated by ,; OK Time taken: 0.143 seconds hive load data local inpath '/home/training/a.csv' into table test3; Copying data from file:/home/training/a.csv Copying file: file:/home/training/a.csv Loading data to table default.test3 OK Time taken: 0.276 seconds hive select * from test3; OK c zxy xyz d abcdef Time taken: 0.156 seconds When i do select f2 from test3; my results are, OK zxy abc but this should be abc,def When i open the same csv file with Microsoft Excel i got abc,def How should i solve this error?? -- Thanks, sandeep -- -- Harsh J -- Thanks, sandeep
Re: Hive error when loading csv data.
What I am suggesting is to write a simple script , maybe using python, where you replace the commas that are used as field delimiter Sent from a remote device. Please excuse any typos... Mike Segel On Jun 26, 2012, at 8:58 PM, Sandeep Reddy P sandeepreddy.3...@gmail.com wrote: If i do that my data will be d|abc|def|abcd my problem is not solved On Tue, Jun 26, 2012 at 6:48 PM, Michel Segel michael_se...@hotmail.comwrote: Yup. I just didnt add the quotes. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 26, 2012, at 4:30 PM, Sandeep Reddy P sandeepreddy.3...@gmail.com wrote: Thanks for the reply. I didnt get that Michael. My f2 should be abc,def On Tue, Jun 26, 2012 at 4:00 PM, Michael Segel michael_se...@hotmail.comwrote: Alternatively you could write a simple script to convert the csv to a pipe delimited file so that abc,def will be abc,def. On Jun 26, 2012, at 2:51 PM, Harsh J wrote: Hive's delimited-fields-format record reader does not handle quoted text that carry the same delimiter within them. Excel supports such records, so it reads it fine. You will need to create your table with a custom InputFormat class that can handle this (Try using OpenCSV readers, they support this), instead of relying on Hive to do this for you. If you're successful in your approach, please also consider contributing something back to Hive/Pig to help others. On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P sandeepreddy.3...@gmail.com wrote: Hi all, I have a csv file with 46 columns but i'm getting error when i do some analysis on that data type. For simplification i have taken 3 columns and now my csv is like c,zxy,xyz d,abc,def,abcd i have created table for this data using, hive create table test3( f1 string, f2 string, f3 string) row format delimited fields terminated by ,; OK Time taken: 0.143 seconds hive load data local inpath '/home/training/a.csv' into table test3; Copying data from file:/home/training/a.csv Copying file: file:/home/training/a.csv Loading data to table default.test3 OK Time taken: 0.276 seconds hive select * from test3; OK c zxy xyz d abcdef Time taken: 0.156 seconds When i do select f2 from test3; my results are, OK zxy abc but this should be abc,def When i open the same csv file with Microsoft Excel i got abc,def How should i solve this error?? -- Thanks, sandeep -- -- Harsh J -- Thanks, sandeep -- Thanks, sandeep
Re: Hive error when loading csv data.
Sorry, I was saying that you can write a python script that replaces the delimiter with a | and ignore the commas within quotes. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 26, 2012, at 8:58 PM, Sandeep Reddy P sandeepreddy.3...@gmail.com wrote: If i do that my data will be d|abc|def|abcd my problem is not solved On Tue, Jun 26, 2012 at 6:48 PM, Michel Segel michael_se...@hotmail.comwrote: Yup. I just didnt add the quotes. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 26, 2012, at 4:30 PM, Sandeep Reddy P sandeepreddy.3...@gmail.com wrote: Thanks for the reply. I didnt get that Michael. My f2 should be abc,def On Tue, Jun 26, 2012 at 4:00 PM, Michael Segel michael_se...@hotmail.comwrote: Alternatively you could write a simple script to convert the csv to a pipe delimited file so that abc,def will be abc,def. On Jun 26, 2012, at 2:51 PM, Harsh J wrote: Hive's delimited-fields-format record reader does not handle quoted text that carry the same delimiter within them. Excel supports such records, so it reads it fine. You will need to create your table with a custom InputFormat class that can handle this (Try using OpenCSV readers, they support this), instead of relying on Hive to do this for you. If you're successful in your approach, please also consider contributing something back to Hive/Pig to help others. On Wed, Jun 27, 2012 at 12:37 AM, Sandeep Reddy P sandeepreddy.3...@gmail.com wrote: Hi all, I have a csv file with 46 columns but i'm getting error when i do some analysis on that data type. For simplification i have taken 3 columns and now my csv is like c,zxy,xyz d,abc,def,abcd i have created table for this data using, hive create table test3( f1 string, f2 string, f3 string) row format delimited fields terminated by ,; OK Time taken: 0.143 seconds hive load data local inpath '/home/training/a.csv' into table test3; Copying data from file:/home/training/a.csv Copying file: file:/home/training/a.csv Loading data to table default.test3 OK Time taken: 0.276 seconds hive select * from test3; OK c zxy xyz d abcdef Time taken: 0.156 seconds When i do select f2 from test3; my results are, OK zxy abc but this should be abc,def When i open the same csv file with Microsoft Excel i got abc,def How should i solve this error?? -- Thanks, sandeep -- -- Harsh J -- Thanks, sandeep -- Thanks, sandeep
Re: How to mapreduce in the scenario
Hive? Sure Assuming you mean that the id is a FK common amongst the tables... Sent from a remote device. Please excuse any typos... Mike Segel On May 29, 2012, at 5:29 AM, liuzhg liu...@cernet.com wrote: Hi, I wonder that if Hadoop can solve effectively the question as following: == input file: a.txt, b.txt result: c.txt a.txt: id1,name1,age1,... id2,name2,age2,... id3,name3,age3,... id4,name4,age4,... b.txt: id1,address1,... id2,address2,... id3,address3,... c.txt id1,name1,age1,address1,... id2,name2,age2,address2,... I know that it can be done well by database. But I want to handle it with hadoop if possible. Can hadoop meet the requirement? Any suggestion can help me. Thank you very much! Best Regards, Gump
Re: How to Integrate LDAP in Hadoop ?
Which release? Version? I believe there are variables in the *-site.xml that allow LDAP integration ... Sent from a remote device. Please excuse any typos... Mike Segel On May 26, 2012, at 7:40 AM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, Did any one work on hadoop with LDAP integration. Please help me for same. Thanks samir
Re: Reduce Hangs at 66%
Well... Lots of information but still missing some of the basics... Which release and version? What are your ulimits set to? How much free disk space do you have? What are you attempting to do? Stuff like that. Sent from a remote device. Please excuse any typos... Mike Segel On May 2, 2012, at 4:49 PM, Keith Thompson kthom...@binghamton.edu wrote: I am running a task which gets to 66% of the Reduce step and then hangs indefinitely. Here is the log file (I apologize if I am putting too much here but I am not exactly sure what is relevant): 2012-05-02 16:42:52,975 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201202240659_6433_r_00_0' to tip task_201202240659_6433_r_00, for tracker 'tracker_analytix7:localhost.localdomain/127.0.0.1:56515' 2012-05-02 16:42:53,584 INFO org.apache.hadoop.mapred.JobInProgress: Task 'attempt_201202240659_6433_m_01_0' has completed task_201202240659_6433_m_01 successfully. 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201202240659_6432_r_00_0: Task attempt_201202240659_6432_r_00_0 failed to report status for 1800 seconds. Killing! 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_00_0' 2012-05-02 17:00:47,546 INFO org.apache.hadoop.mapred.JobTracker: Adding task (TASK_CLEANUP) 'attempt_201202240659_6432_r_00_0' to tip task_201202240659_6432_r_00, for tracker 'tracker_analytix4:localhost.localdomain/127.0.0.1:44204' 2012-05-02 17:00:48,763 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_00_0' 2012-05-02 17:00:48,957 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201202240659_6432_r_00_1' to tip task_201202240659_6432_r_00, for tracker 'tracker_analytix5:localhost.localdomain/127.0.0.1:59117' 2012-05-02 17:00:56,559 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201202240659_6432_r_00_1: java.io.IOException: The temporary job-output directory hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary doesn't exist! at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) 2012-05-02 17:00:59,903 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_00_1' 2012-05-02 17:00:59,906 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201202240659_6432_r_00_2' to tip task_201202240659_6432_r_00, for tracker 'tracker_analytix3:localhost.localdomain/127.0.0.1:39980' 2012-05-02 17:01:07,200 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201202240659_6432_r_00_2: java.io.IOException: The temporary job-output directory hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary doesn't exist! at org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250) at org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:240) at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:438) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) 2012-05-02 17:01:10,239 INFO org.apache.hadoop.mapred.JobTracker: Removing task 'attempt_201202240659_6432_r_00_2' 2012-05-02 17:01:10,283 INFO org.apache.hadoop.mapred.JobTracker: Adding task (REDUCE) 'attempt_201202240659_6432_r_00_3' to tip task_201202240659_6432_r_00, for tracker 'tracker_analytix2:localhost.localdomain/127.0.0.1:33297' 2012-05-02 17:01:18,188 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201202240659_6432_r_00_3: java.io.IOException: The temporary job-output directory hdfs://analytix1:9000/thompson/outputDensity/density1/_temporary doesn't exist! at
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3
Well, you've kind of painted yourself in to a corner... Not sure why you didn't get a response from the Cloudera lists, but it's a generic question... 8 out of 10 TB. Are you talking effective storage or actual disks? And please tell me you've already ordered more hardware.. Right? And please tell me this isn't your production cluster... (Strong hint to Strata and Cloudea... You really want to accept my upcoming proposal talk... ;-) Sent from a remote device. Please excuse any typos... Mike Segel On May 3, 2012, at 5:25 AM, Austin Chungath austi...@gmail.com wrote: Yes. This was first posted on the cloudera mailing list. There were no responses. But this is not related to cloudera as such. cdh3 is based on apache hadoop 0.20 as the base. My data is in apache hadoop 0.20.205 There is an upgrade namenode option when we are migrating to a higher version say from 0.20 to 0.20.205 but here I am downgrading from 0.20.205 to 0.20 (cdh3) Is this possible? On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi prash1...@gmail.comwrote: Seems like a matter of upgrade. I am not a Cloudera user so would not know much, but you might find some help moving this to Cloudera mailing list. On Thu, May 3, 2012 at 2:51 AM, Austin Chungath austi...@gmail.com wrote: There is only one cluster. I am not copying between clusters. Say I have a cluster running apache 0.20.205 with 10 TB storage capacity and has about 8 TB of data. Now how can I migrate the same cluster to use cdh3 and use that same 8 TB of data. I can't copy 8 TB of data using distcp because I have only 2 TB of free space On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar nitinpawar...@gmail.com wrote: you can actually look at the distcp http://hadoop.apache.org/common/docs/r0.20.0/distcp.html but this means that you have two different set of clusters available to do the migration On Thu, May 3, 2012 at 12:51 PM, Austin Chungath austi...@gmail.com wrote: Thanks for the suggestions, My concerns are that I can't actually copyToLocal from the dfs because the data is huge. Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a namenode upgrade. I don't have to copy data out of dfs. But here I am having Apache hadoop 0.20.205 and I want to use CDH3 now, which is based on 0.20 Now it is actually a downgrade as 0.20.205's namenode info has to be used by 0.20's namenode. Any idea how I can achieve what I am trying to do? Thanks. On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar nitinpawar...@gmail.com wrote: i can think of following options 1) write a simple get and put code which gets the data from DFS and loads it in dfs 2) see if the distcp between both versions are compatible 3) this is what I had done (and my data was hardly few hundred GB) .. did a dfs -copyToLocal and then in the new grid did a copyFromLocal On Thu, May 3, 2012 at 11:41 AM, Austin Chungath austi...@gmail.com wrote: Hi, I am migrating from Apache hadoop 0.20.205 to CDH3u3. I don't want to lose the data that is in the HDFS of Apache hadoop 0.20.205. How do I migrate to CDH3u3 but keep the data that I have on 0.20.205. What is the best practice/ techniques to do this? Thanks Regards, Austin -- Nitin Pawar -- Nitin Pawar
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3
Ok... When you get your new hardware... Set up one server as your new NN, JT, SN. Set up the others as a DN. (Cloudera CDH3u3) On your existing cluster... Remove your old log files, temp files on HDFS anything you don't need. This should give you some more space. Start copying some of the directories/files to the new cluster. As you gain space, decommission a node, rebalance, add node to new cluster... It's a slow process. Should I remind you to make sure you up you bandwidth setting, and to clean up the hdfs directories when you repurpose the nodes? Does this make sense? Sent from a remote device. Please excuse any typos... Mike Segel On May 3, 2012, at 5:46 AM, Austin Chungath austi...@gmail.com wrote: Yeah I know :-) and this is not a production cluster ;-) and yes there is more hardware coming :-) On Thu, May 3, 2012 at 4:10 PM, Michel Segel michael_se...@hotmail.comwrote: Well, you've kind of painted yourself in to a corner... Not sure why you didn't get a response from the Cloudera lists, but it's a generic question... 8 out of 10 TB. Are you talking effective storage or actual disks? And please tell me you've already ordered more hardware.. Right? And please tell me this isn't your production cluster... (Strong hint to Strata and Cloudea... You really want to accept my upcoming proposal talk... ;-) Sent from a remote device. Please excuse any typos... Mike Segel On May 3, 2012, at 5:25 AM, Austin Chungath austi...@gmail.com wrote: Yes. This was first posted on the cloudera mailing list. There were no responses. But this is not related to cloudera as such. cdh3 is based on apache hadoop 0.20 as the base. My data is in apache hadoop 0.20.205 There is an upgrade namenode option when we are migrating to a higher version say from 0.20 to 0.20.205 but here I am downgrading from 0.20.205 to 0.20 (cdh3) Is this possible? On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi prash1...@gmail.com wrote: Seems like a matter of upgrade. I am not a Cloudera user so would not know much, but you might find some help moving this to Cloudera mailing list. On Thu, May 3, 2012 at 2:51 AM, Austin Chungath austi...@gmail.com wrote: There is only one cluster. I am not copying between clusters. Say I have a cluster running apache 0.20.205 with 10 TB storage capacity and has about 8 TB of data. Now how can I migrate the same cluster to use cdh3 and use that same 8 TB of data. I can't copy 8 TB of data using distcp because I have only 2 TB of free space On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar nitinpawar...@gmail.com wrote: you can actually look at the distcp http://hadoop.apache.org/common/docs/r0.20.0/distcp.html but this means that you have two different set of clusters available to do the migration On Thu, May 3, 2012 at 12:51 PM, Austin Chungath austi...@gmail.com wrote: Thanks for the suggestions, My concerns are that I can't actually copyToLocal from the dfs because the data is huge. Say if my hadoop was 0.20 and I am upgrading to 0.20.205 I can do a namenode upgrade. I don't have to copy data out of dfs. But here I am having Apache hadoop 0.20.205 and I want to use CDH3 now, which is based on 0.20 Now it is actually a downgrade as 0.20.205's namenode info has to be used by 0.20's namenode. Any idea how I can achieve what I am trying to do? Thanks. On Thu, May 3, 2012 at 12:23 PM, Nitin Pawar nitinpawar...@gmail.com wrote: i can think of following options 1) write a simple get and put code which gets the data from DFS and loads it in dfs 2) see if the distcp between both versions are compatible 3) this is what I had done (and my data was hardly few hundred GB) .. did a dfs -copyToLocal and then in the new grid did a copyFromLocal On Thu, May 3, 2012 at 11:41 AM, Austin Chungath austi...@gmail.com wrote: Hi, I am migrating from Apache hadoop 0.20.205 to CDH3u3. I don't want to lose the data that is in the HDFS of Apache hadoop 0.20.205. How do I migrate to CDH3u3 but keep the data that I have on 0.20.205. What is the best practice/ techniques to do this? Thanks Regards, Austin -- Nitin Pawar -- Nitin Pawar
Re: Best practice to migrate HDFS from 0.20.205 to CDH3u3
Ok... So riddle me this... I currently have a replication factor of 3. I reset it to two. What do you have to do to get the replication factor of 3 down to 2? Do I just try to rebalance the nodes? The point is that you are looking at a very small cluster. You may want to start the be cluster with a replication factor of 2 and then when the data is moved over, increase it to a factor of 3. Or maybe not. I do a distcp to. Copy the data and after each distcp, I do an fsck for a sanity check and then remove the files I copied. As I gain more room, I can then slowly drop nodes, do an fsck, rebalance and then repeat. Even though this us a dev cluster, the OP wants to retain the data. There are other options depending on the amount and size of new hardware. I mean make one machine a RAID 5 machine, copy data to it clearing off the cluster. If 8TB was the amount of disk used, that would be 2. TB used. Let's say 3TB. Going raid 5, how much disk is that? So you could fit it on one machine, depending on hardware, or maybe 2 machines... Now you can rebuild initial cluster and then move data back. Then rebuild those machines. Lots of options... ;-) Sent from a remote device. Please excuse any typos... Mike Segel On May 3, 2012, at 11:26 AM, Suresh Srinivas sur...@hortonworks.com wrote: This probably is a more relevant question in CDH mailing lists. That said, what Edward is suggesting seems reasonable. Reduce replication factor, decommission some of the nodes and create a new cluster with those nodes and do distcp. Could you share with us the reasons you want to migrate from Apache 205? Regards, Suresh On Thu, May 3, 2012 at 8:25 AM, Edward Capriolo edlinuxg...@gmail.comwrote: Honestly that is a hassle, going from 205 to cdh3u3 is probably more or a cross-grade then an upgrade or downgrade. I would just stick it out. But yes like Michael said two clusters on the same gear and distcp. If you are using RF=3 you could also lower your replication to rf=2 'hadoop dfs -setrepl 2' to clear headroom as you are moving stuff. On Thu, May 3, 2012 at 7:25 AM, Michel Segel michael_se...@hotmail.com wrote: Ok... When you get your new hardware... Set up one server as your new NN, JT, SN. Set up the others as a DN. (Cloudera CDH3u3) On your existing cluster... Remove your old log files, temp files on HDFS anything you don't need. This should give you some more space. Start copying some of the directories/files to the new cluster. As you gain space, decommission a node, rebalance, add node to new cluster... It's a slow process. Should I remind you to make sure you up you bandwidth setting, and to clean up the hdfs directories when you repurpose the nodes? Does this make sense? Sent from a remote device. Please excuse any typos... Mike Segel On May 3, 2012, at 5:46 AM, Austin Chungath austi...@gmail.com wrote: Yeah I know :-) and this is not a production cluster ;-) and yes there is more hardware coming :-) On Thu, May 3, 2012 at 4:10 PM, Michel Segel michael_se...@hotmail.com wrote: Well, you've kind of painted yourself in to a corner... Not sure why you didn't get a response from the Cloudera lists, but it's a generic question... 8 out of 10 TB. Are you talking effective storage or actual disks? And please tell me you've already ordered more hardware.. Right? And please tell me this isn't your production cluster... (Strong hint to Strata and Cloudea... You really want to accept my upcoming proposal talk... ;-) Sent from a remote device. Please excuse any typos... Mike Segel On May 3, 2012, at 5:25 AM, Austin Chungath austi...@gmail.com wrote: Yes. This was first posted on the cloudera mailing list. There were no responses. But this is not related to cloudera as such. cdh3 is based on apache hadoop 0.20 as the base. My data is in apache hadoop 0.20.205 There is an upgrade namenode option when we are migrating to a higher version say from 0.20 to 0.20.205 but here I am downgrading from 0.20.205 to 0.20 (cdh3) Is this possible? On Thu, May 3, 2012 at 3:25 PM, Prashant Kommireddi prash1...@gmail.com wrote: Seems like a matter of upgrade. I am not a Cloudera user so would not know much, but you might find some help moving this to Cloudera mailing list. On Thu, May 3, 2012 at 2:51 AM, Austin Chungath austi...@gmail.com wrote: There is only one cluster. I am not copying between clusters. Say I have a cluster running apache 0.20.205 with 10 TB storage capacity and has about 8 TB of data. Now how can I migrate the same cluster to use cdh3 and use that same 8 TB of data. I can't copy 8 TB of data using distcp because I have only 2 TB of free space On Thu, May 3, 2012 at 3:12 PM, Nitin Pawar nitinpawar...@gmail.com wrote: you can actually look at the distcp http://hadoop.apache.org/common/docs/r0.20.0/distcp.html but this means that you have two different
Re: Accessing global Counters
Actually it's easier to use dynamic counters... Sent from a remote device. Please excuse any typos... Mike Segel On Apr 20, 2012, at 11:36 AM, Amith D K amit...@huawei.com wrote: Yes U can use user defined counter as Jagat suggeted. Counter can be enum as Jagat described or any string which are called dynamic counters. It is easier to use Enum counter than dynamic counters, finally it depends on your use case :) Amith From: Jagat [jagatsi...@gmail.com] Sent: Saturday, April 21, 2012 12:25 AM To: common-user@hadoop.apache.org Subject: Re: Accessing global Counters Hi You can create your own counters like enum CountFruits { Apple, Mango, Banana } And in your mapper class when you see condition to increment , you can use Reporter incrCounter method to do the same. http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reporter.html#incrCounter(java.lang.Enum,%20long) e.g // I saw Apple increment it by one reporter.incrCounter(CountFruits.Apple,1); Now you can access them using job.getCounters http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters() Hope this helps Regards, Jagat Singh On Fri, Apr 20, 2012 at 9:43 PM, Gayatri Rao rgayat...@gmail.com wrote: Hi All, Is there a way for me to set global counters in Mapper and access them from reducer? Could you suggest how I can acheve this? Thanks Gayatri
Re: How to rebuild NameNode from DataNode.
Switch to MapR M5? :-) Just kidding. Simple way of solving this pre CDH4... NFS mount a directory from your SN and add it to your list of checkpoint directories. You may lose some data, but you should be able rebuild. There is more to this, but its the basic idea on how to get a copy of your meta data. Sent from a remote device. Please excuse any typos... Mike Segel On Apr 18, 2012, at 11:48 PM, Saburo Fujioka fuji...@do-it.co.jp wrote: Hello, I do a tentative plan of operative trouble countermeasures of a system currently now. If when NameNode has been lost, but are investigating the means to rebuild the remaining NameNode from DataNode, I don't know at the moment. Were consistent with those of the DataNode is the namespaceID of dfs/name/current/ VERSION as confirmation, fsimage are not rebuilt, the list did not display anything in the hadoop dfs-ls. The risk of loss for NameNode because that is protected by Corosync + Pacemaker + DRBD is low. Because of the rare case, it is necessary to clarify the means to reconfigure the NameNode from DataNode. Do you know how to? I am using hadoop 1.0.1. Thank you very much,
Re: Setting a timeout for one Map() input processing
Multiple threads within the mapper where you have the main thread starting a timeout thread and a process thread. Take the result of the thread that finishes first, ignoring the other and killing it all within the Mapper.map() method? Sure it seems possible. (you output from the time out thread is a NOOP...) Sent from a remote device. Please excuse any typos... Mike Segel On Apr 18, 2012, at 7:39 AM, Ondřej Klimpera klimp...@fit.cvut.cz wrote: Hello, I'd like to ask you if there is a possibility of setting a timeout for processing one input line of text input in mapper function. The idea is, that if processing of one line is too long, Hadoop will cut this process and continue processing next input line. Thank you for your answer. Ondrej Klimpera
Re: Control Resources / Cores assigned for a job
Fair scheduler? Sent from a remote device. Please excuse any typos... Mike Segel On Mar 16, 2012, at 5:54 PM, Deepak Nettem deepaknet...@gmail.com wrote: Hi, I want to be able to control the number of nodes assigned to a MR job on the cluster. For example, I want the job to not execute more than 10 Mappers at a time, irrespective of whether there are more nodes available or not. I don't wish to control the number of mappers that are created by the job. The number of mappers is tied to the problem size / input data and on my input splits. I want it to remain that way. Is this possible? If so, what's the best way to do this? Deepak
Re: Should splittable Gzip be a core hadoop feature?
I do agree that a git hub project is the way to go unless you could convince Cloudera, HortonWorks or MapR to pick it up and support it. They have enough committers Is this potentially worthwhile? Maybe, it depends on how the cluster is integrated in to the overall environment. Companies that have standardized on using gzip would find it useful. Sent from a remote device. Please excuse any typos... Mike Segel On Feb 29, 2012, at 3:17 PM, Niels Basjes ni...@basjes.nl wrote: Hi, On Wed, Feb 29, 2012 at 19:13, Robert Evans ev...@yahoo-inc.com wrote: What I really want to know is how well does this new CompressionCodec perform in comparison to the regular gzip codec in various different conditions and what type of impact does it have on network traffic and datanode load. My gut feeling is that the speedup is going to be relatively small except when there is a lot of computation happening in the mapper I agree, I made the same assesment. In the javadoc I wrote under When is this useful? *Assume you have a heavy map phase for which the input is a 1GiB Apache httpd logfile. Now assume this map takes 60 minutes of CPU time to run.* and the added load and network traffic outweighs the speedup in most cases, No, the trick to solve that one is to upload the gzipped files with a HDFS blocksize equal (or 1 byte larger) than the filesize. This setting will help in speeding up Gzipped input files in any situation (no more network overhead). From there the HDFS file replication factor of the file dictates the optimal number of splits for this codec. but like all performance on a complex system gut feelings are almost worthless and hard numbers are what is needed to make a judgment call. Yes Niels, I assume you have tested this on your cluster(s). Can you share with us some of the numbers? No I haven't tested it beyond a multiple core system. The simple reason for that is that when this was under review last summer the whole Yarn thing happened and I was unable to run it at all for a long time. I only got it running again last december when the restructuring of the source tree was mostly done. At this moment I'm building a experimentation setup at work that can be used for various things. Given the current state of Hadoop 2.0 I think it's time to produce some actual results. -- Best regards / Met vriendelijke groeten, Niels Basjes
Re: Should splittable Gzip be a core hadoop feature?
Let's play devil's advocate for a second? Why? Snappy exists. The only advantage is that you don't have to convert from gzip to snappy and can process gzip files natively. Next question is how large are the gzip files in the first place? I don't disagree, I just want to have a solid argument in favor of it... Sent from a remote device. Please excuse any typos... Mike Segel On Feb 28, 2012, at 9:50 AM, Niels Basjes ni...@basjes.nl wrote: Hi, Some time ago I had an idea and implemented it. Normally you can only run a single gzipped input file through a single mapper and thus only on a single CPU core. What I created makes it possible to process a Gzipped file in such a way that it can run on several mappers in parallel. I've put the javadoc I created on my homepage so you can read more about the details. http://howto.basjes.nl/hadoop/javadoc-for-skipseeksplittablegzipcodec Now the question that was raised by one of the people reviewing this code was: Should this implementation be part of the core Hadoop feature set? The main reason that was given is that this needs a bit more understanding on what is happening and as such cannot be enabled by default. I would like to hear from the Hadoop Core/Map reduce users what you think. Should this be - a part of the default Hadoop feature set so that anyone can simply enable it by setting the right configuration? - a separate library? - a nice idea I had fun building but that no one needs? - ... ? -- Best regards / Met vriendelijke groeten, Niels Basjes
Re: working with SAS
Both responses assume replacing SAS w a Hadoop cluster. I would agree that going to EC2 might make sense in terms of a PoC before investing in a physical cluster, but we need to know more about the underlying problem. First, can the problem be broken down in to something that can be accomplished in parallel sub tasks? Second... How much data? It could be a good use case for whirr... Sent from a remote device. Please excuse any typos... Mike Segel On Feb 6, 2012, at 2:32 AM, Prashant Sharma prashan...@imaginea.com wrote: + you will not necessarily need vertical systems for speeding up things(totally depends on your query) . Give a thought of having commodity hardware(much cheaper) and hadoop being suited for them, *I hope* your infrastructure can be cheaper in terms of price to performance ratio. Having said that, I do not mean you have to throw away you existing infrastructure, because it is ideal for certain requirements. your solution can be like writing a mapreduce job which does what query is supposed to do and run it on a cluster of size ? depends! (how fast you want things be done? and scale). Incase your querry is adhoc and have to be run frequently. You might wanna consider HBASE and HIVE as solutions with a lot of expensive vertical nodes ;). BTW Is your querry iterative? A little more details on your type of querry can attract guy's with more wisdom to help. HTH On Mon, Feb 6, 2012 at 1:46 PM, alo alt wget.n...@googlemail.com wrote: Hi, hadoop is running on a linux box (mostly) and can run in a standalone installation for testing only. If you decide to use hadoop with hive or hbase you have to face a lot of more tasks: - installation (whirr and Amazone EC2 as example) - write your own mapreduce job or use hive / hbase - setup sqoop with the terradata-driver You can easy setup part 1 and 2 with Amazon's EC2, I think you can also book Windows Server there. For a single query the best option I think before you install a hadoop cluster. best, Alex -- Alexander Lorenz http://mapredit.blogspot.com On Feb 6, 2012, at 8:11 AM, Ali Jooan Rizvi wrote: Hi, I would like to know if hadoop will be of help to me? Let me explain you guys my scenario: I have a windows server based single machine server having 16 Cores and 48 GB of Physical Memory. In addition, I have 120 GB of virtual memory. I am running a query with statistical calculation on large data of over 1 billion rows, on SAS. In this case, SAS is acting like a database on which both source and target tables are residing. For storage, I can keep the source and target data on Teradata as well but the query containing a patent can only be run on SAS interface. The problem is that SAS is taking many days (25 days) to run it (a single query with statistical function) and not all cores all the time were used and rather merely 5% CPU was utilized on average. However memory utilization was high, very high, and that's why large virtual memory was used. Can I have a hadoop interface in place to do it all so that I may end up running the query in lesser time that is in 1 or 2 days. Anything squeezing my run time will be very helpful. Thanks Ali Jooan Rizvi
Re: Too many open files Error
Sorry going from memory... As user Hadoop or mapred or hdfs what do you see when you do a ulimit -a? That should give you the number of open files allowed by a single user... Sent from a remote device. Please excuse any typos... Mike Segel On Jan 26, 2012, at 5:13 AM, Mark question markq2...@gmail.com wrote: Hi guys, I get this error from a job trying to process 3Million records. java.io.IOException: Bad connect ack with firstBadLink 192.168.1.20:50010 at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2903) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102) at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288) When I checked the logfile of the datanode-20, I see : 2012-01-26 03:00:11,827 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 192.168.1.20:50010, storageID=DS-97608578-192.168.1.20-50010-1327575205369, infoPort=50075, ipcPort=50020):DataXceiver java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.DataInputStream.read(DataInputStream.java:132) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:262) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:309) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:373) at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:525) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:357) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:103) at java.lang.Thread.run(Thread.java:662) Which is because I'm running 10 maps per taskTracker on a 20 node cluster, each map opens about 300 files so that should give 6000 opened files at the same time ... why is this a problem? the maximum # of files per process on one machine is: cat /proc/sys/fs/file-max --- 2403545 Any suggestions? Thanks, Mark
Re: Problems with timeout when a Hadoop job generates a large number of key-value pairs
Steve, If you want me to debug your code, I'll be glad to set up a billable contract... ;-) What I am willing to do is to help you to debug your code... Did you time how long it takes in the Mapper.map() method? The reason I asked this is to first confirm that you are failing within a map() method. It could be that you're just not updating your status... You said that you are writing many output records for a single input. So let's take a look at your code. Are all writes of the same length? Meaning that in each iteration of Mapper.map() you will always write. K number of rows? If so, ask yourself why some iterations are taking longer and longer? Note: I'm assuming that the time for each iteration is taking longer than the previous... Or am I missing something? -Mike Sent from a remote device. Please excuse any typos... Mike Segel On Jan 20, 2012, at 11:16 AM, Steve Lewis lordjoe2...@gmail.com wrote: We have been having problems with mappers timing out after 600 sec when the mapper writes many more, say thousands of records for every input record - even when the code in the mapper is small and fast. I no idea what could cause the system to be so slow and am reluctant to raise the 600 sec limit without understanding why there should be a timeout when all MY code is very fast. P I am enclosing a small sample which illustrates the problem. It will generate a 4GB text file on hdfs if the input file does not exist or is not at least that size and this will take some time (hours in my configuration) - then the code is essentially wordcount but instead of finding and emitting words - the mapper emits all substrings of the input data - this generates a much larger output data and number of output records than wordcount generates. Still, the amount of data emitted is no larger than other data sets I know Hadoop can handle. All mappers on my 8 node cluster eventually timeout after 600 sec - even though I see nothing in the code which is even a little slow and suspect that any slow behavior is in the called Hadoop code. This is similar to a problem we have in bioinformatics where a colleague saw timeouts on his 50 node cluster. I would appreciate any help from the group. Note - if you have a text file at least 4 GB the program will take that as an imput without trying to create its own file. /* */ import org.apache.hadoop.conf.*; import org.apache.hadoop.fs.*; import org.apache.hadoop.io.*; import org.apache.hadoop.mapreduce.*; import org.apache.hadoop.mapreduce.lib.input.*; import org.apache.hadoop.mapreduce.lib.output.*; import org.apache.hadoop.util.*; import java.io.*; import java.util.*; /** * org.systemsbiology.hadoop.SubstringGenerator * * This illustrates an issue we are having where a mapper generating a much larger volume of * data ans number of records times out even though the code is small, simple and fast * * NOTE!!! as written the program will generate a 4GB file in hdfs with good input data - * this is done only if the file does not exist but may take several hours. It will only be * done once. After that the failure is fairly fast * * What this will do is count unique Substrings of lines of length * between MIN_SUBSTRING_LENGTH and MAX_SUBSTRING_LENGTH by generatin all * substrings and then using the word could algorithm * What is interesting is that the number and volume or writes in the * map phase is MUCH larger than the number of reads and the volume of read data * * The example is artificial but similar the some real BioInformatics problems - * for example finding all substrings in a gemome can be important for the design of * microarrays. * * While the real problem is more complex - the problem is that * when the input file is large enough the mappers time out failing to report after * 600 sec. There is nothing slow in any of the application code and nothing I */ public class SubstringCount implements Tool { public static final long ONE_MEG = 1024 * 1024; public static final long ONE_GIG = 1024 * ONE_MEG; public static final int LINE_LENGTH = 100; public static final Random RND = new Random(); // NOTE - edit this line to be a sensible location in the current file system public static final String INPUT_FILE_PATH = BigInputLines.txt; // NOTE - edit this line to be a sensible location in the current file system public static final String OUTPUT_FILE_PATH = output; // NOTE - edit this line to be the input file size - 4 * ONE_GIG should be large but not a problem public static final long DESIRED_LENGTH = 4 * ONE_GIG; // NOTE - limits on substring length public static final int MINIMUM_LENGTH = 5; public static final int MAXIMUM_LENGTH = 32; /** * create an input file to read * @param fs !null file system
Re: I am trying to run a large job and it is consistently failing with timeout - nothing happens for 600 sec
Timeout errors don't usually occur outside of the Mapper.map() 'phase'. When we've seen this error it has to deal w M/R going against HBase Since the OP sees the error when he does a bulk 'write', but it stops when he reduces the number of writes ... That kind of suggests where the problem occurs ... Unless of course I missed something... Sent from a remote device. Please excuse any typos... Mike Segel On Jan 18, 2012, at 9:28 PM, Raj Vishwanthan rajv...@yahoo.com wrote: You can try the following - make it into a map only job (for debug purposes) - start your shuffle phase after all the maps are complete( there is a parameter for this) -characterize your disks for performance Raj Sent from Samsung Mobile Steve Lewis lordjoe2...@gmail.com wrote: In my hands the problem occurs in all map jobs - an associate with a different cluster - mine has 8 nodes - his 40 reports 80% of map tasks fail with a few succeeding - I suspect some kind of an I/O waiot but fail to see how it gets to 600sec On Wed, Jan 18, 2012 at 4:50 PM, Raj V rajv...@yahoo.com wrote: Steve Does the timeout happen for all the map jobs? Are you using some kind of shared storage for map outputs? Any problems with the physical disks? If the shuffle phase has started could the disks be I/O waiting between the read and write? Raj From: Steve Lewis lordjoe2...@gmail.com To: common-user@hadoop.apache.org Sent: Wednesday, January 18, 2012 4:21 PM Subject: Re: I am trying to run a large job and it is consistently failing with timeout - nothing happens for 600 sec 1) I do a lot of progress reporting 2) Why would the job succeed when the only change in the code is if(NumberWrites++ % 100 == 0) context.write(key,value); comment out the test allowing full writes and the job fails Since every write is a report I assume that something in the write code or other hadoop code for dealing with output if failing. I do increment a counter for every write or in the case of the above code potential write What I am seeing is that where ever the timeout occurs it is not in a place where I am capable of inserting more reporting On Wed, Jan 18, 2012 at 4:01 PM, Leonardo Urbina lurb...@mit.edu wrote: Perhaps you are not reporting progress throughout your task. If you happen to run a job large enough job you hit the the default timeout mapred.task.timeout (that defaults to 10 min). Perhaps you should consider reporting progress in your mapper/reducer by calling progress() on the Reporter object. Check tip 7 of this link: http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/ Hope that helps, -Leo Sent from my phone On Jan 18, 2012, at 6:46 PM, Steve Lewis lordjoe2...@gmail.com wrote: I KNOW is is a task timeout - what I do NOT know is WHY merely cutting the number of writes causes it to go away. It seems to imply that some context.write operation or something downstream from that is taking a huge amount of time and that is all hadoop internal code - not mine so my question is why should increasing the number and volume of wriotes cause a task to time out On Wed, Jan 18, 2012 at 2:33 PM, Tom Melendez t...@supertom.com wrote: Sounds like mapred.task.timeout? The default is 10 minutes. http://hadoop.apache.org/common/docs/current/mapred-default.html Thanks, Tom On Wed, Jan 18, 2012 at 2:05 PM, Steve Lewis lordjoe2...@gmail.com wrote: The map tasks fail timing out after 600 sec. I am processing one 9 GB file with 16,000,000 records. Each record (think is it as a line) generates hundreds of key value pairs. The job is unusual in that the output of the mapper in terms of records or bytes orders of magnitude larger than the input. I have no idea what is slowing down the job except that the problem is in the writes. If I change the job to merely bypass a fraction of the context.write statements the job succeeds. This is one map task that failed and one that succeeded - I cannot understand how a write can take so long or what else the mapper might be doing JOB FAILED WITH TIMEOUT *Parser*TotalProteins90,103NumberFragments10,933,089 *FileSystemCounters*HDFS_BYTES_READ67,245,605FILE_BYTES_WRITTEN444,054,807 *Map-Reduce Framework*Combine output records10,033,499Map input records 90,103Spilled Records10,032,836Map output bytes3,520,182,794Combine input records10,844,881Map output records10,933,089 Same code but fewer writes JOB SUCCEEDED *Parser*TotalProteins90,103NumberFragments206,658,758 *FileSystemCounters*FILE_BYTES_READ111,578,253HDFS_BYTES_READ67,245,607 FILE_BYTES_WRITTEN220,169,922 *Map-Reduce Framework*Combine output records4,046,128Map input records90,103Spilled Records4,046,128Map output bytes662,354,413Combine input records4,098,609Map output records2,066,588 Any bright ideas -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA
Re: Dynamically adding nodes in Hadoop
Actually I would recommend avoiding /etc/hosts and using DNS if this is going to be a production grade cluster... Sent from a remote device. Please excuse any typos... Mike Segel On Dec 17, 2011, at 5:40 AM, alo alt wget.n...@googlemail.com wrote: Hi, in the slave - file too. /etc/hosts is also recommend to avoid DNS issues. After adding in slaves the new node has to be started and should quickly appear in the web-ui. If you don't need the nodes all time you can setup a exclude and refresh your cluster (http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F) - Alex On Sat, Dec 17, 2011 at 12:06 PM, madhu phatak phatak@gmail.com wrote: Hi, I am trying to add nodes dynamically to a running hadoop cluster.I started tasktracker and datanode in the node. It works fine. But when some node try fetch values ( for reduce phase) it fails with unknown host exception. When i add a node to running cluster do i have to add its hostname to all nodes (slaves +master) /etc/hosts file? Or some other way is there? -- Join me at http://hadoopworkshop.eventbrite.com/ -- Alexander Lorenz http://mapredit.blogspot.com P Think of the environment: please don't print this email unless you really need to.
Re: More cores Vs More Nodes ?
Brad you said 64 core allocations. So how many cores are lost due to the overhead of virtualization? Isn't it 1 core per VM? So you end up losing 8 cores when you create 8 vms... Right? Sent from a remote device. Please excuse any typos... Mike Segel On Dec 13, 2011, at 7:15 PM, Brad Sarsfield b...@bing.com wrote: Hi Prashant, In each case I had a single tasktracker per node. I oversubscribed the total tasks per tasktracker/node by 1.5 x # of cores. So for the 64 core allocation comparison. In A: 8 cores; Each machine had a single tasktracker with 8 maps / 4 reduce slots for 12 task slots total per machine x 8 machines (including head node) In B: 2 cores; Each machine had a single tasktracker with 2 maps / 1 reduce slots for 3 slots total per machines x 29 machines (including head node which was running 8 cores) The experiment was done in a cloud hosted environment running set of VMs. ~Brad -Original Message- From: Prashant Kommireddi [mailto:prash1...@gmail.com] Sent: Tuesday, December 13, 2011 9:46 AM To: common-user@hadoop.apache.org Subject: Re: More cores Vs More Nodes ? Hi Brad, how many taskstrackers did you have on each node in both cases? Thanks, Prashant Sent from my iPhone On Dec 13, 2011, at 9:42 AM, Brad Sarsfield b...@bing.com wrote: Praveenesh, Your question is not naïve; in fact, optimal hardware design can ultimately be a very difficult question to answer on what would be better. If you made me pick one without much information I'd go for more machines. But... It all depends; and there is no right answer :) More machines +May run your workload faster +Will give you a higher degree of reliability protection from node / hardware / hard drive failure. +More aggregate IO capabilities - capex / opex may be higher than allocating more cores More cores +May run your workload faster +More cores may allow for more tasks to run on the same machine +More cores/tasks may reduce network contention and increase increasing task to task data flow performance. Notice May run your workload faster is in both; as it can be very workload dependant. My Experience: I did a recent experiment and found that given the same number of cores (64) with the exact same network / machine configuration; A: I had 8 machines with 8 cores B: I had 28 machines with 2 cores (and 1x8 core head node) B was able to outperform A by 2x using teragen and terasort. These machines were running in a virtualized environment; where some of the IO capabilities behind the scenes were being regulated to 400Mbps per node when running in the 2 core configuration vs 1Gbps on the 8 core. So I would expect the non-throttled scenario to work even better. ~Brad -Original Message- From: praveenesh kumar [mailto:praveen...@gmail.com] Sent: Monday, December 12, 2011 8:51 PM To: common-user@hadoop.apache.org Subject: More cores Vs More Nodes ? Hey Guys, So I have a very naive question in my mind regarding Hadoop cluster nodes ? more cores or more nodes - Shall I spend money on going from 2-4 core machines, or spend money on buying more nodes less core eg. say 2 machines of 2 cores for example? Thanks, Praveenesh
Re: Routing and region deletes
Per Seffensen, I would urge you to step away from the keyboard and rethink your design. It sounds like you want to replicate a date partition model similar to what you would do if you were attempting this with HBase. HBase is not a relational database and you have a different way of doing things. You could put the date/time stamp in the key such that your data is sorted by date. However, this would cause hot spots. Think about how you access the data. It sounds like you access the more recent data more frequently than historical data. This is a bad idea in HBase. (note: it may still make sense to do this ... You have to think more about the data and consider alternatives.) I personally would hash the key for even distribution, again depending on the data access pattern. (hashed data means you can't do range queries but again, it depends on what you are doing...) You also have to think about how you purge the data. You don't just drop a region. Doing a full table scan once a month to delete may not be a bad thing. Again it depends on what you are doing... Just my opinion. Others will have their own... Now I'm stepping away from the keyboard to get my morning coffee... :-) Sent from a remote device. Please excuse any typos... Mike Segel On Dec 8, 2011, at 7:13 AM, Per Steffensen st...@designware.dk wrote: Hi The system we are going to work on will receive 50mio+ new datarecords every day. We need to keep a history of 2 years of data (thats 35+ billion datarecords in the storage all in all), and that basically means that we also need to delete 50mio+ datarecords every day, or e.g. 1,5 billion every month. We plan to store the datarecords in HBase. Is it somehow possible to tell HBase to put (route) all datarecords belonging to a specific date or month to a designated set of regions (and route nothing else there), so that deleting all data belonging to that day/month i basically deleting those regions entirely? And is explicit deletion of entire regions possible at all? The reason I want to do this is that I expect it to be much faster than doing explicit deletion record by record of 50mio+ records every day. Regards, Per Steffensen
Re: Issue with DistributedCache
Silly question... Why do you need to use the distributed cache for the word count program? What are you trying to accomplish? I've only had to play with it for one project where we had to push out a bunch of c++ code to the nodes as part of a job... Sent from a remote device. Please excuse any typos... Mike Segel On Nov 24, 2011, at 7:05 AM, Denis Kreis de.kr...@gmail.com wrote: Hi Bejoy 1. Old API: The Map and Reduce classes are the same as in the example, the main method is as follows public static void main(String[] args) throws IOException, InterruptedException { UserGroupInformation ugi = UserGroupInformation.createProxyUser(remote user name, UserGroupInformation.getLoginUser()); ugi.doAs(new PrivilegedExceptionActionVoid() { public Void run() throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName(wordcount); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(path to input dir)); FileOutputFormat.setOutputPath(conf, new Path(path to output dir)); conf.set(mapred.job.tracker, ip:8021); FileSystem fs = FileSystem.get(new URI(hdfs://ip:8020), new Configuration()); fs.mkdirs(new Path(remote path)); fs.copyFromLocalFile(new Path(local path/test.jar), new Path(remote path));
Re: MapReduce Examples
I think that there are... Since you work in databases, have you looked at HBase? The reason the word count example is nice is that anyone can get a bunch of word / text documents freely from the web. Having completed the word count, pi approximation, historical temp calculation... Do you have access to actual data? I mean if you were Sears, Kroger, HomeDepot, IKEA,... Large retailers with lots of PoS data, you could try to find out whats the big holiday sellers... Or I guess if you can get your hands on twitter data... ( I dont know... I don't tweet) you can start to think of writing your own examples... Actually that would probably be a better way to learn it... Oh and one last idea... There's a book written by some guys from the university of Maryland ... Maybe another good source of ideas... HTH -Mike Sent from a remote device. Please excuse any typos... Mike Segel On Nov 24, 2011, at 6:02 AM, Sloot, Hans-Peter hans-peter.sl...@atos.net wrote: Hi , Are there some a bit more sophisticated example of MapReduce than the wordcounting available? I work for a database department and just the word count is not very impressive. Regards Hans-Peter Dit bericht is vertrouwelijk en kan geheime informatie bevatten enkel bestemd voor de geadresseerde. Indien dit bericht niet voor u is bestemd, verzoeken wij u dit onmiddellijk aan ons te melden en het bericht te vernietigen. Aangezien de integriteit van het bericht niet veilig gesteld is middels verzending via internet, kan Atos Nederland B.V. niet aansprakelijk worden gehouden voor de inhoud daarvan. Hoewel wij ons inspannen een virusvrij netwerk te hanteren, geven wij geen enkele garantie dat dit bericht virusvrij is, noch aanvaarden wij enige aansprakelijkheid voor de mogelijke aanwezigheid van een virus in dit bericht. Op al onze rechtsverhoudingen, aanbiedingen en overeenkomsten waaronder Atos Nederland B.V. goederen en/of diensten levert zijn met uitsluiting van alle andere voorwaarden de Leveringsvoorwaarden van Atos Nederland B.V. van toepassing. Deze worden u op aanvraag direct kosteloos toegezonden. This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, the Atos Nederland B.V. group liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly submitted to you on your request. Atos Nederland B.V. / Utrecht KvK Utrecht 30132762
Re: Issue with DistributedCache
Denis... Sorry, you lost me. Just to make sure we're using the same terminology... The cluster is comprised of two types of nodes... The data nodes which run DN,TT, and if you have HBase, RS. Then there are control nodes which run you NN,SN, JT and if you run HBase, HM and ZKs ... Outside of the cluster we have machines set up with Hadoop installed but are not running any of the processes. They are where our users launch there jobs. We call them edge nodes. ( it's not a good idea to let users directly on the actual cluster.) Ok, having said all of that... You launch you job from the edge nodes... Your data sits in HDFS so you don't need distributed cache at all. Does that make sense? You job will run on the local machine, connect to the JT and then run. We set up the edge nodes so that all of the jars, config files are already set up for the users and we can better control access... Sent from a remote device. Please excuse any typos... Mike Segel On Nov 24, 2011, at 7:22 AM, Denis Kreis de.kr...@gmail.com wrote: Without using the distributed cache i'm getting the same error. It's because i start the job from a remote client / programmatically 2011/11/24 Michel Segel michael_se...@hotmail.com: Silly question... Why do you need to use the distributed cache for the word count program? What are you trying to accomplish? I've only had to play with it for one project where we had to push out a bunch of c++ code to the nodes as part of a job... Sent from a remote device. Please excuse any typos... Mike Segel On Nov 24, 2011, at 7:05 AM, Denis Kreis de.kr...@gmail.com wrote: Hi Bejoy 1. Old API: The Map and Reduce classes are the same as in the example, the main method is as follows public static void main(String[] args) throws IOException, InterruptedException { UserGroupInformation ugi = UserGroupInformation.createProxyUser(remote user name, UserGroupInformation.getLoginUser()); ugi.doAs(new PrivilegedExceptionActionVoid() { public Void run() throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName(wordcount); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(path to input dir)); FileOutputFormat.setOutputPath(conf, new Path(path to output dir)); conf.set(mapred.job.tracker, ip:8021); FileSystem fs = FileSystem.get(new URI(hdfs://ip:8020), new Configuration()); fs.mkdirs(new Path(remote path)); fs.copyFromLocalFile(new Path(local path/test.jar), new Path(remote path));
Re: Matrix multiplication in Hadoop
You really don't need to wait... If you're going to go down this path you can use a jni wrapper to do the c/c++ code for the gpu... You can do that now... If you want to go beyond the 1D you can do it but you have to get a bit creative... but it's doable... Sent from a remote device. Please excuse any typos... Mike Segel On Nov 19, 2011, at 10:53 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Sounds like a job for next gen map reduce native libraries and gpu's. A modern day Dr frankenstein for sure. On Saturday, November 19, 2011, Tim Broberg tim.brob...@exar.com wrote: Perhaps this is a good candidate for a native library, then? From: Mike Davis [xmikeda...@gmail.com] Sent: Friday, November 18, 2011 7:39 PM To: common-user@hadoop.apache.org Subject: Re: Matrix multiplication in Hadoop On Friday, November 18, 2011, Mike Spreitzer mspre...@us.ibm.com wrote: Why is matrix multiplication ill-suited for Hadoop? IMHO, a huge issue here is the JVM's inability to fully support cpu vendor specific SIMD instructions and, by extension, optimized BLAS routines. Running a large MM task using intel's MKL rather than relying on generic compiler optimization is orders of magnitude faster on a single multicore processor. I see almost no way that Hadoop could win such a CPU intensive task against an mpi cluster with even a tenth of the nodes running with a decently tuned BLAS library. Racing even against a single CPU might be difficult, given the i/o overhead. Still, it's a reasonably common problem and we shouldn't murder the good in favor of the best. I'm certain a MM/LinAlg Hadoop library with even mediocre performance, wrt C, would get used. -- Mike Davis The information and any attached documents contained in this message may be confidential and/or legally privileged. The message is intended solely for the addressee(s). If you are not the intended recipient, you are hereby notified that any use, dissemination, or reproduction is strictly prohibited and may be unlawful. If you are not the intended recipient, please contact the sender immediately by return e-mail and destroy all copies of the original message.
Re: Matrix multiplication in Hadoop
Is Hadoop the best tool for doing large matrix math. Sure you can do it, but, aren't there better tools for these types of problems? Sent from a remote device. Please excuse any typos... Mike Segel On Nov 18, 2011, at 10:59 AM, Mike Spreitzer mspre...@us.ibm.com wrote: Who is doing multiplication of large dense matrices using Hadoop? What is a good way to do that computation using Hadoop? Thanks, Mike
Re: Data locality for a custom input format
There's this article on InfoQ that deals with this issue... ;-) http://www.infoq.com/articles/HadoopInputFormat Sent from a remote device. Please excuse any typos... Mike Segel On Nov 12, 2011, at 7:51 AM, Harsh J ha...@cloudera.com wrote: Tharindu, InputSplit#getLocations() i.e., http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapreduce/InputSplit.html#getLocations() is used to decide locality of a task. You need your custom InputFormat to prepare the right array of these objects. The # of objects == # of map tasks, and the locations array gets used by the scheduler for local assignment. For a FileSplit preparation, this is as easy as passing the block locations obtained from the NameNode. For the rest type of splits, you need to fill them up yourself. On Sat, Nov 12, 2011 at 7:12 PM, Tharindu Mathew mcclou...@gmail.com wrote: Hi hadoop devs, I'm implementing a custom input format and want to understand how to make use of data locality. AFAIU, only file input format makes use of data locality since the job tracker picks data locality based on the block location defined in the file input split. So, the job tracker code is partly responsible for this. So providing data locality for a custom input format would be to either either extend file input format or modify job tracker code (if that makes sense even). Is my understanding correct? -- Regards, Tharindu blog: http://mackiemathew.com/ -- Harsh J
Re: Map-Reduce in memory
Hi, First, you have 8 physical cores. Hyper threading makes the machine think that it has 16. The trouble is that you really don't have 16 cores so you need to be a little more conservative. You don't mention HBase, so I'm going to assume that you don't have it installed. So in terms of tasks, allocate a core each to DN and TT leaving 6 cores or 12 hyper threaded cores. This leaves a little headroom for the other linux processes... Now you can split the number of remaining cores however you want. You can even overlap a bit since you are not going to be running all of your reducers at the same time. So let's say 10 mappers and the 4 reducers to start. Since you have all that memory, you can bump up you DN and TT allocations. W ith respect to your tuning... You need to change them one at a time... Sent from a remote device. Please excuse any typos... Mike Segel On Nov 4, 2011, at 1:46 AM, N.N. Gesli nnge...@gmail.com wrote: Thank you very much for your replies. Michel, disk is 3TB (6x550GB, 50 GB from each disk is reserved for local basically for mapred.local.dir). You are right on the CPU; it is 8 core but shows as 16. Is that mean it can handle 16 JVMs at a time? CPU is a little overloaded, but that is not a huge problem at this point. I made io.sort.factor 200 and io.sort.mb 2000. Still got the same error/timeout. I played with all related conf settings one by one. At last, changing mapred.job.shuffle.merge.percent from 1.0 back to 0.66 solved the problem. However, the job is still taking long time. There are 84 reducers, but only one of them takes a very long time. I attached the log file of that reduce task. Majority of the data gets spilled to disk. Even if I set mapred.child.java.opts to 6144, the reduce task log shows ShuffleRamManager: MemoryLimit=1503238528, MaxSingleShuffleLimit=375809632 as if memory is 2GB (70% of 2GB=1503238528b). In the same log file later there is also this line: INFO ExecReducer: maximum memory = 6414139392 I am not using memory monitoring. Tasktrackers have this line in the log: TaskTracker's totalMemoryAllottedForTasks is -1. TaskMemoryManager is disabled. Why is ShuffleRamManager is finding that number as if the max memory is 2GB? Why am I still getting that much spill even with these aggressive memory settings? Why only one reducer taking that long? What else I can change to make this job processed in the memory and finish faster? Thank you. -N.N.Gesli On Fri, Oct 28, 2011 at 2:14 AM, Michel Segel michael_se...@hotmail.com wrote: Uhm... He has plenty of memory... Depending on what sort of m/r tasks... He could push it. Didn't say how much disk... I wouldn't start that high... Try 10 mappers and 2. Reducers. Granted it is a bit asymmetric and you can bump up the reducers... Watch your jobs in ganglia and see what is happening... Harsh, assuming he is using intel, each core is hyper threaded so the box sees this as 2x CPUs. 8 cores looks like 16. Sent from a remote device. Please excuse any typos... Mike Segel On Oct 28, 2011, at 3:08 AM, Harsh J ha...@cloudera.com wrote: Hey N.N. Gesli, (Inline) On Fri, Oct 28, 2011 at 12:38 PM, N.N. Gesli nnge...@gmail.com wrote: Hello, We have 12 node Hadoop Cluster that is running Hadoop 0.20.2-cdh3u0. Each node has 8 core and 144GB RAM (don't ask). So, I want to take advantage of this huge RAM and run the map-reduce jobs mostly in memory with no spill, if possible. We use Hive for most of the processes. I have set: mapred.tasktracker.map.tasks.maximum = 16 mapred.tasktracker.reduce.tasks.maximum = 8 This is *crazy* for an 8 core machine. Try to keep M+R slots well below 8 instead - You're probably CPU-thrashed in this setup once large number of tasks get booted. mapred.child.java.opts = 6144 You can also raise io.sort.mb to 2000, and tweak io.sort.factor. The child opts raise to 6~ GB looks a bit unnecessary since most of your tasks work on record basis and would not care much about total RAM. Perhaps use all that RAM for a service like HBase which can leverage caching nicely! One of my Hive queries is producing 6 stage map-reduce jobs. On the third stage when it queries from a 200GB table, the last 14 reducers hang. I changed mapred.task.timeout to 0 to see if they really hang. It has been 5 hours, so something terribly wrong in my setup. Parts of the log is below. It is probably just your slot settings. You may be massively over-subscribing your CPU resources with 16 map task slots + 8 reduce tasks slots. At worst case, it would mean 24 total JVMs competing over 8 available physical processors. Doesn't make sense to me at least - Make it more like 7 M / 2 R or so :) My questions: * What should be my configurations to make reducers to run in the memory? * Why it keeps waiting for map outputs? It has to fetch map outputs to get some data to start
Re: Map-Reduce in memory
Uhm... He has plenty of memory... Depending on what sort of m/r tasks... He could push it. Didn't say how much disk... I wouldn't start that high... Try 10 mappers and 2. Reducers. Granted it is a bit asymmetric and you can bump up the reducers... Watch your jobs in ganglia and see what is happening... Harsh, assuming he is using intel, each core is hyper threaded so the box sees this as 2x CPUs. 8 cores looks like 16. Sent from a remote device. Please excuse any typos... Mike Segel On Oct 28, 2011, at 3:08 AM, Harsh J ha...@cloudera.com wrote: Hey N.N. Gesli, (Inline) On Fri, Oct 28, 2011 at 12:38 PM, N.N. Gesli nnge...@gmail.com wrote: Hello, We have 12 node Hadoop Cluster that is running Hadoop 0.20.2-cdh3u0. Each node has 8 core and 144GB RAM (don't ask). So, I want to take advantage of this huge RAM and run the map-reduce jobs mostly in memory with no spill, if possible. We use Hive for most of the processes. I have set: mapred.tasktracker.map.tasks.maximum = 16 mapred.tasktracker.reduce.tasks.maximum = 8 This is *crazy* for an 8 core machine. Try to keep M+R slots well below 8 instead - You're probably CPU-thrashed in this setup once large number of tasks get booted. mapred.child.java.opts = 6144 You can also raise io.sort.mb to 2000, and tweak io.sort.factor. The child opts raise to 6~ GB looks a bit unnecessary since most of your tasks work on record basis and would not care much about total RAM. Perhaps use all that RAM for a service like HBase which can leverage caching nicely! One of my Hive queries is producing 6 stage map-reduce jobs. On the third stage when it queries from a 200GB table, the last 14 reducers hang. I changed mapred.task.timeout to 0 to see if they really hang. It has been 5 hours, so something terribly wrong in my setup. Parts of the log is below. It is probably just your slot settings. You may be massively over-subscribing your CPU resources with 16 map task slots + 8 reduce tasks slots. At worst case, it would mean 24 total JVMs competing over 8 available physical processors. Doesn't make sense to me at least - Make it more like 7 M / 2 R or so :) My questions: * What should be my configurations to make reducers to run in the memory? * Why it keeps waiting for map outputs? It has to fetch map outputs to get some data to start with. And it pulls the map outputs a few at a time - to not overload the network during shuffle phases of several reducers across the cluster. * What does it mean dup hosts? Duplicate hosts. Hosts it already knows about and has already scheduled fetch work upon. snip -- Harsh J
Re: Job Submission schedule, one file at a time ?
Not sure what you are attempting to do... If you submit the directory name... You get a single m/r job to process all. ( but it doesn't sound like that is what you want...) You could use Oozie, or just a simple shell script that will walk down a list of files in the directory and then launch a Hadoop task... Or did you want something else? Sent from a remote device. Please excuse any typos... Mike Segel On Oct 25, 2011, at 3:53 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote: Hi, I do have a folder with 50 different files and and I want to submit a Hadoop MapReduce job using each file as an input.My Map/Reduce programs basically do the same job for each of my files but I want to schedule and submit a job one file at a time. Its like submitting a job with one file input, wait until the job completes and submit the second job (second file) right after.I want to have 50 different Mapreduce outputs for the 50 input files. Looking forward for your inputs , Thanks. Regards,
Re: Question on mianframe files
You may not want to do this... Does the data contain and packed or zoned decimals? If so, the dd conversion will corrupt your data. Sent from a remote device. Please excuse any typos... Mike Segel On Oct 15, 2011, at 3:51 AM, SRINIVAS SURASANI vas...@gmail.com wrote: Hi, I'm downloading mainframe files using FTP in binary mode on to local file system. These files are now seen as EBCDIC. The information about these files are (a) fixed in length ( each field in record has fixed length). (b)each record is of some KB ( This KB is fixed for each record). Now here I'm able to convert this EBCDIC files to ASCII files with in unix file system using the following command. dd if INPUTFILENAME of OUTPUTFILENAME conv=ascii,unblock cbs=150 150 being the record size. So, here I want this conversion to be done in Hadoop to leverage the use of parallel processing. I was wondering is there any record reader available for this kind of files and also about how to convert Packed Decimals COMP(3) files to ASCII. Any suggestions on how this can be done. Srinivas
Re: Question on mianframe files
Ok... Your best bet is to bite the bullet and forego using dd. What end up doing is staging the files twice ... Raw ebcdic, converted ASCII then move to HDFS... You would be better off passing in the data definition of the record and convert the files on the fly... Pretty straight forward... Not sure why you would want to do this in a m/r unless you're going to do more processing than a simple conversion... Sent from a remote device. Please excuse any typos... Mike Segel On Oct 15, 2011, at 10:30 AM, SRINIVAS SURASANI vas...@gmail.com wrote: Yes, we have files which are only EBCDIC and files containing EBCDIC + PACKED DECIMALS. As a first step started working on only EBCDIC files. dd command working fine but the intention is to do this conversion with in HDFS to leverage parallel processing. On Sat, Oct 15, 2011 at 10:58 AM, Michel Segel michael_se...@hotmail.comwrote: You may not want to do this... Does the data contain and packed or zoned decimals? If so, the dd conversion will corrupt your data. Sent from a remote device. Please excuse any typos... Mike Segel On Oct 15, 2011, at 3:51 AM, SRINIVAS SURASANI vas...@gmail.com wrote: Hi, I'm downloading mainframe files using FTP in binary mode on to local file system. These files are now seen as EBCDIC. The information about these files are (a) fixed in length ( each field in record has fixed length). (b)each record is of some KB ( This KB is fixed for each record). Now here I'm able to convert this EBCDIC files to ASCII files with in unix file system using the following command. dd if INPUTFILENAME of OUTPUTFILENAME conv=ascii,unblock cbs=150 150 being the record size. So, here I want this conversion to be done in Hadoop to leverage the use of parallel processing. I was wondering is there any record reader available for this kind of files and also about how to convert Packed Decimals COMP(3) files to ASCII. Any suggestions on how this can be done. Srinivas
Re: Hadoop cluster optimization
Avi, First why 32 bit OS? You have a 64 bit processor that has 4 cores hyper threaded looks like 8cpus. With only 1.7 GB you're going to be limited on the number of slots you can configure. I'd say run ganglia but that would take resources away from you. It sounds like the default parameters are a pretty good fit. Sent from a remote device. Please excuse any typos... Mike Segel On Aug 21, 2011, at 6:57 AM, Avi Vaknin avivakni...@gmail.com wrote: Hi all ! How are you? My name is Avi and I have been fascinated by Apache Hadoop for the last few months. I am spending the last two weeks trying to optimize my configuration files and environment. I have been going through many Hadoop's configuration properties and it seems that none of them is making a big difference (+- 3 minutes of a total job run time). In Hadoop's meanings my cluster considered to be extremely small (260 GB of text files, while every job is going through only +- 8 GB). I have one server acting as NameNode and JobTracker, and another 5 servers acting as DataNodes and TaskTreckers. Right now Hadoop's configurations are set to default, beside the DFS Block Size which is set to 256 MB since every file on my cluster takes 155 - 250 MB. All of the above servers are exactly the same and having the following hardware and software: 1.7 GB memory 1 Intel(R) Xeon(R) CPU E5507 @ 2.27GHz Ubuntu Server 10.10 , 32-bit platform Cloudera CDH3 Manual Hadoop Installation (for the ones who are familiar with Amazon Web Services, I am talking about Small EC2 Instances/Servers) Total job run time is +-15 minutes (+-50 files/blocks/mapTasks of up to 250 MB and 10 reduce tasks). Based on the above information, does anyone can recommend on a best practice configuration?? Do you thinks that when dealing with such a small cluster, and when processing such a small amount of data, is it even possible to optimize jobs so they would run much faster? By the way, it seems like none of the nodes are having a hardware performance issues (CPU/Memory) while running the job. Thats true unless I am having a bottle neck somewhere else (seems like network bandwidth is not the issue). That issue is a little confusing because the NameNode process and the JobTracker process should allocate 1GB of memory each, which means that my hardware starting point is insufficient and in that case why am I not seeing a full Memory utilization using 'top' command on the NameNode JobTracker Server? How would you recommend to measure/monitor different Hadoop's properties to find out where is the bottle neck? Thanks for your help!! Avi
Re: Namenode Scalability
So many questions, why stop there? First question... What would cause the name node to have a GC issue? Second question... You're streaming 1PB a day. Is this a single stream of data? Are you writing this to one file before processing, or are you processing the data directly on the ingestion stream? Are you also filtering the data so that you are not saving all of the data? This sounds like a homework assignment than a real world problem. I guess people don't race cars against trains or have two trains traveling in different directions anymore... :-) Sent from a remote device. Please excuse any typos... Mike Segel On Aug 10, 2011, at 12:07 PM, jagaran das jagaran_...@yahoo.co.in wrote: To be precise, the projected data is around 1 PB. But the publishing rate is also around 1GBPS. Please suggest. From: jagaran das jagaran_...@yahoo.co.in To: common-user@hadoop.apache.org common-user@hadoop.apache.org Sent: Wednesday, 10 August 2011 12:58 AM Subject: Namenode Scalability In my current project we are planning to streams of data to Namenode (20 Node Cluster). Data Volume would be around 1 PB per day. But there are application which can publish data at 1GBPS. Few queries: 1. Can a single Namenode handle such high speed writes? Or it becomes unresponsive when GC cycle kicks in. 2. Can we have multiple federated Name nodes sharing the same slaves and then we can distribute the writes accordingly. 3. Can multiple region servers of HBase help us ?? Please suggest how we can design the streaming part to handle such scale of data. Regards, Jagaran Das
Re: Sanity check re: value of 10GbE NICs for Hadoop?
I'm not sure which point you are trying to make. To answer to answer your question... With respect to price... 10GBe is cost effective. You have to consider 1GBe is not only you port speed but also there is going to be the speed of the uplink or trunk. So if you continue to build out, you run in to bandwidth issues between racks. So you end up doing 1GBe ports and then higher speed by either port bonding or bigger bandwidth for uplinks only. These switches are more expensive than simple 1GBe switches, but less than full 10GBe. Depending on vendor, number of ports, discount, you can get the switch for approx 10,000 and up. Think $550 to $600 a port for 10GBe. With Sandy Bridge, you will start to see 10GBe on the mother boards. If you're following discussion on the performance gains, you'll start to see the network being the bottleneck. If you are planning to build a new cluster... You should plan on 10gbe. Sent from a remote device. Please excuse any typos... Mike Segel On Jun 29, 2011, at 1:07 AM, Bharath Mundlapudi bharathw...@yahoo.com wrote: One could argue that its too early for 10Gb NIC in Hadoop Cluster. Certainly having extra bandwidth is good but at what price? Please note that all the points you mentioned can work with 1Gb NICs today. Unless if you can back with price/performance data. Typically, Map output is compressed. If system is hitting peak network utilization, one can select high compression rate algorithms at the cost of CPU. Most of these machines comes with dual NIC cards, so one could do link bonding to push more bits. One area may have good benefit of 10Gb NIC is High Density Systems - 24 core and 3x12TB disks. This is the trend now and will continue. These systems can saturate the 1Gb NICs. -Bharath From: Saqib Jang -- Margalla Communications saq...@margallacomm.com To: common-user@hadoop.apache.org Sent: Tuesday, June 28, 2011 10:16 AM Subject: Sanity check re: value of 10GbE NICs for Hadoop? Folks, I've been digging into the potential benefits of using 10 Gigabit Ethernet (10GbE) NIC server connections for Hadoop and wanted to run what I've come up with through initial research by the list for 'sanity check' feedback. I'd very much appreciate your input on the importance (or lack of it) of the following potential benefits of 10GbE server connectivity as well as other thoughts regarding 10GbE and Hadoop (My interest is specifically in the value of 10GbE server connections and 10GbE switching infrastructure, over scenarios such as bonded 1GbE server connections with 10GbE switching). 1. HDFS Data Loading. The higher throughput enabled by 10GbE server and switching infrastructure allows faster processing and distribution of data. 2. Hadoop Cluster Scalability. High-performance for initial data processing and distribution directly impacts the degree of parallelism or scalability supported by the cluster. 3. HDFS Replication. Higher speed server connections allows faster file replication. 4. Map/Reduce Shuffle Phase. Improved end-to-end throughput and latency directly impact the shuffle phase of a data set reduction especially for tasks that are at the document level (including large documents) and lots of metadata generated by those documents as well as video analytics and images. 5. Data Reporting. 10GbE server networking etwork performance can improve data reporting performance, especially if the Hadoop cluster is running multiple data reductions. 6. Support of Cluster File Systems. With 10 GbE NICs, Hadoop could be reorganized to use a cluster or network file system. This would allow Hadoop even with its Java implementation to have higher performance I/O and not have to be so concerned with disk drive density in the same server. 7. Others? thanks, Saqib Saqib Jang Principal/Founder Margalla Communications, Inc. 1339 Portola Road, Woodside, CA 94062 (650) 274 8745 www.margallacomm.com
Re: Using own InputSplit
You can add that sometimes the input file is too small and you don't get the desired parallelism. Sent from a remote device. Please excuse any typos... Mike Segel On May 27, 2011, at 12:25 PM, Harsh J ha...@cloudera.com wrote: Mohit, On Fri, May 27, 2011 at 10:44 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Actually this link confused me http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Job+Input Clearly, logical splits based on input-size is insufficient for many applications since record boundaries must be respected. In such cases, the application should implement a RecordReader, who is responsible for respecting record-boundaries and presents a record-oriented view of the logical InputSplit to the individual task. But it looks like application doesn't need to do that since it's done default? Or am I misinterpreting this entirely? For any type of InputFormat Hadoop provides along with itself, it should already handle this for you (Text Files (say, \n-ended), Sequence Files, Avro Datafiles). If you have a custom file format that defines its own record delimiter character(s); you would surely need to write your own InputFormat that splits across properly (the wiki still helps on how to manage the reads across the first split and the subsequents). -- Harsh J
Re: Reducer granularity and starvation
Fair scheduler won't help unless you set it to allow preemptive executions which may not be a good thing... Fair scheduler will wait until the current task completes before assigning a new task to the open slot. So if you have a long running job... You're SOL. A combiner will definitely help but you will still have the issue of long running jobs. You could put you job in a queue that limits the number of slots... But then you will definitely increase the time to run your job. If you could suspend a task... But that's anon-trivial solution... Sent from a remote device. Please excuse any typos... Mike Segel On May 18, 2011, at 5:04 PM, W.P. McNeill bill...@gmail.com wrote: I'm using fair scheduler and JVM reuse. It is just plain a big job. I'm not using a combiner right now, but that's something to look at. What about bumping the mapred.reduce.tasks up to some huge number? I think that shouldn't make a difference, but I'm hearing conflicting information on this.
Re: Suggestions for swapping issue
You have to do the math... If you have 2gb per mapper, and run 10 mappers per node... That means 20gb of memory. Then you have TT and DN running which also take memory... What did you set as the number of mappers/reducers per node? What do you see in ganglia or when you run top? Sent from a remote device. Please excuse any typos... Mike Segel On May 11, 2011, at 12:31 PM, Adi adi.pan...@gmail.com wrote: Hello Hadoop Gurus, We are running a 4-node cluster. We just upgraded the RAM to 48 GB. We have allocated around 33-34 GB per node for hadoop processes. Leaving the rest of the 14-15 GB memory for OS and as buffer. There are no other processes running on these nodes. Most of the lighter jobs run successfully but one big job is de-stabilizing the cluster. One node starts swapping and runs out of swap space and goes offline. We tracked the processes on that node and noticed that it ends up with more than expected hadoop-java processes. The other 3 nodes were running 10 or 11 processes and this node ends up with 36. After killing the job we find these processes still show up and we have to kill them manually. We have tried reducing the swappiness to 6 but saw the same results. It also looks like hadoop stays well within the memory limits allocated and still starts swapping. Some other suggestions we have seen are: 1) Increase swap size. Current size is 6 GB. The most quoted size is 'tons of swap' but note sure how much it translates to in numbers. Should it be 16 or 24 GB 2) Increase overcommit ratio. Not sure if this helps as a few blog comments mentioned it didn't help Any other hadoop or linux config suggestions are welcome. Thanks. -Adi
Re: Cluster hardware question
Hi, Actually if you have 2 4 core CPUs xeon chips... You will become i/o bound with 4 drives. The rule of thumb tends to be 2 disks per core so you would want 16 drives per node... At least in theory. 24 1TB drives would be interesting, but I'm not sure what sort of problems you could expect to encounter until you had to expand the cluster... Sent from a remote device. Please excuse any typos... Mike Segel On Apr 26, 2011, at 8:55 AM, Xiaobo Gu guxiaobo1...@gmail.com wrote: Hi, People say a balanced server configration is as following: 2 4 Core CPU, 24G RAM, 4 1TB SATA Disks But we have been used to use storages servers with 24 1T SATA Disks, we are wondering will Hadoop be CPU bounded if this kind of servers are used. Does anybody have experiences with hadoop running on servers with so many disks. Regards, Xiaobo G