TestDFSIO error: libhdfs.so.1 does not exist
Hi all, I am benchmarking a Hadoop Cluster with the hadoop-*-test.jar TestDFSIO but the following error returns: File /usr/hadoop-0.20.2/libhdfs/libhdfs.so.1 does not exist. How to solve this problem? Thanks!
Re: Hadoop Question
Nitin, On 2011/07/28 14:51, Nitin Khandelwal wrote: How can I determine if a file is being written to (by any thread) in HDFS. That information is exposed by the NameNode http servlet. You can obtain it with the fsck tool (hadoop fsck /path/to/dir -openforwrite) or you can do an http get http://namenode:port/fsck?path=/your/path&openforwrite=1 George
RE: next gen map reduce
Does this mean 0.22.0 has reached stable and will be released as the stable version soon? --Aaron -Original Message- From: Robert Evans [mailto:ev...@yahoo-inc.com] Sent: Thursday, July 28, 2011 6:39 AM To: common-user@hadoop.apache.org Subject: Re: next gen map reduce It has not been introduced yet. If you are referring to MRV2. It is targeted to go into the 0.23 release of Hadoop, but is currently on the MR-279 branch. Which should hopefully be merged to trunk in abut a week. --Bobby On 7/28/11 7:31 AM, "real great.." wrote: In which Hadoop version is next gen introduced? -- Regards, R.V.
Re: File System Counters.
Harsh If this is the case I don't understand something. If I see FILE_BYTES_READ to be non zero for a map, the only thing I can assume is that it came from a spill during sort phase. I have a 10 node cluster, and I ran TeraSort with size 100,000 Bytes ( 1000 records). My io.sort.mb is 300 and io.sort.factor is 10. My mapred.child.java.opts is set to -Xmx512m. When I run this I expected given that I have everything that fits into memory, that there will be no FILE_BYTES_READ on the map side and no FILE_BYTES_WRITTEN on the redcue side. But I find that my FILE_BYTES_READ on the map side is 188,604 (HDFS_BYTES_READ is 149,686) and inexplicably SPILLED_RECORDS is 1000 for both and map and reduce. So my questions have become two. 1. Why is my spill count 1000. Given that io.sort.factor and io.sort.mb are 10 and 300 MB and I have 512MB for each task? 2. Where are the numbers for FILE_BYTES_READ/WRITTEN coming from? TIA Raj From: Harsh J To: common-user@hadoop.apache.org; R V Sent: Thursday, July 28, 2011 12:03 AM Subject: Re: File System Counters. Raj, There is no overlap. Data read from HDFS FileSystem instances go to HDFS_BYTES_READ, and data read from Local FileSystem instances go to FILE_BYTES_READ. These are two different FileSystems, and have no overlap at all. On Thu, Jul 28, 2011 at 5:56 AM, R V wrote: > Hello > > I don't know if the question has been answered. I am trying to understand > the overlap between FILE_BYTES_READ and HDFS_BYTES_READ. What are the various > components that provide value to this counter? For example when I see > FILE_BYTES_READ for a specific task ( Map or Reduce ) , is it purely due to > the spill during sort phase? If a HDFS read happens on a non local node, does > the counter increase on the node where the data block resides? What happens > when the data is local? does the counter increase for both HDFS_BYTES_READ > and FILE_BYTES_READ? From the values I am seeing, this looks to be the case > but I am not sure. > > I am not very fluent in Java , and hence I don't fully understand the source > . :-( > > Raj -- Harsh J
Re: Exporting From Hive
This is for CLI Use this: set hive.cli.print.header=true; Instead of doing this on the prompt everytime you can change your hive start command to: hive -hiveconf hive.cli.print.header=true But be careful with this setting as quite a few commands stop working with NPE with this on. I think describe doesn't work. -Ayon See My Photos on Flickr Also check out my Blog for answers to commonly asked questions. From: "Bale, Michael" To: common-user@hadoop.apache.org Sent: Thursday, July 28, 2011 8:54 AM Subject: Exporting From Hive Hi, I was wondering if anyone could help me? Does anyone know if it is possible to include the column headers in an output from a Hive Query? I've had a look through the internet but can't seem to find an answer. If not, is it possible to export the result from a describe table query? If so I could then run that at the same tie and join up at a future date. Thanks for your help -- *Mike Bale* Graduate Insight Analyst *Cable and Wireless Communications* Tel: +44 (0)20 7315 4437 www.cwc.com The information contained in this email (and any attachments) is confidential and may be privileged. If you are not the intended recipient and have received this email in error, please notify the sender immediately by reply email and delete the message and any attachments. If you are not the named addressee, you must not copy, disclose, forward or otherwise use the information contained in this email. Cable & Wireless Communications Plc and its affiliates reserve the right to monitor all email communications through their networks to ensure regulatory compliance. Cable & Wireless Communications Plc is a company registered in England & Wales with number: 07130199 and offices located at 3rd Floor, 26 Red Lion Square, London WC1R 4HQ
Re: cygwin not connecting to Hadoop server
Hi A Df, see inline at :: - Original Message - From: A Df Date: Wednesday, July 27, 2011 10:31 pm Subject: Re: cygwin not connecting to Hadoop server To: "common-user@hadoop.apache.org" > See inline at **. More questions and many Thanks :D > > > > > > > >From: Uma Maheswara Rao G 72686 > >To: common-user@hadoop.apache.org; A Df > >Cc: "common-user@hadoop.apache.org" > > >Sent: Wednesday, 27 July 2011, 17:31 > >Subject: Re: cygwin not connecting to Hadoop server > > > > > >Hi A Df, > > > >Did you format the NameNode first? > > > >** I had formatted it already but then I had reinstalled Java and > upgraded the plugins in cygwin so I reformatted it again. :D yes it > worked!! I am not sure all the steps that got it to finally work :: Great :-) > but I will have to document it to prevent this headache in the > future. Although I typed ssh localhost too , so question is, do I > need to type ssh localhost each time I need to run hadoop?? Also, :: Actually ssh is an authentication service. To save the athentication keys, you can run below commands. which will provide authentication.No need to give password every time. ssh-keygen -t rsa -P "" cat /root/.ssh/id_rsa.pub > /root/.ssh/authosized_keys then exceute /etc/init.d/sshd restart To connect to remote machines cat /root/.ssh/id_rsa.pub | ssh root@ 'cat > /root/.ssh/authorized_keys' then in remote machine excute /etc/init.d/sshd restart > since I need to work with Eclipse maybe you can have a look at my > post about the plugin cause I can get the patch to work. The > subject is "Re: Cygwin not working with Hadoop and Eclipse Plugin". > I plan to read up on how to write programs for Hadoop. I am using > the tutorial at Yahoo but if you know of any really good about > coding with Hadoop or just about understanding Hadoop then please > let me know. Hadoop Definitive guide will the great book for understanding the Hadoop.Some sample prgrams also will be available. To check the Hadoop internals: http://www.google.co.in/url?sa=t&source=web&cd=8&ved=0CEMQFjAH&url=http%3A%2F%2Findia.paxcel.net%3A6060%2FLargeDataMatters%2Fwp-content%2Fuploads%2F2010%2F09%2FHDFS1.pdf&rct=j&q=hadoop%20internals%20%2B%20part%201&ei=CqAxTtD8C4fprQfYq6DMCw&usg=AFQjCNGYMQbAeGP0cYGl4OJHseRsfEMGvQ&cad=rja > > > >Can you check the NN logs whether NN is started or not? > >** I checked and the previous runs had some logs missing but now > the last one have all 5 logs and I got two conf files in xml. I > also copied out the other output files which I plan to examine. > Where do I specify the output extension format that I want for my > output file? I was hoping for an txt file it shows the output in a > file with no extension even though I can read it in Notepad++. I > also got to view the web interface at: > >NameNode - http://localhost:50070/ > >JobTracker - http://localhost:50030/ > > > >** See below for the working version, finally!! Thanks > > > >Williams@TWilliams-LTPC ~/hadoop-0.20.2 > >$ bin/hadoop jar hadoop-0.20.2-examples.jar grep input > >11/07/27 17:42:20 INFO mapred.FileInputFormat: Total in > > > >11/07/27 17:42:20 INFO mapred.JobClient: Running job: j > >11/07/27 17:42:21 INFO mapred.JobClient: map 0% reduce > >11/07/27 17:42:33 INFO mapred.JobClient: map 15% reduc > >11/07/27 17:42:36 INFO mapred.JobClient: map 23% reduc > >11/07/27 17:42:39 INFO mapred.JobClient: map 38% reduc > >11/07/27 17:42:42 INFO mapred.JobClient: map 38% reduc > >11/07/27 17:42:45 INFO mapred.JobClient: map 53% reduc > >11/07/27 17:42:48 INFO mapred.JobClient: map 69% reduc > >11/07/27 17:42:51 INFO mapred.JobClient: map 76% reduc > >11/07/27 17:42:54 INFO mapred.JobClient: map 92% reduc > >11/07/27 17:42:57 INFO mapred.JobClient: map 100% redu > >11/07/27 17:43:06 INFO mapred.JobClient: map 100% redu > >11/07/27 17:43:09 INFO mapred.JobClient: Job complete: > >11/07/27 17:43:09 INFO mapred.JobClient: Counters: 18 > >11/07/27 17:43:09 INFO mapred.JobClient: Job Counters > >11/07/27 17:43:09 INFO mapred.JobClient: Launched r > >11/07/27 17:43:09 INFO mapred.JobClient: Launched m > >11/07/27 17:43:09 INFO mapred.JobClient: Data-local > >11/07/27 17:43:09 INFO mapred.JobClient: FileSystemCo > >11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES > >11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES > >11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES > >11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES > >11/07/27 17:43:09 INFO mapred.JobClient: Map-Reduce F > >11/07/27 17:43:09 INFO mapred.JobClient: Reduce inp > >11/07/27 17:43:09 INFO mapred.JobClient: Combine ou > >11/07/27 17:43:09 INFO mapred.JobClient: Map input > >11/07/27 17:43:09 INFO mapred.JobClient: Reduce shu > >11/07/27 17:43:09 INFO mapred.JobClient: Reduce out > >11/07/27 17:43:09 INFO mapred.JobClient: Spilled Re > >11/07/27 17:43:09 INFO mapred.JobClient: M
Re: OSX starting hadoop error
FYI, I logged a bug for this: https://issues.apache.org/jira/browse/HADOOP-7489 On Jul 28, 2011, at 11:36 AM, Bryan Keller wrote: > I am also seeing this error upon startup. I am guessing you are using OS X > Lion? It started happening to me after I upgraded to 10.7. Hadoop seems to > function properly despite this error showing up, though it is annoying. > > > On Jul 27, 2011, at 12:37 PM, Ben Cuthbert wrote: > >> All >> When starting hadoop on OSX I am getting this error. is there a fix for it >> >> java[22373:1c03] Unable to load realm info from SCDynamicStore >
Re: OSX starting hadoop error
I am also seeing this error upon startup. I am guessing you are using OS X Lion? It started happening to me after I upgraded to 10.7. Hadoop seems to function properly despite this error showing up, though it is annoying. On Jul 27, 2011, at 12:37 PM, Ben Cuthbert wrote: > All > When starting hadoop on OSX I am getting this error. is there a fix for it > > java[22373:1c03] Unable to load realm info from SCDynamicStore
Re: cygwin not connecting to Hadoop server
Hi A Df, see inline at :: - Original Message - From: A Df Date: Wednesday, July 27, 2011 10:31 pm Subject: Re: cygwin not connecting to Hadoop server To: "common-user@hadoop.apache.org" > See inline at **. More questions and many Thanks :D > > > > > > > >From: Uma Maheswara Rao G 72686 > >To: common-user@hadoop.apache.org; A Df > >Cc: "common-user@hadoop.apache.org" > > >Sent: Wednesday, 27 July 2011, 17:31 > >Subject: Re: cygwin not connecting to Hadoop server > > > > > >Hi A Df, > > > >Did you format the NameNode first? > > > >** I had formatted it already but then I had reinstalled Java and > upgraded the plugins in cygwin so I reformatted it again. :D yes it > worked!! I am not sure all the steps that got it to finally work :: Great :-) > but I will have to document it to prevent this headache in the > future. Although I typed ssh localhost too , so question is, do I > need to type ssh localhost each time I need to run hadoop?? Also, :: Actually ssh is an authentication service. To save the athentication keys, you can run below commands. which will provide authentication.No need to give password every time. ssh-keygen -t rsa -P "" cat /root/.ssh/id_rsa.pub > /root/.ssh/authosized_keys then exceute /etc/init.d/sshd restart To connect to remote machines cat /root/.ssh/id_rsa.pub | ssh root@ 'cat > /root/.ssh/authorized_keys' then in remote machine excute /etc/init.d/sshd restart > since I need to work with Eclipse maybe you can have a look at my > post about the plugin cause I can get the patch to work. The > subject is "Re: Cygwin not working with Hadoop and Eclipse Plugin". > I plan to read up on how to write programs for Hadoop. I am using > the tutorial at Yahoo but if you know of any really good about > coding with Hadoop or just about understanding Hadoop then please > let me know. Hadoop Definitive guide will the great book for understanding the Hadoop.Some sample prgrams also will be available. To check the Hadoop internals: http://www.google.co.in/url?sa=t&source=web&cd=8&ved=0CEMQFjAH&url=http%3A%2F%2Findia.paxcel.net%3A6060%2FLargeDataMatters%2Fwp-content%2Fuploads%2F2010%2F09%2FHDFS1.pdf&rct=j&q=hadoop%20internals%20%2B%20part%201&ei=CqAxTtD8C4fprQfYq6DMCw&usg=AFQjCNGYMQbAeGP0cYGl4OJHseRsfEMGvQ&cad=rja > > > >Can you check the NN logs whether NN is started or not? > >** I checked and the previous runs had some logs missing but now > the last one have all 5 logs and I got two conf files in xml. I > also copied out the other output files which I plan to examine. > Where do I specify the output extension format that I want for my > output file? I was hoping for an txt file it shows the output in a > file with no extension even though I can read it in Notepad++. I > also got to view the web interface at: > > NameNode - http://localhost:50070/ > > JobTracker - http://localhost:50030/ > > > >** See below for the working version, finally!! Thanks > > > >Williams@TWilliams-LTPC ~/hadoop-0.20.2 > >$ bin/hadoop jar hadoop-0.20.2-examples.jar grep input > >11/07/27 17:42:20 INFO mapred.FileInputFormat: Total in > > > >11/07/27 17:42:20 INFO mapred.JobClient: Running job: j > >11/07/27 17:42:21 INFO mapred.JobClient: map 0% reduce > >11/07/27 17:42:33 INFO mapred.JobClient: map 15% reduc > >11/07/27 17:42:36 INFO mapred.JobClient: map 23% reduc > >11/07/27 17:42:39 INFO mapred.JobClient: map 38% reduc > >11/07/27 17:42:42 INFO mapred.JobClient: map 38% reduc > >11/07/27 17:42:45 INFO mapred.JobClient: map 53% reduc > >11/07/27 17:42:48 INFO mapred.JobClient: map 69% reduc > >11/07/27 17:42:51 INFO mapred.JobClient: map 76% reduc > >11/07/27 17:42:54 INFO mapred.JobClient: map 92% reduc > >11/07/27 17:42:57 INFO mapred.JobClient: map 100% redu > >11/07/27 17:43:06 INFO mapred.JobClient: map 100% redu > >11/07/27 17:43:09 INFO mapred.JobClient: Job complete: > >11/07/27 17:43:09 INFO mapred.JobClient: Counters: 18 > >11/07/27 17:43:09 INFO mapred.JobClient: Job Counters > >11/07/27 17:43:09 INFO mapred.JobClient: Launched r > >11/07/27 17:43:09 INFO mapred.JobClient: Launched m > >11/07/27 17:43:09 INFO mapred.JobClient: Data-local > >11/07/27 17:43:09 INFO mapred.JobClient: FileSystemCo > >11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES > >11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES > >11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES > >11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES > >11/07/27 17:43:09 INFO mapred.JobClient: Map-Reduce F > >11/07/27 17:43:09 INFO mapred.JobClient: Reduce inp > >11/07/27 17:43:09 INFO mapred.JobClient: Combine ou > >11/07/27 17:43:09 INFO mapred.JobClient: Map input > >11/07/27 17:43:09 INFO mapred.JobClient: Reduce shu > >11/07/27 17:43:09 INFO mapred.JobClient: Reduce out > >11/07/27 17:43:09 INFO mapred.JobClient: Spilled Re > >11/07/27 17:43:09 INFO mapred.JobClient: M
Exporting From Hive
Hi, I was wondering if anyone could help me? Does anyone know if it is possible to include the column headers in an output from a Hive Query? I've had a look through the internet but can't seem to find an answer. If not, is it possible to export the result from a describe table query? If so I could then run that at the same tie and join up at a future date. Thanks for your help -- *Mike Bale* Graduate Insight Analyst *Cable and Wireless Communications* Tel: +44 (0)20 7315 4437 www.cwc.com The information contained in this email (and any attachments) is confidential and may be privileged. If you are not the intended recipient and have received this email in error, please notify the sender immediately by reply email and delete the message and any attachments. If you are not the named addressee, you must not copy, disclose, forward or otherwise use the information contained in this email. Cable & Wireless Communications Plc and its affiliates reserve the right to monitor all email communications through their networks to ensure regulatory compliance. Cable & Wireless Communications Plc is a company registered in England & Wales with number: 07130199 and offices located at 3rd Floor, 26 Red Lion Square, London WC1R 4HQ
Unit testing strategy for map/reduce methods
I've been playing with unit testing strategies for my Hadoop work. A discussion of techniques and a link to example code here: http://cornercases.wordpress.com/2011/07/28/unit-testing-mapreduce-with-overridden-write-methods/ .
Re: Replication and failure
On Thu, Jul 28, 2011 at 12:17 AM, Harsh J wrote: > Mohit, > > I believe Tom's book (Hadoop: The Definitive Guide) covers this > precisely well. Perhaps others too. > > Replication is a best-effort sort of thing. If 2 nodes are all that is > available, then two replicas are written and one is left to the > replica monitor service to replicate later as possible (leading to an > underreplicated write for the moment). The scenario (with default > configs) would only fail if you have 0 DataNodes 'available' to write > to. Thanks Harsh. I think you answered my question. I thought that replication of 3 is a must. And for that you really need atleast 4 nodes so that if one of the nodes die it can still write to 3 nodes. I am assuming writes to replica nodes are always synchronous and not eventually consistent. > > Or are you asking about what happens when a DN fails during a write operation? I am assuming there will be some errors in this case. > > On Thu, Jul 28, 2011 at 5:08 AM, Mohit Anchlia wrote: >> Just trying to understand what happens if there are 3 nodes with >> replication set to 3 and one node fails. Does it fail the writes too? >> >> If there is a link that I can look at will be great. I tried searching >> but didn't see any definitive answer. >> >> Thanks, >> Mohit >> > > > > -- > Harsh J >
Re: HBase Mapreduce cannot find Map class
See http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description for some help. St.Ack On Thu, Jul 28, 2011 at 4:04 AM, air wrote: > -- Forwarded message -- > From: air > Date: 2011/7/28 > Subject: HBase Mapreduce cannot find Map class > To: CDH Users > > > import java.io.IOException; > import java.text.ParseException; > import java.text.SimpleDateFormat; > import java.util.Date; > > import org.apache.hadoop.conf.Configured; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.hbase.HBaseConfiguration; > import org.apache.hadoop.hbase.client.HTable; > import org.apache.hadoop.hbase.client.Put; > import org.apache.hadoop.hbase.mapred.TableMapReduceUtil; > import org.apache.hadoop.io.LongWritable; > import org.apache.hadoop.io.Text; > import org.apache.hadoop.mapred.JobClient; > import org.apache.hadoop.mapred.JobConf; > import org.apache.hadoop.mapred.MapReduceBase; > import org.apache.hadoop.mapred.Mapper; > import org.apache.hadoop.mapred.OutputCollector; > import org.apache.hadoop.mapred.Reporter; > import org.apache.hadoop.mapred.FileInputFormat; > import org.apache.hadoop.mapred.lib.NullOutputFormat; > import org.apache.hadoop.util.Tool; > import org.apache.hadoop.util.ToolRunner; > > > public class LoadToHBase extends Configured implements Tool{ > public static class XMap extends MapReduceBase implements > Mapper{ > private JobConf conf; > > @Override > public void configure(JobConf conf){ > this.conf = conf; > try{ > this.table = new HTable(new HBaseConfiguration(conf), > "observations"); > }catch(IOException e){ > throw new RuntimeException("Failed HTable construction", e); > } > } > > @Override > public void close() throws IOException{ > super.close(); > table.close(); > } > > private HTable table; > public void map(LongWritable key, Text value, OutputCollector > output, Reporter reporter) throws IOException{ > String[] valuelist = value.toString().split("\t"); > SimpleDateFormat sdf = new SimpleDateFormat("-MM-dd > HH:mm:ss"); > Date addtime = null; // 用户注册时间 > Date ds = null; > Long delta_days = null; > String uid = valuelist[0]; > try { > addtime = sdf.parse(valuelist[1]); > } catch (ParseException e) { > // TODO Auto-generated catch block > e.printStackTrace(); > } > > String ds_str = conf.get("load.hbase.ds", null); > if (ds_str != null){ > try { > ds = sdf.parse(ds_str); > } catch (ParseException e) { > // TODO Auto-generated catch block > e.printStackTrace(); > } > }else{ > ds_str = "2011-07-28"; > } > > if (addtime != null && ds != null){ > delta_days = (ds.getTime() - addtime.getTime()) / (24 * 60 * > 60 * 1000); > } > > if (delta_days != null){ > byte[] rowKey = uid.getBytes(); > Put p = new Put(rowKey); > p.add("content".getBytes(), "attr1".getBytes(), > delta_days.toString().getBytes()); > table.put(p); > } > } > } > /** > * @param args > * @throws Exception > */ > public static void main(String[] args) throws Exception { > // TODO Auto-generated method stub > int exitCode = ToolRunner.run(new HBaseConfiguration(), new > LoadToHBase(), args); > System.exit(exitCode); > } > > @Override > public int run(String[] args) throws Exception { > // TODO Auto-generated method stub > JobConf conf = new JobConf(getClass()); > TableMapReduceUtil.addDependencyJars(conf); > FileInputFormat.addInputPath(conf, new Path(args[0])); > conf.setJobName("LoadToHBase"); > conf.setJarByClass(getClass()); > conf.setMapperClass(XMap.class); > conf.setNumReduceTasks(0); > conf.setOutputFormat(NullOutputFormat.class); > JobClient.runJob(conf); > return 0; > } > > } > > execute it using hbase LoadToHBase /user/hive/warehouse/datamining.db/xxx/ > and it says: > > .. > 11/07/28 17:20:29 INFO mapred.JobClient: Task Id : > attempt_201107261532_2625_m_04_1, Status : FAILED > java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387) > at org.apache.hadoop.mapred.MapTask.run(MapTask.j
Re: Class loading problem
On Thu, 28 Jul 2011 10:05:57 -0400, "Kumar, Ranjan" wrote: > I have a class to define data I am reading from a MySQL database. > According to online tutorials I created a class called MyRecord and > extended it from Writable, DBWritable. While running it with hadoop I get a > NoSuchMethodException: dataTest$MyRecord.() Hadoop needs a noargs constructor to build the object, which it then fills in by using readFields(). Many classes come with a default noargs constructor, which basically defers to the noargs contructor from Object, or another ancestor class. HOWEVER If you defined another constructor that takes arguments, you've implicitly removed the default noargs constructor on your class. You need to define one explicitly, which Hadoop can use to build your objects. hth
Class loading problem
I have a class to define data I am reading from a MySQL database. According to online tutorials I created a class called MyRecord and extended it from Writable, DBWritable. While running it with hadoop I get a NoSuchMethodException: dataTest$MyRecord.() I am using 0.21.0 Thanks for your help Ranjan -- Important Notice to Recipients: The sender of this e-mail is an employee of Morgan Stanley Smith Barney LLC. If you have received this communication in error, please destroy all electronic and paper copies and notify the sender immediately. Erroneous transmission is not intended to waive confidentiality or privilege. Morgan Stanley Smith Barney reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers/mssbemail.html. If you cannot access this link, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley Smith Barney you consent to the foregoing.
RE: Error in 9000 and 9001 port in hadoop-0.20.2
Start the namenode[set fs.default.name to hdfs://192.168.1.101:9000] and check your netstat report [netstat -nlp] to check which port and IP it is binding. Ideally, 9000 should be bound to 192.168.1.101. If yes, configure the same IP in slaves as well. Otw, we may need to revisit your configs once. To use the hostname, you should have hostname-IP mapping in /etc/hosts file in master as well as slaves. -Original Message- From: Doan Ninh [mailto:uitnetw...@gmail.com] Sent: Thursday, July 28, 2011 6:45 PM To: common-user@hadoop.apache.org Subject: Re: Error in 9000 and 9001 port in hadoop-0.20.2 I changed fs.default.name to hdfs://192.168.1.101:9000. But, the same error as before. I need a help On Thu, Jul 28, 2011 at 7:45 PM, Nitin Khandelwal < nitin.khandel...@germinait.com> wrote: > Plz change ur* fs.default.name* to hdfs://192.168.1.101:9000 > Thanks, > Nitin > > On 28 July 2011 17:46, Doan Ninh wrote: > > > In the first time, i use *hadoop-cluster-1* for 192.168.1.101. > > That is the hostname of the master node. > > But, the same error occurs > > How can i fix it? > > > > On Thu, Jul 28, 2011 at 7:07 PM, madhu phatak > > wrote: > > > > > I had issue using IP address in XML files . You can try to use host > names > > > in > > > the place of IP address . > > > > > > On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh > wrote: > > > > > > > Hi, > > > > > > > > I run Hadoop in 4 Ubuntu 11.04 on VirtualBox. > > > > On the master node (192.168.1.101), I configure fs.default.name = > > > hdfs:// > > > > 127.0.0.1:9000. Then i configure everything on 3 other node > > > > When i start the cluster by entering "$HADOOP_HOME/bin/start-all.sh" > on > > > the > > > > master node > > > > Everything is ok, but the slave can't connect to the master on 9000, > > 9001 > > > > port. > > > > I manually telnet to 192.168.1.101 in 9000, 9001. And the result is > > > > "connection refused" > > > > Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. > The > > > > result is connected. > > > > But, on the master node, i telnet to 192.168.1.101:9000 => > Connection > > > > Refused > > > > > > > > Can somebody help me? > > > > > > > > > > > > > -- > > > Nitin Khandelwal >
Re: Hadoop-streaming using binary executable c program
I am not completely sure what you are getting at. It looks like the output of your c program is (And this is just a guess) NOTE: \t stands for the tab character and in streaming it is used to separate the key from the value \n stands for carriage return and is used to separate individual records.. \t\n \t\n \t\n ... And you want the output to look like \t\n You could use a reduce to do this, but the issue here is with the shuffle in between the maps and the reduces. The Shuffle will group by the key to send to the reducers and then sort by the key. So in reality your map output looks something like FROM MAP 1: \t\n \t\n FROM MAP 2: \t\n \t\n FROM MAP 3: \t\n \t\n If you send it to a single reducer (The only way to get a single file) Then the input to the reducer will be sorted alphabetically by the RNA, and the order of the input will be lost. You can work around this by giving each line a unique number that is in the order you want It to be output. But doing this would require you to write some code. I would suggest that you do it with a small shell script after all the maps have completed to splice them together. -- Bobby On 7/27/11 2:55 PM, "Daniel Yehdego" wrote: Hi Bobby, I just want to ask you if there is away of using a reducer or something like concatenation to glue my outputs from the mapper and outputs them as a single file and segment of the predicted RNA 2D structure? FYI: I have used a reducer NONE before: HADOOP_HOME$ bin/hadoop jar /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output /user/yehdego/RF-out -reducer NONE -verbose and a sample of my output using the mapper of two different slave nodes looks like this : AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC and [...(((...))).]. (-13.46) GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU .(((.((......).. (-11.00) and I want to concatenate and output them as a single predicated RNA sequence structure: AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU [...(((...))).]..(((.((......).. Regards, Daniel T. Yehdego Computational Science Program University of Texas at El Paso, UTEP dtyehd...@miners.utep.edu > From: dtyehd...@miners.utep.edu > To: common-user@hadoop.apache.org > Subject: RE: Hadoop-streaming using binary executable c program > Date: Tue, 26 Jul 2011 16:23:10 + > > > Good afternoon Bobby, > > Thanks so much, now its working excellent. And the speed is also reasonable. > Once again thanks u. > > Regards, > > Daniel T. Yehdego > Computational Science Program > University of Texas at El Paso, UTEP > dtyehd...@miners.utep.edu > > > From: ev...@yahoo-inc.com > > To: common-user@hadoop.apache.org > > Date: Mon, 25 Jul 2011 14:47:34 -0700 > > Subject: Re: Hadoop-streaming using binary executable c program > > > > This is likely to be slow and it is not ideal. The ideal would be to > > modify pknotsRG to be able to read from stdin, but that may not be possible. > > > > The shell script would probably look something like the following > > > > #!/bin/sh > > rm -f temp.txt; > > while read line > > do > > echo $line >> temp.txt; > > done > > exec pknotsRG temp.txt; > > > > Place it in a file say hadoopPknotsRG Then you probably want to run > > > > chmod +x hadoopPknotsRG > > > > After that you want to test it with > > > > hadoop fs -cat > > /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 > > | ./hadoopPknotsRG > > > > If that works then you can try it with Hadoop streaming > > > > HADOOP_HOME$ bin/hadoop jar > > /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper > > ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file > > /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input > > /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output > > /user/yehdego/RF-out -reducer NONE -verbose > > > > --Bobby > > > > On 7/25/11 3:37 PM, "Daniel Yehdego" wrote: > > > > > > > > Good afternoon Bobby, > > > > Thanks, you gave me a great help in finding out what the problem was. After > > I put the command line you suggested me, I found out that there was a > > segmentation error. > > The binary executable program pknotsRG only reads a file with a sequence in > > it. This means, there should be a shell script, as you have said, that will > > take the data coming > > from stdin and write it to a temporary file. Any idea on how to do this job > > i
Re: next gen map reduce
It has not been introduced yet. If you are referring to MRV2. It is targeted to go into the 0.23 release of Hadoop, but is currently on the MR-279 branch. Which should hopefully be merged to trunk in abut a week. --Bobby On 7/28/11 7:31 AM, "real great.." wrote: In which Hadoop version is next gen introduced? -- Regards, R.V.
Re: /tmp/hadoop-oracle/dfs/name is in an inconsistent state
Hi, Before starting, you need to format the namenode. ./hdfs namenode -format then this directories will be created. respective configuration is 'dfs.namenode.name.dir' default configurations will exist in hdfs-default.xml. If you want to configure your own directory path, you can add the above property in hdfs-site.xml file. Regards, Uma Mahesh ** This email and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained here in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this email in error, please notify the sender by phone or email immediately and delete it! * - Original Message - From: "Daniel,Wu" Date: Thursday, July 28, 2011 6:51 pm Subject: /tmp/hadoop-oracle/dfs/name is in an inconsistent state To: common-user@hadoop.apache.org > When I started hadoop, the namenode failed to startup because of > the following error. The strange thing is that it says/tmp/hadoop- > oracle/dfs/name isinconsistent, but I don't think I have > configured it as /tmp/hadoop-oracle/dfs/name. Where should I check > for the related configuration? > 2011-07-28 21:07:35,383 ERROR > org.apache.hadoop.hdfs.server.namenode.NameNode: > org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory > /tmp/hadoop-oracle/dfs/name is in an inconsistent state: storage directory > does not exist or is not accessible. > >
/tmp/hadoop-oracle/dfs/name is in an inconsistent state
When I started hadoop, the namenode failed to startup because of the following error. The strange thing is that it says/tmp/hadoop-oracle/dfs/name isinconsistent, but I don't think I have configured it as /tmp/hadoop-oracle/dfs/name. Where should I check for the related configuration? 2011-07-28 21:07:35,383 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /tmp/hadoop-oracle/dfs/name is in an inconsistent state: storage directory does not exist or is not accessible.
Re: Error in 9000 and 9001 port in hadoop-0.20.2
I changed fs.default.name to hdfs://192.168.1.101:9000. But, the same error as before. I need a help On Thu, Jul 28, 2011 at 7:45 PM, Nitin Khandelwal < nitin.khandel...@germinait.com> wrote: > Plz change ur* fs.default.name* to hdfs://192.168.1.101:9000 > Thanks, > Nitin > > On 28 July 2011 17:46, Doan Ninh wrote: > > > In the first time, i use *hadoop-cluster-1* for 192.168.1.101. > > That is the hostname of the master node. > > But, the same error occurs > > How can i fix it? > > > > On Thu, Jul 28, 2011 at 7:07 PM, madhu phatak > > wrote: > > > > > I had issue using IP address in XML files . You can try to use host > names > > > in > > > the place of IP address . > > > > > > On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh > wrote: > > > > > > > Hi, > > > > > > > > I run Hadoop in 4 Ubuntu 11.04 on VirtualBox. > > > > On the master node (192.168.1.101), I configure fs.default.name = > > > hdfs:// > > > > 127.0.0.1:9000. Then i configure everything on 3 other node > > > > When i start the cluster by entering "$HADOOP_HOME/bin/start-all.sh" > on > > > the > > > > master node > > > > Everything is ok, but the slave can't connect to the master on 9000, > > 9001 > > > > port. > > > > I manually telnet to 192.168.1.101 in 9000, 9001. And the result is > > > > "connection refused" > > > > Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. > The > > > > result is connected. > > > > But, on the master node, i telnet to 192.168.1.101:9000 => > Connection > > > > Refused > > > > > > > > Can somebody help me? > > > > > > > > > > > > > -- > > > Nitin Khandelwal >
Re: next gen map reduce
Its currently still on the MR279 branch - http://svn.apache.org/viewvc/hadoop/common/branches/MR-279/. It is planned to be merged to trunk soon. Tom On 7/28/11 7:31 AM, "real great.." wrote: > In which Hadoop version is next gen introduced?
Re: Error in 9000 and 9001 port in hadoop-0.20.2
Plz change ur* fs.default.name* to hdfs://192.168.1.101:9000 Thanks, Nitin On 28 July 2011 17:46, Doan Ninh wrote: > In the first time, i use *hadoop-cluster-1* for 192.168.1.101. > That is the hostname of the master node. > But, the same error occurs > How can i fix it? > > On Thu, Jul 28, 2011 at 7:07 PM, madhu phatak > wrote: > > > I had issue using IP address in XML files . You can try to use host names > > in > > the place of IP address . > > > > On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh wrote: > > > > > Hi, > > > > > > I run Hadoop in 4 Ubuntu 11.04 on VirtualBox. > > > On the master node (192.168.1.101), I configure fs.default.name = > > hdfs:// > > > 127.0.0.1:9000. Then i configure everything on 3 other node > > > When i start the cluster by entering "$HADOOP_HOME/bin/start-all.sh" on > > the > > > master node > > > Everything is ok, but the slave can't connect to the master on 9000, > 9001 > > > port. > > > I manually telnet to 192.168.1.101 in 9000, 9001. And the result is > > > "connection refused" > > > Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. The > > > result is connected. > > > But, on the master node, i telnet to 192.168.1.101:9000 => Connection > > > Refused > > > > > > Can somebody help me? > > > > > > -- Nitin Khandelwal
next gen map reduce
In which Hadoop version is next gen introduced? -- Regards, R.V.
Re: Hadoop Question
How about having the slave write to temp file first, then move it to the file the master is monitoring for after they close it? -Joey On Jul 27, 2011, at 22:51, Nitin Khandelwal wrote: > Hi All, > > How can I determine if a file is being written to (by any thread) in HDFS. I > have a continuous process on the master node, which is tracking a particular > folder in HDFS for files to process. On the slave nodes, I am creating files > in the same folder using the following code : > > At the slave node: > > import org.apache.commons.io.IOUtils; > import org.apache.hadoop.fs.FileSystem; > import java.io.OutputStream; > > OutputStream oStream = fileSystem.create(path); > IOUtils.write(, oStream); > IOUtils.closeQuietly(oStream); > > > At the master node, > I am getting the earliest modified file in the folder. At times when I try > reading the file, I get nothing in the file, mostly because the slave might > be still finishing writing to the file. Is there any way, to somehow tell > the master, that the slave is still writing to the file and to check the > file sometime later for actual content. > > Thanks, > -- > > > Nitin Khandelwal
Re: Error in 9000 and 9001 port in hadoop-0.20.2
In the first time, i use *hadoop-cluster-1* for 192.168.1.101. That is the hostname of the master node. But, the same error occurs How can i fix it? On Thu, Jul 28, 2011 at 7:07 PM, madhu phatak wrote: > I had issue using IP address in XML files . You can try to use host names > in > the place of IP address . > > On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh wrote: > > > Hi, > > > > I run Hadoop in 4 Ubuntu 11.04 on VirtualBox. > > On the master node (192.168.1.101), I configure fs.default.name = > hdfs:// > > 127.0.0.1:9000. Then i configure everything on 3 other node > > When i start the cluster by entering "$HADOOP_HOME/bin/start-all.sh" on > the > > master node > > Everything is ok, but the slave can't connect to the master on 9000, 9001 > > port. > > I manually telnet to 192.168.1.101 in 9000, 9001. And the result is > > "connection refused" > > Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. The > > result is connected. > > But, on the master node, i telnet to 192.168.1.101:9000 => Connection > > Refused > > > > Can somebody help me? > > >
Re: Error in 9000 and 9001 port in hadoop-0.20.2
I had issue using IP address in XML files . You can try to use host names in the place of IP address . On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh wrote: > Hi, > > I run Hadoop in 4 Ubuntu 11.04 on VirtualBox. > On the master node (192.168.1.101), I configure fs.default.name = hdfs:// > 127.0.0.1:9000. Then i configure everything on 3 other node > When i start the cluster by entering "$HADOOP_HOME/bin/start-all.sh" on the > master node > Everything is ok, but the slave can't connect to the master on 9000, 9001 > port. > I manually telnet to 192.168.1.101 in 9000, 9001. And the result is > "connection refused" > Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. The > result is connected. > But, on the master node, i telnet to 192.168.1.101:9000 => Connection > Refused > > Can somebody help me? >
Error in 9000 and 9001 port in hadoop-0.20.2
Hi, I run Hadoop in 4 Ubuntu 11.04 on VirtualBox. On the master node (192.168.1.101), I configure fs.default.name = hdfs:// 127.0.0.1:9000. Then i configure everything on 3 other node When i start the cluster by entering "$HADOOP_HOME/bin/start-all.sh" on the master node Everything is ok, but the slave can't connect to the master on 9000, 9001 port. I manually telnet to 192.168.1.101 in 9000, 9001. And the result is "connection refused" Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. The result is connected. But, on the master node, i telnet to 192.168.1.101:9000 => Connection Refused Can somebody help me?
RE: Reader/Writer problem in HDFS
No such API as per my knowledge. copyFromLocal is such API. That may not fit in your scenario I guess. --Laxman -Original Message- From: Meghana [mailto:meghana.mara...@germinait.com] Sent: Thursday, July 28, 2011 4:32 PM To: hdfs-u...@hadoop.apache.org; lakshman...@huawei.com Cc: common-user@hadoop.apache.org Subject: Re: Reader/Writer problem in HDFS Thanks Laxman! That would definitely help things. :) Is there a better FileSystem/other method call to create a file in one go (i.e. atomic i guess?), without having to call create() and then write to the stream? ..meghana On 28 July 2011 16:12, Laxman wrote: > One approach can be use some ".tmp" extension while writing. Once the write > is completed rename back to original file name. Also, reducer has to filter > out ".tmp" files. > > This will ensure reducer will not pickup the partial files. > > We do have the similar scenario where the a/m approach resolved the issue. > > -Original Message- > From: Meghana [mailto:meghana.mara...@germinait.com] > Sent: Thursday, July 28, 2011 1:38 PM > To: common-user; hdfs-u...@hadoop.apache.org > Subject: Reader/Writer problem in HDFS > > Hi, > > We have a job where the map tasks are given the path to an output folder. > Each map task writes a single file to that folder. There is no reduce > phase. > There is another thread, which constantly looks for new files in the output > folder. If found, it persists the contents to index, and deletes the file. > > We use this code in the map task: > try { >OutputStream oStream = fileSystem.create(path); >IOUtils.write("xyz", oStream); > } finally { >IOUtils.closeQuietly(oStream); > } > > The problem: Some times the reader thread sees & tries to read a file which > is not yet fully written to HDFS (or the checksum is not written yet, etc), > and throws an error. Is it possible to write an HDFS file in such a way > that > it won't be visible until it is fully written? > > We use Hadoop 0.20.203. > > Thanks, > > Meghana > >
Fwd: HBase Mapreduce cannot find Map class
-- Forwarded message -- From: air Date: 2011/7/28 Subject: HBase Mapreduce cannot find Map class To: CDH Users import java.io.IOException; import java.text.ParseException; import java.text.SimpleDateFormat; import java.util.Date; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apache.hadoop.hbase.client.HTable; import org.apache.hadoop.hbase.client.Put; import org.apache.hadoop.hbase.mapred.TableMapReduceUtil; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.lib.NullOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class LoadToHBase extends Configured implements Tool{ public static class XMap extends MapReduceBase implements Mapper{ private JobConf conf; @Override public void configure(JobConf conf){ this.conf = conf; try{ this.table = new HTable(new HBaseConfiguration(conf), "observations"); }catch(IOException e){ throw new RuntimeException("Failed HTable construction", e); } } @Override public void close() throws IOException{ super.close(); table.close(); } private HTable table; public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException{ String[] valuelist = value.toString().split("\t"); SimpleDateFormat sdf = new SimpleDateFormat("-MM-dd HH:mm:ss"); Date addtime = null; // 用户注册时间 Date ds = null; Long delta_days = null; String uid = valuelist[0]; try { addtime = sdf.parse(valuelist[1]); } catch (ParseException e) { // TODO Auto-generated catch block e.printStackTrace(); } String ds_str = conf.get("load.hbase.ds", null); if (ds_str != null){ try { ds = sdf.parse(ds_str); } catch (ParseException e) { // TODO Auto-generated catch block e.printStackTrace(); } }else{ ds_str = "2011-07-28"; } if (addtime != null && ds != null){ delta_days = (ds.getTime() - addtime.getTime()) / (24 * 60 * 60 * 1000); } if (delta_days != null){ byte[] rowKey = uid.getBytes(); Put p = new Put(rowKey); p.add("content".getBytes(), "attr1".getBytes(), delta_days.toString().getBytes()); table.put(p); } } } /** * @param args * @throws Exception */ public static void main(String[] args) throws Exception { // TODO Auto-generated method stub int exitCode = ToolRunner.run(new HBaseConfiguration(), new LoadToHBase(), args); System.exit(exitCode); } @Override public int run(String[] args) throws Exception { // TODO Auto-generated method stub JobConf conf = new JobConf(getClass()); TableMapReduceUtil.addDependencyJars(conf); FileInputFormat.addInputPath(conf, new Path(args[0])); conf.setJobName("LoadToHBase"); conf.setJarByClass(getClass()); conf.setMapperClass(XMap.class); conf.setNumReduceTasks(0); conf.setOutputFormat(NullOutputFormat.class); JobClient.runJob(conf); return 0; } } execute it using hbase LoadToHBase /user/hive/warehouse/datamining.db/xxx/ and it says: .. 11/07/28 17:20:29 INFO mapred.JobClient: Task Id : attempt_201107261532_2625_m_04_1, Status : FAILED java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) at org.apache.hadoop.mapred.Child.main(Child.java:264) C
Re: Reader/Writer problem in HDFS
Thanks Laxman! That would definitely help things. :) Is there a better FileSystem/other method call to create a file in one go (i.e. atomic i guess?), without having to call create() and then write to the stream? ..meghana On 28 July 2011 16:12, Laxman wrote: > One approach can be use some ".tmp" extension while writing. Once the write > is completed rename back to original file name. Also, reducer has to filter > out ".tmp" files. > > This will ensure reducer will not pickup the partial files. > > We do have the similar scenario where the a/m approach resolved the issue. > > -Original Message- > From: Meghana [mailto:meghana.mara...@germinait.com] > Sent: Thursday, July 28, 2011 1:38 PM > To: common-user; hdfs-u...@hadoop.apache.org > Subject: Reader/Writer problem in HDFS > > Hi, > > We have a job where the map tasks are given the path to an output folder. > Each map task writes a single file to that folder. There is no reduce > phase. > There is another thread, which constantly looks for new files in the output > folder. If found, it persists the contents to index, and deletes the file. > > We use this code in the map task: > try { >OutputStream oStream = fileSystem.create(path); >IOUtils.write("xyz", oStream); > } finally { >IOUtils.closeQuietly(oStream); > } > > The problem: Some times the reader thread sees & tries to read a file which > is not yet fully written to HDFS (or the checksum is not written yet, etc), > and throws an error. Is it possible to write an HDFS file in such a way > that > it won't be visible until it is fully written? > > We use Hadoop 0.20.203. > > Thanks, > > Meghana > >
RE: Reader/Writer problem in HDFS
One approach can be use some ".tmp" extension while writing. Once the write is completed rename back to original file name. Also, reducer has to filter out ".tmp" files. This will ensure reducer will not pickup the partial files. We do have the similar scenario where the a/m approach resolved the issue. -Original Message- From: Meghana [mailto:meghana.mara...@germinait.com] Sent: Thursday, July 28, 2011 1:38 PM To: common-user; hdfs-u...@hadoop.apache.org Subject: Reader/Writer problem in HDFS Hi, We have a job where the map tasks are given the path to an output folder. Each map task writes a single file to that folder. There is no reduce phase. There is another thread, which constantly looks for new files in the output folder. If found, it persists the contents to index, and deletes the file. We use this code in the map task: try { OutputStream oStream = fileSystem.create(path); IOUtils.write("xyz", oStream); } finally { IOUtils.closeQuietly(oStream); } The problem: Some times the reader thread sees & tries to read a file which is not yet fully written to HDFS (or the checksum is not written yet, etc), and throws an error. Is it possible to write an HDFS file in such a way that it won't be visible until it is fully written? We use Hadoop 0.20.203. Thanks, Meghana
Hadoop output contains __temporary
Hi, all In my recent work in hadoop, I find that the output dir contains: both _SUCCESS and __temporary. And then the next job would be failed because the input path contains _temporary. How does this happen? And How to avoid this? Thanks for your replies. liuliu --
Why hadoop 0.20.203 unit test failed
Hi all, I'm trying to compile and unit testing hadoop 0.20.203, but met with almost the same problem with previous discussion in the mailing list( http://mail-archives.apache.org/mod_mbox/hadoop-general/201105.mbox/%3CBANLkTim68H=8ngbfzmsvrqob9pmy7fv...@mail.gmail.com%3E). Even after setting umask to 022, I still have 11 testcases failed, as listed below. Test org.apache.hadoop.mapred.TestJobTrackerRestart FAILED Test org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker FAILED Test org.apache.hadoop.mapred.TestJobTrackerSafeMode FAILED Test org.apache.hadoop.filecache.TestMRWithDistributedCache FAILED Test org.apache.hadoop.filecache.TestTrackerDistributedCacheManager FAILED Test org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript FAILED Test org.apache.hadoop.mapred.TestRecoveryManager FAILED Test org.apache.hadoop.mapred.TestTaskTrackerLocalization FAILED Test org.apache.hadoop.mapred.lib.TestCombineFileInputFormat FAILED Test org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl FAILED Test org.apache.hadoop.tools.rumen.TestRumenJobTraces FAILED Test org.apache.hadoop.hdfsproxy.TestHdfsProxy FAILED The jdk version in my testing environment is Sun jdk 1.6u19, and the ant version is 1.8.2 Anybody knows what causes these testcase failure? Any comments/suggestion would be highly appreciated. -- Best Regards, Li Yu
Reader/Writer problem in HDFS
Hi, We have a job where the map tasks are given the path to an output folder. Each map task writes a single file to that folder. There is no reduce phase. There is another thread, which constantly looks for new files in the output folder. If found, it persists the contents to index, and deletes the file. We use this code in the map task: try { OutputStream oStream = fileSystem.create(path); IOUtils.write("xyz", oStream); } finally { IOUtils.closeQuietly(oStream); } The problem: Some times the reader thread sees & tries to read a file which is not yet fully written to HDFS (or the checksum is not written yet, etc), and throws an error. Is it possible to write an HDFS file in such a way that it won't be visible until it is fully written? We use Hadoop 0.20.203. Thanks, Meghana
Re: Replication and failure
Mohit, I believe Tom's book (Hadoop: The Definitive Guide) covers this precisely well. Perhaps others too. Replication is a best-effort sort of thing. If 2 nodes are all that is available, then two replicas are written and one is left to the replica monitor service to replicate later as possible (leading to an underreplicated write for the moment). The scenario (with default configs) would only fail if you have 0 DataNodes 'available' to write to. Or are you asking about what happens when a DN fails during a write operation? On Thu, Jul 28, 2011 at 5:08 AM, Mohit Anchlia wrote: > Just trying to understand what happens if there are 3 nodes with > replication set to 3 and one node fails. Does it fail the writes too? > > If there is a link that I can look at will be great. I tried searching > but didn't see any definitive answer. > > Thanks, > Mohit > -- Harsh J
RE: where to find the log info
Daniel, You can find those std out statements in "{LOG Directory}/userlogs/{task attemp id}/stdout" file. Same way you can find std err statements in "{LOG Directory}/userlogs/{task attemp id}/stderr" and log statements in "{LOG Directory}/userlogs/{task attemp id}/syslog". Devaraj K -Original Message- From: Daniel,Wu [mailto:hadoop...@163.com] Sent: Thursday, July 28, 2011 11:47 AM To: common-user@hadoop.apache.org Subject: where to find the log info Hi everyone, I am new to it, and want to do some debug/log. I'd like to check what the value is for each mapper execution. If I add the following code in bold, where can I find the log info? If I can't do it in this way, how should I do? public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); System.out.println(value.toString); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } }
Re: File System Counters.
Raj, There is no overlap. Data read from HDFS FileSystem instances go to HDFS_BYTES_READ, and data read from Local FileSystem instances go to FILE_BYTES_READ. These are two different FileSystems, and have no overlap at all. On Thu, Jul 28, 2011 at 5:56 AM, R V wrote: > Hello > > I don't know if the question has been answered. I am trying to understand > the overlap between FILE_BYTES_READ and HDFS_BYTES_READ. What are the various > components that provide value to this counter? For example when I see > FILE_BYTES_READ for a specific task ( Map or Reduce ) , is it purely due to > the spill during sort phase? If a HDFS read happens on a non local node, does > the counter increase on the node where the data block resides? What happens > when the data is local? does the counter increase for both HDFS_BYTES_READ > and FILE_BYTES_READ? From the values I am seeing, this looks to be the case > but I am not sure. > > I am not very fluent in Java , and hence I don't fully understand the source > . :-( > > Raj -- Harsh J
Re: where to find the log info
Task logs are written to userlogs directory on the TT nodes. You can view task logs on the JobTracker/TaskTracker web UI for each task at: http://machine:50030/taskdetails.jsp?jobid=&tipid= All of syslogs, stdout and stderr logs are available in the links to logs off that page. 2011/7/28 Daniel,Wu : > Hi everyone, > > I am new to it, and want to do some debug/log. I'd like to check what the > value is for each mapper execution. If I add the following code in bold, > where can I find the log info? If I can't do it in this way, how should I do? > > public void map(Object key, Text value, Context context > ) throws IOException, InterruptedException { > StringTokenizer itr = new StringTokenizer(value.toString()); > System.out.println(value.toString); > while (itr.hasMoreTokens()) { > word.set(itr.nextToken()); > context.write(word, one); > } > } > } -- Harsh J