Restricting quota for users in HDFS
Hi all, We have chown command in hadoop dfs to make a particular directory own by a person. Do we have something similar to create user with some space limit/restrict the disk usage by a particular user? Thanks Pallavi
Re: Datanodes fail to start
If you rebuilt the hadoop, following the wikipage of HowToRelease may reduce the trouble occurred. On Sat, May 16, 2009 at 7:20 AM, Pankil Doshiforpan...@gmail.com wrote: I got the solution.. Namespace IDs where some how incompatible.So I had to clean data dir and temp dir ,format the cluster and make a fresh start Pankil On Fri, May 15, 2009 at 2:25 AM, jason hadoop jason.had...@gmail.comwrote: There should be a few more lines at the end. We only want the part from last the STARTUP_MSG to the end On one of mine a successfull start looks like this: STARTUP_MSG: Starting DataNode STARTUP_MSG: host = at/192.168.1.119 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.19.1-dev STARTUP_MSG: build = -r ; compiled by 'jason' on Tue Mar 17 04:03:57 PDT 2009 / 2009-03-17 03:08:11,884 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean 2009-03-17 03:08:11,886 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010 2009-03-17 03:08:11,889 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s 2009-03-17 03:08:12,142 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2009-03-17 03:08:12,155 INFO org.mortbay.util.Credential: Checking Resource aliases 2009-03-17 03:08:12,518 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.webapplicationhand...@1e184cb 2009-03-17 03:08:12,578 INFO org.mortbay.util.Container: Started WebApplicationContext[/static,/static] 2009-03-17 03:08:12,721 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.webapplicationhand...@1d9e282 2009-03-17 03:08:12,722 INFO org.mortbay.util.Container: Started WebApplicationContext[/logs,/logs] 2009-03-17 03:08:12,878 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.webapplicationhand...@14a75bb 2009-03-17 03:08:12,884 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2009-03-17 03:08:12,951 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50075 2009-03-17 03:08:12,951 INFO org.mortbay.util.Container: Started org.mortbay.jetty.ser...@1358f03 2009-03-17 03:08:12,957 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=DataNode, sessionId=null 2009-03-17 03:08:13,242 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=DataNode, port=50020 2009-03-17 03:08:13,264 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2009-03-17 03:08:13,304 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting 2009-03-17 03:08:13,343 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting 2009-03-17 03:08:13,343 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(192.168.1.119:50010, storageID=DS-540597485-192.168.1.119-50010-1237022386925, infoPort=50075, ipcPort=50020) 2009-03-17 03:08:13,344 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting 2009-03-17 03:08:13,344 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting 2009-03-17 03:08:13,351 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 192.168.1.119:50010, storageID=DS-540597485-192.168.1.119-50010-1237022386925, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/tmp/hadoop-0.19.0-jason/dfs/data/current'} 2009-03-17 03:08:13,352 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 360msec Initial delay: 0msec 2009-03-17 03:08:13,391 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 14 blocks got processed in 27 msecs 2009-03-17 03:08:13,392 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block scanner. On Thu, May 14, 2009 at 9:51 PM, Pankil Doshi forpan...@gmail.com wrote: This is log from datanode. 2009-05-14 00:36:14,559 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 82 blocks got processed in 12 msecs 2009-05-14 01:36:15,768 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 82 blocks got processed in 8 msecs 2009-05-14 02:36:13,975 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 82 blocks got processed in 9 msecs 2009-05-14 03:36:15,189 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 82 blocks got processed in 12 msecs 2009-05-14 04:36:13,384 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 82 blocks got processed in 9 msecs 2009-05-14 05:36:14,592 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 82 blocks got processed in 9 msecs 2009-05-14 06:36:15,806 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 82 blocks got processed in 12 msecs 2009-05-14 07:36:14,008 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 82 blocks got processed in 12 msecs 2009-05-14 08:36:15,204 INFO
Re: Can I share datas for several map tasks?
Thanks for your reply. Can you do me a favor to make a check? I modified mapred-default.xml as follows: 540 property 541 namemapred.job.reuse.jvm.num.tasks/name 542 value-1/value 543 descriptionHow many tasks to run per jvm. If set to -1, there is 544 no limit. 545 /description 546 /property And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop; This is my program: 17 public class WordCount { 18 19 public static class TokenizerMapper 20extends MapperObject, Text, Text, IntWritable{ 21 22 private final static IntWritable one = new IntWritable(1); 23 private Text word = new Text(); 24 public static int[] ToBeSharedData = new int[1024 * 1024 * 16]; 25 26 protected void setup(Context context 27 ) throws IOException, InterruptedException { 28 //Init shared data 29 ToBeSharedData[0] = 12345; 30 System.out.println(setup shared data[0] = + ToBeSharedData[0]); 31 } 32 33 public void map(Object key, Text value, Context context 34 ) throws IOException, InterruptedException { 35 StringTokenizer itr = new StringTokenizer(value.toString()); 36 while (itr.hasMoreTokens()) { 37 word.set(itr.nextToken()); 38 context.write(word, one); 39 } 40 System.out.println(read shared data[0] = + ToBeSharedData[0]); 41 } 42 } First, can you tell me how to make sure jvm reuse is taking effect, for I didn't see anything different from before. I use top command under linux and see the same number of java processes and same memory usage. Second, can you tell me how to make the ToBeSharedData be inited only once and can be read from other MapTasks on the same node? Or this is not a suitable programming style for map-reduce? By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a single-node. thanks in advance On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal shara...@yahoo-inc.comwrote: snowloong wrote: Hi, I want to share some data structures for the map tasks on a same node(not through files), I mean, if one map task has already initialized some data structures (e.g. an array or a list), can other map tasks share these memorys and directly access them, for I don't want to reinitialize these datas and I want to save some memory. Can hadoop help me do this? You can enable jvm reuse across tasks. See mapred.job.reuse.jvm.num.tasks in mapred-default.xml for usage. Then you can cache the data in a static variable in your mapper. - Sharad
Rack Configuration::!
Hello! How to configure the machines in different racks? I have in all 10 machines. Now I want the heirarchy as follows:: machine1 machine2 machine3--these are all DN and TT machine4 machine5 -JT1 machine7 machine8-- JT2 machine10NN and Sec.NN As of now I have 7 machines running a hadoop cluster. which follws below heirarchy:: machine1 machine2-- these are all DN and TT machine3 machine4 machine5--JT machine6-Sec.NN machine7---NN Also, if the machines are configured in different racks, what advantage do we have? Also, give me few problem statements which handles big amount of data(processing). What Yahoo and Amazon guys have done? What kind of huge processing of huge data they have handled? -- Regards! Sugandha
Re: 2009 Hadoop Summit West - was wonderful
Thanks Jason and Chuck. On Tue, Jun 16, 2009 at 5:55 AM, Chuck Lam chuck@gmail.com wrote: She mentioned a number of projects. I think this one is most relevant. ASDF: Automated, Online Fingerpointing for Hadoop http://www.pdl.cmu.edu/PDL-FTP/stray/CMU-PDL-08-104_abs.html On Sun, Jun 14, 2009 at 6:38 PM, jason hadoop jason.had...@gmail.com wrote: This is the best I have at present: http://www.cs.cmu.edu/~priya/ http://www.cs.cmu.edu/%7Epriya/ On Sat, Jun 13, 2009 at 11:05 AM, zsongbo zson...@gmail.com wrote: Hi Jason, Could you please post more information about Pria Narasimhan's toolset for automated fault detection in hadoop clusters? Such as url or others. Thanks. Schubert On Thu, Jun 11, 2009 at 11:26 AM, jason hadoop jason.had...@gmail.com wrote: I had a great time, smoozing with people, and enjoyed a couple of the talks I would love to see more from Pria Narasimhan, hope their toolset for automated fault detection in hadoop clusters becomes generally available. Zookeeper rocks on! Hbase is starting to look really good, in 0.20 the master node and the single point of failure and configuration headache goes away and Zookeeper takes over. Owen O'Mally ave a solid presentation on the new Hadoop API's and the reasons for the changes. It was good to hang with everyone, see you all next year! I even got to spend a little time chatting with Tom White, and a signed copy of his book, thanks Tom! -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals
Hadoop as Cloud Storage
Dear Hadoop Guru's, After googling and find some information on using hadoop as cloud storage (long term). I have a problem to maintain lots of data (around 50 TB) much of them are TV Commercial (video files). I know, the best solution for long term file archiving is using tape backup, but i just curious, is hadoop can be used as 'data archiving' platform ? Thanks! Warm Regards, Wildan --- OpenThink Labs http://openthink-labs.tobethink.com/ Making IT, Business and Education in Harmony 087884599249 Y! : hawking_123 Linkedln : http://www.linkedin.com/in/wildanmaulana
Re: Rack Configuration::!
On Tue, Jun 16, 2009 at 2:45 PM, Sugandha Naolekar sugandha@gmail.comwrote: Hello! How to configure the machines in different racks? Also, if the machines are configured in different racks, what advantage do we have? See this thread: http://www.nabble.com/Hadoop-topology.script.file.name-Form-td17683521.html Also, give me few problem statements which handles big amount of data(processing). What Yahoo and Amazon guys have done? What kind of huge processing of huge data they have handled? Just google map reduce applications. Read the original map-reduce paper by Googlers. -- Harish Mallipeddi http://blog.poundbang.in
Re: MapContext.getInputSplit() returns nothing
Why dont we convert input split information into the same string format that is displayed in the webUI? Something like this - hdfs://nyc-qws-029/in-dir/words86ac4a.txt:0+184185. Its a simple format and we can always parse such a string in C++. Is there some reason for the current binary format? If there is good reason for it, I am game to write such a deserialiser class. Is there some reference for this binary format that I can use to write the deserialiser? Roshan On Mon, Jun 15, 2009 at 5:40 PM, Owen O'Malley omal...@apache.org wrote: *Sigh* We need Avro for input splits. That is the expected behavior. It would be great if someone wrote a C++ FileInputSplit class that took a binary string and converted it back to a filename, offset, and length. -- Owen
Re: Can I share datas for several map tasks?
In the examples for my book is a jvm reuse with static data shared between jvm's example On Tue, Jun 16, 2009 at 1:08 AM, Hello World snowlo...@gmail.com wrote: Thanks for your reply. Can you do me a favor to make a check? I modified mapred-default.xml as follows: 540 property 541 namemapred.job.reuse.jvm.num.tasks/name 542 value-1/value 543 descriptionHow many tasks to run per jvm. If set to -1, there is 544 no limit. 545 /description 546 /property And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop; This is my program: 17 public class WordCount { 18 19 public static class TokenizerMapper 20extends MapperObject, Text, Text, IntWritable{ 21 22 private final static IntWritable one = new IntWritable(1); 23 private Text word = new Text(); 24 public static int[] ToBeSharedData = new int[1024 * 1024 * 16]; 25 26 protected void setup(Context context 27 ) throws IOException, InterruptedException { 28 //Init shared data 29 ToBeSharedData[0] = 12345; 30 System.out.println(setup shared data[0] = + ToBeSharedData[0]); 31 } 32 33 public void map(Object key, Text value, Context context 34 ) throws IOException, InterruptedException { 35 StringTokenizer itr = new StringTokenizer(value.toString()); 36 while (itr.hasMoreTokens()) { 37 word.set(itr.nextToken()); 38 context.write(word, one); 39 } 40 System.out.println(read shared data[0] = + ToBeSharedData[0]); 41 } 42 } First, can you tell me how to make sure jvm reuse is taking effect, for I didn't see anything different from before. I use top command under linux and see the same number of java processes and same memory usage. Second, can you tell me how to make the ToBeSharedData be inited only once and can be read from other MapTasks on the same node? Or this is not a suitable programming style for map-reduce? By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a single-node. thanks in advance On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal shara...@yahoo-inc.com wrote: snowloong wrote: Hi, I want to share some data structures for the map tasks on a same node(not through files), I mean, if one map task has already initialized some data structures (e.g. an array or a list), can other map tasks share these memorys and directly access them, for I don't want to reinitialize these datas and I want to save some memory. Can hadoop help me do this? You can enable jvm reuse across tasks. See mapred.job.reuse.jvm.num.tasks in mapred-default.xml for usage. Then you can cache the data in a static variable in your mapper. - Sharad -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals
Re: Datanodes fail to start
I often find myself editing the src/saveVersion.sh to fake out the version numbers, when I build a hadoop jar for the first time, and have to deploy it on an an already running cluster. On Mon, Jun 15, 2009 at 11:57 PM, Ian jonhson jonhson@gmail.com wrote: If you rebuilt the hadoop, following the wikipage of HowToRelease may reduce the trouble occurred. On Sat, May 16, 2009 at 7:20 AM, Pankil Doshiforpan...@gmail.com wrote: I got the solution.. Namespace IDs where some how incompatible.So I had to clean data dir and temp dir ,format the cluster and make a fresh start Pankil On Fri, May 15, 2009 at 2:25 AM, jason hadoop jason.had...@gmail.com wrote: There should be a few more lines at the end. We only want the part from last the STARTUP_MSG to the end On one of mine a successfull start looks like this: STARTUP_MSG: Starting DataNode STARTUP_MSG: host = at/192.168.1.119 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.19.1-dev STARTUP_MSG: build = -r ; compiled by 'jason' on Tue Mar 17 04:03:57 PDT 2009 / 2009-03-17 03:08:11,884 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Registered FSDatasetStatusMBean 2009-03-17 03:08:11,886 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 50010 2009-03-17 03:08:11,889 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Balancing bandwith is 1048576 bytes/s 2009-03-17 03:08:12,142 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4 2009-03-17 03:08:12,155 INFO org.mortbay.util.Credential: Checking Resource aliases 2009-03-17 03:08:12,518 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.webapplicationhand...@1e184cb 2009-03-17 03:08:12,578 INFO org.mortbay.util.Container: Started WebApplicationContext[/static,/static] 2009-03-17 03:08:12,721 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.webapplicationhand...@1d9e282 2009-03-17 03:08:12,722 INFO org.mortbay.util.Container: Started WebApplicationContext[/logs,/logs] 2009-03-17 03:08:12,878 INFO org.mortbay.util.Container: Started org.mortbay.jetty.servlet.webapplicationhand...@14a75bb 2009-03-17 03:08:12,884 INFO org.mortbay.util.Container: Started WebApplicationContext[/,/] 2009-03-17 03:08:12,951 INFO org.mortbay.http.SocketListener: Started SocketListener on 0.0.0.0:50075 2009-03-17 03:08:12,951 INFO org.mortbay.util.Container: Started org.mortbay.jetty.ser...@1358f03 2009-03-17 03:08:12,957 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=DataNode, sessionId=null 2009-03-17 03:08:13,242 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=DataNode, port=50020 2009-03-17 03:08:13,264 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting 2009-03-17 03:08:13,304 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 50020: starting 2009-03-17 03:08:13,343 INFO org.apache.hadoop.ipc.Server: IPC Server handler 0 on 50020: starting 2009-03-17 03:08:13,343 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dnRegistration = DatanodeRegistration(192.168.1.119:50010, storageID=DS-540597485-192.168.1.119-50010-1237022386925, infoPort=50075, ipcPort=50020) 2009-03-17 03:08:13,344 INFO org.apache.hadoop.ipc.Server: IPC Server handler 1 on 50020: starting 2009-03-17 03:08:13,344 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 50020: starting 2009-03-17 03:08:13,351 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration( 192.168.1.119:50010, storageID=DS-540597485-192.168.1.119-50010-1237022386925, infoPort=50075, ipcPort=50020)In DataNode.run, data = FSDataset{dirpath='/tmp/hadoop-0.19.0-jason/dfs/data/current'} 2009-03-17 03:08:13,352 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: using BLOCKREPORT_INTERVAL of 360msec Initial delay: 0msec 2009-03-17 03:08:13,391 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 14 blocks got processed in 27 msecs 2009-03-17 03:08:13,392 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting Periodic block scanner. On Thu, May 14, 2009 at 9:51 PM, Pankil Doshi forpan...@gmail.com wrote: This is log from datanode. 2009-05-14 00:36:14,559 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 82 blocks got processed in 12 msecs 2009-05-14 01:36:15,768 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 82 blocks got processed in 8 msecs 2009-05-14 02:36:13,975 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 82 blocks got processed in 9 msecs 2009-05-14 03:36:15,189 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 82 blocks got processed in 12 msecs 2009-05-14 04:36:13,384 INFO org.apache.hadoop.dfs.DataNode: BlockReport of 82 blocks got processed in 9
Re: Debugging Map-Reduce programs
When you are running in local mode you have 2 basic choices if you want to interact with a debugger. You can launch from within eclipse or other IDE, or you can setup a java debugger transport as part of the mapred.child.java.opts variable, and attach to the running jvm. By far the simplest is loading via eclipse. Your other alternative is to inform the framework to retain the job files via keep.failed.task.files (be careful here you will fill your disk with old dead data) and use the debug the IsolationRunner Examples in my book :) On Mon, Jun 15, 2009 at 6:49 PM, bharath vissapragada bharathvissapragada1...@gmail.com wrote: I am running in a local mode . Can you tell me how to set those breakpoints or how to access those files so that i can debug the program. The program is generating = java.lang.NumberFormatException: For input string: But that particular string is the one which is the input to the mapclass . So I think that it is not reading my input correctly .. But when i try to print the same .. it isn't printing to the STDOUT .. Iam using the FileInputFormat class FileInputFormat.addInputPath(conf, new Path(/home/rip/Desktop/hadoop-0.18.3/input)); FileOutputFormat.setOutputPath(conf, new Path(/home/rip/Desktop/hadoop-0.18.3/output)); input and output are folders for inp and outpt. It is generating these warnings also 09/06/16 12:38:32 WARN fs.FileSystem: local is a deprecated filesystem name. Use file:/// instead. Thanks in advance On Tue, Jun 16, 2009 at 3:50 AM, Aaron Kimball aa...@cloudera.com wrote: On Mon, Jun 15, 2009 at 10:01 AM, bharath vissapragada bhara...@students.iiit.ac.in wrote: Hi all , When running hadoop in local mode .. can we use print statements to print something to the terminal ... Yes. In distributed mode, each task will write its stdout/stderr to files which you can access through the web-based interface. Also iam not sure whether the program is reading my input files ... If i keep print statements it isn't displaying any .. can anyone tell me how to solve this problem. Is it generating exceptions? Are the files present? If you're running in local mode, you can use a debugger; set a breakpoint in your map() method and see if it gets there. How are you configuring the input files for your job? Thanks in adance, -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: MapContext.getInputSplit() returns nothing
So after squinting at this a bit I feel this is the format: Length of string: 00 23 String: 68 64 66 73 3A 2F 2F 6E 79 63 2D 71 77 73 2D 30 32 39 2F 69 6E 2D 64 69 72 2F 77 6F 72 64 73 2E 74 78 74 Start Offset: 00 00 00 00 00 00 00 00 Size: 00 00 00 00 00 02 C4 AC And this should be the split for file hdfs://nyc-qws-029/in-dir/words.txt from offset 0 to 181420. That said, is there some reason why this is the format? I don't want the deserialiser I write to break from one version of Hadoop to the next. Roshan On Tue, Jun 16, 2009 at 9:41 AM, Roshan James roshan.james.subscript...@gmail.com wrote: Why dont we convert input split information into the same string format that is displayed in the webUI? Something like this - hdfs://nyc-qws-029/in-dir/words86ac4a.txt:0+184185. Its a simple format and we can always parse such a string in C++. Is there some reason for the current binary format? If there is good reason for it, I am game to write such a deserialiser class. Is there some reference for this binary format that I can use to write the deserialiser? Roshan On Mon, Jun 15, 2009 at 5:40 PM, Owen O'Malley omal...@apache.org wrote: *Sigh* We need Avro for input splits. That is the expected behavior. It would be great if someone wrote a C++ FileInputSplit class that took a binary string and converted it back to a filename, offset, and length. -- Owen
Re: MapContext.getInputSplit() returns nothing
Sorry, I forget how much isn't clear to people who are just starting. FileInputFormat creates FileSplits. The serialization is very stable and can't be changed without breaking things. The reason that pipes can't stringify it is that the string form of input splits are ambiguous (and since it is user code, we really can't make assumptions about it). The format of FileSplit is: 16 bit filename byte length filename in bytes 64 bit offset 64 bit length Technically the filename uses a funky utf-8 encoding, but in practice as long as the filename has ascii characters they are ascii. Look at org.apache.hadoop.io.UTF.writeString for the precise definition. -- Owen
Re: Datanodes fail to start
On Tue, Jun 16, 2009 at 9:55 PM, jason hadoopjason.had...@gmail.com wrote: I often find myself editing the src/saveVersion.sh to fake out the version numbers, when I build a hadoop jar for the first time, and have to deploy it on an an already running cluster. That is not good solution
Re: Hadoop as Cloud Storage
Hey Wildan, HDFS is successfully storing well over 50TBs on a single cluster. It's meant to store data that will be analyzed in a MR job, but it can be used for archival storage. You'd probably consider deploying nodes with lots of disk space vs. lots of RAM and processor power. You'll want to do a cost analysis to determine if tape or HDFS is cheaper. That said, you should know a few things about HDFS: - Its read path is optimized for high throughput, and doesn't care as much about latency (read: it's got high latency relative to other file systems) - It's not meant for small files, so ideally your video files will be at least ~100MB each - It requires that the machines that makeup your cluster be running whenever you want to access or store data. (Note that HDFS survives if a small percentage of your nodes go down; it's built with fault tolerance in mind) I hope this clears things up. Let me know if you have any other questions. Alex On Tue, Jun 16, 2009 at 2:44 AM, W wilda...@gmail.com wrote: Dear Hadoop Guru's, After googling and find some information on using hadoop as cloud storage (long term). I have a problem to maintain lots of data (around 50 TB) much of them are TV Commercial (video files). I know, the best solution for long term file archiving is using tape backup, but i just curious, is hadoop can be used as 'data archiving' platform ? Thanks! Warm Regards, Wildan --- OpenThink Labs http://openthink-labs.tobethink.com/ Making IT, Business and Education in Harmony 087884599249 Y! : hawking_123 Linkedln : http://www.linkedin.com/in/wildanmaulana
Nor OOM Java Heap Space neither GC OverHead Limit Exeeceded
Hi All, I am running my mapred program in local mode by setting mapred.jobtracker.local to local mode so that I can debug my code. The mapred program is a direct porting of my original sequential code. There is no reduce phase. Basically, I have just put my program in the map class. My program takes around 1-2 min. in instantiating the data objects which are present in the constructor of Map class(it loads some data model files, therefore it takes some time). After the instantiation part in the constrcutor of Map class the map function is supposed to process the input split. The problem is that the data objects do not get instantiated completely and in between(whlie it is still in constructor) the program stops giving the exceptions pasted at bottom. The program runs fine without mapreduce and does not require more than 2GB memory, but in mapreduce even after doing export HADOOP_HEAPSIZE=2500(I am working on machines with 16GB RAM), the program fails. I have also set HADOOP_OPTS=-server -XX:-UseGCOverheadLimit as sometimes I was getting GC Overhead Limit Exceeded exceptions also. Somebody, please help me with this problem: I have trying to debug it for the last 3 days, but unsuccessful. Thanks! java.lang.OutOfMemoryError: Java heap space at sun.misc.FloatingDecimal.toJavaFormatString(FloatingDecimal.java:889) at java.lang.Double.toString(Double.java:179) at java.text.DigitList.set(DigitList.java:272) at java.text.DecimalFormat.format(DecimalFormat.java:584) at java.text.DecimalFormat.format(DecimalFormat.java:507) at java.text.NumberFormat.format(NumberFormat.java:269) at org.apache.hadoop.util.StringUtils.formatPercent(StringUtils.java:110) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1147) at LbjTagger.NerTagger.main(NerTagger.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) 09/06/16 12:34:41 WARN mapred.LocalJobRunner: job_local_0001 java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:79) ... 5 more Caused by: java.lang.ThreadDeath at java.lang.Thread.stop(Thread.java:715) at org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310) at org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:315) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1224) at LbjTagger.NerTagger.main(NerTagger.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) -- View this message in context: http://www.nabble.com/Nor-%22OOM-Java-Heap-Space%22-neither-%22GC-OverHead-Limit-Exeeceded%22-tp24059508p24059508.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Restricting quota for users in HDFS
On 6/15/09 11:16 PM, Palleti, Pallavi pallavi.pall...@corp.aol.com wrote: We have chown command in hadoop dfs to make a particular directory own by a person. Do we have something similar to create user with some space limit/restrict the disk usage by a particular user? Quotas are implemented on a per-directory basis, not per-user. There is no support for this user can have X space, regardless of where he/she writes only this directory has a limit of X space, regardless of who writes there.
Announcing CloudBase-1.3.1 release
Hi, We have released 1.3.1 version of CloudBase on sourceforge- https://sourceforge.net/projects/cloudbase CloudBase is a data warehouse system for Terabyte Petabyte scale analytics. It is built on top of Map-Reduce architecture. It allows you to query flat log files using ANSI SQL. Please give it a try and send us your feedback. Thanks, Yanbo Release notes - New Features: * CREATE CSV tables - One can create tables on top of data in CSV (Comma Separated Values) format and query them using SQL. Current implementation doesn't accept CSV records which span multiple lines. Data may not be processed correctly if a field contains embedded line-breaks. Please visit http://cloudbase.sourceforge.net/index.html#userDoc for detailed specification of the CSV format. Bug fixes: * Aggregate function 'AVG' returns the same value as 'SUM' function * If a query has multiple aliases, only the last alias works
Re: Can I share datas for several map tasks?
I can't get your book, so can you give me a few more words to describe the solution? very appreciate. -snowloong On Tue, Jun 16, 2009 at 9:51 PM, jason hadoop jason.had...@gmail.comwrote: In the examples for my book is a jvm reuse with static data shared between jvm's example On Tue, Jun 16, 2009 at 1:08 AM, Hello World snowlo...@gmail.com wrote: Thanks for your reply. Can you do me a favor to make a check? I modified mapred-default.xml as follows: 540 property 541 namemapred.job.reuse.jvm.num.tasks/name 542 value-1/value 543 descriptionHow many tasks to run per jvm. If set to -1, there is 544 no limit. 545 /description 546 /property And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop; This is my program: 17 public class WordCount { 18 19 public static class TokenizerMapper 20extends MapperObject, Text, Text, IntWritable{ 21 22 private final static IntWritable one = new IntWritable(1); 23 private Text word = new Text(); 24 public static int[] ToBeSharedData = new int[1024 * 1024 * 16]; 25 26 protected void setup(Context context 27 ) throws IOException, InterruptedException { 28 //Init shared data 29 ToBeSharedData[0] = 12345; 30 System.out.println(setup shared data[0] = + ToBeSharedData[0]); 31 } 32 33 public void map(Object key, Text value, Context context 34 ) throws IOException, InterruptedException { 35 StringTokenizer itr = new StringTokenizer(value.toString()); 36 while (itr.hasMoreTokens()) { 37 word.set(itr.nextToken()); 38 context.write(word, one); 39 } 40 System.out.println(read shared data[0] = + ToBeSharedData[0]); 41 } 42 } First, can you tell me how to make sure jvm reuse is taking effect, for I didn't see anything different from before. I use top command under linux and see the same number of java processes and same memory usage. Second, can you tell me how to make the ToBeSharedData be inited only once and can be read from other MapTasks on the same node? Or this is not a suitable programming style for map-reduce? By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a single-node. thanks in advance On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal shara...@yahoo-inc.com wrote: snowloong wrote: Hi, I want to share some data structures for the map tasks on a same node(not through files), I mean, if one map task has already initialized some data structures (e.g. an array or a list), can other map tasks share these memorys and directly access them, for I don't want to reinitialize these datas and I want to save some memory. Can hadoop help me do this? You can enable jvm reuse across tasks. See mapred.job.reuse.jvm.num.tasks in mapred-default.xml for usage. Then you can cache the data in a static variable in your mapper. - Sharad -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals
Re: Nor OOM Java Heap Space neither GC OverHead Limit Exeeceded
Is it possible that your map class is an inner class and not static? On Tue, Jun 16, 2009 at 10:51 AM, akhil1988 akhilan...@gmail.com wrote: Hi All, I am running my mapred program in local mode by setting mapred.jobtracker.local to local mode so that I can debug my code. The mapred program is a direct porting of my original sequential code. There is no reduce phase. Basically, I have just put my program in the map class. My program takes around 1-2 min. in instantiating the data objects which are present in the constructor of Map class(it loads some data model files, therefore it takes some time). After the instantiation part in the constrcutor of Map class the map function is supposed to process the input split. The problem is that the data objects do not get instantiated completely and in between(whlie it is still in constructor) the program stops giving the exceptions pasted at bottom. The program runs fine without mapreduce and does not require more than 2GB memory, but in mapreduce even after doing export HADOOP_HEAPSIZE=2500(I am working on machines with 16GB RAM), the program fails. I have also set HADOOP_OPTS=-server -XX:-UseGCOverheadLimit as sometimes I was getting GC Overhead Limit Exceeded exceptions also. Somebody, please help me with this problem: I have trying to debug it for the last 3 days, but unsuccessful. Thanks! java.lang.OutOfMemoryError: Java heap space at sun.misc.FloatingDecimal.toJavaFormatString(FloatingDecimal.java:889) at java.lang.Double.toString(Double.java:179) at java.text.DigitList.set(DigitList.java:272) at java.text.DecimalFormat.format(DecimalFormat.java:584) at java.text.DecimalFormat.format(DecimalFormat.java:507) at java.text.NumberFormat.format(NumberFormat.java:269) at org.apache.hadoop.util.StringUtils.formatPercent(StringUtils.java:110) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1147) at LbjTagger.NerTagger.main(NerTagger.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) 09/06/16 12:34:41 WARN mapred.LocalJobRunner: job_local_0001 java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:81) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:79) ... 5 more Caused by: java.lang.ThreadDeath at java.lang.Thread.stop(Thread.java:715) at org.apache.hadoop.mapred.LocalJobRunner.killJob(LocalJobRunner.java:310) at org.apache.hadoop.mapred.JobClient$NetworkedJob.killJob(JobClient.java:315) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1224) at LbjTagger.NerTagger.main(NerTagger.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) -- View this message in context:
Re: Nor OOM Java Heap Space neither GC OverHead Limit Exeeceded
One more thing, finally it terminates there (after some time) by giving the final Exception: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217) at LbjTagger.NerTagger.main(NerTagger.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) akhil1988 wrote: Thank you Jason for your reply. My Map class is an inner class and it is a static class. Here is the structure of my code. public class NerTagger { public static class Map extends MapReduceBase implements MapperLongWritable, Text, Text, Text{ private Text word = new Text(); private static NETaggerLevel1 tagger1 = new NETaggerLevel1(); private static NETaggerLevel2 tagger2 = new NETaggerLevel2(); Map(){ System.out.println(HI2\n); Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); System.out.println(HI3\n); Parameters.forceNewSentenceOnLineBreaks=Boolean.parseBoolean(true); System.out.println(loading the tagger); tagger1=(NETaggerLevel1)Classifier.binaryRead(Parameters.pathToModelFile+.level1); System.out.println(HI5\n); tagger2=(NETaggerLevel2)Classifier.binaryRead(Parameters.pathToModelFile+.level2); System.out.println(Done- loading the tagger); } public void map(LongWritable key, Text value, OutputCollectorText, Text output, Reporter reporter ) throws IOException { String inputline = value.toString(); /* Processing of the input pair is done here */ } public static void main(String [] args) throws Exception { JobConf conf = new JobConf(NerTagger.class); conf.setJobName(NerTagger); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setNumReduceTasks(0); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); conf.set(mapred.job.tracker, local); conf.set(fs.default.name, file:///); DistributedCache.addCacheFile(new URI(/home/akhil1988/Ner/OriginalNer/Data/), conf); DistributedCache.addCacheFile(new URI(/home/akhil1988/Ner/OriginalNer/Config/), conf); DistributedCache.createSymlink(conf); conf.set(mapred.child.java.opts,-Xmx4096m); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); System.out.println(HI1\n); JobClient.runJob(conf); } Jason, when the program executes HI1 and HI2 are printed but it does not reaches HI3. In the statement Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); it is able to access Config/allLayer1.config file (as while executing this statement, it prints some messages like which data it is loading, etc.) but it gets stuck there(while loading some classifier) and never reaches HI3. This program runs fine when executed normally(without mapreduce). Thanks, Akhil jason hadoop wrote: Is it possible that your map class is an inner class and not static? On Tue, Jun 16, 2009 at 10:51 AM, akhil1988 akhilan...@gmail.com wrote: Hi All, I am running my mapred program in local mode by setting mapred.jobtracker.local to local mode so that I can debug my code. The mapred program is a direct porting of my original sequential code. There is no reduce phase. Basically, I have just put my program in the map class. My program takes around 1-2 min. in instantiating the data objects which are present in the constructor of Map class(it loads some data model files, therefore it takes some time). After the instantiation part in the constrcutor of Map class the map function is supposed to process the input split. The problem is that the data objects do not get instantiated
Re: Can I share datas for several map tasks?
Thank you, Jason. I found the example. So, is there a way to share the same JVM between different jobs? From: jason hadoop jason.had...@gmail.com To: core-user@hadoop.apache.org Sent: Tuesday, June 16, 2009 7:22:16 PM Subject: Re: Can I share datas for several map tasks? in the example code, download bundle, in the package com.apress.hadoopbook.examples.advancedtechniques, is the class JVMReuseAndStaticInitializers.java which demonstrates sharing data between instances using jvm reuse. I built this to prove to myself that it was possible. It never got an actual write up in the book itself. On Tue, Jun 16, 2009 at 6:55 PM, Hello World snowlo...@gmail.com wrote: I can't get your book, so can you give me a few more words to describe the solution? very appreciate. -snowloong On Tue, Jun 16, 2009 at 9:51 PM, jason hadoop jason.had...@gmail.com wrote: In the examples for my book is a jvm reuse with static data shared between jvm's example On Tue, Jun 16, 2009 at 1:08 AM, Hello World snowlo...@gmail.com wrote: Thanks for your reply. Can you do me a favor to make a check? I modified mapred-default.xml as follows: 540 property 541 namemapred.job.reuse.jvm.num.tasks/name 542 value-1/value 543 descriptionHow many tasks to run per jvm. If set to -1, there is 544 no limit. 545 /description 546 /property And execute bin/stop-all.sh; bin/start-all.sh to restart hadoop; This is my program: 17 public class WordCount { 18 19 public static class TokenizerMapper 20 extends MapperObject, Text, Text, IntWritable{ 21 22 private final static IntWritable one = new IntWritable(1); 23 private Text word = new Text(); 24 public static int[] ToBeSharedData = new int[1024 * 1024 * 16]; 25 26 protected void setup(Context context 27 ) throws IOException, InterruptedException { 28 //Init shared data 29 ToBeSharedData[0] = 12345; 30 System.out.println(setup shared data[0] = + ToBeSharedData[0]); 31 } 32 33 public void map(Object key, Text value, Context context 34 ) throws IOException, InterruptedException { 35 StringTokenizer itr = new StringTokenizer(value.toString()); 36 while (itr.hasMoreTokens()) { 37 word.set(itr.nextToken()); 38 context.write(word, one); 39 } 40 System.out.println(read shared data[0] = + ToBeSharedData[0]); 41 } 42 } First, can you tell me how to make sure jvm reuse is taking effect, for I didn't see anything different from before. I use top command under linux and see the same number of java processes and same memory usage. Second, can you tell me how to make the ToBeSharedData be inited only once and can be read from other MapTasks on the same node? Or this is not a suitable programming style for map-reduce? By the way, I'm using hadoop-0.20.0, in pseudo-distributed mode on a single-node. thanks in advance On Tue, Jun 16, 2009 at 1:48 PM, Sharad Agarwal shara...@yahoo-inc.com wrote: snowloong wrote: Hi, I want to share some data structures for the map tasks on a same node(not through files), I mean, if one map task has already initialized some data structures (e.g. an array or a list), can other map tasks share these memorys and directly access them, for I don't want to reinitialize these datas and I want to save some memory. Can hadoop help me do this? You can enable jvm reuse across tasks. See mapred.job.reuse.jvm.num.tasks in mapred-default.xml for usage. Then you can cache the data in a static variable in your mapper. - Sharad -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.apress.com/book/view/9781430219422 www.prohadoopbook.com a community for Hadoop Professionals -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: Nor OOM Java Heap Space neither GC OverHead Limit Exeeceded
Something is happening inside of your (Parameters. readConfigAndLoadExternalData(Config/allLayer1.config);) code, and the framework is killing the job for not heartbeating for 600 seconds On Tue, Jun 16, 2009 at 8:32 PM, akhil1988 akhilan...@gmail.com wrote: One more thing, finally it terminates there (after some time) by giving the final Exception: java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1217) at LbjTagger.NerTagger.main(NerTagger.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:165) at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68) akhil1988 wrote: Thank you Jason for your reply. My Map class is an inner class and it is a static class. Here is the structure of my code. public class NerTagger { public static class Map extends MapReduceBase implements MapperLongWritable, Text, Text, Text{ private Text word = new Text(); private static NETaggerLevel1 tagger1 = new NETaggerLevel1(); private static NETaggerLevel2 tagger2 = new NETaggerLevel2(); Map(){ System.out.println(HI2\n); Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); System.out.println(HI3\n); Parameters.forceNewSentenceOnLineBreaks=Boolean.parseBoolean(true); System.out.println(loading the tagger); tagger1=(NETaggerLevel1)Classifier.binaryRead(Parameters.pathToModelFile+.level1); System.out.println(HI5\n); tagger2=(NETaggerLevel2)Classifier.binaryRead(Parameters.pathToModelFile+.level2); System.out.println(Done- loading the tagger); } public void map(LongWritable key, Text value, OutputCollectorText, Text output, Reporter reporter ) throws IOException { String inputline = value.toString(); /* Processing of the input pair is done here */ } public static void main(String [] args) throws Exception { JobConf conf = new JobConf(NerTagger.class); conf.setJobName(NerTagger); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setNumReduceTasks(0); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); conf.set(mapred.job.tracker, local); conf.set(fs.default.name, file:///); DistributedCache.addCacheFile(new URI(/home/akhil1988/Ner/OriginalNer/Data/), conf); DistributedCache.addCacheFile(new URI(/home/akhil1988/Ner/OriginalNer/Config/), conf); DistributedCache.createSymlink(conf); conf.set(mapred.child.java.opts,-Xmx4096m); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); System.out.println(HI1\n); JobClient.runJob(conf); } Jason, when the program executes HI1 and HI2 are printed but it does not reaches HI3. In the statement Parameters.readConfigAndLoadExternalData(Config/allLayer1.config); it is able to access Config/allLayer1.config file (as while executing this statement, it prints some messages like which data it is loading, etc.) but it gets stuck there(while loading some classifier) and never reaches HI3. This program runs fine when executed normally(without mapreduce). Thanks, Akhil jason hadoop wrote: Is it possible that your map class is an inner class and not static? On Tue, Jun 16, 2009 at 10:51 AM, akhil1988 akhilan...@gmail.com wrote: Hi All, I am running my mapred program in local mode by setting mapred.jobtracker.local to local mode so that I can debug my code. The mapred program is a direct porting of my original sequential code. There is no reduce phase. Basically, I have just put my program in the map class. My program takes around 1-2 min. in instantiating the data objects which are present in the constructor of Map class(it loads some
Re: org.apache.hadoop.ipc.client : trying connect to server failed
Hi, I faced the same problem. Try deleting the hadoop pids from the logs directory. That worked for me. Thanks, Richa On Mon, Jun 15, 2009 at 10:28 PM, ashish pareek pareek...@gmail.com wrote: HI , I am trying to step up a hadoop cluster on 3GB machine and using hadoop 0.18.3 and have followed procedure given in apache hadoop site for hadoop cluster. In conf/slaves I have added two datanode i.e including the namenode vitrual machine and other machine virtual machine (datanode) . and have set up passwordless ssh between both virtual machines . But now problem is when I run command : bin/hadoop start-all.sh It start only one datanode on the same namenode vitrual machine but it doesn't start the datanode on other machine. in logs/hadoop-datanode.log i get message INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop1/192.168.1.28:9000. Already tried 1 time(s). 2009-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop1/192.168.1.28:9000. Already tried 2 time(s). 2009-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hadoop1/192.168.1.28:9000. Already tried 3 time(s). . . . . . . . . . . . . I have tried formatting and start the cluster again .but still I get the same error. So can any one help in solving this problem. :) Thanks Regards Ashish Pareek -- Richa Khandelwal University of California, Santa Cruz CA
[ANN] HBase 0.20.0-alpha available for download
An alpha version of HBase 0.20.0 is available for download at: http://people.apache.org/~stack/hbase-0.20.0-alpha/ We are making this release available to preview what is coming in HBase 0.20.0. In short, 0.20.0 is about performance and high-availability. Also, a new, richer API has been added and the old deprecated. Here is a list of almost 300 issues addressed so far in 0.20.0: http://tinyurl.com/ntvheo This alpha release contains known bugs. See http://tinyurl.com/kvfsft for the current list. In particular, this alpha release is without a migration script to bring your 0.19.x era data forward to work on hbase 0.20.0. A working, well-tested migration script will be in place before we cut the first HBase 0.20.0 release candidate some time in the next week or so. After download, please take the time to review the 0.20.0 'Getting Started' also available here: http://people.apache.org/~stack/hbase-0.20.0-alpha/docs/api/overview-summary.html#overview_description. HBase 0.20.0 has new dependencies, in particular it now depends on ZooKeeper. With ZooKeeper in the mix a few core HBase configurations have been removed and replaced with ZooKeeper configurations instead. Also of note, HBase 0.20.0 will include Stargate, an improved REST connector for HBase. The old, bundled REST connector will be deprecated. Stargate is implemented using the Jersey framework. It includes protobuf encoding support, has caching proxy awareness, supports batching for scanners and updates, and in general has the goal of enabling Web scale storage systems (a la S3) backed by HBase. Currently its only available up on github, http://github.com/macdiesel/stargate/tree/master. It will be added to a new contrib directory before we cut a release candidate. Please let us know if you have difficulty with the install, if you find the documentation missing or, if you trip over bugs hbasing. Yours, The HBasistas
Re: [ANN] HBase 0.20.0-alpha available for download
Oh sweet. This will be a most excellent party. On Tue, Jun 16, 2009 at 10:23 PM, stackst...@duboce.net wrote: An alpha version of HBase 0.20.0 is available for download at: http://people.apache.org/~stack/hbase-0.20.0-alpha/ We are making this release available to preview what is coming in HBase 0.20.0. In short, 0.20.0 is about performance and high-availability. Also, a new, richer API has been added and the old deprecated. Here is a list of almost 300 issues addressed so far in 0.20.0: http://tinyurl.com/ntvheo This alpha release contains known bugs. See http://tinyurl.com/kvfsft for the current list. In particular, this alpha release is without a migration script to bring your 0.19.x era data forward to work on hbase 0.20.0. A working, well-tested migration script will be in place before we cut the first HBase 0.20.0 release candidate some time in the next week or so. After download, please take the time to review the 0.20.0 'Getting Started' also available here: http://people.apache.org/~stack/hbase-0.20.0-alpha/docs/api/overview-summary.html#overview_description. HBase 0.20.0 has new dependencies, in particular it now depends on ZooKeeper. With ZooKeeper in the mix a few core HBase configurations have been removed and replaced with ZooKeeper configurations instead. Also of note, HBase 0.20.0 will include Stargate, an improved REST connector for HBase. The old, bundled REST connector will be deprecated. Stargate is implemented using the Jersey framework. It includes protobuf encoding support, has caching proxy awareness, supports batching for scanners and updates, and in general has the goal of enabling Web scale storage systems (a la S3) backed by HBase. Currently its only available up on github, http://github.com/macdiesel/stargate/tree/master. It will be added to a new contrib directory before we cut a release candidate. Please let us know if you have difficulty with the install, if you find the documentation missing or, if you trip over bugs hbasing. Yours, The HBasistas
Problem in viewing WEB UI
Hi, When I run command *bin/hadoop dfsadmin -report *it shows that 2 datanodes are alive but when i try to http://hadoopmster:50070/ but the problem is that it opens doesnot opne http://hadoopmaster:50070/dfshealth.jsp page and throws *error HTTP: 404 . So why is't happening like this? * Regards, Ashish Pareek On Wed, Jun 17, 2009 at 10:06 AM, Sugandha Neaolekar sugandha@gmail.com wrote: Well, You just have to specify the address in the URL address bar as:: http://hadoopmaster:50070 U'll be able to see the web UI..! On Tue, Jun 16, 2009 at 7:17 PM, ashish pareek pareek...@gmail.comwrote: HI Sugandha, Hmmm your suggestion helped and Now I am able to run two datanode one on the same machine as name node and other on the different machine Thanks a lot :) But the problem is now I am not able to see web UI . for both datanode and as well as name node should I have to consider some more things in the site.xml ? if so please help... Thanking you again, regards, Ashish Pareek. On Tue, Jun 16, 2009 at 3:10 PM, Sugandha Naolekar sugandha@gmail.com wrote: hi,,! First of all, get your concepts clear of hadoop. You can refer to the following site:: http://www.google.co.in/url?sa=tsource=webct=rescd=1url=http%3A%2F%2Fwww.michael-noll.com%2Fwiki%2FRunning_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)ei=lGU3Spv2FZbLjAe19KmiDQusg=AFQjCNFbmVGsoChOSMzCB3tRhoV0ylHOzAsig2=t2AJ_nf24SFtveN4PHS_TAhttp://www.google.co.in/url?sa=tsource=webct=rescd=1url=http%3A%2F%2Fwww.michael-noll.com%2Fwiki%2FRunning_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29ei=lGU3Spv2FZbLjAe19KmiDQusg=AFQjCNFbmVGsoChOSMzCB3tRhoV0ylHOzAsig2=t2AJ_nf24SFtveN4PHS_TA I have small doubt whether in the mater.xml and slave.xml we can have same port numbers to both of them like for slave : property namefs.default.name/name valuehdfs://hadoopslave: 9000/value /property for master::: property namefs.default.name/name valuehdfs://hadoopmaster:9000/value /property Well, any two daemons or services can run on the same port unless, they are not run on the same machine.If you wish to run DN and NN on the same machine, their port numbers have to be different. On Tue, Jun 16, 2009 at 2:55 PM, ashish pareek pareek...@gmail.comwrote: HI sugandha, and one more thing can we have in slave::: property namedfs.datanode.address/ name valuehadoopmaster:9000/value valuehadoopslave:9001/value /property Also, fs,default.name is the tag which specifies the default filesystem. And generaLLY, it is run on namenode. So, it;s value has to be a namenode's address only and not slave's. Else if you have complete procedure for installing and running Hadoop in cluster can you please send it to me .. I need to step up hadoop with in two days and show it to my guide.Currently I am doing my masters. Thanks for your spending time Try for the above, and this should work! regards, Ashish Pareek On Tue, Jun 16, 2009 at 2:33 PM, Sugandha Naolekar sugandha@gmail.com wrote: Following changes are to be done:: Under master folder:: - put slaves address as well under the values of tag(dfs.datanode.address) - You want to make namenode as datanode as well. As per your config file, you have specified hadoopmaster in your slave file. If you don't want that, remove ti from slaves file. UNder slave folder:: - put only slave's (the m/c where you intend to run your datanode)'s address.under datanode.address tag. Else it should go as such:: property namedfs.datanode.address/name valuehadoopmaster:9000/value valuehadoopslave:9001/value /property Also, your port numbers hould be different. the daemons NN,DN,JT,TT should run independently on different ports. On Tue, Jun 16, 2009 at 2:05 PM, Sugandha Naolekar sugandha@gmail.com wrote: -- Forwarded message -- From: ashish pareek pareek...@gmail.com Date: Tue, Jun 16, 2009 at 2:00 PM Subject: Re: org.apache.hadoop.ipc.client : trying connect to server failed To: Sugandha Naolekar sugandha@gmail.com On Tue, Jun 16, 2009 at 1:58 PM, ashish pareek pareek...@gmail.comwrote: HI , I am sending .tar.gz format containing both master and datanode config files ... Regards, Ashish Pareek On Tue, Jun 16, 2009 at 1:47 PM, Sugandha Naolekar sugandha@gmail.com wrote: can u pls send me a zip or a tar file? I don't have windows systems but, linux On Tue, Jun 16, 2009 at 1:19 PM, ashish pareek pareek...@gmail.com wrote: HI Sungandha , Thanks for your reply I am sending you master and slave configuration files if you can go through it and tell me where I am going wrong it would be helpful. Hope to get a reply soon ... Thanks again! Regards, Ashish Pareek On Tue, Jun 16,