Re: Save configuration data in job configuration file.
This does not save in the xml file. I think this just keep the variable in memory. On 19 January 2013 18:48, Arun C Murthy a...@hortonworks.com wrote: jobConf.set(String, String)? -- Best regards,
Re: Save configuration data in job configuration file.
The MR framework saves it into the job.xml before it sends it for execution. If you're asking about a way to save the config object into the XML file, use http://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html#writeXml(java.io.Writer)or similar APIs. On Sun, Jan 20, 2013 at 4:41 PM, Pedro Sá da Costa psdc1...@gmail.comwrote: This does not save in the xml file. I think this just keep the variable in memory. On 19 January 2013 18:48, Arun C Murthy a...@hortonworks.com wrote: jobConf.set(String, String)? -- Best regards, -- Harsh J
Re: Prolonged safemode
On Sun, Jan 20, 2013 at 7:50 AM, Mohammad Tariq donta...@gmail.com wrote: Hey Jean, Feels good to hear that ;) I don't have to feel myself a solitary yonker anymore. Since I am working on a single node, the problem becomes more sever. I don't have any other node where MR files could get replicated. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Sun, Jan 20, 2013 at 5:08 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Tariq, I often have to force HDFS to go out of safe mode manually when I restart my cluster (or after power outage) I never tought about reporting that ;) I'm using hadoop-1.0.3. I think it was because of the MR files still not replicated on enought nodes. But not 100% sure. JM 2013/1/19, Mohammad Tariq donta...@gmail.com: Hello list, I have a pseudo distributed setup on my laptop. Everything was working fine untill now. But lately HDFS has started taking a lot of time to leave the safemode. Infact, I have to it manuaaly most of the times as TT and Hbase daemons get disturbed because of this. I am using hadoop-1.0.4. Is it a problem with this version? I have never faced any such issue with older versions. Or, is something going wrong on my side?? Thank you so much for your precious time. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com
Re: Prolonged safemode
Check integrity of the file system, and check the replication factor, by mistake if default is left as 3 or so. if you have hbase configured check hbck if everything is fine with the cluster. ∞ Shashwat Shriparv On Sun, Jan 20, 2013 at 3:09 PM, xin jiang jiangxin1...@gmail.com wrote: On Sun, Jan 20, 2013 at 7:50 AM, Mohammad Tariq donta...@gmail.comwrote: Hey Jean, Feels good to hear that ;) I don't have to feel myself a solitary yonker anymore. Since I am working on a single node, the problem becomes more sever. I don't have any other node where MR files could get replicated. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Sun, Jan 20, 2013 at 5:08 AM, Jean-Marc Spaggiari jean-m...@spaggiari.org wrote: Hi Tariq, I often have to force HDFS to go out of safe mode manually when I restart my cluster (or after power outage) I never tought about reporting that ;) I'm using hadoop-1.0.3. I think it was because of the MR files still not replicated on enought nodes. But not 100% sure. JM 2013/1/19, Mohammad Tariq donta...@gmail.com: Hello list, I have a pseudo distributed setup on my laptop. Everything was working fine untill now. But lately HDFS has started taking a lot of time to leave the safemode. Infact, I have to it manuaaly most of the times as TT and Hbase daemons get disturbed because of this. I am using hadoop-1.0.4. Is it a problem with this version? I have never faced any such issue with older versions. Or, is something going wrong on my side?? Thank you so much for your precious time. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com
Re: Prolonged safemode
If your DN is starting too slow, then you should investigate why. In any case, Apache Bigtop's (http://bigtop.apache.org) pseudo-distributed configs provide good values for 1-node setups. In your case, you seem to be missing dfs.safemode.min.datanodes set to 1, and dfs.safemode.extension set to 0. On Sun, Jan 20, 2013 at 4:05 AM, Mohammad Tariq donta...@gmail.com wrote: Hello list, I have a pseudo distributed setup on my laptop. Everything was working fine untill now. But lately HDFS has started taking a lot of time to leave the safemode. Infact, I have to it manuaaly most of the times as TT and Hbase daemons get disturbed because of this. I am using hadoop-1.0.4. Is it a problem with this version? I have never faced any such issue with older versions. Or, is something going wrong on my side?? Thank you so much for your precious time. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com -- Harsh J
Re: Prolonged safemode
Hi Tariq, When you start your namenode,Is it able to come out of Safemode Automatically. If no then there are under replicated blocks or corrupted blocks where namenode is trying to fetch it. Try to remove corrupted blocks. Regards, Varun Kumar.P On Sun, Jan 20, 2013 at 4:05 AM, Mohammad Tariq donta...@gmail.com wrote: Hello list, I have a pseudo distributed setup on my laptop. Everything was working fine untill now. But lately HDFS has started taking a lot of time to leave the safemode. Infact, I have to it manuaaly most of the times as TT and Hbase daemons get disturbed because of this. I am using hadoop-1.0.4. Is it a problem with this version? I have never faced any such issue with older versions. Or, is something going wrong on my side?? Thank you so much for your precious time. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com -- Regards, Varun Kumar.P
Re: Prolonged safemode
Hello Varun, Thank you so much for your reply. In most of the cases, it is not. But apart from that everything seems to be fine. I am not getting any notification about under replicated blocks or corrupted blocks. I will do a recheck though. Thank you. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Sun, Jan 20, 2013 at 5:43 PM, Mohammad Tariq donta...@gmail.com wrote: Thank you so much for the valuable reply Harsh. I'll look into it. One quick question, why it it happening with 1.0.4? Is there any compulsion to set these two props, you have specified above. Earlier version were doing absolutely fine without these props? I am Sorry to be a pest of questions. But, I am kinda curious about this. Thank you so much. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Sun, Jan 20, 2013 at 4:29 PM, Harsh J ha...@cloudera.com wrote: If your DN is starting too slow, then you should investigate why. In any case, Apache Bigtop's (http://bigtop.apache.org) pseudo-distributed configs provide good values for 1-node setups. In your case, you seem to be missing dfs.safemode.min.datanodes set to 1, and dfs.safemode.extension set to 0. On Sun, Jan 20, 2013 at 4:05 AM, Mohammad Tariq donta...@gmail.comwrote: Hello list, I have a pseudo distributed setup on my laptop. Everything was working fine untill now. But lately HDFS has started taking a lot of time to leave the safemode. Infact, I have to it manuaaly most of the times as TT and Hbase daemons get disturbed because of this. I am using hadoop-1.0.4. Is it a problem with this version? I have never faced any such issue with older versions. Or, is something going wrong on my side?? Thank you so much for your precious time. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com -- Harsh J
Re: On a lighter note
Oh yeah Alex. Thank God that we have a German expert as well ;) Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Sun, Jan 20, 2013 at 1:28 PM, Alexander Alten-Lorenz wget.n...@gmail.com wrote: Actually Der Untergang ;) Alexander Alten-Lorenz http://mapredit.blogspot.com Twitter: @mapredit German Hadoop LinkedIn Group: http://goo.gl/N8pCF On Jan 18, 2013, at 23:18, Ted Dunning tdunn...@maprtech.com wrote: Well, I think the actual name was untergang. Same meaning. Sent from my iPhone On Jan 17, 2013, at 8:09 PM, Mohammad Tariq donta...@gmail.com wrote: You are right Michael, as always :) Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Fri, Jan 18, 2013 at 6:33 AM, Michael Segel michael_se...@hotmail.comwrote: I'm thinking 'Downfall' But I could be wrong. On Jan 17, 2013, at 6:56 PM, Yongzhi Wang wang.yongzhi2...@gmail.com wrote: Who can tell me what is the name of the original film? Thanks! Yongzhi On Thu, Jan 17, 2013 at 3:05 PM, Mohammad Tariq donta...@gmail.comwrote: I am sure you will suffer from severe stomach ache after watching this :) http://www.youtube.com/watch?v=hEqQMLSXQlY Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com
Re: Prolonged safemode
I am not aware of a direct regression in DN startup slowdown or block report slowdown; its hard to tell what exactly the regression is without more notes or logs on behavior. On Sun, Jan 20, 2013 at 5:43 PM, Mohammad Tariq donta...@gmail.com wrote: Thank you so much for the valuable reply Harsh. I'll look into it. One quick question, why it it happening with 1.0.4? Is there any compulsion to set these two props, you have specified above. Earlier version were doing absolutely fine without these props? I am Sorry to be a pest of questions. But, I am kinda curious about this. Thank you so much. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Sun, Jan 20, 2013 at 4:29 PM, Harsh J ha...@cloudera.com wrote: If your DN is starting too slow, then you should investigate why. In any case, Apache Bigtop's (http://bigtop.apache.org) pseudo-distributed configs provide good values for 1-node setups. In your case, you seem to be missing dfs.safemode.min.datanodes set to 1, and dfs.safemode.extension set to 0. On Sun, Jan 20, 2013 at 4:05 AM, Mohammad Tariq donta...@gmail.comwrote: Hello list, I have a pseudo distributed setup on my laptop. Everything was working fine untill now. But lately HDFS has started taking a lot of time to leave the safemode. Infact, I have to it manuaaly most of the times as TT and Hbase daemons get disturbed because of this. I am using hadoop-1.0.4. Is it a problem with this version? I have never faced any such issue with older versions. Or, is something going wrong on my side?? Thank you so much for your precious time. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com -- Harsh J -- Harsh J
Fair Scheduler of Hadoop
Hi guys, I have a quick question regarding to fire scheduler of Hadoop, I am reading this article = http://blog.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/, my question is from the following statements, There is currently no support for preemption of long tasks, but this is being added in HADOOP-4665https://issues.apache.org/jira/browse/HADOOP-4665, which will allow you to set how long each pool will wait before preempting other jobs’ tasks to reach its guaranteed capacity.. My questions are, 1. What means preemption of long tasks? Kill long running tasks, or pause long running tasks to give resources to other tasks, or it means something else? 2. I am also confused about set how long each pool will wait before preempting other jobs’ tasks to reach its guaranteed capacity., what means reach its guaranteed capacity? I think when using fair scheduler, each pool has predefined resources allocation settings (and the settings guarantees each pool has resources as configured), is that true? In what situations each pool will not have its guaranteed (or configured) capacity? regards, Lin
Using JCUDA with MapReduce
Hi all, I was wondering if anyone here tried using the GPU of a Hadoop Node to enhance MapReduce processing ? I read about it but it always comes down to heavy computations such as Matrix multiplications and Mote Carlo algorithms. Did anyone try it with MapReduce jobs that analyze logs or any other text mining examples ? Is there a trade-off here (guess there is) between data size/complexity and the computation required ? Thanks, Amit.
Fwd: new join algorithm using mapreduce
-- Forwarded message -- From: Vikas Jadhav vikascjadha...@gmail.com Date: Sat, Jan 19, 2013 at 10:58 PM Subject: new join algorithm using mapreduce To: user@hadoop.apache.org I am writing new join algorithm using hadoop and want to do multi way join in single mapreduce job map -- processes all dataset reduce-- join of all dataset + aggregate operation second mapreduce job will collect result from multple reducer files of first job I am pretty clear about map phase but dnt have idea how to process all dataset in single reduce. So how i should proceed and do i need to modify code of mapreduce Thanks -- * * * Thanx and Regards* * Vikas Jadhav* -- * * * Thanx and Regards* * Vikas Jadhav*
Re: Fair Scheduler of Hadoop
Lin, The article you are reading us old. Fair scheduler does have preemption. Tasks get killed and rerun later, potentially on a different node. You can set a minimum / guaranteed capacity. The sum of those across pools would typically equal the total capacity of your cluster or less. Then you can configure each pool to go beyond that capacity. That would happen if the cluster is temporary not used to the full capacity. Then when the demand for capacity increases, and jobs are queued in other pools that are not running at their minimum guaranteed capacity, some long running tasks from jobs in the pool that is using more than its minimum capacity get killed (to be run later again). Does that make sense? Cheers, Joep Sent from my iPhone On Jan 20, 2013, at 6:25 AM, Lin Ma lin...@gmail.com wrote: Hi guys, I have a quick question regarding to fire scheduler of Hadoop, I am reading this article = http://blog.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/, my question is from the following statements, There is currently no support for preemption of long tasks, but this is being added in HADOOP-4665, which will allow you to set how long each pool will wait before preempting other jobs’ tasks to reach its guaranteed capacity.. My questions are, 1. What means preemption of long tasks? Kill long running tasks, or pause long running tasks to give resources to other tasks, or it means something else? 2. I am also confused about set how long each pool will wait before preempting other jobs’ tasks to reach its guaranteed capacity., what means reach its guaranteed capacity? I think when using fair scheduler, each pool has predefined resources allocation settings (and the settings guarantees each pool has resources as configured), is that true? In what situations each pool will not have its guaranteed (or configured) capacity? regards, Lin
Re: How to copy log files from remote windows machine to Hadoop cluster
Hi Mirko, Thanks for your reply. It works for me as well. Now I was able to mount the folder on the master node and configured Flume such that it can either poll for logs in real time or even for periodic retrieval. Thanks, Mahesh Balija. Calsof Labs. On Thu, Jan 17, 2013 at 5:01 PM, Mirko Kämpf mirko.kae...@gmail.com wrote: One approach I used in my lab was the data-gateway, which is a small linux box which just mounts Windows Shares and a single flume node on the gateway corresponds to the HDFS cluster. With tail or periodic log rotation you have control over all logfiles, depending on your use case. Either grab all incomming data and buffer it in Flume or just move all new data during night to the cluster. The gateway also contains sqoop and HDFS client if needed. Mirko 2013/1/17 Mahesh Balija balijamahesh@gmail.com That link talks about just installing Flume on Windows machine (NOT even have configs to push logs to the Hadoop cluster), but what if I have to collect logs from various clients, then I will endup installing in all clients. I have installed Flume successfully on Linux but I have to configure it such a way that it should gather the log files from the remote windows box? Harsh can you throw some light on this? On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq donta...@gmail.comwrote: Yes. It is possible. I haven't tries windows+flume+hadoop combo personally, but it should work. You may find this linkhttp://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.htmluseful. Alex has explained beautifully how to run Flume on a windows box.If I get time i'll try to simulate your use case and let you know. BTW, could you please share with us whatever you have tried?? Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija balijamahesh@gmail.com wrote: I have studied Flume but I didn't find any thing useful in my case. My requirement is there is a directory in Windows machine, in which the files will be generated and keep updated with new logs. I want to have a tail kind of mechanism (using exec source) through which I can push the latest updates into the cluster. Or I have to simply push once in a day to the cluster using spooling directory mechanism. Can somebody assist whether it is possible using Flume if so the configurations needed for this specific to remote windows machine. But On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf mirko.kae...@gmail.comwrote: Give Flume (http://flume.apache.org/) a chance to collect your data. Mirko 2013/1/17 sirenfei siren...@gmail.com ftp auto upload? 2013/1/17 Mahesh Balija balijamahesh@gmail.com: the Hadoop cluster (HDFS) either in synchronous or asynchronou
Re: Time taken for launching Application Master
Check your node manager logs to understand the bottleneck first. When we had a similar issue on recent version of hadoop, which includes fix for MAPREDUCE-4068: we rearranged our job jar file to reduce time spent on 'expanding' the job jar file by the node manager(s). -Rahul On Sun, Jan 20, 2013 at 10:34 PM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am seeing that from the time ApplicationMaster is sumitted by my Client to the ASM part of RM, it is taking around 7 seconds for AM to get started. Is there a way to reduce that time, I mean to speed it up? Thanks, Kishore
Can't browse the filesystem By Internet Explorer
Hi, I have installed a cluster with hadoop2.0.0-alpha, totally 4 pc works, 1 Namenode, 3 Datanodes. I opened the http://master:50070/dfshealth.jps page by Chrome from a remote pc, it's all right, However, when I clicked the browse the filesystem, the Chrome redirect to http://slave16:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=/nnaddr =master2:9000 (the slave16: 172.16.0.***, LAN ip address), Is there any configurations in hdfs-site.xml that I need to configure to deal with the problem, Or How? Any answers would be appreciated, thanks! Kira.wang