Re: Save configuration data in job configuration file.

2013-01-20 Thread Pedro Sá da Costa
This does not save in the xml file. I think this just keep the
variable in memory.

On 19 January 2013 18:48, Arun C Murthy a...@hortonworks.com wrote:
 jobConf.set(String, String)?




-- 
Best regards,


Re: Save configuration data in job configuration file.

2013-01-20 Thread Harsh J
The MR framework saves it into the job.xml before it sends it for execution.

If you're asking about a way to save the config object into the XML file,
use
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/conf/Configuration.html#writeXml(java.io.Writer)or
similar APIs.


On Sun, Jan 20, 2013 at 4:41 PM, Pedro Sá da Costa psdc1...@gmail.comwrote:

 This does not save in the xml file. I think this just keep the
 variable in memory.

 On 19 January 2013 18:48, Arun C Murthy a...@hortonworks.com wrote:
  jobConf.set(String, String)?




 --
 Best regards,




-- 
Harsh J


Re: Prolonged safemode

2013-01-20 Thread xin jiang
On Sun, Jan 20, 2013 at 7:50 AM, Mohammad Tariq donta...@gmail.com wrote:

 Hey Jean,

 Feels good to hear that ;) I don't have to feel
 myself a solitary yonker anymore.

 Since I am working on a single node, the problem
 becomes more sever. I don't have any other node
 where MR files could get replicated.

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Sun, Jan 20, 2013 at 5:08 AM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Hi Tariq,

 I often have to force HDFS to go out of safe mode manually when I
 restart my cluster (or after power outage) I never tought about
 reporting that ;)

 I'm using hadoop-1.0.3. I think it was because of the MR files still
 not replicated on enought nodes. But not 100% sure.

 JM

 2013/1/19, Mohammad Tariq donta...@gmail.com:
  Hello list,
 
 I have a pseudo distributed setup on my laptop. Everything was
  working fine untill now. But lately HDFS has started taking a lot of
 time
  to leave the safemode. Infact, I have to it manuaaly most of the times
 as
  TT and Hbase daemons get disturbed because of this.
 
  I am using hadoop-1.0.4. Is it a problem with this version? I have never
  faced any such issue with older versions. Or, is something going wrong
 on
  my side??
 
  Thank you so much for your precious time.
 
  Warm Regards,
  Tariq
  https://mtariq.jux.com/
  cloudfront.blogspot.com
 





Re: Prolonged safemode

2013-01-20 Thread shashwat shriparv
Check integrity of the file system, and check the replication factor, by
mistake if default is left as 3 or so.  if you have hbase configured check
hbck if everything is fine with the cluster.



∞
Shashwat Shriparv



On Sun, Jan 20, 2013 at 3:09 PM, xin jiang jiangxin1...@gmail.com wrote:



 On Sun, Jan 20, 2013 at 7:50 AM, Mohammad Tariq donta...@gmail.comwrote:

 Hey Jean,

 Feels good to hear that ;) I don't have to feel
 myself a solitary yonker anymore.

 Since I am working on a single node, the problem
 becomes more sever. I don't have any other node
 where MR files could get replicated.

 Warm Regards,
 Tariq
  https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Sun, Jan 20, 2013 at 5:08 AM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Hi Tariq,

 I often have to force HDFS to go out of safe mode manually when I
 restart my cluster (or after power outage) I never tought about
 reporting that ;)

 I'm using hadoop-1.0.3. I think it was because of the MR files still
 not replicated on enought nodes. But not 100% sure.

 JM

 2013/1/19, Mohammad Tariq donta...@gmail.com:
  Hello list,
 
 I have a pseudo distributed setup on my laptop. Everything was
  working fine untill now. But lately HDFS has started taking a lot of
 time
  to leave the safemode. Infact, I have to it manuaaly most of the times
 as
  TT and Hbase daemons get disturbed because of this.
 
  I am using hadoop-1.0.4. Is it a problem with this version? I have
 never
  faced any such issue with older versions. Or, is something going wrong
 on
  my side??
 
  Thank you so much for your precious time.
 
  Warm Regards,
  Tariq
  https://mtariq.jux.com/
  cloudfront.blogspot.com
 






Re: Prolonged safemode

2013-01-20 Thread Harsh J
If your DN is starting too slow, then you should investigate why.

In any case, Apache Bigtop's (http://bigtop.apache.org) pseudo-distributed
configs provide good values for 1-node setups. In your case, you seem to be
missing dfs.safemode.min.datanodes set to 1, and dfs.safemode.extension set
to 0.


On Sun, Jan 20, 2013 at 4:05 AM, Mohammad Tariq donta...@gmail.com wrote:

 Hello list,

I have a pseudo distributed setup on my laptop. Everything was
 working fine untill now. But lately HDFS has started taking a lot of time
 to leave the safemode. Infact, I have to it manuaaly most of the times as
 TT and Hbase daemons get disturbed because of this.

 I am using hadoop-1.0.4. Is it a problem with this version? I have never
 faced any such issue with older versions. Or, is something going wrong on
 my side??

 Thank you so much for your precious time.

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com




-- 
Harsh J


Re: Prolonged safemode

2013-01-20 Thread varun kumar
Hi Tariq,

When you start your namenode,Is it able to come out of Safemode
Automatically.

If no then there are under replicated blocks or corrupted blocks where
namenode is trying to fetch it.

Try to remove corrupted blocks.

Regards,
Varun Kumar.P

On Sun, Jan 20, 2013 at 4:05 AM, Mohammad Tariq donta...@gmail.com wrote:

 Hello list,

I have a pseudo distributed setup on my laptop. Everything was
 working fine untill now. But lately HDFS has started taking a lot of time
 to leave the safemode. Infact, I have to it manuaaly most of the times as
 TT and Hbase daemons get disturbed because of this.

 I am using hadoop-1.0.4. Is it a problem with this version? I have never
 faced any such issue with older versions. Or, is something going wrong on
 my side??

 Thank you so much for your precious time.

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com




-- 
Regards,
Varun Kumar.P


Re: Prolonged safemode

2013-01-20 Thread Mohammad Tariq
Hello Varun,

   Thank you so much for your reply. In most of the
cases, it is not. But apart from that everything seems
to be fine. I am not getting any notification about
under replicated blocks or corrupted blocks. I will do
a recheck though.

Thank you.

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Sun, Jan 20, 2013 at 5:43 PM, Mohammad Tariq donta...@gmail.com wrote:

 Thank you so much for the valuable reply Harsh. I'll
 look into it. One quick question, why it it happening
 with 1.0.4? Is there any compulsion to set these two
 props, you have specified above. Earlier version were
 doing absolutely fine without these props?

 I am Sorry to be a pest of questions. But, I am kinda
 curious about this. Thank you so much.

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Sun, Jan 20, 2013 at 4:29 PM, Harsh J ha...@cloudera.com wrote:

 If your DN is starting too slow, then you should investigate why.

 In any case, Apache Bigtop's (http://bigtop.apache.org)
 pseudo-distributed configs provide good values for 1-node setups. In your
 case, you seem to be missing dfs.safemode.min.datanodes set to 1,
 and dfs.safemode.extension set to 0.


 On Sun, Jan 20, 2013 at 4:05 AM, Mohammad Tariq donta...@gmail.comwrote:

 Hello list,

I have a pseudo distributed setup on my laptop. Everything was
 working fine untill now. But lately HDFS has started taking a lot of time
 to leave the safemode. Infact, I have to it manuaaly most of the times as
 TT and Hbase daemons get disturbed because of this.

 I am using hadoop-1.0.4. Is it a problem with this version? I have never
 faced any such issue with older versions. Or, is something going wrong on
 my side??

 Thank you so much for your precious time.

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com




 --
 Harsh J





Re: On a lighter note

2013-01-20 Thread Mohammad Tariq
Oh yeah Alex. Thank God that we have a German
expert as well ;)

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Sun, Jan 20, 2013 at 1:28 PM, Alexander Alten-Lorenz wget.n...@gmail.com
 wrote:

 Actually Der Untergang ;)

 Alexander Alten-Lorenz
 http://mapredit.blogspot.com
 Twitter: @mapredit
 German Hadoop LinkedIn Group: http://goo.gl/N8pCF

 On Jan 18, 2013, at 23:18, Ted Dunning tdunn...@maprtech.com wrote:

 Well, I think the actual name was untergang.  Same meaning.

 Sent from my iPhone

 On Jan 17, 2013, at 8:09 PM, Mohammad Tariq donta...@gmail.com wrote:

 You are right Michael, as always :)

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Fri, Jan 18, 2013 at 6:33 AM, Michael Segel 
 michael_se...@hotmail.comwrote:

 I'm thinking 'Downfall'

 But I could be wrong.

 On Jan 17, 2013, at 6:56 PM, Yongzhi Wang wang.yongzhi2...@gmail.com
 wrote:

 Who can tell me what is the name of the original film? Thanks!

 Yongzhi


 On Thu, Jan 17, 2013 at 3:05 PM, Mohammad Tariq donta...@gmail.comwrote:

 I am sure you will suffer from severe stomach ache after watching this :)
 http://www.youtube.com/watch?v=hEqQMLSXQlY

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com







Re: Prolonged safemode

2013-01-20 Thread Harsh J
I am not aware of a direct regression in DN startup slowdown or block
report slowdown; its hard to tell what exactly the regression is without
more notes or logs on behavior.


On Sun, Jan 20, 2013 at 5:43 PM, Mohammad Tariq donta...@gmail.com wrote:

 Thank you so much for the valuable reply Harsh. I'll
 look into it. One quick question, why it it happening
 with 1.0.4? Is there any compulsion to set these two
 props, you have specified above. Earlier version were
 doing absolutely fine without these props?

 I am Sorry to be a pest of questions. But, I am kinda
 curious about this. Thank you so much.

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Sun, Jan 20, 2013 at 4:29 PM, Harsh J ha...@cloudera.com wrote:

 If your DN is starting too slow, then you should investigate why.

 In any case, Apache Bigtop's (http://bigtop.apache.org)
 pseudo-distributed configs provide good values for 1-node setups. In your
 case, you seem to be missing dfs.safemode.min.datanodes set to 1,
 and dfs.safemode.extension set to 0.


 On Sun, Jan 20, 2013 at 4:05 AM, Mohammad Tariq donta...@gmail.comwrote:

 Hello list,

I have a pseudo distributed setup on my laptop. Everything was
 working fine untill now. But lately HDFS has started taking a lot of time
 to leave the safemode. Infact, I have to it manuaaly most of the times as
 TT and Hbase daemons get disturbed because of this.

 I am using hadoop-1.0.4. Is it a problem with this version? I have never
 faced any such issue with older versions. Or, is something going wrong on
 my side??

 Thank you so much for your precious time.

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com




 --
 Harsh J





-- 
Harsh J


Fair Scheduler of Hadoop

2013-01-20 Thread Lin Ma
Hi guys,

I have a quick question regarding to fire scheduler of Hadoop, I am reading
this article =
http://blog.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/, my
question is from the following statements, There is currently no support
for preemption of long tasks, but this is being added in
HADOOP-4665https://issues.apache.org/jira/browse/HADOOP-4665,
which will allow you to set how long each pool will wait before preempting
other jobs’ tasks to reach its guaranteed capacity..

My questions are,

1. What means preemption of long tasks? Kill long running tasks, or pause
long running tasks to give resources to other tasks, or it means something
else?
2. I am also confused about set how long each pool will wait before
preempting other jobs’ tasks to reach its guaranteed capacity., what means
reach its guaranteed capacity? I think when using fair scheduler, each
pool has predefined resources allocation settings (and the settings
guarantees each pool has resources as configured), is that true? In what
situations each pool will not have its guaranteed (or configured) capacity?

regards,
Lin


Using JCUDA with MapReduce

2013-01-20 Thread Amit Sela
Hi all,

I was wondering if anyone here tried using the GPU of a Hadoop Node to
enhance MapReduce processing ?

I read about it but it always comes down to heavy computations such as
Matrix multiplications and Mote Carlo algorithms.

Did anyone try it with MapReduce jobs that analyze logs or any other text
mining examples ?

Is there a trade-off here (guess there is) between data size/complexity and
the computation required ?

Thanks,

Amit.


Fwd: new join algorithm using mapreduce

2013-01-20 Thread Vikas Jadhav
-- Forwarded message --
From: Vikas Jadhav vikascjadha...@gmail.com
Date: Sat, Jan 19, 2013 at 10:58 PM
Subject: new join algorithm using mapreduce
To: user@hadoop.apache.org


I am writing new join algorithm using hadoop
and want to do multi way join in single mapreduce job


map -- processes all dataset
reduce--  join of all dataset
+ aggregate operation

second mapreduce job

will collect result from multple reducer files of first job

I am pretty clear about map phase but dnt have idea how to process
all dataset in single reduce.

So how i should proceed and
do i need to modify code of mapreduce


Thanks



-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*



-- 
*
*
*

Thanx and Regards*
* Vikas Jadhav*


Re: Fair Scheduler of Hadoop

2013-01-20 Thread Joep Rottinghuis
Lin,

The article you are reading us old.
Fair scheduler does have preemption.
Tasks get killed and rerun later, potentially on a different node.

You can set a minimum / guaranteed capacity. The sum of those across pools 
would typically equal the total capacity of your cluster or less.
Then you can configure each pool to go beyond that capacity. That would happen 
if the cluster is temporary not used to the full capacity.
Then when the demand for capacity increases, and jobs are queued in other pools 
that are not running at their minimum guaranteed capacity, some long running 
tasks from jobs in the pool that is using more than its minimum capacity get 
killed (to be run later again).

Does that make sense?

Cheers,

Joep

Sent from my iPhone

On Jan 20, 2013, at 6:25 AM, Lin Ma lin...@gmail.com wrote:

 Hi guys,
 
 I have a quick question regarding to fire scheduler of Hadoop, I am reading 
 this article = 
 http://blog.cloudera.com/blog/2008/11/job-scheduling-in-hadoop/, my question 
 is from the following statements, There is currently no support for 
 preemption of long tasks, but this is being added in HADOOP-4665, which will 
 allow you to set how long each pool will wait before preempting other jobs’ 
 tasks to reach its guaranteed capacity..
 
 My questions are,
 
 1. What means preemption of long tasks? Kill long running tasks, or pause 
 long running tasks to give resources to other tasks, or it means something 
 else?
 2. I am also confused about set how long each pool will wait before 
 preempting other jobs’ tasks to reach its guaranteed capacity., what means 
 reach its guaranteed capacity? I think when using fair scheduler, each pool 
 has predefined resources allocation settings (and the settings guarantees 
 each pool has resources as configured), is that true? In what situations each 
 pool will not have its guaranteed (or configured) capacity?
 
 regards,
 Lin


Re: How to copy log files from remote windows machine to Hadoop cluster

2013-01-20 Thread Mahesh Balija
Hi Mirko,

   Thanks for your reply. It works for me as well.
   Now I was able to mount the folder on the master node and
configured Flume such that it can either poll for logs in real time or even
for periodic retrieval.

Thanks,
Mahesh Balija.
Calsof Labs.

On Thu, Jan 17, 2013 at 5:01 PM, Mirko Kämpf mirko.kae...@gmail.com wrote:

 One approach I used in my lab was the data-gateway,
 which is a small linux box which just mounts Windows Shares
 and a single flume node on the gateway corresponds to the
 HDFS cluster. With tail or periodic log rotation you have control
 over all logfiles, depending on your use case. Either grab all
 incomming data and buffer it in Flume or just move all new data
 during night to the cluster. The gateway also contains sqoop
 and HDFS client if needed.

 Mirko




 2013/1/17 Mahesh Balija balijamahesh@gmail.com

 That link talks about just installing Flume on Windows machine (NOT even
 have configs to push logs to the Hadoop cluster), but what if I have to
 collect logs from various clients, then I will endup installing in all
 clients.

 I have installed Flume successfully on Linux but I have to configure it
 such a way that it should gather the log files from the remote windows box?

 Harsh can you throw some light on this?


 On Thu, Jan 17, 2013 at 4:21 PM, Mohammad Tariq donta...@gmail.comwrote:

 Yes. It is possible. I haven't tries windows+flume+hadoop combo
 personally, but it should work. You may find this 
 linkhttp://mapredit.blogspot.in/2012/07/run-flume-13x-on-windows.htmluseful.
  Alex
 has explained beautifully how to run Flume on a windows box.If I
 get time i'll try to simulate your use case and let you know.

 BTW, could you please share with us whatever you have tried??

 Warm Regards,
 Tariq
 https://mtariq.jux.com/
 cloudfront.blogspot.com


 On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija 
 balijamahesh@gmail.com wrote:

 I have studied Flume but I didn't find any thing useful in my case.
 My requirement is there is a directory in Windows machine, in which the
 files will be generated and keep updated with new logs. I want to have a
 tail kind of mechanism (using exec source) through which I can push the
 latest updates into the cluster.
 Or I have to simply push once in a day to the cluster using spooling
 directory mechanism.

 Can somebody assist whether it is possible using Flume if so the
 configurations needed for this specific to remote windows machine.

 But

 On Thu, Jan 17, 2013 at 3:48 PM, Mirko Kämpf mirko.kae...@gmail.comwrote:

 Give Flume (http://flume.apache.org/) a chance to collect your data.

 Mirko



 2013/1/17 sirenfei siren...@gmail.com

 ftp auto upload?


 2013/1/17 Mahesh Balija balijamahesh@gmail.com:
  the Hadoop cluster (HDFS) either in synchronous or asynchronou









Re: Time taken for launching Application Master

2013-01-20 Thread Rahul Jain
Check your node manager logs to understand the bottleneck first. When we
had a similar issue on recent version of hadoop, which includes fix for
MAPREDUCE-4068: we rearranged our job jar file to reduce time spent on
'expanding' the job jar file by the node manager(s).

-Rahul

On Sun, Jan 20, 2013 at 10:34 PM, Krishna Kishore Bonagiri 
write2kish...@gmail.com wrote:

 Hi,
I am seeing that from the time ApplicationMaster is sumitted by my
 Client to the ASM part of RM, it is taking around 7 seconds for AM to get
 started. Is there a way to reduce that time, I mean to speed it up?

 Thanks,
 Kishore



Can't browse the filesystem By Internet Explorer

2013-01-20 Thread kira . wang
Hi,

I have installed a cluster with hadoop2.0.0-alpha, totally 4 pc works, 1
Namenode, 3 Datanodes.
I opened the http://master:50070/dfshealth.jps page by Chrome from a remote
pc, it's all right,
However, when I clicked the browse the filesystem, the Chrome redirect to 
http://slave16:50075/browseDirectory.jsp?namenodeInfoPort=50070dir=/nnaddr
=master2:9000
(the slave16: 172.16.0.***, LAN ip address),
Is there any configurations in hdfs-site.xml that I need to configure to
deal with the problem, Or How?

Any answers would be appreciated, thanks!

Kira.wang