Re: What happens when .....?

2014-08-28 Thread Stanley Shi
Normally MR job is used for batch processing. So I don't think this is a good use case here for MR. Since you need to run the program periodically, you cannot submit a single mapreduce job for this. An possible way is to create a cron job to scan the folder size and submit a MR job if necessary;

Re: What happens when .....?

2014-08-28 Thread Eric Payne
From: Stanley Shi s...@pivotal.io To: user@hadoop.apache.org user@hadoop.apache.org Sent: Thursday, August 28, 2014 1:15 AM Subject: Re: What happens when .? Normally MR job is used for batch processing. So I don't think this is a good use case here for MR. Since you need to run

Re: What happens when .....?

2014-08-28 Thread Mahesh Khandewal
:* Thursday, August 28, 2014 1:15 AM *Subject:* Re: What happens when .? Normally MR job is used for batch processing. So I don't think this is a good use case here for MR. Since you need to run the program periodically, you cannot submit a single mapreduce job for this. An possible way

What happens when .....?

2014-08-27 Thread Kandoi, Nikhil
Hi All, I have a system where files are coming in hdfs at regular intervals and I perform an operation everytime the directory size goes above a particular point. My Question is that when I submit a map reduce job, would it only work on the files present at that point ?? Regards, Nikhil Kandoi

Re: What happens when you have fewer input files than mapper slots?

2013-03-22 Thread jeremy p
Apologies -- I don't understand this advice : If the evenness is the goal you can also write your own input format that return empty locations for each split and read the small files in map task directly. How would manually reading the files into the map task help me? Hadoop would still spawn

Re: What happens when you have fewer input files than mapper slots?

2013-03-22 Thread jeremy p
Is there a way to force an even spread of data? On Fri, Mar 22, 2013 at 2:14 PM, jeremy p athomewithagroove...@gmail.comwrote: Apologies -- I don't understand this advice : If the evenness is the goal you can also write your own input format that return empty locations for each split and read

Re: What happens when you have fewer input files than mapper slots?

2013-03-21 Thread Luke Lu
Short version : let's say you have 20 nodes, and each node has 10 mapper slots. You start a job with 20 very small input files. How is the work distributed to the cluster? Will it be even, with each node spawning one mapper task? Is there any way of predicting or controlling how the work

Re: What happens when you have fewer input files than mapper slots?

2013-03-19 Thread Rahul Jain
Which version of hadoop are you using ? MRV1 or MRV2 (yarn) ?? For MRv2 (yarn): you can pretty much achieve this using: yarn.nodemanager.resource.memory-mb (system wide setting) and mapreduce.map.memory.mb (job level setting) e.g. if yarn.nodemanager.resource.memory-mb=100 and

Re: What happens when you have fewer input files than mapper slots?

2013-03-19 Thread jeremy p
The job we need to run executes some third-party code that utilizes multiple cores. The only way the job will get done in a timely fashion is if we give it all the cores available on the machine. This is not a task that can be split up. Yes, I know, it's not ideal, but this is the situation I

Re: What happens when you have fewer input files than mapper slots?

2013-03-19 Thread Harsh J
You can leverage YARN's CPU Core scheduling feature for this purpose. It was added to the 2.0.3 release via https://issues.apache.org/jira/browse/YARN-2 and seems to fit your need exactly. However, looking at that patch, it seems like param-config support for MR apps wasn't added by this so it may

Re: What happens when you have fewer input files than mapper slots?

2013-03-19 Thread Harsh J
Correction to my previous post: I completely missed https://issues.apache.org/jira/browse/MAPREDUCE-4520 which covers the MR config ends already in 2.0.3. My bad :) On Wed, Mar 20, 2013 at 5:34 AM, Harsh J ha...@cloudera.com wrote: You can leverage YARN's CPU Core scheduling feature for this

What happens when a block is being invalidated/deleted on the DataNode when it is being read?

2013-01-31 Thread Xiao Yu
Hi, all I am wondering if this situation could happen: a block is being invalidated/deleted on a DataNode(by the NameNode, for example, to delete a over-replicated block) while it is also concurrently being read (by some clients). If this could happen, how does hdfs handle this issue? Thank you

what happens when a datanode rejoins?

2012-09-11 Thread Mehul Choube
Hi, What happens when an existing (not new) datanode rejoins a cluster for following scenarios: 1. Some of the blocks it was managing are deleted/modified? 2. The size of the blocks are now modified say from 64MB to 128MB? 3. What if the block replication factor was one

Re: what happens when a datanode rejoins?

2012-09-11 Thread George Datskos
Hi Mehul Some of the blocks it was managing are deleted/modified? The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. The size of the blocks are now modified say

Re: what happens when a datanode rejoins?

2012-09-11 Thread George Datskos
Mehul, Let me make an addition. Some of the blocks it was managing are deleted/modified? Blocks that are deleted in the interim will deleted on the rejoining node as well, after it rejoins . Regarding the modified, I'd advise against modifying blocks after they have been fully written.

Re: what happens when a datanode rejoins?

2012-09-11 Thread Harsh J
George has answered most of these. I'll just add on: On Tue, Sep 11, 2012 at 12:44 PM, Mehul Choube mehul_cho...@symantec.com wrote: 1. Some of the blocks it was managing are deleted/modified? A DN runs a block report upon start, and sends the list of blocks to the NN. NN validates them

what happens when a datanode rejoins?

2012-09-11 Thread mehul choube
Hi, What happens when an existing (not new) datanode rejoins a cluster for following scenarios: a) Some of the blocks it was managing are deleted/modified? b) The size of the blocks are now modified say from 64MB to 128MB? c) What if the block replication factor was one (yea not in most

RE: what happens when a datanode rejoins?

2012-09-11 Thread Mehul Choube
The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. What happens when the datanode rejoins after namenode has already re-replicated the blocs it was managing

Re: what happens when a datanode rejoins?

2012-09-11 Thread Harsh J
Hi, Inline. On Tue, Sep 11, 2012 at 2:36 PM, Mehul Choube mehul_cho...@symantec.com wrote: The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. What happens when

RE: what happens when a datanode rejoins?

2012-09-11 Thread Mehul Choube
DataNode rejoins take care of only NameNode. Sorry didn't get this From: Narasingu Ramesh [mailto:ramesh.narasi...@gmail.com] Sent: Tuesday, September 11, 2012 2:38 PM To: user@hadoop.apache.org Subject: Re: what happens when a datanode rejoins? Hi Mehul, DataNode rejoins take care

Re: What happens when I do not output anything from my mapper

2012-06-05 Thread murat migdisoglu
you tell me which output format are you using? Thanks Devaraj From: murat migdisoglu [murat.migdiso...@gmail.com] Sent: Monday, June 04, 2012 6:18 PM To: common-user@hadoop.apache.org Subject: Re: What happens when I do not output anything from my

Re: What happens when I do not output anything from my mapper

2012-06-04 Thread praveenesh kumar
You can control your map outputs based on any condition you want. I have done that - it worked for me. It could be your code problem that its not working for you. Can you please share your map code or cross-check whether your conditions are correct ? Regards, Praveenesh On Mon, Jun 4, 2012 at

RE: What happens when I do not output anything from my mapper

2012-06-04 Thread Devaraj k
in it. Thanks Devaraj From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, June 04, 2012 5:57 PM To: common-user@hadoop.apache.org Subject: Re: What happens when I do not output anything from my mapper You can control your map outputs based on any condition

Re: What happens when I do not output anything from my mapper

2012-06-04 Thread murat migdisoglu
From: praveenesh kumar [praveen...@gmail.com] Sent: Monday, June 04, 2012 5:57 PM To: common-user@hadoop.apache.org Subject: Re: What happens when I do not output anything from my mapper You can control your map outputs based on any condition you want. I have done that - it worked for me

Re: What happens when I do not output anything from my mapper - Solution

2012-06-04 Thread murat migdisoglu
, 2012 5:57 PM To: common-user@hadoop.apache.org Subject: Re: What happens when I do not output anything from my mapper You can control your map outputs based on any condition you want. I have done that - it worked for me. It could be your code problem that its not working for you. Can you

RE: What happens when I do not output anything from my mapper

2012-06-04 Thread Devaraj k
From: murat migdisoglu [murat.migdiso...@gmail.com] Sent: Monday, June 04, 2012 6:18 PM To: common-user@hadoop.apache.org Subject: Re: What happens when I do not output anything from my mapper Hi, Thanks for your answer. After I've read your emails, I decided to clear

Re: What happens when you do a ctrl-c on a big dfs -rmr

2009-03-11 Thread lohit
time. You do not need to reformat HDFS. Lohit - Original Message From: bzheng bing.zh...@gmail.com To: core-user@hadoop.apache.org Sent: Wednesday, March 11, 2009 7:48:41 PM Subject: What happens when you do a ctrl-c on a big dfs -rmr I did a ctrl-c immediately after issuing a hadoop dfs

RE: What happens when a server loses all its state?

2008-12-17 Thread Benjamin Reed
From: thomas.john...@sun.com [thomas.john...@sun.com] Sent: Tuesday, December 16, 2008 4:02 PM To: zookeeper-user@hadoop.apache.org Subject: Re: What happens when a server loses all its state? Mahadev Konar wrote: Hi Thomas, More generally, is it a safe assumption

RE: What happens when a server loses all its state?

2008-12-17 Thread Krishna Sankar (ksankar)
recovery state. Cheers k/ |-Original Message- |From: Benjamin Reed [mailto:br...@yahoo-inc.com] |Sent: Wednesday, December 17, 2008 11:48 AM |To: zookeeper-user@hadoop.apache.org |Subject: RE: What happens when a server loses all its state? | |Thomas, | |in the scenario you give you have

Re: What happens when a server loses all its state?

2008-12-16 Thread Thomas Vinod Johnson
Sorry, I should have been a little more explicit. At this point, the situation I'm considering is this; out of 3 servers, 1 server 'A' forgets its persistent state (due to a bad disk, say) and it restarts. My guess from what I could understand/reason about the internals was that the server 'A'

HDFS: What happens when a harddrive fails

2008-03-26 Thread Cagdas Gerede
I was wondering 1) what happens if a data node is alive but its harddrive fails? Does it throw an exception and dies? 2) If It continues to run and continue to do blockreporting, is there a console showing datanodes with healthy hard drives and unhealthy hard drives? I know the web server of the

Re: HDFS: What happens when a harddrive fails

2008-03-26 Thread Ted Dunning
It depends on the failure. For some failure modes, the disk just becomes very slow. On 3/26/08 4:39 PM, Cagdas Gerede [EMAIL PROTECTED] wrote: I was wondering 1) what happens if a data node is alive but its harddrive fails? Does it throw an exception and dies? 2) If It continues to run