Normally MR job is used for batch processing. So I don't think this is a
good use case here for MR.
Since you need to run the program periodically, you cannot submit a single
mapreduce job for this.
An possible way is to create a cron job to scan the folder size and submit
a MR job if necessary;
From: Stanley Shi s...@pivotal.io
To: user@hadoop.apache.org user@hadoop.apache.org
Sent: Thursday, August 28, 2014 1:15 AM
Subject: Re: What happens when .?
Normally MR job is used for batch processing. So I don't think this is a good
use case here for MR.
Since you need to run
:* Thursday, August 28, 2014 1:15 AM
*Subject:* Re: What happens when .?
Normally MR job is used for batch processing. So I don't think this is a
good use case here for MR.
Since you need to run the program periodically, you cannot submit a single
mapreduce job for this.
An possible way
Hi All,
I have a system where files are coming in hdfs at regular intervals and I
perform an operation everytime the directory size goes above a particular point.
My Question is that when I submit a map reduce job, would it only work on the
files present at that point ??
Regards,
Nikhil Kandoi
Apologies -- I don't understand this advice : If the evenness is the goal
you can also write your own input format that return empty locations for
each split and read the small files in map task directly. How would
manually reading the files into the map task help me? Hadoop would still
spawn
Is there a way to force an even spread of data?
On Fri, Mar 22, 2013 at 2:14 PM, jeremy p athomewithagroove...@gmail.comwrote:
Apologies -- I don't understand this advice : If the evenness is the goal
you can also write your own input format that return empty locations for
each split and read
Short version : let's say you have 20 nodes, and each node has 10 mapper
slots. You start a job with 20 very small input files. How is the work
distributed to the cluster? Will it be even, with each node spawning one
mapper task? Is there any way of predicting or controlling how the work
Which version of hadoop are you using ? MRV1 or MRV2 (yarn) ??
For MRv2 (yarn): you can pretty much achieve this using:
yarn.nodemanager.resource.memory-mb (system wide setting)
and
mapreduce.map.memory.mb (job level setting)
e.g. if yarn.nodemanager.resource.memory-mb=100
and
The job we need to run executes some third-party code that utilizes
multiple cores. The only way the job will get done in a timely fashion is
if we give it all the cores available on the machine. This is not a task
that can be split up.
Yes, I know, it's not ideal, but this is the situation I
You can leverage YARN's CPU Core scheduling feature for this purpose.
It was added to the 2.0.3 release via
https://issues.apache.org/jira/browse/YARN-2 and seems to fit your
need exactly. However, looking at that patch, it seems like
param-config support for MR apps wasn't added by this so it may
Correction to my previous post: I completely missed
https://issues.apache.org/jira/browse/MAPREDUCE-4520 which covers the
MR config ends already in 2.0.3. My bad :)
On Wed, Mar 20, 2013 at 5:34 AM, Harsh J ha...@cloudera.com wrote:
You can leverage YARN's CPU Core scheduling feature for this
Hi, all
I am wondering if this situation could happen: a block is being
invalidated/deleted on a DataNode(by the NameNode, for example, to delete a
over-replicated block) while it is also concurrently being read (by some
clients). If this could happen, how does hdfs handle this issue?
Thank you
Hi,
What happens when an existing (not new) datanode rejoins a cluster for
following scenarios:
1. Some of the blocks it was managing are deleted/modified?
2. The size of the blocks are now modified say from 64MB to 128MB?
3. What if the block replication factor was one
Hi Mehul
Some of the blocks it was managing are deleted/modified?
The namenode will asynchronously replicate the blocks to other datanodes
in order to maintain the replication factor after a datanode has not
been in contact for 10 minutes.
The size of the blocks are now modified say
Mehul,
Let me make an addition.
Some of the blocks it was managing are deleted/modified?
Blocks that are deleted in the interim will deleted on the rejoining
node as well, after it rejoins . Regarding the modified, I'd advise
against modifying blocks after they have been fully written.
George has answered most of these. I'll just add on:
On Tue, Sep 11, 2012 at 12:44 PM, Mehul Choube
mehul_cho...@symantec.com wrote:
1. Some of the blocks it was managing are deleted/modified?
A DN runs a block report upon start, and sends the list of blocks to
the NN. NN validates them
Hi,
What happens when an existing (not new) datanode rejoins a cluster for
following scenarios:
a) Some of the blocks it was managing are deleted/modified?
b) The size of the blocks are now modified say from 64MB to 128MB?
c) What if the block replication factor was one (yea not in most
The namenode will asynchronously replicate the blocks to other datanodes in
order to maintain the replication factor after a datanode has not been in
contact for 10 minutes.
What happens when the datanode rejoins after namenode has already re-replicated
the blocs it was managing
Hi,
Inline.
On Tue, Sep 11, 2012 at 2:36 PM, Mehul Choube mehul_cho...@symantec.com wrote:
The namenode will asynchronously replicate the blocks to other datanodes
in order to maintain the replication factor after a datanode has not been in
contact for 10 minutes.
What happens when
DataNode rejoins take care of only NameNode.
Sorry didn't get this
From: Narasingu Ramesh [mailto:ramesh.narasi...@gmail.com]
Sent: Tuesday, September 11, 2012 2:38 PM
To: user@hadoop.apache.org
Subject: Re: what happens when a datanode rejoins?
Hi Mehul,
DataNode rejoins take care
you tell me which output format are you using?
Thanks
Devaraj
From: murat migdisoglu [murat.migdiso...@gmail.com]
Sent: Monday, June 04, 2012 6:18 PM
To: common-user@hadoop.apache.org
Subject: Re: What happens when I do not output anything from my
You can control your map outputs based on any condition you want. I have
done that - it worked for me.
It could be your code problem that its not working for you.
Can you please share your map code or cross-check whether your conditions
are correct ?
Regards,
Praveenesh
On Mon, Jun 4, 2012 at
in it.
Thanks
Devaraj
From: praveenesh kumar [praveen...@gmail.com]
Sent: Monday, June 04, 2012 5:57 PM
To: common-user@hadoop.apache.org
Subject: Re: What happens when I do not output anything from my mapper
You can control your map outputs based on any condition
From: praveenesh kumar [praveen...@gmail.com]
Sent: Monday, June 04, 2012 5:57 PM
To: common-user@hadoop.apache.org
Subject: Re: What happens when I do not output anything from my mapper
You can control your map outputs based on any condition you want. I have
done that - it worked for me
, 2012 5:57 PM
To: common-user@hadoop.apache.org
Subject: Re: What happens when I do not output anything from my mapper
You can control your map outputs based on any condition you want. I have
done that - it worked for me.
It could be your code problem that its not working for you.
Can you
From: murat migdisoglu [murat.migdiso...@gmail.com]
Sent: Monday, June 04, 2012 6:18 PM
To: common-user@hadoop.apache.org
Subject: Re: What happens when I do not output anything from my mapper
Hi,
Thanks for your answer. After I've read your emails, I decided to clear
time. You do not need to
reformat HDFS.
Lohit
- Original Message
From: bzheng bing.zh...@gmail.com
To: core-user@hadoop.apache.org
Sent: Wednesday, March 11, 2009 7:48:41 PM
Subject: What happens when you do a ctrl-c on a big dfs -rmr
I did a ctrl-c immediately after issuing a hadoop dfs
From: thomas.john...@sun.com [thomas.john...@sun.com]
Sent: Tuesday, December 16, 2008 4:02 PM
To: zookeeper-user@hadoop.apache.org
Subject: Re: What happens when a server loses all its state?
Mahadev Konar wrote:
Hi Thomas,
More generally, is it a safe assumption
recovery state.
Cheers
k/
|-Original Message-
|From: Benjamin Reed [mailto:br...@yahoo-inc.com]
|Sent: Wednesday, December 17, 2008 11:48 AM
|To: zookeeper-user@hadoop.apache.org
|Subject: RE: What happens when a server loses all its state?
|
|Thomas,
|
|in the scenario you give you have
Sorry, I should have been a little more explicit. At this point, the
situation I'm considering is this; out of 3 servers, 1 server 'A'
forgets its persistent state (due to a bad disk, say) and it restarts.
My guess from what I could understand/reason about the internals was
that the server 'A'
I was wondering
1) what happens if a data node is alive but its harddrive fails? Does it
throw an exception and dies?
2) If It continues to run and continue to do blockreporting, is there a
console showing datanodes with healthy hard drives and unhealthy hard
drives? I know the web server of the
It depends on the failure.
For some failure modes, the disk just becomes very slow.
On 3/26/08 4:39 PM, Cagdas Gerede [EMAIL PROTECTED] wrote:
I was wondering
1) what happens if a data node is alive but its harddrive fails? Does it
throw an exception and dies?
2) If It continues to run
32 matches
Mail list logo