Are you using Fuse for mounting HDFS ?
On Fri, Apr 19, 2013 at 4:30 PM, lijinlong wakingdrea...@163.com wrote:
I mounted HDFS to a local directory for storage,that is /mnt/hdfs.I can do
the basic file operation such as create ,remove,copy etc just using linux
command and GUI.But when I tried
As this is a HBase specific question, it will be better to ask this
question on the HBase user mailing list.
Thanks
Hemanth
On Fri, Apr 19, 2013 at 10:46 PM, Adrian Acosta Mitjans
amitj...@estudiantes.uci.cu wrote:
Hello:
I'm working in a proyect, and i'm using hbase for storage the data,
Hi,
If your goal is to use the new API, I am able to get it to work with the
following maven configuration:
dependency
groupIdorg.apache.mrunit/groupId
artifactIdmrunit/artifactId
version0.9.0-incubating/version
classifierhadoop1/classifier
/dependency
If I
Sorry - no. I just wanted to know if you were using FUSE, because I knew of
no other way of mounting HDFS.. Basically was wondering if some libraries
needed to be system path for the Java programs to work.
From your response looks like you aren't using FUSE. So what are you using
to mount ?
+ user@
Please do continue the conversation on the mailing list, in case others
like you can benefit from / contribute to the discussion
Thanks
Hemanth
On Sat, Apr 20, 2013 at 5:32 PM, Hemanth Yamijala yhema...@thoughtworks.com
wrote:
Hi,
My code is working with having mrunit-0.9.0
2.x.x provides NN high availability.
http://hadoop.apache.org/docs/r2.0.3-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html
However, it is in alpha stage right now.
Thanks
hemanth
On Sat, Apr 20, 2013 at 5:30 PM, Ascot Moss ascot.m...@gmail.com wrote:
Hi,
I am new to
** **
*From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
*Sent:* Wednesday, April 17, 2013 9:11 PM
*To:* user@hadoop.apache.org
*Subject:* Re: How to configure mapreduce archive size?
** **
The check for cache file cleanup is controlled by the
property
Are you trying to implement something like namespace federation, that's a
part of Hadoop 2.0 -
http://hadoop.apache.org/docs/r2.0.3-alpha/hadoop-project-dist/hadoop-hdfs/Federation.html
On Thu, Apr 18, 2013 at 10:02 PM, Lixiang Ao aolixi...@gmail.com wrote:
Actually I'm trying to do something
,
** **
Jane
** **
*From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
*Sent:* Tuesday, April 16, 2013 9:35 PM
*To:* user@hadoop.apache.org
*Subject:* Re: How to configure mapreduce archive size?
** **
You can limit the size by setting local.cache.size in the mapred-site.xml
I don't think that is possible. When we use -getmerge, the destination
filesystem happens to be a LocalFileSystem which extends from
ChecksumFileSystem. I believe that's why the CRC files are getting in.
Would it not be possible for you to ignore them, since they have a fixed
extension ?
Thanks
** **
*From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
*Sent:* Thursday, April 11, 2013 9:09 PM
*To:* user@hadoop.apache.org
*Subject:* Re: How to configure mapreduce archive size?
** **
TableMapReduceUtil has APIs like addDependencyJars which will use
DistributedCache. I
*From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
*Sent:* Monday, April 08, 2013 9:09 PM
*To:* user@hadoop.apache.org
*Subject:* Re: How to configure mapreduce archive size?
Hi,
This directory is used as part of the 'DistributedCache' feature
AFAIK, the cp command works fully from the DFS client. It reads bytes from
the InputStream created when the file is opened and writes the same to the
OutputStream of the file. It does not work at the level of data blocks. A
configuration io.file.buffer.size is used as the size of the buffer used
);
** **
job.setOutputFormatClass(TableOutputFormat.*class*);
job.getConfiguration().set(TableOutputFormat.*OUTPUT_TABLE*,
tableName);
job.setNumReduceTasks(0);
*boolean* b = job.waitForCompletion(*true*);
** **
*From:* Hemanth Yamijala [mailto:yhema
Hi,
This directory is used as part of the 'DistributedCache' feature. (
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#DistributedCache).
There is a configuration key local.cache.size which controls the amount
of data stored under DistributedCache. The default limit is 10GB. However,
Hi,
Not sure if I am answering your question, but this is the background. Every
MapReduce job has a partitioner associated to it. The default partitioner
is a HashPartitioner. You can as a user write your own partitioner as well
and plug it into the job. The partitioner is responsible for
only
the needed lines.
Thanks,
Alberto
On 28 March 2013 11:01, Hemanth Yamijala yhema...@thoughtworks.com
wrote:
Hi,
Not sure if I am answering your question, but this is the background.
Every
MapReduce job has a partitioner associated to it. The default
partitioner
, Hemanth Yamijala yhema...@thoughtworks.com
wrote:
Hmm. That feels like a join. Can't you read the input file on the map
side
and output those keys along with the original map output keys.. That way
the
reducer would automatically get both together ?
On Thu, Mar 28, 2013 at 5:20 PM
=./myheapdump.hprof -XX:OnOutOfMemoryError=./dump.sh'
This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
Koji
On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
Hi,
I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately,
like I suspected, the dump goes
...
attempt_201302211510_81218_m_00_0: put: File myheapdump.hprof does not
exist.
attempt_201302211510_81218_m_00_0: log4j:WARN No appenders could be
found for logger (org.apache.hadoop.hdfs.DFSClient).
On Wed, Mar 27, 2013 at 2:29 PM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Couple of things to check
.
I only have have a edge node through which I can submit the jobs.
Is there any other way of getting the dump instead of physically going to
that machine and checking out.
On Tue, Mar 26, 2013 at 10:12 AM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Hi,
One option to find what
matching
a pattern. However, these are NOT retaining the current working directory.
Hence, there is no option to get this from a cluster AFAIK.
You are effectively left with the jmap option on pseudo distributed cluster
I think.
Thanks
Hemanth
On Tue, Mar 26, 2013 at 11:37 AM, Hemanth Yamijala
=./dump.sh'
This should create the heap dump on hdfs at /tmp/myheapdump_knoguchi.
Koji
On Mar 26, 2013, at 11:53 AM, Hemanth Yamijala wrote:
Hi,
I tried to use the -XX:+HeapDumpOnOutOfMemoryError. Unfortunately, like
I suspected, the dump goes to the current work directory of the task
The stack trace indicates the job client is trying to submit a job to the
MR cluster and it is failing. Are you certain that at the time of
submitting the job, the JobTracker is running ? (On localhost:54312) ?
Regarding using a different file system - it depends a lot on what file
system you are
Hi,
The free memory might be low, just because GC hasn't reclaimed what it can.
Can you just try reading in the data you want to read and see if that works
?
Thanks
Hemanth
On Mon, Mar 25, 2013 at 10:32 AM, nagarjuna kanamarlapudi
nagarjuna.kanamarlap...@gmail.com wrote:
io.sort.mb = 256 MB
suggestion loading 420 MB file into memory. It threw java
heap space error.
I am not sure where this 1.6 GB of configured heap went to ?
On Mon, Mar 25, 2013 at 12:01 PM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Hi,
The free memory might be low, just because GC hasn't reclaimed what
in the mapper. So I am trying to read
the whole file and load it into list in the mapper.
For each and every record Iook in this file which I got from distributed
cache.
—
Sent from iPhone
On Mon, Mar 25, 2013 at 6:39 PM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Hmm. How are you loading
Any MapReduce task needs to communicate with the tasktracker that launched
it periodically in order to let the tasktracker know it is still alive and
active. The time for which silence is tolerated is controlled by a
configuration property mapred.task.timeout.
It looks like in your case, this has
the fix is really in 2.0.0-alpha, request
you to please clarify me.
Thanks,
Kishore
On Thu, Mar 21, 2013 at 9:57 AM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
There was an issue related to hung connections (HDFS-3357). But the JIRA
indicates the fix is available in Hadoop-2.0.0
There was an issue related to hung connections (HDFS-3357). But the JIRA
indicates the fix is available in Hadoop-2.0.0-alpha. Still, would be worth
checking on Sandy's suggestion
On Wed, Mar 20, 2013 at 11:09 PM, Sandy Ryza sandy.r...@cloudera.comwrote:
Hi Kishore,
50010 is the datanode
23, 2013 at 11:54 AM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Hi Lucas,
I tried something like this but got different results.
I wrote code that opened a file on HDFS, wrote a line and called sync.
Without closing the file, I ran a wordcount with that file as input. It did
work
Can you try this ? Pick a class like WordCount from your package and
execute this command:
javap -classpath path to your jar -verbose org.myorg.Wordcount | grep
version.
For e.g. here's what I get for my class:
$ javap -verbose WCMapper | grep version
minor version: 0
major version: 50
Yes. It corresponds to the JT start time.
Thanks
hemanth
On Sat, Feb 23, 2013 at 5:37 PM, Manoj Babu manoj...@gmail.com wrote:
Bharath,
I can understand that its time stamp.
what does identifier means? whether is holds the job tracker instance
started time?
Cheers!
Manoj.
On Sat, Feb
, and reading the file using
org.apache.hadoop.fs.FSDataInputStream also works ok.
Last thing, the web interface doesn't see the contents, and command hadoop
-fs -ls says the file is empty.
What am I doing wrong?
Thanks!
Lucas
On Sat, Feb 23, 2013 at 4:37 AM, Hemanth Yamijala
yhema
Could you please clarify, are you opening the file in your mapper code and
reading from there ?
Thanks
Hemanth
On Friday, February 22, 2013, Lucas Bernardi wrote:
Hello there, I'm trying to use hadoop map reduce to process an open file. The
writing process, writes a line to the file and syncs
which you are planning to do on your data.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Tue, Feb 19, 2013 at 6:44 AM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Hi,
You could consider using sqoop. http://sqoop.apache.org/ there seemed to
be a SQL
, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Sorry. I did not read the mail correctly. I think the error is in how the
jar has been created. The classes start with root as wordcount_classes,
instead of org.
Thanks
Hemanth
On Tuesday, February 19, 2013, Hemanth Yamijala wrote:
Have
Hi,
In the past, some tests have been flaky. It would be good if you can search
jira and see whether this is a known issue. Else, please file it, and if
possible, provide a patch. :)
Regarding whether this will be a reliable build, it depends a little bit on
what you are going to use it for. For
What database is this ? Was hbase mentioned ?
On Monday, February 18, 2013, Mohammad Tariq wrote:
Hello Masoud,
You can use the Bulk Load feature. You might find it more
efficient than normal client APIs or using the TableOutputFormat.
The bulk load feature uses a MapReduce job
This seems to be related to the % used capacity at a datanode. The values
are computed for all the live datanodes, and the range / central limits /
deviations are computed based on a sorted list of the values.
Thanks
hemanth
On Thu, Feb 14, 2013 at 2:42 PM, Dhanasekaran Anbalagan
Can you please include the complete stack trace and not just the root.
Also, have you set fs.default.name to a hdfs location like
hdfs://localhost:9000 ?
Thanks
Hemanth
On Wednesday, February 13, 2013, Alex Thieme wrote:
Thanks for the prompt reply and I'm sorry I forgot to include the
Hi,
Hadoop On Demand is no longer supported with recent releases of Hadoop.
There is no separate user list for HOD related questions.
Which version of Hadoop are you using right now ?
Thanks
hemanth
On Wed, Feb 6, 2013 at 8:59 PM, Mehmet Belgin
mehmet.bel...@oit.gatech.eduwrote:
Hello
Hi,
Part answer: you can get the blacklisted tasktrackers using the command
line:
mapred job -list-blacklisted-trackers.
Also, I think that a blacklisted tasktracker becomes 'unblacklisted' if it
works fine after some time. Though I am not very sure about this.
Thanks
hemanth
On Wed, Jan 30,
not to close the FS.
It will go away when the task ends anyway.
Thx
On Thu, Jan 24, 2013 at 5:26 PM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Hi,
We are noticing a problem where we get a filesystem closed exception when
a map task is done and is finishing execution. By map task
.
On Fri, Jan 25, 2013 at 6:56 AM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Hi,
We are noticing a problem where we get a filesystem closed exception
when a
map task is done and is finishing execution. By map task, I literally
mean
the MapTask class of the map reduce code
This may beof some use, about how maps are decided:
http://wiki.apache.org/hadoop/HowManyMapsAndReduces
Thanks
Hemanth
On Friday, January 25, 2013, jamal sasha wrote:
Hi.
A very very lame question.
Does numbers of mapper depends on the number of nodes I have?
How I imagine map-reduce is
Could you post the stack trace from the job logs. Also looking at the task
tracker logs on the failed nodes may help.
Thanks
Hemanth
On Friday, January 25, 2013, Terry Healy wrote:
Running hadoop-0.20.2 on a 20 node cluster.
When running a Map/Reduce job that uses several .jars loaded into
Hi,
We are noticing a problem where we get a filesystem closed exception when a
map task is done and is finishing execution. By map task, I literally mean
the MapTask class of the map reduce code. Debugging this we found that the
mapper is getting a handle to the filesystem object and itself
On top of what Bejoy said, just wanted to add that when you submit a job to
Hadoop using the hadoop jar command, the jars which you reference in the
command on the edge/client node will be picked up by Hadoop and made
available to the cluster nodes where the mappers and reducers run.
Thanks
Hi,
Please note that you are referring to a very old version of Hadoop. the
current stable release is Hadoop 1.x. The API has changed in 1.x. Take a
look at the wordcount example here:
http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html#Example%3A+WordCount+v2.0
But, in principle your
();
}
output.collect(key, new IntWritable(sum));
}
}
On Mon, Jan 21, 2013 at 8:29 PM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Hi,
Please note that you are referring to a very old version of Hadoop. the
current stable release is Hadoop 1.x. The API has changed
Hi,
Not sure how to do it using MRUnit, but should be possible to do this using
a mocking framework like Mockito or EasyMock. In a mapper (or reducer),
you'd use the Context classes to get the DistributedCache files. By mocking
these to return what you want, you could potentially run a true unit
failed when I tried to open it.
Restarting the daemons helped.
I don't think this problem will come in a normal up-and-running production
cluster.
Thanks
hemanth
On Thu, Jan 17, 2013 at 9:48 AM, Hemanth Yamijala yhema...@thoughtworks.com
wrote:
At the place where you get the error, can you
You may get more updated information from folks at Yahoo!, but here is a
mail on hadoop-general mailing list that has some statistics:
http://www.mail-archive.com/general@hadoop.apache.org/msg05592.html
Please note it is a little dated, so things should be better now :-)
Thank
hemanth
On Tue,
in 2.x and trunk. Could you check if this
provides functionality you require - so we at least know there is new API
support in later versions ?
Thanks
Hemanth
On Mon, Jan 14, 2013 at 7:45 PM, Hemanth Yamijala yhema...@thoughtworks.com
wrote:
Hi,
No. I didn't find any reference to a working
Hi,
AFAIK, the mapred.local.dir property refers to a set of directories under
which different types of data related to mapreduce jobs are stored - for
e.g. intermediate data, localized files for a job etc. The working
directory for a mapreduce job is configured under a sub directory within
one of
Hi,
One place where I could find the capacity-scheduler.xml was from source -
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/resources.
AFAIK, the masters file is only used for starting the secondary namenode -
which has in 2.x been replaced by a
:
Thanks Hemanth
** **
I appreciate your response
Did you find any working example of it in use? It looks to me like I’d
still be tied to the old API
Thanks
Mike
** **
*From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
*Sent:* 14 January 2013 05:08
To add to that, log aggregation is a feature available with Hadoop 2.0
(where mapreduce is re-written to YARN). The functionality is available via
the History Server:
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html
Thanks
hemanth
On Sat, Jan 12, 2013
11, 2013 at 3:28 PM, Ivan Tretyakov itretya...@griddynamics.com
wrote:
Thanks for replies!
keep.failed.task.files set to false.
Config of one of the jobs attached.
On Fri, Jan 11, 2013 at 5:44 AM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Good point. Forgot that one
Queues in the capacity scheduler are logical data structures into which
MapReduce jobs are placed to be picked up by the JobTracker / Scheduler
framework, according to some capacity constraints that can be defined for a
queue.
So, given your use case, I don't think Capacity Scheduler is going to
Hemanth
On Thu, Jan 10, 2013 at 8:18 AM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Hi,
The directory name you have provided is
/data?/mapred/local/taskTracker/persona/jobcache/.
This directory is used by the TaskTracker (slave) daemons to localize job
files when the tasks
Is this the same as:
http://stackoverflow.com/questions/6137139/how-to-save-only-non-empty-reducers-output-in-hdfs?
i.e. LazyOutputFormat, etc. ?
On Thu, Jan 10, 2013 at 4:51 PM, Pratyush Chandra
chandra.praty...@gmail.com wrote:
Hi,
I am using s3n as file system. I do not wish to create
Good point. Forgot that one :-)
On Thu, Jan 10, 2013 at 10:53 PM, Vinod Kumar Vavilapalli
vino...@hortonworks.com wrote:
Can you check the job configuration for these ~100 jobs? Do they have
keep.failed.task.files set to true? If so, these files won't be deleted. If
it doesn't, it could
Hi,
The directory name you have provided is
/data?/mapred/local/taskTracker/persona/jobcache/.
This directory is used by the TaskTracker (slave) daemons to localize job
files when the tasks are run on the slaves.
Hence, I don't think this is related to the parameter
Hi,
I am not sure if your complaint is as much about the changing interfaces as
it is about documentation.
Please note that versions prior to 1.0 did not have stable interfaces as a
major requirement. Not by choice, but because the focus was on seemingly
more important functionality, stability,
Hi,
In Hadoop 1.0, I don't think this information is exposed. The
TaskInProgress is an internal class and hence cannot / should not be used
from client applications. The only way out seems to be to screen scrape the
information from the Jobtracker web UI.
If you can live with completed events,
From a user perspective, at a high level, the mapreduce package can be
thought of as having user facing client code that can be invoked, extended
etc as applicable from client programs.
The mapred package is to be treated as internal to the mapreduce system,
and shouldn't directly be used unless
Hi,
Are tasks being executed multiple times due to failures? Sorry, it was not
very clear from your question.
Thanks
hemanth
On Sat, Jan 5, 2013 at 7:44 PM, David Parks davidpark...@yahoo.com wrote:
Thinking here... if you submitted the task programmatically you should be
able to capture
If it is a small number, A seems the best way to me.
On Friday, December 28, 2012, Kshiva Kps wrote:
Which one is current ..
What is the preferred way to pass a small number of configuration
parameters to a mapper or reducer?
*A. *As key-value pairs in the jobconf object.
* *
Hi,
Firstly, I am talking about Hadoop 1.0. Please note that in Hadoop 2.x and
trunk, the Mapreduce framework is completely revamped to Yarn (
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html)
and you may need to look at different interfaces for building your own
This is a dated blog post, so it would help if someone with current HDFS
knowledge can validate it:
http://developer.yahoo.com/blogs/hadoop/posts/2010/05/scalability_of_the_hadoop_dist/
.
There is a bit about the RAM required for the Namenode and how to compute
it:
You can look at the 'Namespace
However, in the case Oleg is talking about the attempts are:
attempt_201212051224_0021_m_00_0
attempt_201212051224_0021_m_02_0
attempt_201212051224_0021_m_03_0
These aren't multiple attempts of a single task, are they ? They are
actually different tasks. If they were multiple
David,
You are using FileNameTextInputFormat. This is not in Hadoop source, as far
as I can see. Can you please confirm where this is being used from ? It
seems like the isSplittable method of this input format may need checking.
Another thing, given you are adding the same input format for all
out what I had done.
** **
Dave
** **
** **
*From:* Hemanth Yamijala [mailto:yhema...@thoughtworks.com]
*Sent:* Thursday, December 06, 2012 3:25 PM
*To:* user@hadoop.apache.org
*Subject:* Re: Map tasks processing some files multiple times
** **
David,
** **
You
Generally true for the framework config files, but some of the
supplementary features can be refreshed without restart. For e.g. scheduler
configuration, host files (for included / excluded nodes) ...
On Tue, Dec 4, 2012 at 5:33 AM, Cristian Cira
cmc0...@tigermail.auburn.eduwrote:
No. You will
Hi,
Little confused about where JNI comes in here (you mentioned this in your
original email). Also, where do you want to get the information for the
hadoop job ? Is it in a program that is submitting a job, or some sort of
monitoring application that is monitoring jobs submitted to a cluster by
Hi,
I've not tried this on S3. However, the directory mentioned in the
exception is based on the value of this particular configuration
key: mapreduce.jobtracker.staging.root.dir. This defaults
to ${hadoop.tmp.dir}/mapred/staging. Can you please set this to an S3
location and try ?
Thanks
, 2012 at 3:11 AM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Hi,
I've not tried this on S3. However, the directory mentioned in the
exception is based on the value of this particular configuration
key: mapreduce.jobtracker.staging.root.dir. This defaults
to ${hadoop.tmp.dir}/mapred
Hi,
Roughly, this information will be available under the 'Hadoop map task
list' page in the Mapreduce web ui (in Hadoop-1.0, which I am assuming is
what you are using). You can reach this page by selecting the running tasks
link from the job information page. The page has a table that lists all
Hi,
Would reducing the output from the map tasks solve the problem ? i.e. are
reducers slowing down because a lot of data is being shuffled ?
If that's the case, you could see if the map output size will reduce by
using the framework's combiner or an in-mapper combining technique.
Thanks
Can certainly do that. Indeed, if you set the number of reducers to 0,
the map output will be directly written to HDFS by the framework
itself. You may also want to look at
http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Task+Side-Effect+Files
to see some things that need to be taken
assumption correct?
Thanks,
Varad
On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala yhema...@gmail.comwrote:
Varad,
Looking at the code for the PiEstimator class which implements the
'pi' example, the two arguments are mandatory and are used *before*
the job is submitted for execution - i.e
Varad,
Looking at the code for the PiEstimator class which implements the
'pi' example, the two arguments are mandatory and are used *before*
the job is submitted for execution - i.e on the client side. In
particular, one of them (nSamples) is used not by the MapReduce job,
but by the client code
Hi,
Yes. By contract, all intermediate output with the same key goes to
the same reducer.
In your example, suppose of the two keys generated from the mapper,
one key goes to reducer 1 and the second goes to reducer 2, reducer 3
will not have any records to process and end without producing any
Can you please look at the jobtracker and tasktracker logs on nodes
where the task has been launched ? Also see if the job logs are
picking up anything. They'll probably give you clues on what is
happening.
Also, is HDFS ok ? i.e. are you able to read files already loaded etc.
Thanks
hemanth
On
One thing to be careful about is paths of dependent libraries or
executables like streaming binaries. In pseudo distributed mode, since all
processes are looking on the same machine, it is likely that they will find
paths that are really local to only the machine where the job is being
launched
Hi,
When do you know the keys to ignore ? You mentioned after the map stage
.. is this at the end of each map task, or at the end of all map tasks ?
Thanks
hemanth
On Fri, Sep 14, 2012 at 4:36 PM, Aseem Anand aseem.ii...@gmail.com wrote:
Hi,
Is there anyway I can ignore all keys except a
Hi,
Task assignment takes data locality into account first and not block
sequence. In hadoop, tasktrackers ask the jobtracker to be assigned tasks.
When such a request comes to the jobtracker, it will try to look for an
unassigned task which needs data that is close to the tasktracker and will
Could you please review your configuration to see if you are pointing to
the right namenode address ? (This will be in core-site.xml)
Please paste it here so we can look for clues.
Thanks
hemanth
On Tue, Sep 11, 2012 at 9:25 PM, yogesh dhari yogeshdh...@live.com wrote:
Hi all,
I am running
)
But, it didn't work like that.
What this is happening ?
Is there any documents about this ?
What part of the source code is doing that ?
Regards,
Hiroyuki
On Tue, Sep 11, 2012 at 11:27 PM, Hemanth Yamijala
yhema...@thoughtworks.com wrote:
Hi,
Task assignment takes data locality
Hi,
I am not sure if there's any way to restrict the tasks to specific
machines. However, I think there are some ways of restricting to
number of 'slots' that can be used by the job.
Also, not sure which version of Hadoop you are on. The
capacityscheduler
Hi,
You could check DistributedCache (
http://hadoop.apache.org/common/docs/stable/mapred_tutorial.html#DistributedCache).
It would allow you to distribute data to the nodes where your tasks are run.
Thanks
Hemanth
On Mon, Sep 10, 2012 at 3:27 PM, Sigurd Spieckermann
Hi,
Responses inline to some points.
On Tue, Sep 11, 2012 at 7:26 AM, Elaine Gan elaine-...@gmo.jp wrote:
Hi,
I'm new to hadoop and i've just played around with map reduce.
I would like to check if my understanding to hadoop is correct and i
would appreciate if anyone could correct me if
Harsh,
Could IsolationRunner be used here. I'd put up a patch for HADOOP-8765,
after applying which IsolationRunner works for me. Maybe we could use it to
re-run the map task that's failing and debug.
Thanks
hemanth
On Thu, Sep 6, 2012 at 9:42 PM, Harsh J ha...@cloudera.com wrote:
Protobuf
Hi,
The path
/tmp/hadoop-pat/mapred/local/archive/-4686065962599733460_1587570556_150738331/snip
is a location used by the tasktracker process for the 'DistributedCache' -
a mechanism to distribute files to all tasks running in a map reduce job. (
Though I agree with others that it would probably be easier to get Hadoop
up and running on Unix based systems, couldn't help notice that this path:
\tmp \hadoop-upendyal\mapred\staging\upendyal-1075683580\.staging
seems to have a space in the first component i.e '\tmp ' and not '\tmp'. Is
that
Hi,
If you are getting the LocalFileSystem, you could try by putting
core-site.xml in a directory that's there in the classpath for the
Tomcat App (or include such a path in the classpath, if that's
possible)
Thanks
hemanth
On Mon, Sep 3, 2012 at 4:01 PM, Visioner Sadak visioner.sa...@gmail.com
Hi,
Is there a reason why Yarn's directory paths are not defaulting to be
relative to hadoop.tmp.dir.
For e.g. yarn.nodemanager.local-dirs defaults to /tmp/nm-local-dir.
Could it be ${hadoop.tmp.dir}/nm-local-dir instead ? Similarly for the
log directories, I guess...
Thanks
hemanth
Hi,
You are right that a change to mapred.tasktracker.reduce.tasks.maximum will
require a restart of the tasktrackers. AFAIK, there is no way of modifying
this property without restarting.
On a different note, could you see if the amount of intermediate data can
be reduced using a combiner, or
1 - 100 of 173 matches
Mail list logo