This is the way I've been doing it, but this requires cluster restart when
each additional .jar is added, and certainly is not suitable for multiple
users using the cluster independently, especially when the .jars in concern
contain user code and not just standard libraries.
Thanks,
Mikhail
On 4/
Edward,
testDU() writes a 32K file to the local fs and then verifies whether the value
reported by du
changes exactly to the amount written.
Although this is true for most block oriented file systems it might not be true
for some.
I suspect that in your case the file is written to tmpfs, which
Almost... The videos and slides are up (as of yesterday) but there appears
to be an ACL problem with the videos.
http://developer.yahoo.com/blogs/hadoop/2008/04/hadoop_summit_slides_and_video.html
Jeremy
On 4/17/08, wuqi <[EMAIL PROTECTED]> wrote:
>
> Are the videos and slides available now?
>
>
Hi, community.
In my local computer (CentOS 5), TestDU.testDU() throws assertionfailederror.
What is the "4096 byte"?
Testsuite: org.apache.hadoop.fs.TestDU
Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 5.143 sec
Testcase: testDU took 5.138 sec
FAILED
expected:<32768> but was:<36864
Mikhail Bautin wrote:
Specifically, I just need a way to alter the child JVM's classpath via
JobConf, without having the framework copy anything in and out of HDFS,
because all my files are already accessible from all nodes. I see how to do
that by adding a couple of lines to TaskRunner's run()
Hello,
We are using Hadoop here at Stony Brook University to power the
next-generation text analytics backend for www.textmap.com. We also have an
NFS partition that is mounted on all machines of our 100-node cluster. I
found it much more convenient to store manually created files (e.g.
configur
You can try setting fs.default.name to "0.0.0.0:8020" on the NameNode
and it might listen on all the interface (there should really be a
better way to specify bind address).
But that mostly wont solve all the problems for accessing from a
different network. The problem is that client use the
Hi - I have a situation that I cannot seem to get good answer myself. I
am using 0.1.6.2.
Basically, I have two jobs that I run in order from the same java driver
process.
I would like to run the first job to run on all the compute hosts in the
cluster (which is by default) and then, I
Here's what I am trying to do.
I get a set of huge text files that has many XML docs contained within one
file. Now, within each of those xml docs within that one huge text file, it
has some junk in the xml, so I want to clean it up basically.
My custom input file format doesn't split the
Having one file per key value has several bad consequences:
A) you are essentially doing a sort, but inefficiently. Better to have the
framework do it well even if you really want this.
B) having one file per key leads (often, not always) to having many files.
This means that subsequent steps a
Hi.
Thanks for all of your responses -- it has been very helpful indeed. I am
working on the filename as suggested and will ask more questions if I get stuck.
I do need one file per key/value. I don't want to sit and collect then create
the output file.
Is this something bad based on thi
Well.. Kayla specifically mentioned that he wants one file per key/value..
Kayla should clarify this..
> -Original Message-
> From: Ted Dunning [mailto:[EMAIL PROTECTED]
> Sent: Friday, April 18, 2008 11:54 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Map Intermediate key/value pa
Yes, but Kayla is likely misguided in this respect.
(my apologies for sounding doctrinaire)
On 4/18/08 11:08 AM, "Devaraj Das" <[EMAIL PROTECTED]> wrote:
> Ted, note that Kayla wants one file per output key/value.
>
>> -Original Message-
>> From: Ted Dunning [mailto:[EMAIL PROTECTED]
Ted, note that Kayla wants one file per output key/value.
> -Original Message-
> From: Ted Dunning [mailto:[EMAIL PROTECTED]
> Sent: Friday, April 18, 2008 11:20 PM
> To: core-user@hadoop.apache.org
> Subject: Re: Map Intermediate key/value pairs written to file system
>
>
> Isn't this
Isn't this just what Hadoop does when you set numReduces = 0?
On 4/18/08 10:45 AM, "Devaraj Das" <[EMAIL PROTECTED]> wrote:
> Within a task you can get the taskId (which are unique). Define "public void
> configure(JobConf job)" and in that get the taskId by doing
> job.get("mapred.task.id") ).
Within a task you can get the taskId (which are unique). Define "public void
configure(JobConf job)" and in that get the taskId by doing
job.get("mapred.task.id") ).
Now create filenames starting with that as the prefix and maybe a
monotonically increasing integer as the suffix (defined as a stat
Hi.
I don't know how to create unique individual file names for each mapper's
key/value pairs. How do you create individual files per mappers key/value
pairs so they don't overwrite one another?
I.e how do you create a new file each time and use that code for all the
mappers and not have each
When there are non daemon threads, JMX threads being our #1 cause, the
jvm will not exit with out help.
This is in TaskTracker.java,
in 0.16.0, this is line 2088, in the finally clause of Child.main
LogManager.shutdown();
System.exit( 0 ); // Force the jvm to exit even if i
On Apr 17, 2008, at 11:20 PM, Sridhar Raman wrote:
I am new to MapReduce and Hadoop, and I have managed to find my way
through
with a few programs. But I still have some doubts that are constantly
clinging onto me. I am not too sure whether these are basic
doubts, or just
some documentat
I am new to hadoop. I would like to put a red hat linux box and window xp box
(installed cygwin with openssh) in the same cluster. Both work in standalone
mode and they can ping each other. But ssh does not work on either way. I
suppose ssh working is the pre-requirement. Can anyone give a hand
Hi
In my setup I have a cluster in witch each server has two network
interfaces on for hadoop network traffic (lets call it A) and one for
traffic to the rest of the network (lets call it B).
Until now I only needed to make the nodes communicate with the master
and vice-versa (throu
Thanks for the quick reply. I will look into that and post further questions.
I'm not familiar with the API, so I may have more questions.
I'm assuming I will have to take out the output.collect so that it doesn't try
to create that end file in addition to the individual files I will force/creat
Jason, didn't get that. The jvm should exit naturally even without calling
System.exit. Where exactly did you insert the System.exit? Please clarify.
Thanks!
> -Original Message-
> From: Jason Venner [mailto:[EMAIL PROTECTED]
> Sent: Friday, April 18, 2008 6:48 PM
> To: core-user@hadoop
Will your requirement be addressed if, from within the map method, you
create a sequence file using SequenceFile.createWriter api, write a
key/value using the writer's append(key,value) API and then close the file
? You can do this for every key/value.
Pls have a look at createWriter APIs and the
We have terrible issues with threads in the JVM's holding down resources
and causing the compute nodes to run out of memory and lock up. We in
fact patch the JobTracker to cause the mapper/reduce jvm to System.exit,
to ensure that the resources are freed.
This is particularly a problem for map
Sounds like you also hit this problem:
https://issues.apache.org/jira/browse/HADOOP-2669
Runping
> -Original Message-
> From: Luca [mailto:[EMAIL PROTECTED]
> Sent: Friday, April 18, 2008 1:21 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Lease expired on open file
>
> dhruba Bor
Hi
I have no reduces. I would like to directly write my map results while they are
produced after each map has completed to disk. I don't want to collect then
write to output.
If I wanted to directly write my map output 1-by-1 (intermediate key/value
pairs) after each map completes into indiv
Hi
I have no reduces. I would like to directly write my map results while they are
produced after each map has completed to disk. I don't want to collect then
write to output.
If I wanted to directly write my map output 1-by-1 (intermediate key/value
pairs) after each map completes into indiv
Chris K Wensel wrote:
you cannot have underscores in a bucket name. it freaks out java.net.URI.
freaks out DNS, too, which is why the java.net classes whine. minus
signs should work
--
Steve Loughran http://www.1060.org/blogxter/publish/5
Author: Ant in Action http
dhruba Borthakur wrote:
The DFSClient has a thread that renews leases periodically for all files
that are being written to. I suspect that this thread is not getting a
chance to run because the gunzip program is eating all the CPU. You
might want to put in a Sleep() after every few seconds on unz
30 matches
Mail list logo