Are there any plans to build redundancy/failover support for the Job
Tracker and Name Node components in Hadoop? Let's take the current
scenario:
1) A data/cpu intensive job is submitted to a Hadoop cluster of 10 machines.
2) Half-way through the job execution, the Job Tracker or Name Node fails.
Hello all,
Somewhat of a an off-topic related question, but I know there are
Hadoop + EC2 users here. Does anyone know if there is a programmatic
API to get find out how many machine time hours have been used by a
Hadoop cluster (or anything) running on EC2? I know that you can log
into the EC2
at 4:59 PM, Ryan LeCompte lecom...@gmail.com wrote:
Hello all,
Somewhat of a an off-topic related question, but I know there are
Hadoop + EC2 users here. Does anyone know if there is a programmatic
API to get find out how many machine time hours have been used by a
Hadoop cluster (or anything
Even better! I'll try this out tomorrow.
Thanks,
Ryan
On Dec 9, 2008, at 10:36 PM, Aaron Kimball [EMAIL PROTECTED] wrote:
Note also that cat foo | bin/hadoop fs -put - some/hdfs/path will
use
stdin.
- Aaron
On Mon, Dec 8, 2008 at 5:56 PM, Ryan LeCompte [EMAIL PROTECTED]
wrote:
Just
Hello all,
I normally upload files into hadoop via bin/hadoop fs -put file dest.
However, is there a way to somehow stream data into Hadoop?
For example, I'd love to do something like this:
zcat xxx HADOOP_HDFS_DESTINATION
This would save me a ton of time since I don't have to first unpack
Just what I need -- thanks!
On Mon, Dec 8, 2008 at 7:31 PM, Alex Loddengaard [EMAIL PROTECTED] wrote:
This should answer your questions:
http://wiki.apache.org/hadoop/MountableHDFS
Alex
On Mon, Dec 8, 2008 at 2:19 PM, Ryan LeCompte [EMAIL PROTECTED] wrote:
Hello all,
I normally upload
For what it's worth, I started seeing these when I upgraded to 0.19. I
was using 10 reduces, but changed it to 30 reduces for my job and now
I don't see these errors any more.
Thanks,
Ryan
On Fri, Dec 5, 2008 at 2:44 PM, Sriram Rao [EMAIL PROTECTED] wrote:
Hi,
When a task tracker kills a
I've tried running the bin/hadoop balance command since I recently
added a new node to the Hadoop cluster. I noticed the following output
in the beginning:
08/12/03 10:26:35 INFO balancer.Balancer: Will move 10 GBbytes in this iteration
Dec 3, 2008 10:26:35 AM 0 0 KB
Hello all,
I'm using Hadoop 0.19 and just discovered that it has no problems
processing .tgz files that contain text files. I was under the
impression that it wouldn't be able to break a .tgz file up into
multiple maps, but instead just treat it as 1 map per .tgz file. Was
this a recent change or
I believe I spoke a little too soon. Looks like Hadoop supports .gz
files, not .tgz. :-)
On Mon, Dec 1, 2008 at 10:46 AM, Ryan LeCompte [EMAIL PROTECTED] wrote:
Hello all,
I'm using Hadoop 0.19 and just discovered that it has no problems
processing .tgz files that contain text files. I
can't guarantee that a reducer (or mapper for that matter) will be
executed exactly once unless you turn-off preemptive scheduling. but,
a distinct key gets sent to a single reducer, so yes, only one reducer
will see a particulat key + associated values
Miles
2008/11/3 Ryan LeCompte [EMAIL
: Ryan LeCompte [EMAIL PROTECTED]
To: core-user@hadoop.apache.org core-user@hadoop.apache.org
Sent: Monday, September 22, 2008 5:18:01 PM
Subject: Re: NotYetReplicated exceptions when pushing large files into HDFS
I've noticed that although I get a few of these exceptions, the file
:08 AM, Ryan LeCompte [EMAIL PROTECTED] wrote:
Hello all,
I'd love to be able to upload into HDFS very large files (e.g., 8 or
10GB), but it seems like my only option is to chop up the file into
smaller pieces. Otherwise, after a while I get NotYetReplication
exceptions while the transfer
Hello all,
I'm setting up a small 3 node hadoop cluster (1 node for
namenode/jobtracker and the other two for datanode/tasktracker). The
map tasks finish fine, but the reduce tasks are failing at about 30%
with an out of memory error. My guess is because the amount of data
that I'm crunching
to upgrade to a system with more memory.
-SM
On Sat, Sep 20, 2008 at 9:07 PM, Ryan LeCompte [EMAIL PROTECTED]
wrote:
Hello all,
I'm setting up a small 3 node hadoop cluster (1 node for
namenode/jobtracker and the other two for datanode/tasktracker). The
map tasks finish fine, but the reduce tasks
Hadoop is best suited for distributed processing across many machines
of large data sets. Most people use Hadoop to plow through large data
sets in an offline fashion. One approach that you can use is to use
Hadoop to process your data, then put it in an optimized form in HBase
(i.e., similar to
queries/updates and HBase queries/updates?
Have a nice day,
Camilo.
On Fri, Sep 12, 2008 at 1:55 PM, Ryan LeCompte [EMAIL PROTECTED] wrote:
Hadoop is best suited for distributed processing across many machines
of large data sets. Most people use Hadoop to plow through large data
sets
(SequenceFile.java:1879)
...
Any ideas?
Thanks,
Ryan
On Tue, Sep 9, 2008 at 12:36 AM, Ryan LeCompte [EMAIL PROTECTED] wrote:
Hello,
I'm attempting to use a SortedMapWritable with a LongWritable as the
key and a custom implementation of org.apache.hadoop.io.Writable as
the value. I notice
somehow doesn't share the same classpath as the
program that actually submits the job conf. Is this expected?
Thanks,
Ryan
On Tue, Sep 9, 2008 at 9:44 AM, Ryan LeCompte [EMAIL PROTECTED] wrote:
Okay, I think I'm getting closer but now I'm running into another problem.
First off, I created my own
Hello,
I'm attempting to use a SortedMapWritable with a LongWritable as the
key and a custom implementation of org.apache.hadoop.io.Writable as
the value. I notice that my program works fine when I use another
primitive wrapper (e.g. Text) as the value, but fails with the
following exception when
Hi Sayali,
Yes, you can submit a collection of files from HDFS as input to the
job. Please take a look at the WordCount example in the Map/Reduce
tutorial for an example:
http://hadoop.apache.org/core/docs/r0.18.0/mapred_tutorial.html#Example%3A+WordCount+v1.0
Ryan
On Sat, Sep 6, 2008 at 9:03
Hello,
I have a question regarding multiple output files that get produced as
a result of using multiple reduce tasks for a job (as opposed to only
one). If I'm using a custom writable and thus writing to a sequence
output, am I gauranteed that all of the day for a particular key will
appear in a
This clears up my concerns. Thanks!
Ryan
On Sep 6, 2008, at 2:17 PM, Owen O'Malley [EMAIL PROTECTED] wrote:
On Sep 6, 2008, at 9:35 AM, Ryan LeCompte wrote:
I have a question regarding multiple output files that get produced
as
a result of using multiple reduce tasks for a job
Hello,
Can a custom Writeable object used as a key/value contain other
writeables, like MapWriteable?
Thanks,
Ryan
Hello,
I was wondering if anyone has gotten far at all with getting Hadoop up
and running with EC2 + EBS? Any luck getting this to work in a way
that the HDFS runs on the EBS so that it isn't blown away every time
you bring up/down the EC2 Hadoop cluster? I'd like to experiment with
this next,
wrote about it and there was
a link
to it in another discussion you were part of.
Hope this helps,
J-D
On Fri, Sep 5, 2008 at 7:00 PM, Ryan LeCompte [EMAIL PROTECTED]
wrote:
Hello,
I was wondering if anyone has gotten far at all with getting Hadoop
up
and running with EC2 + EBS? Any
Hi Tom,
This clears up my questions.
Thanks!
Ryan
On Thu, Sep 4, 2008 at 9:21 AM, Tom White [EMAIL PROTECTED] wrote:
On Thu, Sep 4, 2008 at 1:46 PM, Ryan LeCompte [EMAIL PROTECTED] wrote:
I'm noticing that using bin/hadoop fs -put ... svn://... is uploading
multi-gigabyte files in ~64MB
Works great!
My only suggestion would be to modify the
/usr/local/hadoop-0.18.0/conf/hadoop-site.xml file to use hdfs://...
for the namenode address. Otherwise I constantly get warnings saying
that the syntax is deprecated any time I submit a job for execution or
interact with HDFS via bin/hadoop
and
submitting the code (even if it's not finished folks might like to
see it). Thanks.
Tom
Cheers
Tim
On Tue, Sep 2, 2008 at 2:22 PM, Ryan LeCompte [EMAIL PROTECTED] wrote:
Hi Tim,
Are you mostly just processing/parsing textual log files? How many
maps/reduces did you configure in your hadoop
[EMAIL PROTECTED] wrote:
Hi Ryan,
Just a heads up, if you require more than the 20 node limit, Amazon
provides a form to request a higher limit:
http://www.amazon.com/gp/html-forms-controller/ec2-request
Andrew
On Mon, Sep 1, 2008 at 10:43 PM, Ryan LeCompte [EMAIL PROTECTED] wrote:
Hello all
time finishing it (Or submit help?) - it is really very simple.
Cheers
Tim
On Tue, Sep 2, 2008 at 2:22 PM, Ryan LeCompte [EMAIL PROTECTED] wrote:
Hi Tim,
Are you mostly just processing/parsing textual log files? How many
maps/reduces did you configure in your hadoop-ec2-env.sh file? How
Actually not if you're using the s3:// as opposed to s3n:// ...
Thanks,
Ryan
On Tue, Sep 2, 2008 at 11:21 AM, James Moore [EMAIL PROTECTED] wrote:
On Mon, Sep 1, 2008 at 1:32 PM, Ryan LeCompte [EMAIL PROTECTED] wrote:
Hello,
I'm trying to upload a fairly large file (18GB or so) to my AWS S3
How can you ensure that the S3 buckets and EC2 instances belong to a
certain zone?
Ryan
On Tue, Sep 2, 2008 at 2:38 PM, Karl Anderson [EMAIL PROTECTED] wrote:
On 2-Sep-08, at 5:22 AM, Ryan LeCompte wrote:
Hi Tim,
Are you mostly just processing/parsing textual log files? How many
maps
I'd have to concatenate the files
into 1 file and somehow turn off splitting?
Ryan
On Wed, Sep 3, 2008 at 12:09 AM, Owen O'Malley [EMAIL PROTECTED] wrote:
On Sep 2, 2008, at 9:00 PM, Ryan LeCompte wrote:
Beginner's question:
If I have a cluster with a single node that has a max of 1 map/1
Hello,
I'm trying to upload a fairly large file (18GB or so) to my AWS S3
account via bin/hadoop fs -put ... s3://...
It copies for a good 15 or 20 minutes, and then eventually errors out
with a failed retry attempt (saying that it can't retry since it has
already written a certain number of
it at the jets3t web site. Make sure that
it's from the same version that your copy of Hadoop is using.
On Mon, Sep 1, 2008 at 1:32 PM, Ryan LeCompte [EMAIL PROTECTED] wrote:
Hello,
I'm trying to upload a fairly large file (18GB or so) to my AWS S3
account via bin/hadoop fs -put ... s3://...
It copies
Hello all,
I'm curious to see how many people are using EC2 to execute their
Hadoop cluster and map/reduce programs, and how many are using
home-grown datacenters. It seems like the 20 node limit with EC2 is a
bit crippling when one wants to process many gigabytes of data. Has
anyone found this
Hello all,
I'm new to Hadoop. I'm trying to write a small hadoop map/reduce
program that instead of reading/writing the primitive
LongWritable,IntWritable, etc. classes I'm using a custom object that
I wrote that implements the Writable interface. I'm still using a
LongWritable for the keys, but
38 matches
Mail list logo