Job Tracker/Name Node redundancy

2009-01-09 Thread Ryan LeCompte
Are there any plans to build redundancy/failover support for the Job Tracker and Name Node components in Hadoop? Let's take the current scenario: 1) A data/cpu intensive job is submitted to a Hadoop cluster of 10 machines. 2) Half-way through the job execution, the Job Tracker or Name Node fails.

EC2 Usage?

2008-12-18 Thread Ryan LeCompte
Hello all, Somewhat of a an off-topic related question, but I know there are Hadoop + EC2 users here. Does anyone know if there is a programmatic API to get find out how many machine time hours have been used by a Hadoop cluster (or anything) running on EC2? I know that you can log into the EC2

Re: EC2 Usage?

2008-12-18 Thread Ryan LeCompte
at 4:59 PM, Ryan LeCompte lecom...@gmail.com wrote: Hello all, Somewhat of a an off-topic related question, but I know there are Hadoop + EC2 users here. Does anyone know if there is a programmatic API to get find out how many machine time hours have been used by a Hadoop cluster (or anything

Re: Streaming data into Hadoop

2008-12-09 Thread Ryan LeCompte
Even better! I'll try this out tomorrow. Thanks, Ryan On Dec 9, 2008, at 10:36 PM, Aaron Kimball [EMAIL PROTECTED] wrote: Note also that cat foo | bin/hadoop fs -put - some/hdfs/path will use stdin. - Aaron On Mon, Dec 8, 2008 at 5:56 PM, Ryan LeCompte [EMAIL PROTECTED] wrote: Just

Streaming data into Hadoop

2008-12-08 Thread Ryan LeCompte
Hello all, I normally upload files into hadoop via bin/hadoop fs -put file dest. However, is there a way to somehow stream data into Hadoop? For example, I'd love to do something like this: zcat xxx HADOOP_HDFS_DESTINATION This would save me a ton of time since I don't have to first unpack

Re: Streaming data into Hadoop

2008-12-08 Thread Ryan LeCompte
Just what I need -- thanks! On Mon, Dec 8, 2008 at 7:31 PM, Alex Loddengaard [EMAIL PROTECTED] wrote: This should answer your questions: http://wiki.apache.org/hadoop/MountableHDFS Alex On Mon, Dec 8, 2008 at 2:19 PM, Ryan LeCompte [EMAIL PROTECTED] wrote: Hello all, I normally upload

Re: stack trace from hung task

2008-12-05 Thread Ryan LeCompte
For what it's worth, I started seeing these when I upgraded to 0.19. I was using 10 reduces, but changed it to 30 reduces for my job and now I don't see these errors any more. Thanks, Ryan On Fri, Dec 5, 2008 at 2:44 PM, Sriram Rao [EMAIL PROTECTED] wrote: Hi, When a task tracker kills a

Hadoop balancer

2008-12-03 Thread Ryan LeCompte
I've tried running the bin/hadoop balance command since I recently added a new node to the Hadoop cluster. I noticed the following output in the beginning: 08/12/03 10:26:35 INFO balancer.Balancer: Will move 10 GBbytes in this iteration Dec 3, 2008 10:26:35 AM 0 0 KB

Hadoop and .tgz files

2008-12-01 Thread Ryan LeCompte
Hello all, I'm using Hadoop 0.19 and just discovered that it has no problems processing .tgz files that contain text files. I was under the impression that it wouldn't be able to break a .tgz file up into multiple maps, but instead just treat it as 1 map per .tgz file. Was this a recent change or

Re: Hadoop and .tgz files

2008-12-01 Thread Ryan LeCompte
I believe I spoke a little too soon. Looks like Hadoop supports .gz files, not .tgz. :-) On Mon, Dec 1, 2008 at 10:46 AM, Ryan LeCompte [EMAIL PROTECTED] wrote: Hello all, I'm using Hadoop 0.19 and just discovered that it has no problems processing .tgz files that contain text files. I

Re: Question regarding reduce tasks

2008-11-03 Thread Ryan LeCompte
can't guarantee that a reducer (or mapper for that matter) will be executed exactly once unless you turn-off preemptive scheduling. but, a distinct key gets sent to a single reducer, so yes, only one reducer will see a particulat key + associated values Miles 2008/11/3 Ryan LeCompte [EMAIL

Re: NotYetReplicated exceptions when pushing large files into HDFS

2008-09-23 Thread Ryan LeCompte
: Ryan LeCompte [EMAIL PROTECTED] To: core-user@hadoop.apache.org core-user@hadoop.apache.org Sent: Monday, September 22, 2008 5:18:01 PM Subject: Re: NotYetReplicated exceptions when pushing large files into HDFS I've noticed that although I get a few of these exceptions, the file

Re: NotYetReplicated exceptions when pushing large files into HDFS

2008-09-22 Thread Ryan LeCompte
:08 AM, Ryan LeCompte [EMAIL PROTECTED] wrote: Hello all, I'd love to be able to upload into HDFS very large files (e.g., 8 or 10GB), but it seems like my only option is to chop up the file into smaller pieces. Otherwise, after a while I get NotYetReplication exceptions while the transfer

Reduce tasks running out of memory on small hadoop cluster

2008-09-20 Thread Ryan LeCompte
Hello all, I'm setting up a small 3 node hadoop cluster (1 node for namenode/jobtracker and the other two for datanode/tasktracker). The map tasks finish fine, but the reduce tasks are failing at about 30% with an out of memory error. My guess is because the amount of data that I'm crunching

Re: Reduce tasks running out of memory on small hadoop cluster

2008-09-20 Thread Ryan LeCompte
to upgrade to a system with more memory. -SM On Sat, Sep 20, 2008 at 9:07 PM, Ryan LeCompte [EMAIL PROTECTED] wrote: Hello all, I'm setting up a small 3 node hadoop cluster (1 node for namenode/jobtracker and the other two for datanode/tasktracker). The map tasks finish fine, but the reduce tasks

Re: Why can't Hadoop be used for online applications ?

2008-09-12 Thread Ryan LeCompte
Hadoop is best suited for distributed processing across many machines of large data sets. Most people use Hadoop to plow through large data sets in an offline fashion. One approach that you can use is to use Hadoop to process your data, then put it in an optimized form in HBase (i.e., similar to

Re: Why can't Hadoop be used for online applications ?

2008-09-12 Thread Ryan LeCompte
queries/updates and HBase queries/updates? Have a nice day, Camilo. On Fri, Sep 12, 2008 at 1:55 PM, Ryan LeCompte [EMAIL PROTECTED] wrote: Hadoop is best suited for distributed processing across many machines of large data sets. Most people use Hadoop to plow through large data sets

Re: Issue in reduce phase with SortedMapWritable and custom Writables as values

2008-09-09 Thread Ryan LeCompte
(SequenceFile.java:1879) ... Any ideas? Thanks, Ryan On Tue, Sep 9, 2008 at 12:36 AM, Ryan LeCompte [EMAIL PROTECTED] wrote: Hello, I'm attempting to use a SortedMapWritable with a LongWritable as the key and a custom implementation of org.apache.hadoop.io.Writable as the value. I notice

Re: Issue in reduce phase with SortedMapWritable and custom Writables as values

2008-09-09 Thread Ryan LeCompte
somehow doesn't share the same classpath as the program that actually submits the job conf. Is this expected? Thanks, Ryan On Tue, Sep 9, 2008 at 9:44 AM, Ryan LeCompte [EMAIL PROTECTED] wrote: Okay, I think I'm getting closer but now I'm running into another problem. First off, I created my own

Issue in reduce phase with SortedMapWritable and custom Writables as values

2008-09-08 Thread Ryan LeCompte
Hello, I'm attempting to use a SortedMapWritable with a LongWritable as the key and a custom implementation of org.apache.hadoop.io.Writable as the value. I notice that my program works fine when I use another primitive wrapper (e.g. Text) as the value, but fails with the following exception when

Re: Multiple input files

2008-09-06 Thread Ryan LeCompte
Hi Sayali, Yes, you can submit a collection of files from HDFS as input to the job. Please take a look at the WordCount example in the Map/Reduce tutorial for an example: http://hadoop.apache.org/core/docs/r0.18.0/mapred_tutorial.html#Example%3A+WordCount+v1.0 Ryan On Sat, Sep 6, 2008 at 9:03

Multiple output files

2008-09-06 Thread Ryan LeCompte
Hello, I have a question regarding multiple output files that get produced as a result of using multiple reduce tasks for a job (as opposed to only one). If I'm using a custom writable and thus writing to a sequence output, am I gauranteed that all of the day for a particular key will appear in a

Re: Multiple output files

2008-09-06 Thread Ryan LeCompte
This clears up my concerns. Thanks! Ryan On Sep 6, 2008, at 2:17 PM, Owen O'Malley [EMAIL PROTECTED] wrote: On Sep 6, 2008, at 9:35 AM, Ryan LeCompte wrote: I have a question regarding multiple output files that get produced as a result of using multiple reduce tasks for a job

Custom Writeables

2008-09-05 Thread Ryan LeCompte
Hello, Can a custom Writeable object used as a key/value contain other writeables, like MapWriteable? Thanks, Ryan

Hadoop + Elastic Block Stores

2008-09-05 Thread Ryan LeCompte
Hello, I was wondering if anyone has gotten far at all with getting Hadoop up and running with EC2 + EBS? Any luck getting this to work in a way that the HDFS runs on the EBS so that it isn't blown away every time you bring up/down the EC2 Hadoop cluster? I'd like to experiment with this next,

Re: Hadoop + Elastic Block Stores

2008-09-05 Thread Ryan LeCompte
wrote about it and there was a link to it in another discussion you were part of. Hope this helps, J-D On Fri, Sep 5, 2008 at 7:00 PM, Ryan LeCompte [EMAIL PROTECTED] wrote: Hello, I was wondering if anyone has gotten far at all with getting Hadoop up and running with EC2 + EBS? Any

Re: Hadoop EC2

2008-09-04 Thread Ryan LeCompte
Hi Tom, This clears up my questions. Thanks! Ryan On Thu, Sep 4, 2008 at 9:21 AM, Tom White [EMAIL PROTECTED] wrote: On Thu, Sep 4, 2008 at 1:46 PM, Ryan LeCompte [EMAIL PROTECTED] wrote: I'm noticing that using bin/hadoop fs -put ... svn://... is uploading multi-gigabyte files in ~64MB

Re: EC2 AMI for Hadoop 0.18.0

2008-09-04 Thread Ryan LeCompte
Works great! My only suggestion would be to modify the /usr/local/hadoop-0.18.0/conf/hadoop-site.xml file to use hdfs://... for the namenode address. Otherwise I constantly get warnings saying that the syntax is deprecated any time I submit a job for execution or interact with HDFS via bin/hadoop

Re: Hadoop EC2

2008-09-03 Thread Ryan LeCompte
and submitting the code (even if it's not finished folks might like to see it). Thanks. Tom Cheers Tim On Tue, Sep 2, 2008 at 2:22 PM, Ryan LeCompte [EMAIL PROTECTED] wrote: Hi Tim, Are you mostly just processing/parsing textual log files? How many maps/reduces did you configure in your hadoop

Re: Hadoop EC2

2008-09-02 Thread Ryan LeCompte
[EMAIL PROTECTED] wrote: Hi Ryan, Just a heads up, if you require more than the 20 node limit, Amazon provides a form to request a higher limit: http://www.amazon.com/gp/html-forms-controller/ec2-request Andrew On Mon, Sep 1, 2008 at 10:43 PM, Ryan LeCompte [EMAIL PROTECTED] wrote: Hello all

Re: Hadoop EC2

2008-09-02 Thread Ryan LeCompte
time finishing it (Or submit help?) - it is really very simple. Cheers Tim On Tue, Sep 2, 2008 at 2:22 PM, Ryan LeCompte [EMAIL PROTECTED] wrote: Hi Tim, Are you mostly just processing/parsing textual log files? How many maps/reduces did you configure in your hadoop-ec2-env.sh file? How

Re: Error while uploading large file to S3 via Hadoop 0.18

2008-09-02 Thread Ryan LeCompte
Actually not if you're using the s3:// as opposed to s3n:// ... Thanks, Ryan On Tue, Sep 2, 2008 at 11:21 AM, James Moore [EMAIL PROTECTED] wrote: On Mon, Sep 1, 2008 at 1:32 PM, Ryan LeCompte [EMAIL PROTECTED] wrote: Hello, I'm trying to upload a fairly large file (18GB or so) to my AWS S3

Re: Hadoop EC2

2008-09-02 Thread Ryan LeCompte
How can you ensure that the S3 buckets and EC2 instances belong to a certain zone? Ryan On Tue, Sep 2, 2008 at 2:38 PM, Karl Anderson [EMAIL PROTECTED] wrote: On 2-Sep-08, at 5:22 AM, Ryan LeCompte wrote: Hi Tim, Are you mostly just processing/parsing textual log files? How many maps

Re: JVM Spawning

2008-09-02 Thread Ryan LeCompte
I'd have to concatenate the files into 1 file and somehow turn off splitting? Ryan On Wed, Sep 3, 2008 at 12:09 AM, Owen O'Malley [EMAIL PROTECTED] wrote: On Sep 2, 2008, at 9:00 PM, Ryan LeCompte wrote: Beginner's question: If I have a cluster with a single node that has a max of 1 map/1

Error while uploading large file to S3 via Hadoop 0.18

2008-09-01 Thread Ryan LeCompte
Hello, I'm trying to upload a fairly large file (18GB or so) to my AWS S3 account via bin/hadoop fs -put ... s3://... It copies for a good 15 or 20 minutes, and then eventually errors out with a failed retry attempt (saying that it can't retry since it has already written a certain number of

Re: Error while uploading large file to S3 via Hadoop 0.18

2008-09-01 Thread Ryan LeCompte
it at the jets3t web site. Make sure that it's from the same version that your copy of Hadoop is using. On Mon, Sep 1, 2008 at 1:32 PM, Ryan LeCompte [EMAIL PROTECTED] wrote: Hello, I'm trying to upload a fairly large file (18GB or so) to my AWS S3 account via bin/hadoop fs -put ... s3://... It copies

Hadoop EC2

2008-09-01 Thread Ryan LeCompte
Hello all, I'm curious to see how many people are using EC2 to execute their Hadoop cluster and map/reduce programs, and how many are using home-grown datacenters. It seems like the 20 node limit with EC2 is a bit crippling when one wants to process many gigabytes of data. Has anyone found this

Reduce hanging with custom value objects?

2008-08-30 Thread Ryan LeCompte
Hello all, I'm new to Hadoop. I'm trying to write a small hadoop map/reduce program that instead of reading/writing the primitive LongWritable,IntWritable, etc. classes I'm using a custom object that I wrote that implements the Writable interface. I'm still using a LongWritable for the keys, but