Hadoop with Axis

2008-07-18 Thread Kylie McCormick
Hello Again: I'm currently running Hadoop with various Client objects in the Map phase. A given Axis services provides the class of the Client to be used in this situation, which runs the call over the wire to the provided URL and translates the objects returned into Writable objects. When I use

Re: [Streaming]What is the difference between streaming options: -file and -CacheFile ?

2008-07-18 Thread Steve Gao
One more little question, why Hadoop streaming is designed in this way to use 2 different options to do the same thing (i.e. control the reduce number)? What's the point here? Thanks --- On Fri, 7/18/08, Arun C Murthy <[EMAIL PROTECTED]> wrote: From: Arun C Murthy <[EMAIL PROTECTED]> Subject: Re

Re: [Streaming]What is the difference between streaming options: -file and -CacheFile ?

2008-07-18 Thread Arun C Murthy
On Jul 18, 2008, at 4:53 PM, Steve Gao wrote: Hi All, I am using Hadoop Streaming. I am confused by streaming options: -file and -CacheFile. Seems that they mean the same thing, right? The difference is that -file will 'ship' your file (local file) to the cluster, while -cachefile

Re: Data locality with CompositeInputFormat

2008-07-18 Thread Chris Douglas
Each split should list the union of locations for all the splits in the composite split. Unfortunately, it's not weighted. -C On Jul 17, 2008, at 6:40 PM, Christian Kunz wrote: When specifying multiple input directories for the CompositeInputFormat, is there any deterministic selection wher

[Streaming]What is the difference between streaming options: -file and -CacheFile ?

2008-07-18 Thread Steve Gao
Hi All,     I am using Hadoop Streaming. I am confused by streaming options: -file and -CacheFile. Seems that they mean the same thing, right?     Another misleading options are : -NumReduceTasks and -jobconf mapred.reduce.tasks. Both are used to control (or give hit to) the number of reducer

Re: [PIG LATIN] how to get the size of a data bag

2008-07-18 Thread Arun C Murthy
Charles, The right forum for Pig is [EMAIL PROTECTED], I'm redirecting you there... good luck! Arun On Jul 18, 2008, at 11:51 AM, charles du wrote: Hi: Just start learning hadoop and pig latin. How can I get the number of elements in a data bag? For example, a data bag like follow has f

Re: can hadoop read files backwards

2008-07-18 Thread Miles Osborne
unless you have a gigantic number of items with the same id, this is straightforward. have a mapper emit items of the form: key=id, value = type,timestamp and your reducer will then see all ids that have the same value together. it is then a simple matter to process all items with the same id.

Re: can hadoop read files backwards

2008-07-18 Thread Elia Mazzawi
well here is the problem I'm trying to solve, I have a data set that looks like this: IDtype Timestamp A1X 1215647404 A2X 1215647405 A3X 1215647406 A1 Y 1215647409 I want to count how many A1 Y, show up within 5 seconds of an A1 X I was planning to have the data so

[PIG LATIN] how to get the size of a data bag

2008-07-18 Thread charles du
Hi: Just start learning hadoop and pig latin. How can I get the number of elements in a data bag? For example, a data bag like follow has four elements. B= {1, 2, 3, 5} I tried C = COUNT(B), it did not work. Thanks. -- tp

What is the difference between streaming options: -file and -CacheFile ?

2008-07-18 Thread Steve Gao
Seems that they mean the same thing, right? Another misleading options are : -NumReduceTasks and -jobconf mapred.reduce.tasks. Both are used to control (or give hit to) the number of reducers.

help request: 0.16.0 java.io.IOException: Filesystem closed & org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find task_....

2008-07-18 Thread Jason Venner
I am seeing an odd mix of errors in a job we have running on a particular cluster of machines. Has anyone seen this before and what is actually the problem? We are running linux (Centos51, on 8 way xeons, with all disks under raid 5) and GigE switches between the machines. The namenode machine d

RE: New York user group?

2008-07-18 Thread Alex Newman
I am down as well.

Timeouts when running balancer

2008-07-18 Thread David J. O'Dell
I'm trying to re balance my cluster as I've added to more nodes. When I run balancer with the default threshold I am seeing timeouts in the logs: 2008-07-18 09:50:46,636 INFO org.apache.hadoop.dfs.Balancer: Decided to move block -8432927406854991437 with a length of 128 MB bytes from 10.11.6.234:5

Re: Reduce stalling

2008-07-18 Thread brainstorm
I'm having the same problem :-/ Maps are going fine while reduce phase stalls on 9-16%, and then resumes after a loong while (30-40 minutes). I'm using hadoop 0.16.0 (r618351) and wordcount hadoop-example... next week I'll try with a newer hadoop version (perhaps trunk) to see if I can reproduce t

using too many mappers?

2008-07-18 Thread Ashish Venugopal
Is it possible that using too many mappers causes issues in Hadoop 0.17.1? I have an input data directory with 100 files in it. I am running a job that takes these files as input. When I set "-jobconf mapred.map.tasks=200" in the job invocation, its seems like the mappers received "empty" inputs (t

Re: Is Hadoop compatiable with IBM JDK 1.5 64 bit for AIX 5?

2008-07-18 Thread Colin Freas
I'm not sure if this is useful info, but I used both the Sun and the IBM JDK under Linux to run version 0.16.iForget of Hadoop, without any problems. I did some brief performance testing, didn't see any significant difference, then we switched over to the Sun JDK exclusively as per the recommendat

Hadoop 0.17.1 namenode service can't start on windows XP.

2008-07-18 Thread Amber
Hi, I followed the instructions from http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/ to install Hadoop 0.17.1 on my Windows XP computer, whose computer name is AMBER, and the current user name is User. I installed CygWin on G:\. I have verified ssh and bin/hadoop version work fine.

RE: New York user group?

2008-07-18 Thread Leon Yu
Yes. I am interested.> Date: Fri, 18 Jul 2008 05:59:33 -0700> From: [EMAIL PROTECTED]> Subject: New York user group?> To: core-user@hadoop.apache.org> > Please let me know if you would be interested in joining NY Hadoop user group if one existed. > > I know about 5-6 people in New York City run

Is Hadoop compatiable with IBM JDK 1.5 64 bit for AIX 5?

2008-07-18 Thread Amber
The Hadoop documentation says "Sun's JDK must be used", this message is post to make sure that there is official statement about this.

New York user group?

2008-07-18 Thread Alex Dorman
Please let me know if you would be interested in joining NY Hadoop user group if one existed. I know about 5-6 people in New York City running Hadoop. I am sure there are many more. Let me know. If there is some interest, I will try to put together first meeting. thanks -Alex

No exception received by application on HDFS restart

2008-07-18 Thread Babu, Suresh
Hi, I have written client application in java, which writes Apache log data to HDFS Cluster. When the HDFS cluster is brought down, the client attempts to connect to the Cluster. I have a few issues regarding the same. 1) When the HDFS cluster is brought down, the Application does not get an