Hello Again:
I'm currently running Hadoop with various Client objects in the Map phase.
A given Axis services provides the class of the Client to be used in this
situation, which runs the call over the wire to the provided URL and
translates the objects returned into Writable objects.
When I use
One more little question, why Hadoop streaming is designed in this way to use 2
different options to do the same thing (i.e. control the reduce number)? What's
the point here?
Thanks
--- On Fri, 7/18/08, Arun C Murthy <[EMAIL PROTECTED]> wrote:
From: Arun C Murthy <[EMAIL PROTECTED]>
Subject: Re
On Jul 18, 2008, at 4:53 PM, Steve Gao wrote:
Hi All,
I am using Hadoop Streaming. I am confused by streaming
options: -file and -CacheFile. Seems that they mean the same thing,
right?
The difference is that -file will 'ship' your file (local file) to
the cluster, while -cachefile
Each split should list the union of locations for all the splits in
the composite split. Unfortunately, it's not weighted. -C
On Jul 17, 2008, at 6:40 PM, Christian Kunz wrote:
When specifying multiple input directories for the
CompositeInputFormat,
is there any deterministic selection wher
Hi All,
I am using Hadoop Streaming. I am confused by streaming options: -file and
-CacheFile. Seems that they mean the same thing, right?
Another misleading options are : -NumReduceTasks and -jobconf
mapred.reduce.tasks. Both are used to control (or give hit to) the number of
reducer
Charles,
The right forum for Pig is [EMAIL PROTECTED], I'm
redirecting you there... good luck!
Arun
On Jul 18, 2008, at 11:51 AM, charles du wrote:
Hi:
Just start learning hadoop and pig latin. How can I get the number of
elements in a data bag?
For example, a data bag like follow has f
unless you have a gigantic number of items with the same id, this is
straightforward. have a mapper emit items of the form:
key=id, value = type,timestamp
and your reducer will then see all ids that have the same value together.
it is then a simple matter to process all items with the same id.
well here is the problem I'm trying to solve,
I have a data set that looks like this:
IDtype Timestamp
A1X 1215647404
A2X 1215647405
A3X 1215647406
A1 Y 1215647409
I want to count how many A1 Y, show up within 5 seconds of an A1 X
I was planning to have the data so
Hi:
Just start learning hadoop and pig latin. How can I get the number of
elements in a data bag?
For example, a data bag like follow has four elements.
B= {1, 2, 3, 5}
I tried C = COUNT(B), it did not work. Thanks.
--
tp
Seems that they mean the same thing, right?
Another misleading options are : -NumReduceTasks and -jobconf
mapred.reduce.tasks. Both are used to control (or give hit to) the number of
reducers.
I am seeing an odd mix of errors in a job we have running on a particular
cluster of machines.
Has anyone seen this before and what is actually the problem?
We are running linux (Centos51, on 8 way xeons, with all disks under raid 5)
and GigE switches between the machines.
The namenode machine d
I am down as well.
I'm trying to re balance my cluster as I've added to more nodes.
When I run balancer with the default threshold I am seeing timeouts in
the logs:
2008-07-18 09:50:46,636 INFO org.apache.hadoop.dfs.Balancer: Decided to
move block -8432927406854991437 with a length of 128 MB bytes from
10.11.6.234:5
I'm having the same problem :-/ Maps are going fine while reduce phase
stalls on 9-16%, and then resumes after a loong while (30-40 minutes).
I'm using hadoop 0.16.0 (r618351) and wordcount hadoop-example... next
week I'll try with a newer hadoop version (perhaps trunk) to see if I
can reproduce t
Is it possible that using too many mappers causes issues in Hadoop 0.17.1? I
have an input data directory with 100 files in it. I am running a job that
takes these files as input. When I set "-jobconf mapred.map.tasks=200" in
the job invocation, its seems like the mappers received "empty" inputs (t
I'm not sure if this is useful info, but I used both the Sun and the IBM JDK
under Linux to run version 0.16.iForget of Hadoop, without any problems. I
did some brief performance testing, didn't see any significant difference,
then we switched over to the Sun JDK exclusively as per the recommendat
Hi, I followed the instructions from
http://hayesdavis.net/2008/06/14/running-hadoop-on-windows/ to install Hadoop
0.17.1 on my Windows XP computer, whose computer name is AMBER, and the current
user name is User. I installed CygWin on G:\. I have verified ssh and
bin/hadoop version work fine.
Yes. I am interested.> Date: Fri, 18 Jul 2008 05:59:33 -0700> From: [EMAIL
PROTECTED]> Subject: New York user group?> To: core-user@hadoop.apache.org> >
Please let me know if you would be interested in joining NY Hadoop user group
if one existed. > > I know about 5-6 people in New York City run
The Hadoop documentation says "Sun's JDK must be used", this message is post to
make sure that there is official statement about this.
Please let me know if you would be interested in joining NY Hadoop user group
if one existed.
I know about 5-6 people in New York City running Hadoop. I am sure there are
many more.
Let me know. If there is some interest, I will try to put together first
meeting.
thanks
-Alex
Hi,
I have written client application in java, which writes Apache log
data to HDFS Cluster. When the HDFS cluster is brought down, the client
attempts to connect to the Cluster. I have a few issues regarding the
same.
1) When the HDFS cluster is brought down, the Application does not get
an
21 matches
Mail list logo