Doug Cutting wrote:
Konstantin Shvachko wrote:
Imho we either need to correct it or remove.
+1
Doug
I added some pages there on namenode/jobtracker, etc, linking to the
faiover doc, which I didnt compare to the svn docs to see what was
correct. Perhaps the failover page could be set up
Kevin wrote:
Thank you for the suggestion. I looked at DFSClient. It appears that
chooseDataNode method decides which data node to connect to. Currently
it chooses the first non-dead data node returned by namenode, which
have sorted the nodes by proximity to the client. However,
chooseDataNode
Allen Wittenauer wrote:
On 8/6/08 11:52 AM, Otis Gospodnetic [EMAIL PROTECTED] wrote:
You can put the same hadoop-site.xml on all machines. Yes, you do want a
secondary NN - a single NN is a SPOF. Browser the archives a few days back to
find an email from Paul about DRBD (disk replication) to
Thanks. After alot of experimenting (and ofcourse, right before you sent
this reply) i figured it out. I also had to include the path to libhdfs.so
in my ld.so.conf and update it before i was able to succesfully compile
fuse_dfs. However when i try to mount the HDFS, it fails. I have tried both
While I configure and use the hadoop framework, it seems that the
DNS server
must be used to do hostname resolution (even if i configure the IP
address
but not hostname in config/slaves and config/masters file). Because
we don't
have local DNS server in our local ethernet, so i have to add
Hello,
can someone please explain oder point me to some documentation or
papers, where i can read well proven facts, why scaling a relational db
is so hard and scaling a document oriented db isnt?
So perhaps if i got lots of requests to my relational db, i would
duplicate it to several
Mork0075 wrote:
Hello,
can someone please explain oder point me to some documentation or
papers, where i can read well proven facts, why scaling a relational db
is so hard and scaling a document oriented db isnt?
http://labs.google.com/papers/bigtable.html
relational dbs are great for
Yes, I agree with you that it should be negotiated. That is namenode
provides an ordered list and the client can choose some based on its
own measurements. But I am afraid 0.17.1 does not provide easy
interface for this.
-Kevin
On Thu, Aug 7, 2008 at 3:40 AM, Steve Loughran [EMAIL PROTECTED]
One way to get all Unix commands to work as is is to mount hdfs as a normal
unix filesystem with either fuse-dfs (in contrib) or hdfs-fuse (on google
code).
Pete
On 8/6/08 5:08 PM, Mori Bellamy [EMAIL PROTECTED] wrote:
hey all,
often i find it would be convenient for me to run conventional
I have a large Hadoop streaming job that generally works fine,
but a few (2-4) of the ~3000 maps and reduces have problems.
To make matters worse, the problems are system-dependent (we run an a
cluster with machines of slightly different OS versions).
I'd of course like to debug these problems,
Can you also post you hadoop-site.xml and hadoop-default.xml?
-k
On Thu, Aug 7, 2008 at 3:52 AM, Mr.Thien [EMAIL PROTECTED] wrote:
Hi everyone,
I am trying to use hadoop.
I set up my computer (thientd-desktop) as master (jobtracker and
namenode). Two other computers: trunght-desktop and
Hello John,
On Thu, Aug 7, 2008 at 6:30 PM, John Heidemann [EMAIL PROTECTED] wrote:
I have a large Hadoop streaming job that generally works fine,
but a few (2-4) of the ~3000 maps and reduces have problems.
To make matters worse, the problems are system-dependent (we run an a
cluster with
On 28-Jul-08, at 6:33 PM, charles du wrote:
Hi:
I tried to run one of my map/reduce jobs on a cluster (hadoop 0.17.0).
I used 10 reducers. 9 of them returns quickly ( in a few seconds), but
one has been running for several hours, and still no sign of
completion. Do you know how I can debug it
you should use the web UI --each mapper / reducer can be inspected and there
is no need to ssh in.
Miles
2008/8/7 Karl Anderson [EMAIL PROTECTED]
On 28-Jul-08, at 6:33 PM, charles du wrote:
Hi:
I tried to run one of my map/reduce jobs on a cluster (hadoop 0.17.0).
I used 10 reducers. 9
Hadoop ships with a few example programs. One of these is join, which
I believe demonstrates map-side joins. I'm finding its usage
instructions a little impenetrable; could anyone send me instructions
that are more like type this then type this then type this?
Thanks in advance.
Cheers,
John
On Thu, Aug 7, 2008 at 4:25 PM, Pete Wyckoff [EMAIL PROTECTED] wrote:
Hi Sebastian,
Those 2 things are just warnings and shouldn't cause any problems. What
happens when you ls /mnt/hadoop ?
[EMAIL PROTECTED] fuse-dfs]# ls /mnt/hadoop
ls: /mnt/hadoop: Transport endpoint is not connected
This just means your classpath is not set properly, so when fuse-dfs uses
libhdfs to try and connect to your server, it cannot instantiate hadoop
objects.
I have a JIRA open to improve error messaging when this happens:
https://issues.apache.org/jira/browse/HADOOP-3918
If you use the
Hi there:
We would like to know what are the most likely causes of this sort of
error:
Exception closing
file
/data1/hdfs/tmp/person_url_pipe_59984_3405334/_temporary/_task_200807311534_0055_m_22_0/part-00022
java.io.IOException: Could not get block locations. Aborting...
at
Hey guys,
I would appreciate any feedback on this
Deepika
-Original Message-
From: Deepika Khera [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 06, 2008 5:39 PM
To: core-user@hadoop.apache.org
Subject: Distributed Lucene - from hadoop contrib
Hi,
I am planning to use
Kevin wrote:
Yes, I have looked at the block files and it matches what you said. I
am just wondering if there is some property or flag that would turn
this feature on, if it exists.
No. If you required this then you'd need to pad your data, but I'm not
sure why you'd ever require it.
http://wiki.apache.org/hadoop/DistributedLucene
and hadoop.contrib.index are two different things.
For information on hadoop.contrib.index, see the README file in the package.
I believe you can find code for http://wiki.apache.org/hadoop/DistributedLucene
at http://katta.wiki.sourceforge.net/.
Hi,
I am a new hadoop developer and am struggling to understand why I cannot pass
TupleWritable between a map and reduce function. I have modified the wordcount
example to demonstrate the issue. Also I am using hadoop 0.17.1.
package wordcount; import java.io.IOException; import java.util.*;
Sorry about the massive code chunk, I am not used to this mail client, I
attached the file instead.
On 8/7/08 4:18 PM, Michael Andrews [EMAIL PROTECTED] wrote:
Hi,
I am a new hadoop developer and am struggling to understand why I cannot pass
TupleWritable between a map and reduce function.
You need access to TupleWritable::setWritten(int). If you want to use
TupleWritable outside the join package, then you need to make this
(and probably related methods, like clearWritten(int)) public and
recompile.
Please file a JIRA if you think it should be more general. -C
On Aug 7,
On Thu, 07 Aug 2008 19:42:05 +0200, Leon Mergen wrote:
Hello John,
On Thu, Aug 7, 2008 at 6:30 PM, John Heidemann [EMAIL PROTECTED] wrote:
I have a large Hadoop streaming job that generally works fine,
but a few (2-4) of the ~3000 maps and reduces have problems.
To make matters worse, the
OK thanks for the information. I guess it seems strange to want to use
TupleWritable in this way, but this just seemed like the right thing to do this
based on the API docs. Is it more idiomatic to inherit from Writable when
processing structured data? Again, I am really new to the hadoop
hadoop 0.16.4
Why are mapred.reduce.tasks and mapred.map.tasks always showing up
as 2?
I have the same config on all nodes.
hadoop-site.xml contains the following parameters:
property
namemapred.map.tasks/name
value67/value
descriptionThe default number of map tasks per job.
Particularly if you know which types to expect in your structured
data, rolling your own Writable is strongly preferred to
TupleWritable. The latter serializes to a comically verbose format and
should only be used when the types and nesting depth are unknown. -C
On Aug 7, 2008, at 5:45 PM,
Hello,
Can someone point me out what are the extra tasks that need to be performed
in order to set up a cluster where nodes are spread over the Internet, in
different LANs?
Do I need to free any datanode/namenode ports? How do I get the datanodes to
know the valid namenode IP, and not something
29 matches
Mail list logo