I got hadoop 0.22.0 running with Windows. The most useful instructions
I found where these:
http://knowlspace.wordpress.com/2011/06/21/setting-up-hadoop-on-windows/
I was able to run examples grep, pi and WordCount.
Saw that 1.0 just came out. Downloaded it; installed; and tried it
out. Don't get so far as even the first example. tasktracker won't run:
2011-12-29 14:41:27,798 ERROR org.apache.hadoop.mapred.TaskTracker:
Can not start task tracker because java.io.IOException: Failed to set
permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate to 0700
The above may be this bug:
https://issues.apache.org/jira/browse/HADOOP-7682
I can see that it has something to do with permissions between my user
id and the account cyg_server that seems part of the standard way to get
sshd running on Windows.
The most obvious thing that's confusing is that, with this bug, why did
I get as far as I did with 0.22.0?
Went back to 0.22.0 and it still runs fine.
People claim that computers are deterministic machines - which is not
true. We get them to work well enough for a great many applications -
some even life-critical. But in fact computers are complex systems
where minute problems can cascade into complete failures of the system.
Pat
On 12/24/2011 3:56 PM, patf wrote:
Thanks Prashant.
That's was yesterday on Linux at work. It's Sat; I'm at home; and
trying out hadoop on Windows 7. The hadoop Windows isntall was
something of a pain-in-the-ass but I've got hadoop basically working
(but not computing yet).
Neither the grep nor the pi examples works. Here's the error I see in
my Windows installation (Windows 7 with a new cygwin install).
c:/PROGRA~2/Java/jdk1.6.0_25/bin/java -Xmx1000m
-Dhadoop.log.dir=D:\downloads\hadoop\logs
-Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=D:\downloads\hadoop\
-Dhadoop.id.str= -Dhadoop.root.logger=INFO,console
-Dhadoop.security.logger=INFO,console
-Djava.library.path=/cygdrive/d/downloads/hadoop/lib/native/Windows_7-x86-32
-Dhadoop.policy.file=hadoop-policy.xml
-Djava.net.preferIPv4Stack=true org.apache.hadoop.util.RunJar
hadoop-mapred-examples-0.22.0.jar grep input output dfs[a-z.]+
11/12/24 15:03:27 WARN conf.Configuration:
mapred.used.genericoptionsparser is deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
11/12/24 15:03:27 WARN mapreduce.JobSubmitter: No job jar file set.
User classes may not be found. See Job or Job#setJar(String).
11/12/24 15:03:27 INFO input.FileInputFormat: Total input paths to
process : 19
11/12/24 15:03:28 INFO mapreduce.JobSubmitter: number of splits:19
11/12/24 15:03:29 INFO mapreduce.Job: Running job: job_201112241459_0003
11/12/24 15:03:30 INFO mapreduce.Job: map 0% reduce 0%
11/12/24 15:03:33 INFO mapreduce.Job: Task Id :
attempt_201112241459_0003_m_000020_0, Status : FAILED
Error initializing attempt_201112241459_0003_m_000020_0:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=cyg_server, access=EXECUTE,
inode="system":MyId:supergroup:rwx------
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:205)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:161)
at
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:128)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4465)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkTraverse(FSNamesystem.java:4442)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2117)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.getFileInfo(NameNode.java:1022)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMeth
11/12/24 15:03:33 WARN mapreduce.Job: Error reading task
outputhttp://MyId-PC:50060/tasklog?plaintext=true&attemptid=attempt_201112241459_0003_m_000020_0&filter=stdout
user cyg_server? That's because I'm coming in under cygwin ssh.
Looks like I need either change the privs in the hadoop file system or
I need to extend more privs to cyg_server. Although there may be an
error (WARN) before I get to the cyg_server permissions crash?
Pat
On 12/23/2011 5:21 PM, Prashant Kommireddi wrote:
Seems like you do not have "/user/MyId/input/conf" on HDFS.
Try this.
cd $HADOOP_HOME_DIR (this should be your hadoop root dir)
hadoop fs -put conf input/conf
And then run the MR job again.
-Prashant Kommireddi
On Fri, Dec 23, 2011 at 3:40 PM, Pat Flaherty<p...@well.com> wrote:
Hi,
Installed 0.22.0 on CentOS 5.7. I can start dfs and mapred and see
their
processes.
Ran the first grep example: bin/hadoop jar hadoop-*-examples.jar grep
input output 'dfs[a-z.]+'. It seems the correct jar name is
hadoop-mapred-examples-0.22.0.jar - there are no other
hadoop*examples*.jar
files in HADOOP_HOME.
Didn't work. Then found and tried pi (compute pi) - that works, so my
installation is to some degree of approximation good.
Back to grep. It fails with
java.io.FileNotFoundException: File does not exist:
/user/MyId/input/conf
Found and ran bin/hadoop fs -ls. OK these directory names are
internal to
hadoop (I assume) because Linux has no idea of /user.
And the directory is there - but the program is failing.
Any suggestions; where to start; etc?
Thanks - Pat