[ https://issues.apache.org/jira/browse/MAHOUT-1632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Palumbo updated MAHOUT-1632: ----------------------------------- Labels: legacy (was: ) > Please help me im stuck on using 20 newsgroups example on Windows > ----------------------------------------------------------------- > > Key: MAHOUT-1632 > URL: https://issues.apache.org/jira/browse/MAHOUT-1632 > Project: Mahout > Issue Type: Question > Reporter: Mishari SH > Labels: legacy > > Hello there, I've been using hadoop & mahout on my windows OS and I started > the hadoop cluster before starting the mahout in order to use the cluster for > it, then, I did start the mahout to test the 20newsgroups example but it > throws an exception as not a valid DFS filename as show below in details from > the beginning : > Microsoft Windows [Version 6.1.7601] > Copyright (c) 2009 Microsoft Corporation. All rights reserved. > C:\Users\Admin>cd\ > C:\>cd mahout > C:\mahout>cd examples > C:\mahout\examples>cd bin > C:\mahout\examples\bin>classify-20newsgroups.sh > Welcome to Git (version 1.9.4-preview20140815) > Run 'git help git' to display the help index. > Run 'git help <command>' to display help for specific commands. > Please select a number to choose the corresponding task to run > 1. cnaivebayes > 2. naivebayes > 3. sgd > 4. clean -- cleans up the work area in /tmp/mahout-work- > Enter your choice : 2 > ok. You chose 2 and we'll use naivebayes > creating work directory at /tmp/mahout-work- > + echo 'Preparing 20newsgroups data' > Preparing 20newsgroups data > + rm -rf /tmp/mahout-work-/20news-all > + mkdir /tmp/mahout-work-/20news-all > + cp -R /tmp/mahout-work-/20news-bydate/20news-bydate-test/alt.atheism > /tmp/maho > ut-work-/20news-bydate/20news-bydate-test/comp.graphics > /tmp/mahout-work-/20news > -bydate/20news-bydate-test/comp.os.ms-windows.misc > /tmp/mahout-work-/20news-byda > te/20news-bydate-test/comp.sys.ibm.pc.hardware > /tmp/mahout-work-/20news-bydate/2 > 0news-bydate-test/comp.sys.mac.hardware > /tmp/mahout-work-/20news-bydate/20news-b > ydate-test/comp.windows.x > /tmp/mahout-work-/20news-bydate/20news-bydate-test/mis > c.forsale /tmp/mahout-work-/20news-bydate/20news-bydate-test/rec.autos > /tmp/maho > ut-work-/20news-bydate/20news-bydate-test/rec.motorcycles > /tmp/mahout-work-/20ne > ws-bydate/20news-bydate-test/rec.sport.baseball > /tmp/mahout-work-/20news-bydate/ > 20news-bydate-test/rec.sport.hockey > /tmp/mahout-work-/20news-bydate/20news-bydat > e-test/sci.crypt > /tmp/mahout-work-/20news-bydate/20news-bydate-test/sci.electron > ics /tmp/mahout-work-/20news-bydate/20news-bydate-test/sci.med > /tmp/mahout-work- > /20news-bydate/20news-bydate-test/sci.space > /tmp/mahout-work-/20news-bydate/20ne > ws-bydate-test/soc.religion.christian > /tmp/mahout-work-/20news-bydate/20news-byd > ate-test/talk.politics.guns > /tmp/mahout-work-/20news-bydate/20news-bydate-test/t > alk.politics.mideast > /tmp/mahout-work-/20news-bydate/20news-bydate-test/talk.pol > itics.misc > /tmp/mahout-work-/20news-bydate/20news-bydate-test/talk.religion.misc > /tmp/mahout-work-/20news-bydate/20news-bydate-train/alt.atheism > /tmp/mahout-wor > k-/20news-bydate/20news-bydate-train/comp.graphics > /tmp/mahout-work-/20news-byda > te/20news-bydate-train/comp.os.ms-windows.misc > /tmp/mahout-work-/20news-bydate/2 > 0news-bydate-train/comp.sys.ibm.pc.hardware > /tmp/mahout-work-/20news-bydate/20ne > ws-bydate-train/comp.sys.mac.hardware > /tmp/mahout-work-/20news-bydate/20news-byd > ate-train/comp.windows.x > /tmp/mahout-work-/20news-bydate/20news-bydate-train/mis > c.forsale /tmp/mahout-work-/20news-bydate/20news-bydate-train/rec.autos > /tmp/mah > out-work-/20news-bydate/20news-bydate-train/rec.motorcycles > /tmp/mahout-work-/20 > news-bydate/20news-bydate-train/rec.sport.baseball > /tmp/mahout-work-/20news-byda > te/20news-bydate-train/rec.sport.hockey > /tmp/mahout-work-/20news-bydate/20news-b > ydate-train/sci.crypt > /tmp/mahout-work-/20news-bydate/20news-bydate-train/sci.el > ectronics /tmp/mahout-work-/20news-bydate/20news-bydate-train/sci.med > /tmp/mahou > t-work-/20news-bydate/20news-bydate-train/sci.space > /tmp/mahout-work-/20news-byd > ate/20news-bydate-train/soc.religion.christian > /tmp/mahout-work-/20news-bydate/2 > 0news-bydate-train/talk.politics.guns > /tmp/mahout-work-/20news-bydate/20news-byd > ate-train/talk.politics.mideast > /tmp/mahout-work-/20news-bydate/20news-bydate-tr > ain/talk.politics.misc > /tmp/mahout-work-/20news-bydate/20news-bydate-train/talk. > religion.misc /tmp/mahout-work-/20news-all > + '[' 'C:\hadp' '!=' '' ']' > + '[' '' == '' ']' > + echo 'Copying 20newsgroups data to HDFS' > Copying 20newsgroups data to HDFS > + set +e > + 'C:\hadp/bin/hadoop' dfs -rmr /tmp/mahout-work-/20news-all > /c/hadp/etc/hadoop/hadoop-env.sh: line 103: /c/hadp/bin: is a directory > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > /c/hadp/etc/hadoop/hadoop-env.sh: line 103: /c/hadp/bin: is a directory > rmr: DEPRECATED: Please use 'rm -r' instead. > -rmr: Pathname /C:/Users/Admin/AppData/Local/Temp/mahout-work-/20news-all > from h > dfs://localhost:9000/C:/Users/Admin/AppData/Local/Temp/mahout-work-/20news-all > i > s not a valid DFS filename. > Usage: hadoop fs [generic options] -rmr > + set -e > + 'C:\hadp/bin/hadoop' dfs -put /tmp/mahout-work-/20news-all > /tmp/mahout-work-/2 > 0news-all > /c/hadp/etc/hadoop/hadoop-env.sh: line 103: /c/hadp/bin: is a directory > DEPRECATED: Use of this script to execute hdfs command is deprecated. > Instead use the hdfs command for it. > /c/hadp/etc/hadoop/hadoop-env.sh: line 103: /c/hadp/bin: is a directory > -put: Pathname /C:/Users/Admin/AppData/Local/Temp/mahout-work-/20news-all > from h > dfs://localhost:9000/C:/Users/Admin/AppData/Local/Temp/mahout-work-/20news-all > i > s not a valid DFS filename. > Usage: hadoop fs [generic options] -put [-f] [-p] <localsrc> ... <dst> > + echo 'Creating sequence files from 20newsgroups data' > Creating sequence files from 20newsgroups data > + ./bin/mahout seqdirectory -i /tmp/mahout-work-/20news-all -o > /tmp/mahout-work- > /20news-seq -ow > /c/hadp/etc/hadoop/hadoop-env.sh: line 103: /c/hadp/bin: is a directory > Running on hadoop, using \hadp/bin/hadoop and HADOOP_CONF_DIR= > MAHOUT-JOB: /c/mahout/examples/target/mahout-examples-0.9-job.jar > /c/hadp/etc/hadoop/hadoop-env.sh: line 103: /c/hadp/bin: is a directory > 14/12/09 21:48:57 INFO common.AbstractJob: Command line arguments: > {--charset=[U > TF-8], --chunkSize=[64], --endPhase=[2147483647], > --fileFilterClass=[org.apache. > mahout.text.PrefixAdditionFilter], > --input=[C:/Users/Admin/AppData/Local/Temp/ma > hout-work-/20news-all], --keyPrefix=[], --method=[mapreduce], > --output=[C:/Users > /Admin/AppData/Local/Temp/mahout-work-/20news-seq], --overwrite=null, > --startPha > se=[0], --tempDir=[temp]} > Exception in thread "main" java.lang.IllegalArgumentException: Pathname > /C:/User > s/Admin/AppData/Local/Temp/mahout-work-/20news-seq from > C:/Users/Admin/AppData/L > ocal/Temp/mahout-work-/20news-seq is not a valid DFS filename. > at > org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedF > ileSystem.java:187) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$000(DistributedFi > leSystem.java:101) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFil > eSystem.java:1068) > at > org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFil > eSystem.java:1064) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkRes > olver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(Distribute > dFileSystem.java:1064) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) > at org.apache.mahout.common.HadoopUtil.delete(HadoopUtil.java:192) > at org.apache.mahout.common.HadoopUtil.delete(HadoopUtil.java:200) > at > org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFr > omDirectory.java:84) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) > at > org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesF > romDirectory.java:65) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. > java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces > sorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(Progra > mDriver.java:72) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:153) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl. > java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces > sorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:212) > C:\mahout\examples\bin> > Please help me I'm new to the big data tools and I need this issue resolved > as soon as possible. > Thank you,,, -- This message was sent by Atlassian JIRA (v6.3.4#6332)