On Fri, Jul 8, 2011 at 11:05 AM, Jake Mannix <[email protected]> wrote:
> At the end of the exception trace, you should see the list of options which > it will > take. As I said, it's missing a "--help" option, but all of the mahout > programs, > if given an incorrect argument, will give this stack trace, followed by the > list of arguments you *could* use. > Seems to violate the principle of least astonishment. If this is a systemic issue with all the command line scripts, I think we should create a JIRA issue for it. I can work on it on the side with my GSOC project. Why does this happen in the first place? > > In this case, they're printed below, I'll cut the part out you need: > > --------------- > Usage: > > [--seqFile <seqFile> --output <output> --dictionaryType <dictionaryType> > --dictionary <dictionary> --csv --useKey --printKey --sizeOnly] > > Options > > --seqFile (-s) seqFile The Sequence File > containing the Vectors > --output (-o) output The output file. If > not specified, > dumps to the console > --dictionaryType (-dt) dictionaryType The dictionary > file type (text|sequencefile) > --dictionary (-d) dictionary The dictionary file. > --csv (-c) Output the Vector as > CSV. Otherwise > it substitutes in the terms for > vector cell entries > --useKey (-u) If the Key is a vector, then dump > that instead > --printKey (-p) Print out the key as > well, delimited > by a tab (or the value if > useKey is true) > --sizeOnly (-sz) Dump only the size of the vector > > ---------------- > > This means you want to do: > > ./bin/mahout -s path_to_docTopics_output -o > path_you_want_to_write_text_output_to > > and then just look in path_you_want_to_write_text_output_to, and it should > have > what you want. > > -jake > > On Fri, Jul 8, 2011 at 6:16 AM, huaiyang gongzi <[email protected] > >wrote: > > > Thanks, Jake. But after typing mahout vectordump --help, I got sth > like > > this > > > > 11/07/08 09:14:25 ERROR vectors.VectorDumper: Exception > > org.apache.commons.cli2.OptionException: Unexpected --help while > processing > > Options > > at > org.apache.commons.cli2.commandline.Parser.parse(Parser.java:99) > > at > > org.apache.mahout.utils.vectors.VectorDumper.main(VectorDumper.java:100) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > > > > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > > at > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > > at > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > Usage: > > > > [--seqFile <seqFile> --output <output> --dictionaryType > > <dictionaryType> > > --dictionary <dictionary> --csv --useKey --printKey > > --sizeOnly] > > Options > > > > --seqFile (-s) seqFile The Sequence File containing > > the > > > > Vectors > > --output (-o) output The output file. If not > > specified, > > dumps to the > > console > > --dictionaryType (-dt) dictionaryType The dictionary file > > type > > > > (text|sequencefile) > > --dictionary (-d) dictionary The dictionary > > file. > > --csv (-c) Output the Vector as CSV. > > Otherwise > > it substitutes in the terms > > for > > vector cell > > entries > > --useKey (-u) If the Key is a vector, then > > dump > > that > > instead > > --printKey (-p) Print out the key as well, > > delimited > > by a tab (or the value if > useKey > > is > > > > true) > > --sizeOnly (-sz) Dump only the size of the > > vector > > 11/07/08 09:14:25 INFO driver.MahoutDriver: Program took 30 ms > > > > > > On Thu, Jul 7, 2011 at 5:56 PM, Jake Mannix <[email protected]> > wrote: > > > > > On Thu, Jul 7, 2011 at 5:53 PM, wine lover <[email protected]> > wrote: > > > > > > > Dear All, > > > > > > > > After running LDA analysis, I got the docTopic file, which is a > regular > > > > sequence-file. How to transfer it into a readable format? I searched > > > > vectordumper, or vectordump, but did not get any useful results, such > > as > > > > how > > > > to use it in command-line? Thanks. > > > > > > > > > > So you say you "searched vectordumper/vectordump", you mean you > > > looked through the code looking for it, or you used it and it didn't do > > > what > > > you wanted? > > > > > > If you're just not sure how to use it, try running "./bin/mahout" from > > your > > > distribution directory, with no arguments, and it will print out a > bunch > > of > > > possible commands, one of which is vectordump. If you try to run it > > > with no arguments, it will sadly exit silently, not telling you what > the > > > usage is (this is a bug!), but if you try to give it an illegal > argument, > > > like > > > > > > ./bin/mahout vectordump --help > > > > > > You'll see: > > > Usage: > > > > > > [--seqFile <seqFile> --output <output> --dictionaryType > <dictionaryType> > > > > > > --dictionary <dictionary> --csv --useKey --printKey --sizeOnly] > > > > > > Options > > > > > > --seqFile (-s) seqFile The Sequence File containing > > the > > > > > > Vectors > > > > > > --output (-o) output The output file. If not > > > specified, > > > dumps to the console > > > > > > --dictionaryType (-dt) dictionaryType The dictionary file type > > > > > > (text|sequencefile) > > > > > > --dictionary (-d) dictionary The dictionary file. > > > > > > --csv (-c) Output the Vector as CSV. > > > Otherwise > > > it substitutes in the terms > for > > > > > > vector cell entries > > > > > > --useKey (-u) If the Key is a vector, then > > dump > > > > > > that instead > > > > > > --printKey (-p) Print out the key as well, > > > delimited > > > by a tab (or the value if > > useKey > > > is > > > true) > > > > > > --sizeOnly (-sz) Dump only the size of the > > vector > > > > > > > > > ----- > > > > > > If you use these instructions to point to the docTopics output > location, > > > you can have it print out the p(topic | document) for each > topic/document > > > pair in your collection. > > > > > > -jake > > > > > >
