Re: Using the Stanford NLP with hadoop
by 'ClassName', which class are you actually refering to? the class in which the LexicalParser is invoked? in my code, the class that implements the parser is named 'parse' and this is the code that i used. lp = new LexicalizedParser(new ObjectInputStream(new GZIPInputStream(parse.class.getResourceAsStream(/englishPCFG.ser.gz; the program runs to completion and map-reduce process is declared as successfully completed everytime even if the code is changed to lp = new LexicalizedParser(new ObjectInputStream(new GZIPInputStream(parse.class.getResourceAsStream(/englishPCF_G.ser.gz; this indicates that the getResourceAsStream does throw an exception even if the file is not present, i guess. any ideas? :confused: Kevin Peterson-3 wrote: On Sat, Apr 18, 2009 at 5:18 AM, hari939 wrote: My project of parsing through material for a semantic search engine requires me to use the http://nlp.stanford.edu/software/lex-parser.shtml Stanford NLP parser on hadoop cluster. To use the Stanford NLP parser, one must create a lexical parser object using a englishPCFG.ser.gz file as a constructor's parameter. i have tried loading the file onto the Hadoop dfs in the /user/root/ folder and have also tried packing the file along with the jar of the java program. Use getResourceAsStream to read it from the jar. Use the ObjectInputStream constructor. That is, new LexicalizedParser(new ObjectInputStream(new GzipInputStream(ClassName.class.getResourceAsStream(/englishPCFG.ser.gz))) I'm interested to know if you have found any other open source parsers in Java or at least have java bindings. -- View this message in context: http://www.nabble.com/Using-the-Stanford-NLP-with-hadoop-tp23112316p24231349.html Sent from the Hadoop core-user mailing list archive at Nabble.com.
Re: Using the Stanford NLP with hadoop
On Tue, Apr 21, 2009 at 4:58 PM, Kevin Peterson kpeter...@biz360.com wrote: I'm interested to know if you have found any other open source parsers in Java or at least have java bindings. Stanford is one of the best, although it is slow. LingPipe http://alias-i.com/lingpipe/ is free for non-commercial use, and they link to most of the open-source toolkits here: http://alias-i.com/lingpipe/web/competition.html It seems like most NLP toolkits don't attempt full sentence parsing, but instead focus on tagging, chunking, or entity recognition. -Stuart
Re: Using the Stanford NLP with hadoop
Greetings, There's a way you can distribute files along with your MR job as part of a payload, or you could save the file in the same spot on every machine of your cluster with some rsyncing, and hard-code loading it. This may be of some help: http://hadoop.apache.org/core/docs/r0.18.2/api/org/apache/hadoop/filecache/DistributedCache.html On Sat, Apr 18, 2009 at 5:18 AM, hari939 hari...@gmail.com wrote: My project of parsing through material for a semantic search engine requires me to use the http://nlp.stanford.edu/software/lex-parser.shtml Stanford NLP parser on hadoop cluster. To use the Stanford NLP parser, one must create a lexical parser object using a englishPCFG.ser.gz file as a constructor's parameter. i have tried loading the file onto the Hadoop dfs in the /user/root/ folder and have also tried packing the file along with the jar of the java program. i am new to the hadoop platform and am not very familiar with some of the salient features of hadoop. looking forward to any form of help. -- View this message in context: http://www.nabble.com/Using-the-Stanford-NLP-with-hadoop-tp23112316p23112316.html Sent from the Hadoop core-user mailing list archive at Nabble.com.