Re: Problems compiling Nutch in Eclipse
RTF parser is not built by default because the jars it uses has some licensing issues. And it is out of sync with current trunk so it does not even build anymore. This issue may help: https://issues.apache.org/jira/browse/NUTCH-644 On Sat, Mar 21, 2009 at 03:02, Rodrigo Reyes C. wrote: > Hi > > I have configured my eclipse project as stated here > > http://wiki.apache.org/nutch/RunNutchInEclipse0.9 > > Still, I am getting the following errors: > > The return type is incompatible with Parser.getParse(Content) > RTFParseFactory.java > nutch/src/plugin/parse-rtf/src/java/org/apache/nutch/parse/rtf line 52 > Java Problem > Type mismatch: cannot convert from ParseResult to Parse > TestRTFParser.java > nutch/src/plugin/parse-rtf/src/test/org/apache/nutch/parse/rtf line 78 > Java Problem > > Any ideas on what could be wrong? I already included both > http://nutch.cvs.sourceforge.net/nutch/nutch/src/plugin/parse-mp3/lib/ and > http://nutch.cvs.sourceforge.net/nutch/nutch/src/plugin/parse-rtf/lib/ jars. > > Thanks in advance > > -- > Rodrigo Reyes C. > > -- Doğacan Güney
Re: Problems compiling Nutch in Eclipse
Ninad Thanks for your answer. I have to say I am eager to read all you have written in your blog about Nutch inner workings. I've already done everything your blog post tells to do (and a couple more things like downloading a couple of extra jars that are not included in the SVN version). Nevertheless, I am still getting the error I wrote. I think I should also mention I am not working on 0.9 code base but on the trunk code base. Maybe that is why I am getting this error. Rodrigo PS: By the way, I did managed to have Nutch crawling yesterday late at night. Still, I haven't been able to compile this specific plugin (rtf plugin) 2009/3/21 Ninad Raut > Check out my blog : > http://j2eewebsearch.blogspot.com/ > > Check out the third point... > > Let me know if you you get it all right. Your comments will be appreciated. > > Regards, > Ninad > > > On Sat, Mar 21, 2009 at 6:32 AM, Rodrigo Reyes C. > wrote: > >> Hi >> >> I have configured my eclipse project as stated here >> >> http://wiki.apache.org/nutch/RunNutchInEclipse0.9 >> >> Still, I am getting the following errors: >> >>- The return type is incompatible with Parser.getParse(Content) >>RTFParseFactory.java >>nutch/src/plugin/parse-rtf/src/java/org/apache/nutch/parse/rtfline 52 >>Java Problem >>- Type mismatch: cannot convert from ParseResult to Parse >>TestRTFParser.java >>nutch/src/plugin/parse-rtf/src/test/org/apache/nutch/parse/rtfline 78 >>Java Problem >> >> Any ideas on what could be wrong? I already included both >> http://nutch.cvs.sourceforge.net/nutch/nutch/src/plugin/parse-mp3/lib/and >> http://nutch.cvs.sourceforge.net/nutch/nutch/src/plugin/parse-rtf/lib/jars. >> >> Thanks in advance >> >> -- >> Rodrigo Reyes C. >> >> >
Re: [DISCUSS] contents of nutch release artifact
Hi, On Sat, Mar 21, 2009 at 12:28 PM, Jukka Zitting wrote: > To be accurate, the source release *is* the collection of bits that > the release manager is using to produce binaries and other release > artifacts. It's just a packaged svn export of the release tag. Or, to express this in another way, the release manager can produce the source release simply by packaging the entire source tree he's using just before invoking any Ant targets to produce the binaries. BR, Jukka Zitting
Re: [DISCUSS] contents of nutch release artifact
Hi, On Fri, Mar 20, 2009 at 1:10 PM, Andrzej Bialecki wrote: > Yes, sorry for not being more explicit - my proposal was for 1.1, I think > 1.0 has to go out as it is (and I'd even hesitate to create a source-only > release now - we would have to test that it's still buildable and fully > functional.) To be accurate, the source release *is* the collection of bits that the release manager is using to produce binaries and other release artifacts. It's just a packaged svn export of the release tag. If the release manager can build and test the sources, then anyone else should be able do the same using the exact same set of bits. Verifying that is one of the key parts of the release vote. >From that perspective I'm even a bit worried about the idea of having an Ant target that exports and packages the tag, as it suggests that the release manager is not necessarily using that set of bits to build the release. BR, Jukka Zitting
Re: [DISCUSS] contents of nutch release artifact
Doğacan Güney wrote: On Thu, Mar 19, 2009 at 23:46, Sami Siren wrote: Sami Siren wrote: Andrzej Bialecki wrote: How about the following: we build just 2 packages: * binary: this includes only base hadoop libs in lib/ (enough to start a local job, no optional filesystems etc), the *.job and *.war files and scripts. Scripts would check for the presence of plugins/ dir, and offer an option to create it from *.job. Assumption here is that this shouldbe enough to run full cycle in local mode, and that people who want to run a distributed cluster will first install a plain Hadoop release, and then just put the *.job and bin/nutch on the master. * source: no build artifacts, no .svn (equivalent to svn export), simple tgz. this sounds good to me. additionally some new documentation needs to be written too. I added a simple patch to NUTCH-728 to make a plain source release from svn, what do people think should we add the plain source package into next rc. I would not like to make changes to binary package now but propose that we do those changes post 1.0. +1 for including plain source release in next rc. As for, local/distributed separation, it is a good idea but I think we should hold it for 1.1 (or something else) if it requires architectural changes (thus needs review and testing). Yes, sorry for not being more explicit - my proposal was for 1.1, I think 1.0 has to go out as it is (and I'd even hesitate to create a source-only release now - we would have to test that it's still buildable and fully functional.) -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
robots.txt redirect (NUTCH-124)
Hi everybody, Can someone shine a light on NUTCH-124: RobotRulesParser.java doesn't follow redirects when requesting the robots.txt file. Doug patched this, but that didn't make it to the trunk. What is the wished behavior here? For example, when requesting the following url: http://7is7.com/software/stateye/download/stateye097f.html ... RobotRulesParser requests the following robots.txt: http://7is7.com/robots.txt ... however, that file doesn't exist, it redirects to: http://www.7is7.com/robots.txt ... that robots.txt tells us the initial url is disallowed. But does it really? Or is robots.txt file only applicable to http://www.7is7.com and not http://7is7.com. So the question is: should we follow such redirects? Thanks, Mathijs