Bartosz, That fixed the problem. Thanks for your help and for updating the wiki.
Frank 2009/4/14 Bartosz Gadzimski <bartek...@o2.pl>: > Hello Frank, > > Yes, it is memory issue you must increase java heap size. > > Just follow this instructions (another things to add to wiki ;) > > Eclipse -> Window -> Preferences -> Java -> Installed JREs -> edit -> > Default VM arguments > > I've set mine to -Xms5m -Xmx150m because I have like 200MB RAM left after > runnig all apps > > -Xms (minimum ammount of RAM memory for running applications) > -Xmx (maximum) > > It should help. > > Thanks, > Bartosz > > Frank McCown pisze: >> >> Hello Bartosz, >> >> I'm running the default Nutch 1.0 version on Windows XP (2 GB RAM) >> with Eclipse 3.3.0. I followed the directions at >> >> http://wiki.apache.org/nutch/RunNutchInEclipse0.9 >> >> exactly as stated. I'm able to run the default Nutch 0.9 release >> without any problems in Eclipse. But when I run 1.0, I always get the >> java.io.IOException as stated in my last email. I had assumed it was >> due to the plugin issue, but maybe not. I'm just running a very small >> crawl with two seed URLs. >> >> Here's what hadoop.log says: >> >> 2009-04-13 13:41:03,010 INFO crawl.Crawl - crawl started in: crawl >> 2009-04-13 13:41:03,025 INFO crawl.Crawl - rootUrlDir = urls >> 2009-04-13 13:41:03,025 INFO crawl.Crawl - threads = 10 >> 2009-04-13 13:41:03,025 INFO crawl.Crawl - depth = 3 >> 2009-04-13 13:41:03,025 INFO crawl.Crawl - topN = 5 >> 2009-04-13 13:41:03,479 INFO crawl.Injector - Injector: starting >> 2009-04-13 13:41:03,479 INFO crawl.Injector - Injector: crawlDb: >> crawl/crawldb >> 2009-04-13 13:41:03,479 INFO crawl.Injector - Injector: urlDir: urls >> 2009-04-13 13:41:03,479 INFO crawl.Injector - Injector: Converting >> injected urls to crawl db entries. >> 2009-04-13 13:41:03,588 WARN mapred.JobClient - Use >> GenericOptionsParser for parsing the arguments. Applications should >> implement Tool for the same. >> 2009-04-13 13:41:06,105 WARN mapred.LocalJobRunner - job_local_0001 >> java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.<init>(MapTask.java:498) >> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) >> at >> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:138) >> >> >> I have not tried Sanjoy's advice yet... it looks like this is a memory >> issue. >> >> Any advice would be much appreciated, >> Frank >> >> >> 2009/4/10 Bartosz Gadzimski <bartek...@o2.pl>: >> >>> >>> Hello Frank, >>> >>> Please look into hadoop.log and let maybe there is something more. >>> >>> About your error - you must give us more specific configuration of your >>> nutch. >>> >>> Default nutch installation is working with no problems (I'v never changed >>> src/plugin path) >>> >>> Please tell us: version of nutch >>> any changes >>> different configurations (different then crawl-urlfilter - adding your >>> domain). >>> >>> Thanks, >>> Bartosz >>> >>> Frank McCown pisze: >>> >>>> >>>> Adding cygwin to my PATH solved my problem with whoami. But now I'm >>>> getting an exception when running the crawler: >>>> >>>> Injector: Converting injected urls to crawl db entries. >>>> Exception in thread "main" java.io.IOException: Job failed! >>>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) >>>> at org.apache.nutch.crawl.Injector.inject(Injector.java:160) >>>> at org.apache.nutch.crawl.Crawl.main(Crawl.java:114) >>>> >>>> I know from searching the mailing list that this is normally due to a >>>> bad plugin.folders setting in the nutch-default.xml, but I used the >>>> same value as the tutorial (./src/plugin) to no avail. >>>> >>>> (As an aside, seems like Hadoop should provide a better error message >>>> if the plugin folder doesn't exist.) >>>> >>>> Anyway, thanks, Bartosz, for your help. >>>> >>>> Frank >>>> >>>> >>>> 2009/4/10 Bartosz Gadzimski <bartek...@o2.pl>: >>>> >>>> >>>>> >>>>> Hello, >>>>> >>>>> So now you have to install cygwin and be sure that you add it to PATH >>>>> >>>>> it's in http://wiki.apache.org/nutch/RunNutchInEclipse0.9 >>>>> >>>>> After this you should be able to run "bash" command from command prompt >>>>> (Menu Start > RUN > cmd.exe) >>>>> >>>>> Then you'r done - everything will be working. >>>>> >>>>> I must add it to wiki, I forgot about whoami problem. >>>>> >>>>> Take care, >>>>> Bartosz >>>>> >>>>> sanjoy.gh...@thomsonreuters.com pisze: >>>>> >>>>> >>>>>> >>>>>> Thanks for the suggestion Bartosz. I downloaded whoami, and It >>>>>> promptly >>>>>> crashed on "bash". >>>>>> >>>>>> 09/04/10 12:02:28 WARN fs.FileSystem: uri=file:/// >>>>>> javax.security.auth.login.LoginException: Login failed: Cannot run >>>>>> program "bash": CreateProcess error=2, The system cannot find the file >>>>>> specified >>>>>> at >>>>>> >>>>>> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupI >>>>>> nformation.java:250) >>>>>> at >>>>>> >>>>>> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupI >>>>>> nformation.java:275) >>>>>> at >>>>>> >>>>>> org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupI >>>>>> nformation.java:257) >>>>>> at >>>>>> >>>>>> org.apache.hadoop.security.UserGroupInformation.login(UserGroupInformati >>>>>> on.java:67) >>>>>> at >>>>>> org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1438) >>>>>> at >>>>>> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1376) >>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:215) >>>>>> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:120) >>>>>> at org.apache.nutch.crawl.Crawl.main(Crawl.java:84) >>>>>> >>>>>> Where am I going to find "bash" on Windows without running commandline >>>>>> cygwin? Is there a way to turn off this security in Hadoop? >>>>>> >>>>>> Thanks, >>>>>> Sanjoy >>>>>> >>>>>> -----Original Message----- >>>>>> From: Bartosz Gadzimski [mailto:bartek...@o2.pl] Sent: Friday, April >>>>>> 10, >>>>>> 2009 5:06 AM >>>>>> To: nutch-dev@lucene.apache.org >>>>>> Subject: Re: login failed exception >>>>>> >>>>>> Hello, >>>>>> >>>>>> I am not sure if it's the case but you should try to add whoami to >>>>>> your >>>>>> windows box. >>>>>> >>>>>> for example for windows xp and sp2: >>>>>> >>>>>> http://www.microsoft.com/downloads/details.aspx?FamilyId=49AE8576-9BB9-4 >>>>>> 126-9761-BA8011FABF38&displaylang=en >>>>>> >>>>>> >>>>>> Thanks, >>>>>> Bartosz >>>>>> >>>>>> Frank McCown pisze: >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> I've been running 0.9 in Eclipse on Windows for some time, and I was >>>>>>> successful in running the NutchBean from version 1.0 in Eclipse, but >>>>>>> the crawler gave me the same exception as it gave this individual. >>>>>>> Maybe there's something else I'm overlooking, but I followed the >>>>>>> Tutorial at >>>>>>> >>>>>>> http://wiki.apache.org/nutch/RunNutchInEclipse0.9 >>>>>>> >>>>>>> to a T. I'll keep working on it though. >>>>>>> >>>>>>> Frank >>>>>>> >>>>>>> >>>>>>> 2009/4/10 Bartosz Gadzimski <bartek...@o2.pl>: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> fmccown pisze: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> You must run Nutch's crawler using cygwin on Windows since cygwin >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> >>>>>> has the >>>>>> >>>>>> >>>>>> >>>>>>>>> >>>>>>>>> whoami program. If you run it from Eclipse on Windows, it can't >>>>>>>>> use >>>>>>>>> cygwin's whoami program and will fail with the exceptions you saw. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> >>>>>> This >>>>>> >>>>>> >>>>>> >>>>>>>>> >>>>>>>>> is >>>>>>>>> an unfortunately design decision in Hadoop which makes anything >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>> >>>>>> after >>>>>> >>>>>> >>>>>> >>>>>>>>> >>>>>>>>> version 9.0 not work in Eclipse on Windows. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> It's not true, please look at >>>>>>>> http://wiki.apache.org/nutch/RunNutchInEclipse0.9 >>>>>>>> >>>>>>>> I am using nutch 1.0 with eclipse on windows with no problems. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Bartosz >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>> >>>> >>> >>> >> >> > >