Re: Nutch bug - assumption of HDFS in CrawlDb.java even if using other file systems like S3

Viksit Gaur Mon, 30 May 2011 23:57:13 -0700

Julien,

I couldn't find any similar symptoms on JIRA - I'll go ahead and file a new one.


Cheers
Viksit

On Wed, May 25, 2011 at 1:07 PM, Julien Nioche
<[email protected]> wrote:
>
> Viksit,
>
> Please check if this has already been reported on the JIRA and if not open a 
> new issue (for 2.0)
>
> Thanks
>
> Julien
>
> On 25 May 2011 19:02, Viksit Gaur <[email protected]> wrote:
>>
>> [Cross posting since this might be more relevant here.]
>>
>> --
>>
>> Hi all,
>>
>> Trying to run nutch on Elastic Mapreduce, I ran into an issue which I
>> think is the same as the following,
>>
>> https://forums.aws.amazon.com/thread.jspa?threadID=54492
>>
>> Exception in thread "main" java.lang.IllegalArgumentException: This
>> file system object (hdfs://ip-10-122-99-48.ec2.internal:9000) does not
>> support access to the request path
>> 's3n://mybucketname/crawl/crawldb/current' You possibly called
>> FileSystem.get(conf) when you should of called FileSystem.get(uri,
>> conf) to obtain a file system supporting your path.
>>        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:351)
>>        at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:99)
>>        at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:155)
>>        at 
>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453)
>>        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:688)
>>        at org.apache.nutch.crawl.CrawlDb.createJob(CrawlDb.java:122)
>>        at org.apache.nutch.crawl.Injector.inject(Injector.java:226)
>>        at org.apache.nutch.crawl.Crawl.main(Crawl.java:119)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>> It appears that CrawlDb.java uses code that assumes all inputs are on
>> HDFS. Is this a known bug - and if so, could someone point me to the
>> number, and whether there exists a patch for it?
>>
>> If not, I'd be happy to contribute one. I'm using Nutch 1.2 that I've
>> patched for NUTCH 937 and NUTCH 993.
>>
>> Cheers
>> Viksit
>
>
>
> --
>
> Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com

Re: Nutch bug - assumption of HDFS in CrawlDb.java even if using other file systems like S3

Reply via email to