Re: SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist:

2014-10-09 Thread jan.zikes
the standard EC2 installation? __ Od: Sean Owen so...@cloudera.com Komu: jan.zi...@centrum.cz Datum: 08.10.2014 18:05 Předmět: Re: SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist: CC: user@spark.apache.org

Re: SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist:

2014-10-09 Thread Rahul Kumar Singh
: 08.10.2014 18:05 Předmět: Re: SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist: CC: user@spark.apache.org Take this as a bit of a guess, since I don't use S3 much and am only a bit aware of the Hadoop+S3 integration issues. But I know that S3's lack

Re: SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist:

2014-10-08 Thread jan.zikes
My additional question is if this problem can be possibly caused by the fact that my file is bigger than RAM memory across the whole cluster?   __ Hi I'm trying to use sc.wholeTextFiles() on file that is stored amazon S3 I'm getting

Re: SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist:

2014-10-08 Thread jan.zikes
One more update: I've realized that this problem is not only Python related. I've tried it also in Scala, but I'm still getting the same error, my scala code: val file = sc.wholeTextFiles(s3n://wiki-dump/wikiinput).first() __ My

Re: SparkContext.wholeTextFiles() java.io.FileNotFoundException: File does not exist:

2014-10-08 Thread Sean Owen
Take this as a bit of a guess, since I don't use S3 much and am only a bit aware of the Hadoop+S3 integration issues. But I know that S3's lack of proper directories causes a few issues when used with Hadoop, which wants to list directories. According to