Re: Question about Google Books Ngrams with pyspark (1.4.1)

2015-09-02 Thread Bertrand
Looking at another forum, I tried : files = sc.newAPIHadoopFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram","com.hadoop.mapreduce.LzoTextInputFormat","org.apache.hadoop.io.LongWritable","org.apache.hadoop.io.Text") Traceback (most recent call last): File "", line

Re: Question about Google Books Ngrams with pyspark (1.4.1)

2015-09-01 Thread Bertrand
Thanks for your prompt reply. I will follow https://issues.apache.org/jira/browse/SPARK-2394 and will let you know if everything works. Cheers, Bertrand -- View this message in context:

Re: Question about Google Books Ngrams with pyspark (1.4.1)

2015-09-01 Thread Bertrand
Hello everybody, I followed the steps from https://issues.apache.org/jira/browse/SPARK-2394 to read LZO-compressed files, but now I cannot even open a file with : lines = sc.textFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram") >>> lines.first() Traceback (most

Re: Question about Google Books Ngrams with pyspark (1.4.1)

2015-09-01 Thread Robineast
Do you have LZO configured? see http://stackoverflow.com/questions/14808041/how-to-have-lzo-compression-in-hadoop-mapreduce --- Robin East Spark GraphX in Action Michael Malak and Robin East Manning Publications Co.

Question about Google Books Ngrams with pyspark (1.4.1)

2015-09-01 Thread Bertrand
Hello everybody, I am trying to read the Google Books Ngrams with pyspark on Amazon EC2. I followed the steps from : http://spark.apache.org/docs/latest/ec2-scripts.html and everything is working fine. I am able to read the file : lines =