Looking at another forum,
I tried :
files =
sc.newAPIHadoopFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram","com.hadoop.mapreduce.LzoTextInputFormat","org.apache.hadoop.io.LongWritable","org.apache.hadoop.io.Text")
Traceback (most recent call last):
File "", line
Thanks for your prompt reply.
I will follow https://issues.apache.org/jira/browse/SPARK-2394 and will let
you know if everything works.
Cheers,
Bertrand
--
View this message in context:
Hello everybody,
I followed the steps from https://issues.apache.org/jira/browse/SPARK-2394
to read LZO-compressed files, but now I cannot even open a file with :
lines =
sc.textFile("s3n://datasets.elasticmapreduce/ngrams/books/20090715/eng-us-all/1gram")
>>> lines.first()
Traceback (most
Do you have LZO configured? see
http://stackoverflow.com/questions/14808041/how-to-have-lzo-compression-in-hadoop-mapreduce
---
Robin East
Spark GraphX in Action Michael Malak and Robin East
Manning Publications Co.
Hello everybody,
I am trying to read the Google Books Ngrams with pyspark on Amazon EC2.
I followed the steps from :
http://spark.apache.org/docs/latest/ec2-scripts.html
and everything is working fine.
I am able to read the file :
lines =