> 1. I don't think textFile is capable of unpacking a .gz file. You need to use 
> hadoopFile or newAPIHadoop file for this.

Sorry that’s incorrect, textFile works fine on .gz files. What it can’t do is 
compute splits on gz files, so if you have a single file, you'll have a single 
partition.

Processing 30 GB of gzipped data should not take that long, at least with the 
Scala API. Python not sure, especially under 1.2.1.

Reply via email to