> 1. I don't think textFile is capable of unpacking a .gz file. You need to use > hadoopFile or newAPIHadoop file for this.
Sorry that’s incorrect, textFile works fine on .gz files. What it can’t do is compute splits on gz files, so if you have a single file, you'll have a single partition. Processing 30 GB of gzipped data should not take that long, at least with the Scala API. Python not sure, especially under 1.2.1.