Hi John,
Glad you're enjoying the Spark training at UMD.
Is the 43 GB XML data in a single file or split across multiple BZIP2 files?
Is the file in a HDFS cluster or on a single linux machine?
If you're using BZIP2 with splittable compression (in HDFS), you'll need at
least Hadoop 1.1:
Hi,
Can you post what the error looks like?
Sameer F.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Usage-of-spark-ec2-how-to-deploy-a-revised-version-of-spark-1-1-0-tp16943p16963.html
Sent from the Apache Spark User List mailing list archive at