Hello, I'm doing some work on Amazon's EMR cluster, and am noticing some peculiar results when using both DataFrames to procure and operate on data, and also when using Spark SQL within Zeppelin to run graphs/reports. Particularly, I'm noticing that when using either of these on the EMR running Spark 1.5.2, it will truncate the first 8 characters from a String. You can view a sample of this in the attached images.
On the left is Spark running locally on my Mac, printing results from a dataframe on a test set of data. On the right, running the same operations on the same set of data on EMR. <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26022/CYo2WDgWEAMLul_.png> Similar results when running spark sql using the %sql tag in Zeppelin for graphing. <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26022/sql_spark_text_issue.png> Additionally, when I transform these back to an RDD, results are shown as wanted (on Amazon EMR). <http://apache-spark-user-list.1001560.n3.nabble.com/file/n26022/dt_to_rdd_print.png> I'm rather certain that this is not the intended behavior, especially considering the Dataframe prints out the whole results running on my local machine running the same version of Spark. Is there a setting somewhere that might be causing this issue with DataFrames and Spark SQL which could be causing this issue to come up? Thanks, Andrew Zurn *Specs for EMR* Release label:emr-4.2.0 Hadoop distribution:Amazon 2.6.0 Applications:Hive 1.0.0, Pig 0.14.0, Hue 3.7.1, Spark 1.5.2, Ganglia 3.6.0, Mahout 0.11.0, Oozie-Sandbox 4.2.0, Presto-Sandbox 0.125, Zeppelin-Sandbox 0.5.5 Master:Running1c3.4xlarge Core:Running10r3.4xlarge *Additional Configuraitons* spark.executor.cores 5 spark.dynamicAllocation.enabled true spark.serializer org.apache.spark.serializer.KryoSerializer spark.executor.memory 34G -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Dataframe-Spark-SQL-Drops-First-8-Characters-of-String-on-Amazon-EMR-tp26022.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org