Dataframe, Spark SQL - Drops First 8 Characters of String on Amazon EMR

awzurn Wed, 20 Jan 2016 07:36:57 -0800

Hello,

I'm doing some work on Amazon's EMR cluster, and am noticing some peculiar
results when using both DataFrames to procure and operate on data, and also
when using Spark SQL within Zeppelin to run graphs/reports. Particularly,
I'm noticing that when using either of these on the EMR running Spark 1.5.2,
it will truncate the first 8 characters from a String. You can view a sample
of this in the attached images.


On the left is Spark running locally on my Mac, printing results from a
dataframe on a test set of data. On the right, running the same operations
on the same set of data on EMR.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26022/CYo2WDgWEAMLul_.png>
 

Similar results when running spark sql using the %sql tag in Zeppelin for
graphing.
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26022/sql_spark_text_issue.png>
 

Additionally, when I transform these back to an RDD, results are shown as
wanted (on Amazon EMR).
<http://apache-spark-user-list.1001560.n3.nabble.com/file/n26022/dt_to_rdd_print.png>
 

I'm rather certain that this is not the intended behavior, especially
considering the Dataframe prints out the whole results running on my local
machine running the same version of Spark.

Is there a setting somewhere that might be causing this issue with
DataFrames and Spark SQL which could be causing this issue to come up?

Thanks,

Andrew Zurn

*Specs for EMR*
Release label:emr-4.2.0
Hadoop distribution:Amazon 2.6.0
Applications:Hive 1.0.0, Pig 0.14.0, Hue 3.7.1, Spark 1.5.2, Ganglia 3.6.0,
Mahout 0.11.0, Oozie-Sandbox 4.2.0, Presto-Sandbox 0.125, Zeppelin-Sandbox
0.5.5

Master:Running1c3.4xlarge
Core:Running10r3.4xlarge

*Additional Configuraitons*
spark.executor.cores    5
spark.dynamicAllocation.enabled true
spark.serializer        org.apache.spark.serializer.KryoSerializer
spark.executor.memory   34G






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Dataframe-Spark-SQL-Drops-First-8-Characters-of-String-on-Amazon-EMR-tp26022.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Dataframe, Spark SQL - Drops First 8 Characters of String on Amazon EMR

Reply via email to