[ 
https://issues.apache.org/jira/browse/SPARK-13795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15310700#comment-15310700
 ] 

Cheng Lian commented on SPARK-13795:
------------------------------------

[~ganeshkrishnan] From the stack trace, I suspect that some string field used 
in the filter predicate is implicitly coerced to integer type and somehow 
treated as a long type while evaluating the filter predicate. Say {{a}} is a 
string field, then {{a > 1}} implicitly coerces {{a}} to int type.

Could you please provide the original query or query plan of the DataFrame that 
is causing the problem? It would be helpful if you can help minimize this use 
case by removing unrelated fields and/or query components. Thanks!

> ClassCast Exception while attempting to show() a DataFrame
> ----------------------------------------------------------
>
>                 Key: SPARK-13795
>                 URL: https://issues.apache.org/jira/browse/SPARK-13795
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.6.0
>         Environment: Linux 14.04 LTS
>            Reporter: Ganesh Krishnan
>
> DataFrame Schema (by printSchema() ) is as follows
> allDataJoined.printSchema() 
> {noformat}
>  |-- eventType: string (nullable = true)
>  |-- itemId: string (nullable = true)
>  |-- productId: string (nullable = true)
>  |-- productVersion: string (nullable = true)
>  |-- servicedBy: string (nullable = true)
>  |-- ACCOUNT_NAME: string (nullable = true)
>  |-- CONTENTGROUPID: string (nullable = true)
>  |-- PRODUCT_ID: string (nullable = true)
>  |-- PROFILE_ID: string (nullable = true)
>  |-- SALESADVISEREMAIL: string (nullable = true)
>  |-- businessName: string (nullable = true)
>  |-- contentGroupId: string (nullable = true)
>  |-- salesAdviserName: string (nullable = true)
>  |-- salesAdviserPhone: string (nullable = true)
> {noformat}
> There is NO column that has any datatype except String. There used to be 
> previously an inferred column of type long that was dropped  
>  
> {code}
> DataFrame allDataJoined = whiteEventJoinedWithReference.
>                        drop(rliDataFrame.col("occurredAtDate"));
> allDataJoined.printSchema() : output above ^^
> Now 
> allDataJoined.show() 
>  
> {code}
> throws the following exception vv
> {noformat}
> java.lang.ClassCastException: java.lang.Long cannot be cast to 
> java.lang.Integer
>       at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
>       at scala.math.Ordering$Int$.compare(Ordering.scala:256)
>       at scala.math.Ordering$class.gt(Ordering.scala:97)
>       at scala.math.Ordering$Int$.gt(Ordering.scala:256)
>       at 
> org.apache.spark.sql.catalyst.expressions.GreaterThan.nullSafeEval(predicates.scala:457)
>       at 
> org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:383)
>       at 
> org.apache.spark.sql.catalyst.expressions.And.eval(predicates.scala:238)
>       at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$create$2.apply(predicates.scala:38)
>       at 
> org.apache.spark.sql.catalyst.expressions.InterpretedPredicate$$anonfun$create$2.apply(predicates.scala:38)
>       at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$prunePartitions$1.apply(DataSourceStrategy.scala:257)
>       at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$prunePartitions$1.apply(DataSourceStrategy.scala:257)
>       at 
> scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
>       at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>       at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>       at 
> scala.collection.TraversableLike$class.filter(TraversableLike.scala:263)
>       at scala.collection.AbstractTraversable.filter(Traversable.scala:105)
>       at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.prunePartitions(DataSourceStrategy.scala:257)
>       at 
> org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:82)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>       at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>       at 
> org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.makeBroadcastHashJoin(SparkStrategies.scala:88)
>       at 
> org.apache.spark.sql.execution.SparkStrategies$EquiJoinSelection$.apply(SparkStrategies.scala:97)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>       at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>       at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:336)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>       at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>       at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:349)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>       at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>       at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59)
>       at 
> org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47)
>       at 
> org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45)
>       at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52)
>       at 
> org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52)
>       at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2134)
>       at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1413)
>       at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1495)
>       at org.apache.spark.sql.DataFrame.showString(DataFrame.scala:171)
>       at org.apache.spark.sql.DataFrame.show(DataFrame.scala:394)
>       at org.apache.spark.sql.DataFrame.show(DataFrame.scala:355)
>       at org.apache.spark.sql.DataFrame.show(DataFrame.scala:363)
> {noformat}
> Checked, googled, stackoverflowed with no results.
> Edit: I managed to narrow down this bug to this usecase scenario:
> The raw json has the field dateOccuredAt and also the parquet it is being 
> written to also has partition dateOccuredAt. The raw JSON field is being 
> inferred as String while the partition is inferred as long which is correct 
> too. However while persisting we have the above error even if the column 
> dateOccuredAt is dropped from the DataFrame
>  Also, we use Java and not Scala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to