Daniel Barclay created SPARK-16743:
--------------------------------------

             Summary: converter and access code out of sync: createDataFrame on 
RDD[Option[C]] fails with MatchError
                 Key: SPARK-16743
                 URL: https://issues.apache.org/jira/browse/SPARK-16743
             Project: Spark
          Issue Type: Bug
    Affects Versions: 1.6.2, 1.6.1
            Reporter: Daniel Barclay


Calling {{SqlContext}}'s {{createDataFrame}} on an RDD of type 
{{RDD\[Option\[SomeUserClass]]}} leads to an internal error.

For example, if the first field of {{SomeUserClass}} is of type {{String}}, 
evaluating the RDD yields a {{MatchError}} referring an instance of 
{{SomeUserClass}} in  
{{org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl}}
 (which should have been passed only certain kinds of representations of 
strings).

The problem seems to be in {{ExistingRDD.scala}}'s 
{{RDDConversions.productToRowAdd(...)}}:

It has a list of converters that reflects the list of members of 
{{SomeUserClass}} (looking past the {{Option}} part of the RDD record type 
{{Option\[SomeUserClass]}}).

However, the data-access code ({{r.productElement\(i)}}) does not seem to look 
past the {{Option}} part correspondingly.  (It does not seem to also traverse 
from the Some instance to the {{SomeUserClass)}}.)

Therefore, it ends up passing the instance of {{SomeUserClass}} to the 
converter intended for the first member field of {{SomeUserClass}} (e.g., a 
String converter), yielding an internal error.

(If {{RDD\[Option\[...]]}} doesn't make sense in the first place, it should be 
rejected with a "conscious" error rather than failing with an internal 
inconsistency.) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to