Daniel Barclay created SPARK-16743: -------------------------------------- Summary: converter and access code out of sync: createDataFrame on RDD[Option[C]] fails with MatchError Key: SPARK-16743 URL: https://issues.apache.org/jira/browse/SPARK-16743 Project: Spark Issue Type: Bug Affects Versions: 1.6.2, 1.6.1 Reporter: Daniel Barclay
Calling {{SqlContext}}'s {{createDataFrame}} on an RDD of type {{RDD\[Option\[SomeUserClass]]}} leads to an internal error. For example, if the first field of {{SomeUserClass}} is of type {{String}}, evaluating the RDD yields a {{MatchError}} referring an instance of {{SomeUserClass}} in {{org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl}} (which should have been passed only certain kinds of representations of strings). The problem seems to be in {{ExistingRDD.scala}}'s {{RDDConversions.productToRowAdd(...)}}: It has a list of converters that reflects the list of members of {{SomeUserClass}} (looking past the {{Option}} part of the RDD record type {{Option\[SomeUserClass]}}). However, the data-access code ({{r.productElement\(i)}}) does not seem to look past the {{Option}} part correspondingly. (It does not seem to also traverse from the Some instance to the {{SomeUserClass)}}.) Therefore, it ends up passing the instance of {{SomeUserClass}} to the converter intended for the first member field of {{SomeUserClass}} (e.g., a String converter), yielding an internal error. (If {{RDD\[Option\[...]]}} doesn't make sense in the first place, it should be rejected with a "conscious" error rather than failing with an internal inconsistency.) -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org