[
https://issues.apache.org/jira/browse/KUDU-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346494#comment-15346494
]
Tom White commented on KUDU-1493:
---------------------------------
Key columns in Kudu must be declared before other, non-key columns. To cope
with this constraint the write side of the Spark-Kudu integration is careful to
map Spark dataframe field indexes to Kudu column indexes:
https://github.com/apache/incubator-kudu/blob/master/java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduContext.scala#L152-L154
However, on the read side Spark dataframe and Kudu indexes are conflated:
https://github.com/apache/incubator-kudu/blob/master/java/kudu-spark/src/main/scala/org/kududb/spark/kudu/KuduRDD.scala#L114-L128
The code fails if key columns are not declared first - or worse, data will be
read incorrectly if the types happen to be the same for the permuted fields.
The fix is to do the reverse mapping on the read side.
> Spark read fails if key columns are not leading columns
> -------------------------------------------------------
>
> Key: KUDU-1493
> URL: https://issues.apache.org/jira/browse/KUDU-1493
> Project: Kudu
> Issue Type: Bug
> Components: spark
> Affects Versions: 0.9.0
> Reporter: Tom White
>
> If the Spark dataframe schema is (A, B, C) then reading will fail if the Kudu
> keys are (A, C). Keys (A, B) work fine.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)