[
https://issues.apache.org/jira/browse/KUDU-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348067#comment-15348067
]
Tom White commented on KUDU-1493:
---------------------------------
Hi [~andygrove73] - thanks for taking a look. What's confusing is that the
{{StructType}} that used to write with may have a different field order to the
one that used to read back with {{StructType}}, so you should always use the
one from the dataframe when reading. If you access fields by name then there's
no problem, but I rely on ordering since I am flattening schemas (and rows) to
work around the fact that Kudu doesn't supported nested or complex types... In
the end I worked around this by reordering the row I get back to conform to my
original {{StructType}}. It works, but I feel like it breaks the principle of
least surprise.
I wondered if there was a way for the dataframe to return the original
{{StructType}} used for writing, but there's no obvious place to store it.
Perhaps there is a way though?
> Spark read fails if key columns are not leading columns
> -------------------------------------------------------
>
> Key: KUDU-1493
> URL: https://issues.apache.org/jira/browse/KUDU-1493
> Project: Kudu
> Issue Type: Bug
> Components: spark
> Affects Versions: 0.9.0
> Reporter: Tom White
> Assignee: Andy Grove
>
> If the Spark dataframe schema is (A, B, C) then reading will fail if the Kudu
> keys are (A, C). Keys (A, B) work fine.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)