[jira] [Commented] (KUDU-1493) Spark read fails if key columns are not leading columns

Tom White (JIRA) Fri, 24 Jun 2016 02:35:49 -0700

    [ 
https://issues.apache.org/jira/browse/KUDU-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348067#comment-15348067
 ]


Tom White commented on KUDU-1493:
---------------------------------

Hi [~andygrove73] - thanks for taking a look. What's confusing is that the 
{{StructType}} that used to write with may have a different field order to the 
one that used to read back with {{StructType}}, so you should always use the 
one from the dataframe when reading. If you access fields by name then there's 
no problem, but I rely on ordering since I am flattening schemas (and rows) to 
work around the fact that Kudu doesn't supported nested or complex types... In 
the end I worked around this by reordering the row I get back to conform to my 
original {{StructType}}. It works, but I feel like it breaks the principle of 
least surprise.

I wondered if there was a way for the dataframe to return the original 
{{StructType}} used for writing, but there's no obvious place to store it. 
Perhaps there is a way though?

> Spark read fails if key columns are not leading columns
> -------------------------------------------------------
>
>                 Key: KUDU-1493
>                 URL: https://issues.apache.org/jira/browse/KUDU-1493
>             Project: Kudu
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 0.9.0
>            Reporter: Tom White
>            Assignee: Andy Grove
>
> If the Spark dataframe schema is (A, B, C) then reading will fail if the Kudu 
> keys are (A, C). Keys (A, B) work fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KUDU-1493) Spark read fails if key columns are not leading columns

Reply via email to