[jira] [Comment Edited] (KUDU-1493) Spark read fails if key columns are not leading columns

Andy Grove (JIRA) Fri, 24 Jun 2016 07:42:19 -0700

    [ 
https://issues.apache.org/jira/browse/KUDU-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15348352#comment-15348352
 ]


Andy Grove edited comment on KUDU-1493 at 6/24/16 2:41 PM:
-----------------------------------------------------------

One application could write many DataFrames with different column ordering to 
the same table. The read operation should always return the columns in the 
order that you specify in your projection. If you don't provide a projection 
then I would expect the columns to be returned in the order they are defined in 
the kudu schema. As far as I know, this is the current behavior and is correct, 
in my opinion.

If you rely on ordering you should apply a projection onto the RDD that you 
read from Kudu e.g. "SELECT c, b, a FROM kudu_table" if using Spark SQL, rather 
than "SELECT * FROM kudu_table".

Databases usually make no guarantees about row or column ordering unless you 
are explicit in your query.




was (Author: andygrove):
One application could write many DataFrames with different column ordering to 
the same table. The read operation should always return the columns in the 
order that you specify in your projection. If you don't provide a projection 
then I would expect the columns to be returned in the order they are defined in 
the kudu schema. As far as I know, this is the current behavior and is correct, 
in my opinion.

If you rely on ordering you should apply a projection onto the RDD that you 
read from Kudu e.g. "SELECT c, b, a FROM kudu_table" if using SparkSQL, rather 
than "SELECT * FROM kudu_table".

SQL databases usually make no guarantees about row or column ordering unless 
you are explicit in your query.



> Spark read fails if key columns are not leading columns
> -------------------------------------------------------
>
>                 Key: KUDU-1493
>                 URL: https://issues.apache.org/jira/browse/KUDU-1493
>             Project: Kudu
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 0.9.0
>            Reporter: Tom White
>            Assignee: Andy Grove
>
> If the Spark dataframe schema is (A, B, C) then reading will fail if the Kudu 
> keys are (A, C). Keys (A, B) work fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KUDU-1493) Spark read fails if key columns are not leading columns

Reply via email to