[
https://issues.apache.org/jira/browse/DRILL-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aman Sinha reassigned DRILL-1401:
---------------------------------
Assignee: Aman Sinha
> Wrong result with csv data when projecting columns not part of the Join or
> Filter
> ---------------------------------------------------------------------------------
>
> Key: DRILL-1401
> URL: https://issues.apache.org/jira/browse/DRILL-1401
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 0.5.0
> Reporter: Chun Chang
> Assignee: Aman Sinha
> Attachments: aggregate_100r.csv
>
>
> #Wed Sep 10 13:35:05 PDT 2014
> git.commit.id.abbrev=686eb9e
> join (inner,full outer,left,right) does not work if the join is directly
> applied to csv files. It will produce either null values on the projected
> columns or NumberFormatException depends on joining condition applied on
> which columns (first or second columns). If you create views on the csv
> files, then join works on the views.
> For example, the following join give null values:
> 0: jdbc:drill:schema=dfs> select cast(`aggregate_100r.csv`.columns[0] as
> int), cast(`aggregate_100r.csv`.columns[1] as int),
> cast(`join_100r.csv`.columns[1] as int) from `aggregate_100r.csv` inner join
> `join_100r.csv` on cast(`aggregate_100r.csv`.columns[0] as int) =
> cast(`join_100r.csv`.columns[0] as int);
> +------------+------------+------------+
> | EXPR$0 | EXPR$1 | EXPR$2 |
> +------------+------------+------------+
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 0 | null | null |
> | 1 | null | null |
> | 1 | null | null |
> | 1 | null | null |
> | 1 | null | null |
> The following give NumberFormatException:
> 0: jdbc:drill:schema=dfs> select cast(`aggregate_100r.csv`.columns[0] as
> int), cast(`aggregate_100r.csv`.columns[1] as int),
> cast(`join_100r.csv`.columns[1] as int) from `aggregate_100r.csv` inner join
> `join_100r.csv` on cast(`aggregate_100r.csv`.columns[1] as int) =
> cast(`join_100r.csv`.columns[1] as int);
> Query failed: Failure while running fragment.
> [3f67299e-f312-445b-8d6b-74984a820f0c]
> Error: exception while executing query: Failure while trying to get next
> result batch. (state=,code=0)
> The following with views works:
> 0: jdbc:drill:schema=dfs> select `aggregate_100r_v`.c0,
> `aggregate_100r_v`.c1, `join_100r_v`.c1 from `aggregate_100r_v` inner join
> `join_100r_v` on `aggregate_100r_v`.c0 = `join_100r_v`.c0;
> +------------+------------+------------+
> | c0 | c1 | c10 |
> +------------+------------+------------+
> | 0 | 0 | 0 |
> | 0 | 0 | 2 |
> | 0 | 0 | 1 |
> | 0 | 0 | 0 |
> | 0 | 0 | 0 |
> | 0 | 0 | 2 |
> | 0 | 0 | 1 |
> | 0 | 0 | 0 |
> | 0 | 1 | 0 |
> | 0 | 1 | 2 |
> | 0 | 1 | 1 |
> | 0 | 1 | 0 |
> | 0 | 2 | 0 |
> | 0 | 2 | 2 |
> | 0 | 2 | 1 |
> | 0 | 2 | 0 |
> | 1 | 1 | 1 |
> | 1 | 1 | 2 |
> | 1 | 1 | 1 |
> | 1 | 1 | 0 |
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)