[
https://issues.apache.org/jira/browse/DRILL-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099390#comment-14099390
]
Aman Sinha commented on DRILL-792:
----------------------------------
This issue seems to have been fixed. On the latest master branch, commit #:
[687b9b0] I cannot reproduce it. Since there is no EXPLAIN plan attached
with this bug I cannot compare the plans. I ran the following queries for
validation:
Join Hive table with parquet file :
0: jdbc:drill:zk=local> select count(*) from hive.uservisits uservisits,
dfs.`/Users/asinha/data/url-data/rankings.parquet` rankings where
uservisits.destinationurl = rankings.pageURL;
+------------+
| EXPR$0 |
+------------+
| 46 |
+------------+
Validate above results by joining the uservisits and rankings tables within
Hive:
0: jdbc:drill:zk=local> select count(*) from hive.uservisits uservisits,
hive.rankings rankings where uservisits.destinationurl = rankings.pageurl;
+------------+
| EXPR$0 |
+------------+
| 46 |
+------------+
Note that the above queries return the same COUNT values.
Also check the actual pageRank column data from the join:
0: jdbc:drill:zk=local> select rankings.pageRank from hive.uservisits
uservisits, dfs.`/Users/asinha/data/url-data/rankings.parquet` rankings where
uservisits.destinationurl = rankings.pageURL order by rankings.pageRank limit
10;
+------------+
| pageRank |
+------------+
| 8 |
| 8 |
| 9 |
| 9 |
| 9 |
| 9 |
| 9 |
| 11 |
| 11 |
| 12 |
+------------+
> Joining a hive table with parquet file is returning an empty result set
> -----------------------------------------------------------------------
>
> Key: DRILL-792
> URL: https://issues.apache.org/jira/browse/DRILL-792
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Reporter: Rahul Challapalli
> Assignee: Aman Sinha
> Priority: Critical
> Fix For: 0.5.0
>
> Attachments: 792.ddl, rankings.parquet, rankings.txt,
> uservisits.parquet, uservisits.txt
>
>
> git.commit.id.abbrev=70fab8c
> 1. Joining a hive table with parquet results in an empty output. Check below
> query
> select rankings.pageRank pagerank from `dfs/parquet/rankings/` rankings inner
> join hive.uservisits uservisits on rankings.pageURL =
> uservisits.destinationurl
> 2. Joining hive table with hive table seems to work fine
> select rankings.pagerank pagerank from hive.rankings rankings inner join
> hive.uservisits uservisits on rankings.pageurl = uservisits.destinationurl
> I attached the parquet and text files required along with the required hive
> ddl. Let me know if you need more information.
--
This message was sent by Atlassian JIRA
(v6.2#6252)