[ 
https://issues.apache.org/jira/browse/DRILL-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14099390#comment-14099390
 ] 

Aman Sinha commented on DRILL-792:
----------------------------------

This issue seems to have been fixed.  On the latest master branch, commit #: 
[687b9b0]   I cannot reproduce it.  Since there is no EXPLAIN plan attached 
with this bug I cannot compare the plans.  I ran the following queries for 
validation: 

Join Hive table with parquet file :

0: jdbc:drill:zk=local> select count(*) from hive.uservisits uservisits, 
dfs.`/Users/asinha/data/url-data/rankings.parquet` rankings where 
uservisits.destinationurl = rankings.pageURL;
+------------+
|   EXPR$0   |
+------------+
| 46         |
+------------+

Validate above results by joining the uservisits and rankings tables within 
Hive:

0: jdbc:drill:zk=local> select count(*) from hive.uservisits uservisits, 
hive.rankings rankings where uservisits.destinationurl = rankings.pageurl;
+------------+
|   EXPR$0   |
+------------+
| 46         |
+------------+

Note that the above queries return the same COUNT values.  

Also check the actual pageRank column data from the join: 

0: jdbc:drill:zk=local> select rankings.pageRank from hive.uservisits 
uservisits, dfs.`/Users/asinha/data/url-data/rankings.parquet` rankings where 
uservisits.destinationurl = rankings.pageURL order by rankings.pageRank limit 
10;
+------------+
|  pageRank  |
+------------+
| 8          |
| 8          |
| 9          |
| 9          |
| 9          |
| 9          |
| 9          |
| 11         |
| 11         |
| 12         |
+------------+

> Joining a hive table with parquet file is returning an empty result set
> -----------------------------------------------------------------------
>
>                 Key: DRILL-792
>                 URL: https://issues.apache.org/jira/browse/DRILL-792
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>            Reporter: Rahul Challapalli
>            Assignee: Aman Sinha
>            Priority: Critical
>             Fix For: 0.5.0
>
>         Attachments: 792.ddl, rankings.parquet, rankings.txt, 
> uservisits.parquet, uservisits.txt
>
>
> git.commit.id.abbrev=70fab8c
> 1. Joining a hive table with parquet results in an empty output. Check below 
> query
> select rankings.pageRank pagerank from `dfs/parquet/rankings/` rankings inner 
> join hive.uservisits uservisits on rankings.pageURL = 
> uservisits.destinationurl
> 2. Joining hive table with hive table seems to work fine
> select rankings.pagerank pagerank from hive.rankings rankings inner join 
> hive.uservisits uservisits on rankings.pageurl = uservisits.destinationurl
> I attached the parquet and text files required along with the required hive 
> ddl. Let me know if you need more information.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to