[ 
https://issues.apache.org/jira/browse/DRILL-4211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120887#comment-16120887
 ] 

Timothy Farkas commented on DRILL-4211:
---------------------------------------

There seem to be two things going wrong here:

# Doing joins on two tables that have columns with the same names causes 
incorrect nulls in the results DRILL-5713
# Column aliases are not pushed down to the JDBCReader, so the columns returned 
from postgres have the original names. The Project operators in Drill then 
proceed to use the aliased column names to do operations on batches. The 
aliased names ofcourse don't exist in the schema, so nulls are returned for the 
aliased column.

I'll make this ticket focus on the aliasing issue, and I'll fix the column name 
conflict issue in DRILL-5713

> Inconsistent results from a joined sql statement to postgres tables
> -------------------------------------------------------------------
>
>                 Key: DRILL-4211
>                 URL: https://issues.apache.org/jira/browse/DRILL-4211
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.3.0, 1.11.0
>         Environment: Postgres db stroage
>            Reporter: Robert Hamilton-Smith
>            Assignee: Timothy Farkas
>              Labels: newbie
>
> When making an sql statement that incorporates a join to a table and then a 
> self join to that table to get a parent value , Drill brings back 
> inconsistent results. 
> Here is the sql in postgres with correct output:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from transactions trx
> join categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join categories w1 on (cat.categoryparentguid = w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL;
> {code}
> Output:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|food&Dining|
> |id1|restaurants|food&Dining|
> |id2|Coffee Shops|food&Dining|
> |id2|Coffee Shops|food&Dining|
> When run in Drill with correct storage prefix:
> {code:sql}
> select trx.categoryguid,
> cat.categoryname, w1.categoryname as parentcat
> from db.schema.transactions trx
> join db.schema.categories cat on (cat.CATEGORYGUID = trx.CATEGORYGUID)
> join db.schema.wpfm_categories w1 on (cat.categoryparentguid = 
> w1.categoryguid)
> where cat.categoryparentguid IS NOT NULL
> {code}
> Results are:
> ||categoryid||categoryname||parentcategory||
> |id1|restaurants|null|
> |id1|restaurants|null|
> |id2|Coffee Shops|null|
> |id2|Coffee Shops|null|
> Physical plan is:
> {code:sql}
> 00-00    Screen : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) 
> categoryname, VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = 
> {110.0 rows, 110.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64293
> 00-01      Project(categoryguid=[$0], categoryname=[$1], parentcat=[$2]) : 
> rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64292
> 00-02        Project(categoryguid=[$9], categoryname=[$41], parentcat=[$47]) 
> : rowType = RecordType(VARCHAR(50) categoryguid, VARCHAR(50) categoryname, 
> VARCHAR(50) parentcat): rowcount = 100.0, cumulative cost = {100.0 rows, 
> 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64291
> 00-03          Jdbc(sql=[SELECT *
> FROM "public"."transactions"
> INNER JOIN (SELECT *
> FROM "public"."categories"
> WHERE "categoryparentguid" IS NOT NULL) AS "t" ON 
> "transactions"."categoryguid" = "t"."categoryguid"
> INNER JOIN "public"."categories" AS "categories0" ON "t"."categoryparentguid" 
> = "categories0"."categoryguid"]) : rowType = RecordType(VARCHAR(255) 
> transactionguid, VARCHAR(255) relatedtransactionguid, VARCHAR(255) 
> transactioncode, DECIMAL(1, 0) transactionpending, VARCHAR(50) 
> transactionrefobjecttype, VARCHAR(255) transactionrefobjectguid, 
> VARCHAR(1024) transactionrefobjectvalue, TIMESTAMP(6) transactiondate, 
> VARCHAR(256) transactiondescription, VARCHAR(50) categoryguid, VARCHAR(3) 
> transactioncurrency, DECIMAL(15, 3) transactionoldbalance, DECIMAL(13, 3) 
> transactionamount, DECIMAL(15, 3) transactionnewbalance, VARCHAR(512) 
> transactionnotes, DECIMAL(2, 0) transactioninstrumenttype, VARCHAR(20) 
> transactioninstrumentsubtype, VARCHAR(20) transactioninstrumentcode, 
> VARCHAR(50) transactionorigpartyguid, VARCHAR(255) 
> transactionorigaccountguid, VARCHAR(50) transactionrecpartyguid, VARCHAR(255) 
> transactionrecaccountguid, VARCHAR(256) transactionstatementdesc, DECIMAL(1, 
> 0) transactionsplit, DECIMAL(1, 0) transactionduplicated, DECIMAL(1, 0) 
> transactionrecategorized, TIMESTAMP(6) transactioncreatedat, TIMESTAMP(6) 
> transactionupdatedat, VARCHAR(50) transactionmatrulerefobjtype, VARCHAR(50) 
> transactionmatrulerefobjguid, VARCHAR(50) transactionmatrulerefobjvalue, 
> VARCHAR(50) transactionuserruleguid, DECIMAL(2, 0) transactionsplitorder, 
> TIMESTAMP(6) transactionprocessedat, TIMESTAMP(6) 
> transactioncategoryassignat, VARCHAR(50) transactionsystemcategoryguid, 
> VARCHAR(50) transactionorigmandateid, VARCHAR(100) fingerprint, VARCHAR(50) 
> categoryguid0, VARCHAR(50) categoryparentguid, DECIMAL(3, 0) categorytype, 
> VARCHAR(50) categoryname, VARCHAR(50) categorydescription, VARCHAR(50) 
> partyguid, VARCHAR(50) categoryguid1, VARCHAR(50) categoryparentguid0, 
> DECIMAL(3, 0) categorytype0, VARCHAR(50) categoryname0, VARCHAR(50) 
> categorydescription0, VARCHAR(50) partyguid0): rowcount = 100.0, cumulative 
> cost = {100.0 rows, 100.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 64259
> {code}
> I worked around it by creating a view on postgres but not ideal. Thanks in 
> advance.
> First Drill Jira Bug.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to