[ 
https://issues.apache.org/jira/browse/LENS-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Angad Singh updated LENS-598:
-----------------------------
    Description: 
Scenario:

We have a cubeX, dimension1, dimension2
table1, table2 are dimtables of dimension1
table3 is a dimtable of dimension2
We have joinchain1 from cubeX to dimension1 to dimension2, and joinchain2 from 
cubeX to dimension1 directly.

Problem is that when a cube query is run which selects columns from cubex, 
dimension1 and dimension2, lens should ideally look at requirements of both 
joinchain1 and joinchain2 when selecting dimtables of dimension1, but it is not.

At times when it is failing to join table3, it is selecting the wrong dimtable 
of dimension1 (table2).

E.g.:

Query:
{noformat}
cube select apps_owned_title from user j where apps_owned.source='raf'
{noformat}

correctly generated hive query (when it selected dimtable table1):
{code:sql}
SELECT ( apps_owned_metadata . apptitle ) FROM 
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join 
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_bundleid_install 
apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') join 
user.uh1_hdfs_hive_dimtable_uac_app apps_owned_metadata on apps_owned.bundleid 
= apps_owned_metadata.bundleid and (apps_owned_metadata.dt = 'latest') WHERE 
(((( apps_owned . source ) =  'raf' ) AND ((j.dt = 'latest'))))
{code}

incorrectly generated query:
{code:sql}
SELECT ( apps_owned_metadata . apptitle ) FROM 
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join 
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install apps_owned 
on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') WHERE (((( 
apps_owned . source ) =  'raf' ) AND ((apps_owned_metadata.dt = 'latest')) AND 
((j.dt = 'latest')))) 
{code}

Exception in lensserver logs as a result of above query:
{noformat}
10 Jun 2015 08:56:27,850 [QuerySubmitter] INFO  
org.apache.lens.driver.hive.HiveDriver  - Hive driver query:INSERT OVERWRITE 
DIRECTORY "/tmp/lensreports/hdfsout/1be8556b-64b2-42f0-a32b-45d3be75537f"  
SELECT ( apps_owned_metadata . apptitle ) FROM 
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join 
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install apps_owned 
on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') WHERE (((( 
apps_owned . source ) =  'raf' ) AND ((apps_owned_metadata.dt = 'latest')) AND 
((j.dt = 'latest')))) 
10 Jun 2015 08:56:27,851 [QuerySubmitter] INFO  
org.apache.lens.driver.hive.HiveDriver  - whetherCalculatePriority: true
10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO  
org.apache.lens.server.api.priority.QueryPriorityDecider  - Deciding Priority 
VERY_HIGH since cost = 0.0
10 Jun 2015 08:56:27,852 [QuerySubmitter] WARN  
org.apache.hadoop.conf.Configuration  - mapred.job.priority is deprecated. 
Instead, use mapreduce.job.priority
10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO  
org.apache.lens.driver.hive.HiveDriver  - set priority to VERY_HIGH
10 Jun 2015 08:56:28,468 [QuerySubmitter] INFO  
org.apache.lens.driver.hive.HiveDriver  - No hive operation available for 
1be8556b-64b2-42f0-a32b-45d3be75537f
10 Jun 2015 08:56:28,469 [QuerySubmitter] ERROR 
org.apache.lens.server.query.QueryExecutionServiceImpl  - Error launching query 
1be8556b-64b2-42f0-a32b-45d3be75537f
org.apache.lens.server.api.error.LensException: Error executing async query
        at 
org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:524)
        at 
org.apache.lens.server.query.QueryExecutionServiceImpl$QuerySubmitter.run(QueryExecutionServiceImpl.java:497)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling 
statement: FAILED: SemanticException [Error 10004]: Line 1:409 Invalid table 
alias or column reference 'apps_owned_metadata': (possible column names are: 
j.userid, j.timestamp, j.gender, j.gender_state_score, 
j.gender_state_confidence, j.dt, apps_owned.userid, apps_owned.timestamp, 
apps_owned.appid, apps_owned.app_install_confidence, apps_owned.dt, 
apps_owned.source)
        at 
org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.checkStatus(ThriftCLIServiceClient.java:52)
        at 
org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementInternal(ThriftCLIServiceClient.java:151)
        at 
org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementAsync(ThriftCLIServiceClient.java:139)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invokeInternal(RetryingThriftCLIServiceClient.java:301)
        at 
org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invoke(RetryingThriftCLIServiceClient.java:329)
        at com.sun.proxy.$Proxy59.executeStatementAsync(Unknown Source)
        at 
org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient$CLIServiceClientWrapper.executeStatementAsync(RetryingThriftCLIServiceClient.java:118)
        at 
org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:515)
        ... 2 more
{noformat}

Note: in above example, cubeX is "user", dimension1 is "user_owned_apps", 
dimension2 is "hive_dimension_uac_app". table 1 is 
"user_attributestore_er_cubetable_dimension_bundleid_install", table2 is 
"user_attributestore_er_cubetable_dimension_app_install", table3 is 
"hive_dimtable_uac_app".

  was:
Scenario:

We have a cubeX, dimension1, dimension2
table1, table2 are dimtables of dimension1
table3 is a dimtable of dimension2
We have joinchain1 from cubeX to dimension1 to dimension2, and joinchain2 from 
cubeX to dimension1 directly.

Problem is that when a cube query is run which selects columns from cubex, 
dimension1 and dimension2, lens should ideally look at requirements of both 
joinchain1 and joinchain2 when selecting dimtables of dimension1, but it is not.

At times when it is failing to join table3, it is selecting the wrong dimtable 
of dimension1 (table2).

E.g.:

Query:
{noformat}
cube select apps_owned_title from user j where apps_owned.source='raf'
{noformat}

correctly generated hive query (when it selected dimtable table1):
{code:sql}
SELECT ( apps_owned_metadata . apptitle ) FROM 
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join 
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_bundleid_install 
apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') join 
user.uh1_hdfs_hive_dimtable_uac_app apps_owned_metadata on apps_owned.bundleid 
= apps_owned_metadata.bundleid and (apps_owned_metadata.dt = 'latest') WHERE 
(((( apps_owned . source ) =  'raf' ) AND ((j.dt = 'latest'))))
{code}

incorrectly generated query:
{code:sql}
SELECT ( apps_owned_metadata . apptitle ) FROM 
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join 
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install apps_owned 
on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') WHERE (((( 
apps_owned . source ) =  'raf' ) AND ((apps_owned_metadata.dt = 'latest')) AND 
((j.dt = 'latest')))) 
{code}

Exception in lensserver logs as a result of above query:
{noformat}
10 Jun 2015 08:56:27,850 [QuerySubmitter] INFO  
org.apache.lens.driver.hive.HiveDriver  - Hive driver query:INSERT OVERWRITE 
DIRECTORY "/tmp/lensreports/hdfsout/1be8556b-64b2-42f0-a32b-45d3be75537f"  
SELECT ( apps_owned_metadata . apptitle ) FROM 
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join 
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install apps_owned 
on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') WHERE (((( 
apps_owned . source ) =  'raf' ) AND ((apps_owned_metadata.dt = 'latest')) AND 
((j.dt = 'latest')))) 
10 Jun 2015 08:56:27,851 [QuerySubmitter] INFO  
org.apache.lens.driver.hive.HiveDriver  - whetherCalculatePriority: true
10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO  
org.apache.lens.server.api.priority.QueryPriorityDecider  - Deciding Priority 
VERY_HIGH since cost = 0.0
10 Jun 2015 08:56:27,852 [QuerySubmitter] WARN  
org.apache.hadoop.conf.Configuration  - mapred.job.priority is deprecated. 
Instead, use mapreduce.job.priority
10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO  
org.apache.lens.driver.hive.HiveDriver  - set priority to VERY_HIGH
10 Jun 2015 08:56:28,468 [QuerySubmitter] INFO  
org.apache.lens.driver.hive.HiveDriver  - No hive operation available for 
1be8556b-64b2-42f0-a32b-45d3be75537f
10 Jun 2015 08:56:28,469 [QuerySubmitter] ERROR 
org.apache.lens.server.query.QueryExecutionServiceImpl  - Error launching query 
1be8556b-64b2-42f0-a32b-45d3be75537f
org.apache.lens.server.api.error.LensException: Error executing async query
        at 
org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:524)
        at 
org.apache.lens.server.query.QueryExecutionServiceImpl$QuerySubmitter.run(QueryExecutionServiceImpl.java:497)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling 
statement: FAILED: SemanticException [Error 10004]: Line 1:409 Invalid table 
alias or column reference 'apps_owned_metadata': (possible column names are: 
j.userid, j.timestamp, j.gender, j.gender_state_score, 
j.gender_state_confidence, j.dt, apps_owned.userid, apps_owned.timestamp, 
apps_owned.appid, apps_owned.app_install_confidence, apps_owned.dt, 
apps_owned.source)
        at 
org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.checkStatus(ThriftCLIServiceClient.java:52)
        at 
org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementInternal(ThriftCLIServiceClient.java:151)
        at 
org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementAsync(ThriftCLIServiceClient.java:139)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invokeInternal(RetryingThriftCLIServiceClient.java:301)
        at 
org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invoke(RetryingThriftCLIServiceClient.java:329)
        at com.sun.proxy.$Proxy59.executeStatementAsync(Unknown Source)
        at 
org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient$CLIServiceClientWrapper.executeStatementAsync(RetryingThriftCLIServiceClient.java:118)
        at 
org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:515)
        ... 2 more
{noformat}


> Dimtable selection not honoring requirements of multiple inter-related join 
> chains properly
> -------------------------------------------------------------------------------------------
>
>                 Key: LENS-598
>                 URL: https://issues.apache.org/jira/browse/LENS-598
>             Project: Apache Lens
>          Issue Type: Bug
>            Reporter: Angad Singh
>            Assignee: Amareshwari Sriramadasu
>
> Scenario:
> We have a cubeX, dimension1, dimension2
> table1, table2 are dimtables of dimension1
> table3 is a dimtable of dimension2
> We have joinchain1 from cubeX to dimension1 to dimension2, and joinchain2 
> from cubeX to dimension1 directly.
> Problem is that when a cube query is run which selects columns from cubex, 
> dimension1 and dimension2, lens should ideally look at requirements of both 
> joinchain1 and joinchain2 when selecting dimtables of dimension1, but it is 
> not.
> At times when it is failing to join table3, it is selecting the wrong 
> dimtable of dimension1 (table2).
> E.g.:
> Query:
> {noformat}
> cube select apps_owned_title from user j where apps_owned.source='raf'
> {noformat}
> correctly generated hive query (when it selected dimtable table1):
> {code:sql}
> SELECT ( apps_owned_metadata . apptitle ) FROM 
> user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join 
> user.uh1_hdfs_user_attributestore_er_cubetable_dimension_bundleid_install 
> apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') 
> join user.uh1_hdfs_hive_dimtable_uac_app apps_owned_metadata on 
> apps_owned.bundleid = apps_owned_metadata.bundleid and 
> (apps_owned_metadata.dt = 'latest') WHERE (((( apps_owned . source ) =  'raf' 
> ) AND ((j.dt = 'latest'))))
> {code}
> incorrectly generated query:
> {code:sql}
> SELECT ( apps_owned_metadata . apptitle ) FROM 
> user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join 
> user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install 
> apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') 
> WHERE (((( apps_owned . source ) =  'raf' ) AND ((apps_owned_metadata.dt = 
> 'latest')) AND ((j.dt = 'latest')))) 
> {code}
> Exception in lensserver logs as a result of above query:
> {noformat}
> 10 Jun 2015 08:56:27,850 [QuerySubmitter] INFO  
> org.apache.lens.driver.hive.HiveDriver  - Hive driver query:INSERT OVERWRITE 
> DIRECTORY "/tmp/lensreports/hdfsout/1be8556b-64b2-42f0-a32b-45d3be75537f"  
> SELECT ( apps_owned_metadata . apptitle ) FROM 
> user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join 
> user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install 
> apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') 
> WHERE (((( apps_owned . source ) =  'raf' ) AND ((apps_owned_metadata.dt = 
> 'latest')) AND ((j.dt = 'latest')))) 
> 10 Jun 2015 08:56:27,851 [QuerySubmitter] INFO  
> org.apache.lens.driver.hive.HiveDriver  - whetherCalculatePriority: true
> 10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO  
> org.apache.lens.server.api.priority.QueryPriorityDecider  - Deciding Priority 
> VERY_HIGH since cost = 0.0
> 10 Jun 2015 08:56:27,852 [QuerySubmitter] WARN  
> org.apache.hadoop.conf.Configuration  - mapred.job.priority is deprecated. 
> Instead, use mapreduce.job.priority
> 10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO  
> org.apache.lens.driver.hive.HiveDriver  - set priority to VERY_HIGH
> 10 Jun 2015 08:56:28,468 [QuerySubmitter] INFO  
> org.apache.lens.driver.hive.HiveDriver  - No hive operation available for 
> 1be8556b-64b2-42f0-a32b-45d3be75537f
> 10 Jun 2015 08:56:28,469 [QuerySubmitter] ERROR 
> org.apache.lens.server.query.QueryExecutionServiceImpl  - Error launching 
> query 1be8556b-64b2-42f0-a32b-45d3be75537f
> org.apache.lens.server.api.error.LensException: Error executing async query
>       at 
> org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:524)
>       at 
> org.apache.lens.server.query.QueryExecutionServiceImpl$QuerySubmitter.run(QueryExecutionServiceImpl.java:497)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hive.service.cli.HiveSQLException: Error while 
> compiling statement: FAILED: SemanticException [Error 10004]: Line 1:409 
> Invalid table alias or column reference 'apps_owned_metadata': (possible 
> column names are: j.userid, j.timestamp, j.gender, j.gender_state_score, 
> j.gender_state_confidence, j.dt, apps_owned.userid, apps_owned.timestamp, 
> apps_owned.appid, apps_owned.app_install_confidence, apps_owned.dt, 
> apps_owned.source)
>       at 
> org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.checkStatus(ThriftCLIServiceClient.java:52)
>       at 
> org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementInternal(ThriftCLIServiceClient.java:151)
>       at 
> org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementAsync(ThriftCLIServiceClient.java:139)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invokeInternal(RetryingThriftCLIServiceClient.java:301)
>       at 
> org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invoke(RetryingThriftCLIServiceClient.java:329)
>       at com.sun.proxy.$Proxy59.executeStatementAsync(Unknown Source)
>       at 
> org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient$CLIServiceClientWrapper.executeStatementAsync(RetryingThriftCLIServiceClient.java:118)
>       at 
> org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:515)
>       ... 2 more
> {noformat}
> Note: in above example, cubeX is "user", dimension1 is "user_owned_apps", 
> dimension2 is "hive_dimension_uac_app". table 1 is 
> "user_attributestore_er_cubetable_dimension_bundleid_install", table2 is 
> "user_attributestore_er_cubetable_dimension_app_install", table3 is 
> "hive_dimtable_uac_app".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to