Angad Singh created LENS-598:
--------------------------------
Summary: Dimtable selection not honoring requirements of multiple
inter-related join chains properly
Key: LENS-598
URL: https://issues.apache.org/jira/browse/LENS-598
Project: Apache Lens
Issue Type: Bug
Reporter: Angad Singh
Assignee: Amareshwari Sriramadasu
Scenario:
We have a cubeX, dimension1, dimension2
table1, table2 are dimtables of dimension1
table3 is a dimtable of dimension2
We have joinchain1 from cubeX to dimension1 to dimension2, and joinchain2 from
cubeX to dimension1 directly.
Problem is that when a cube query is run which selects columns from cubex,
dimension1 and dimension2, lens should ideally look at requirements of both
joinchain1 and joinchain2 when selecting dimtables of dimension1, but it is not.
At times when it is failing to join table3, it is selecting the wrong dimtable
of dimension1 (table2).
E.g.:
Query:
{noformat}
cube select apps_owned_title from user j where apps_owned.source='raf'
{noformat}
correctly generated hive query (when it selected dimtable table1):
{code:sql}
SELECT ( apps_owned_metadata . apptitle ) FROM
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_bundleid_install
apps_owned on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') join
user.uh1_hdfs_hive_dimtable_uac_app apps_owned_metadata on apps_owned.bundleid
= apps_owned_metadata.bundleid and (apps_owned_metadata.dt = 'latest') WHERE
(((( apps_owned . source ) = 'raf' ) AND ((j.dt = 'latest'))))
{code}
incorrectly generated query:
{code:sql}
SELECT ( apps_owned_metadata . apptitle ) FROM
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install apps_owned
on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') WHERE ((((
apps_owned . source ) = 'raf' ) AND ((apps_owned_metadata.dt = 'latest')) AND
((j.dt = 'latest'))))
{code}
Exception in lensserver logs as a result of above query:
{noformat}
10 Jun 2015 08:56:27,850 [QuerySubmitter] INFO
org.apache.lens.driver.hive.HiveDriver - Hive driver query:INSERT OVERWRITE
DIRECTORY "/tmp/lensreports/hdfsout/1be8556b-64b2-42f0-a32b-45d3be75537f"
SELECT ( apps_owned_metadata . apptitle ) FROM
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_gender_state j join
user.uh1_hdfs_user_attributestore_er_cubetable_dimension_app_install apps_owned
on j.userid = apps_owned.userid and (apps_owned.dt = 'latest') WHERE ((((
apps_owned . source ) = 'raf' ) AND ((apps_owned_metadata.dt = 'latest')) AND
((j.dt = 'latest'))))
10 Jun 2015 08:56:27,851 [QuerySubmitter] INFO
org.apache.lens.driver.hive.HiveDriver - whetherCalculatePriority: true
10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO
org.apache.lens.server.api.priority.QueryPriorityDecider - Deciding Priority
VERY_HIGH since cost = 0.0
10 Jun 2015 08:56:27,852 [QuerySubmitter] WARN
org.apache.hadoop.conf.Configuration - mapred.job.priority is deprecated.
Instead, use mapreduce.job.priority
10 Jun 2015 08:56:27,852 [QuerySubmitter] INFO
org.apache.lens.driver.hive.HiveDriver - set priority to VERY_HIGH
10 Jun 2015 08:56:28,468 [QuerySubmitter] INFO
org.apache.lens.driver.hive.HiveDriver - No hive operation available for
1be8556b-64b2-42f0-a32b-45d3be75537f
10 Jun 2015 08:56:28,469 [QuerySubmitter] ERROR
org.apache.lens.server.query.QueryExecutionServiceImpl - Error launching query
1be8556b-64b2-42f0-a32b-45d3be75537f
org.apache.lens.server.api.error.LensException: Error executing async query
at
org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:524)
at
org.apache.lens.server.query.QueryExecutionServiceImpl$QuerySubmitter.run(QueryExecutionServiceImpl.java:497)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling
statement: FAILED: SemanticException [Error 10004]: Line 1:409 Invalid table
alias or column reference 'apps_owned_metadata': (possible column names are:
j.userid, j.timestamp, j.gender, j.gender_state_score,
j.gender_state_confidence, j.dt, apps_owned.userid, apps_owned.timestamp,
apps_owned.appid, apps_owned.app_install_confidence, apps_owned.dt,
apps_owned.source)
at
org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.checkStatus(ThriftCLIServiceClient.java:52)
at
org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementInternal(ThriftCLIServiceClient.java:151)
at
org.apache.hive.service.cli.thrift.ThriftCLIServiceClient.executeStatementAsync(ThriftCLIServiceClient.java:139)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invokeInternal(RetryingThriftCLIServiceClient.java:301)
at
org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient.invoke(RetryingThriftCLIServiceClient.java:329)
at com.sun.proxy.$Proxy59.executeStatementAsync(Unknown Source)
at
org.apache.hive.service.cli.thrift.RetryingThriftCLIServiceClient$CLIServiceClientWrapper.executeStatementAsync(RetryingThriftCLIServiceClient.java:118)
at
org.apache.lens.driver.hive.HiveDriver.executeAsync(HiveDriver.java:515)
... 2 more
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)