[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug
[ https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332713#comment-16332713 ] ASF subversion and git services commented on ASTERIXDB-2199: Commit 0a7233d8f810cb1da46361ab3cd1ed57e8efdf65 in asterixdb's branch refs/heads/master from [~sjaco002] [ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=0a7233d ] [ASTERIXDB-2199][COMP] Fix PushFieldAccessRule for nested partitioning keys Fixes an issue where nested partitioning keys were ignored by PushFieldAccessRule Added Test and fixed changed plans Change-Id: I874c1fd15719b6bdeb7b0913fbafc04a58d32ed4 Reviewed-on: https://asterix-gerrit.ics.uci.edu/2246 Integration-Tests: JenkinsTested-by: Jenkins Contrib: Jenkins Reviewed-by: Ildar Absalyamov > Nested primary key and hash repartitioning bug > --- > > Key: ASTERIXDB-2199 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB >Reporter: Shiva Jahangiri >Assignee: Steven Jacobs >Priority: Major > > If a join is happening on primary keys of two tables, no hash partitioning > should happen. Having the following DDL(Note that primary key of Friendship2 > is string): > DROP DATAVERSE Facebook IF EXISTS; > CREATE DATAVERSE Facebook; > Use Facebook; > CREATE TYPE FriendshipType AS closed { > id:string, > friends :[string] > }; > CREATE DATASET Friendship2(FriendshipType) > PRIMARY KEY id; > insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, > {"id":"2","friends" : [ "4","5","6"]} > ]); > By running the following query: > Use Facebook; > select * from Friendship2 first, Friendship2 second where first.id = > second.id; > we can see that there is no hash partitioning happening in optimized logical > plan which is correct as join is happening on the primary key of both > relations and data is already partitioned on primary key: > { > "operator":"distribute-result", > "expressions":"$$9", > "operatorId" : "1.1", > "physical-operator":"DISTRIBUTE_RESULT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.2", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"project", >"variables" :["$$9"], >"operatorId" : "1.3", >"physical-operator":"STREAM_PROJECT", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"assign", > "variables" :["$$9"], > "expressions":"{ first : $$first, second : $$second}", > "operatorId" : "1.4", > "physical-operator":"ASSIGN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"project", > "variables" :["$$first","$$second"], > "operatorId" : "1.5", > "physical-operator":"STREAM_PROJECT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.6", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"join", >"condition":"eq($$10, $$11)", >"operatorId" : "1.7", >"physical-operator":"HYBRID_HASH_JOIN > [$$10][$$11]", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"exchange", > "operatorId" : "1.8", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"data-scan", > "variables" :["$$10","$$first"], > "data-source":"Facebook.Friendship2", > "operatorId" : "1.9", >
[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug
[ https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310011#comment-16310011 ] Steven Jacobs commented on ASTERIXDB-2199: -- [~wyk] Can you go ahead and file this separately and assign to me? I'll go ahead and add it to the same CR though since it's in the same place of the code. > Nested primary key and hash repartitioning bug > --- > > Key: ASTERIXDB-2199 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB >Reporter: Shiva Jahangiri >Assignee: Steven Jacobs > > If a join is happening on primary keys of two tables, no hash partitioning > should happen. Having the following DDL(Note that primary key of Friendship2 > is string): > DROP DATAVERSE Facebook IF EXISTS; > CREATE DATAVERSE Facebook; > Use Facebook; > CREATE TYPE FriendshipType AS closed { > id:string, > friends :[string] > }; > CREATE DATASET Friendship2(FriendshipType) > PRIMARY KEY id; > insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, > {"id":"2","friends" : [ "4","5","6"]} > ]); > By running the following query: > Use Facebook; > select * from Friendship2 first, Friendship2 second where first.id = > second.id; > we can see that there is no hash partitioning happening in optimized logical > plan which is correct as join is happening on the primary key of both > relations and data is already partitioned on primary key: > { > "operator":"distribute-result", > "expressions":"$$9", > "operatorId" : "1.1", > "physical-operator":"DISTRIBUTE_RESULT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.2", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"project", >"variables" :["$$9"], >"operatorId" : "1.3", >"physical-operator":"STREAM_PROJECT", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"assign", > "variables" :["$$9"], > "expressions":"{ first : $$first, second : $$second}", > "operatorId" : "1.4", > "physical-operator":"ASSIGN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"project", > "variables" :["$$first","$$second"], > "operatorId" : "1.5", > "physical-operator":"STREAM_PROJECT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.6", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"join", >"condition":"eq($$10, $$11)", >"operatorId" : "1.7", >"physical-operator":"HYBRID_HASH_JOIN > [$$10][$$11]", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"exchange", > "operatorId" : "1.8", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"data-scan", > "variables" :["$$10","$$first"], > "data-source":"Facebook.Friendship2", > "operatorId" : "1.9", > > "physical-operator":"DATASOURCE_SCAN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.10", > > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ >
[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug
[ https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306655#comment-16306655 ] Wail Alkowaileet commented on ASTERIXDB-2199: - Yes, similar to the issue in this JIRA. However, the fix doesn't recognize the key expression if it's in the form {noformat} $assignVar.getField(0) {noformat} where $assingVar = $unnestVar.getField(0) it only recognizes it if it is inlined nested field access {noformat} $unnestVar.getField(0).getField(0) {noformat} where $unnestVar is the scan variable from data-scan > Nested primary key and hash repartitioning bug > --- > > Key: ASTERIXDB-2199 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB >Reporter: Shiva Jahangiri >Assignee: Steven Jacobs > > If a join is happening on primary keys of two tables, no hash partitioning > should happen. Having the following DDL(Note that primary key of Friendship2 > is string): > DROP DATAVERSE Facebook IF EXISTS; > CREATE DATAVERSE Facebook; > Use Facebook; > CREATE TYPE FriendshipType AS closed { > id:string, > friends :[string] > }; > CREATE DATASET Friendship2(FriendshipType) > PRIMARY KEY id; > insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, > {"id":"2","friends" : [ "4","5","6"]} > ]); > By running the following query: > Use Facebook; > select * from Friendship2 first, Friendship2 second where first.id = > second.id; > we can see that there is no hash partitioning happening in optimized logical > plan which is correct as join is happening on the primary key of both > relations and data is already partitioned on primary key: > { > "operator":"distribute-result", > "expressions":"$$9", > "operatorId" : "1.1", > "physical-operator":"DISTRIBUTE_RESULT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.2", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"project", >"variables" :["$$9"], >"operatorId" : "1.3", >"physical-operator":"STREAM_PROJECT", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"assign", > "variables" :["$$9"], > "expressions":"{ first : $$first, second : $$second}", > "operatorId" : "1.4", > "physical-operator":"ASSIGN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"project", > "variables" :["$$first","$$second"], > "operatorId" : "1.5", > "physical-operator":"STREAM_PROJECT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.6", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"join", >"condition":"eq($$10, $$11)", >"operatorId" : "1.7", >"physical-operator":"HYBRID_HASH_JOIN > [$$10][$$11]", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"exchange", > "operatorId" : "1.8", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"data-scan", > "variables" :["$$10","$$first"], > "data-source":"Facebook.Friendship2", > "operatorId" : "1.9", > > "physical-operator":"DATASOURCE_SCAN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.10", >
[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug
[ https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306415#comment-16306415 ] Steven Jacobs commented on ASTERIXDB-2199: -- I'm guessing that this bug happens in Master as well? If so, I guess it would be another issue? > Nested primary key and hash repartitioning bug > --- > > Key: ASTERIXDB-2199 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB >Reporter: Shiva Jahangiri >Assignee: Steven Jacobs > > If a join is happening on primary keys of two tables, no hash partitioning > should happen. Having the following DDL(Note that primary key of Friendship2 > is string): > DROP DATAVERSE Facebook IF EXISTS; > CREATE DATAVERSE Facebook; > Use Facebook; > CREATE TYPE FriendshipType AS closed { > id:string, > friends :[string] > }; > CREATE DATASET Friendship2(FriendshipType) > PRIMARY KEY id; > insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, > {"id":"2","friends" : [ "4","5","6"]} > ]); > By running the following query: > Use Facebook; > select * from Friendship2 first, Friendship2 second where first.id = > second.id; > we can see that there is no hash partitioning happening in optimized logical > plan which is correct as join is happening on the primary key of both > relations and data is already partitioned on primary key: > { > "operator":"distribute-result", > "expressions":"$$9", > "operatorId" : "1.1", > "physical-operator":"DISTRIBUTE_RESULT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.2", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"project", >"variables" :["$$9"], >"operatorId" : "1.3", >"physical-operator":"STREAM_PROJECT", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"assign", > "variables" :["$$9"], > "expressions":"{ first : $$first, second : $$second}", > "operatorId" : "1.4", > "physical-operator":"ASSIGN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"project", > "variables" :["$$first","$$second"], > "operatorId" : "1.5", > "physical-operator":"STREAM_PROJECT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.6", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"join", >"condition":"eq($$10, $$11)", >"operatorId" : "1.7", >"physical-operator":"HYBRID_HASH_JOIN > [$$10][$$11]", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"exchange", > "operatorId" : "1.8", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"data-scan", > "variables" :["$$10","$$first"], > "data-source":"Facebook.Friendship2", > "operatorId" : "1.9", > > "physical-operator":"DATASOURCE_SCAN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.10", > > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >
[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug
[ https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305941#comment-16305941 ] Wail Alkowaileet commented on ASTERIXDB-2199: - [~dtabass] It's a different issue as Steven pointed out... [~sjaco002] I found another bug in the current fix in (https://asterix-gerrit.ics.uci.edu/#/c/2246/). I'm not sure if I should file it as a different issue. The issue is when the there's a common expression with the key getField() expression and the key getField() is no longer inlined. Reproduce DDL: {noformat} DROP DATAVERSE Facebook IF EXISTS; CREATE DATAVERSE Facebook; Use Facebook; CREATE TYPE PersonType AS closed { id:string, name:string } ; CREATE TYPE FriendshipType AS closed { person : PersonType, Friends :[PersonType] } ; /* Creating Datasets */ CREATE DATASET Person(PersonType) PRIMARY KEY id; CREATE DATASET Friendship(FriendshipType) PRIMARY KEY person.id; {noformat} Query: {noformat} Use Facebook; select first.person.name as n1, second.person.name as n2 from Friendship first, Friendship second where first.person.id = second.person.id; {noformat} Plan: {noformat} "distribute result [$$29] -- DISTRIBUTE_RESULT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$29]) -- STREAM_PROJECT |PARTITIONED| assign [$$29] <- [{"n1": $$36, "n2": $$37}] -- ASSIGN |PARTITIONED| project ([$$36, $$37]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| join (eq($$34, $$35)) -- HYBRID_HASH_JOIN [$$34][$$35] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$36, $$34]) -- STREAM_PROJECT |PARTITIONED| assign [$$36, $$34] <- [$$37, $$35] -- ASSIGN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| replicate -- REPLICATE |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$35] |PARTITIONED| project ([$$37, $$35]) -- STREAM_PROJECT |PARTITIONED| assign [$$37, $$35] <- [$$31.getField(1), $$31.getField(0)] -- ASSIGN |PARTITIONED| project ([$$31]) -- STREAM_PROJECT |PARTITIONED| assign [$$31] <- [$$second.getField(0)] -- ASSIGN |PARTITIONED| project ([$$second]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| data-scan []<-[$$33, $$second] <- Facebook.Friendship -- DATASOURCE_SCAN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| empty-tuple-source -- EMPTY_TUPLE_SOURCE |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| replicate -- REPLICATE |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$35] |PARTITIONED| project ([$$37, $$35]) -- STREAM_PROJECT |PARTITIONED| assign [$$37, $$35] <- [$$31.getField(1), $$31.getField(0)] -- ASSIGN |PARTITIONED| project ([$$31]) -- STREAM_PROJECT |PARTITIONED| assign [$$31] <- [$$second.getField(0)] -- ASSIGN |PARTITIONED| project ([$$second]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| data-scan []<-[$$33, $$second] <- Facebook.Friendship -- DATASOURCE_SCAN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| empty-tuple-source -- EMPTY_TUPLE_SOURCE |PARTITIONED| {noformat} > Nested primary key and hash repartitioning bug > --- > > Key: ASTERIXDB-2199 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199 > Project: Apache AsterixDB > Issue Type: Bug >
[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug
[ https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305926#comment-16305926 ] Steven Jacobs commented on ASTERIXDB-2199: -- Mike: See my comment from yesterday. The fix for this issue is waiting for review. Wail brought up something that might be a separate issue, which he is going to check if exists. > Nested primary key and hash repartitioning bug > --- > > Key: ASTERIXDB-2199 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB >Reporter: Shiva Jahangiri >Assignee: Steven Jacobs > > If a join is happening on primary keys of two tables, no hash partitioning > should happen. Having the following DDL(Note that primary key of Friendship2 > is string): > DROP DATAVERSE Facebook IF EXISTS; > CREATE DATAVERSE Facebook; > Use Facebook; > CREATE TYPE FriendshipType AS closed { > id:string, > friends :[string] > }; > CREATE DATASET Friendship2(FriendshipType) > PRIMARY KEY id; > insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, > {"id":"2","friends" : [ "4","5","6"]} > ]); > By running the following query: > Use Facebook; > select * from Friendship2 first, Friendship2 second where first.id = > second.id; > we can see that there is no hash partitioning happening in optimized logical > plan which is correct as join is happening on the primary key of both > relations and data is already partitioned on primary key: > { > "operator":"distribute-result", > "expressions":"$$9", > "operatorId" : "1.1", > "physical-operator":"DISTRIBUTE_RESULT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.2", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"project", >"variables" :["$$9"], >"operatorId" : "1.3", >"physical-operator":"STREAM_PROJECT", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"assign", > "variables" :["$$9"], > "expressions":"{ first : $$first, second : $$second}", > "operatorId" : "1.4", > "physical-operator":"ASSIGN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"project", > "variables" :["$$first","$$second"], > "operatorId" : "1.5", > "physical-operator":"STREAM_PROJECT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.6", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"join", >"condition":"eq($$10, $$11)", >"operatorId" : "1.7", >"physical-operator":"HYBRID_HASH_JOIN > [$$10][$$11]", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"exchange", > "operatorId" : "1.8", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"data-scan", > "variables" :["$$10","$$first"], > "data-source":"Facebook.Friendship2", > "operatorId" : "1.9", > > "physical-operator":"DATASOURCE_SCAN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.10", > > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[
[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug
[ https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305921#comment-16305921 ] Michael J. Carey commented on ASTERIXDB-2199: - I think Steven may have fixed it already? > Nested primary key and hash repartitioning bug > --- > > Key: ASTERIXDB-2199 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB >Reporter: Shiva Jahangiri >Assignee: Steven Jacobs > > If a join is happening on primary keys of two tables, no hash partitioning > should happen. Having the following DDL(Note that primary key of Friendship2 > is string): > DROP DATAVERSE Facebook IF EXISTS; > CREATE DATAVERSE Facebook; > Use Facebook; > CREATE TYPE FriendshipType AS closed { > id:string, > friends :[string] > }; > CREATE DATASET Friendship2(FriendshipType) > PRIMARY KEY id; > insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, > {"id":"2","friends" : [ "4","5","6"]} > ]); > By running the following query: > Use Facebook; > select * from Friendship2 first, Friendship2 second where first.id = > second.id; > we can see that there is no hash partitioning happening in optimized logical > plan which is correct as join is happening on the primary key of both > relations and data is already partitioned on primary key: > { > "operator":"distribute-result", > "expressions":"$$9", > "operatorId" : "1.1", > "physical-operator":"DISTRIBUTE_RESULT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.2", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"project", >"variables" :["$$9"], >"operatorId" : "1.3", >"physical-operator":"STREAM_PROJECT", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"assign", > "variables" :["$$9"], > "expressions":"{ first : $$first, second : $$second}", > "operatorId" : "1.4", > "physical-operator":"ASSIGN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"project", > "variables" :["$$first","$$second"], > "operatorId" : "1.5", > "physical-operator":"STREAM_PROJECT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.6", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"join", >"condition":"eq($$10, $$11)", >"operatorId" : "1.7", >"physical-operator":"HYBRID_HASH_JOIN > [$$10][$$11]", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"exchange", > "operatorId" : "1.8", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"data-scan", > "variables" :["$$10","$$first"], > "data-source":"Facebook.Friendship2", > "operatorId" : "1.9", > > "physical-operator":"DATASOURCE_SCAN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.10", > > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { > >
[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug
[ https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305872#comment-16305872 ] Wail Alkowaileet commented on ASTERIXDB-2199: - Will look into it.. Thanks! > Nested primary key and hash repartitioning bug > --- > > Key: ASTERIXDB-2199 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB >Reporter: Shiva Jahangiri >Assignee: Steven Jacobs > > If a join is happening on primary keys of two tables, no hash partitioning > should happen. Having the following DDL(Note that primary key of Friendship2 > is string): > DROP DATAVERSE Facebook IF EXISTS; > CREATE DATAVERSE Facebook; > Use Facebook; > CREATE TYPE FriendshipType AS closed { > id:string, > friends :[string] > }; > CREATE DATASET Friendship2(FriendshipType) > PRIMARY KEY id; > insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, > {"id":"2","friends" : [ "4","5","6"]} > ]); > By running the following query: > Use Facebook; > select * from Friendship2 first, Friendship2 second where first.id = > second.id; > we can see that there is no hash partitioning happening in optimized logical > plan which is correct as join is happening on the primary key of both > relations and data is already partitioned on primary key: > { > "operator":"distribute-result", > "expressions":"$$9", > "operatorId" : "1.1", > "physical-operator":"DISTRIBUTE_RESULT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.2", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"project", >"variables" :["$$9"], >"operatorId" : "1.3", >"physical-operator":"STREAM_PROJECT", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"assign", > "variables" :["$$9"], > "expressions":"{ first : $$first, second : $$second}", > "operatorId" : "1.4", > "physical-operator":"ASSIGN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"project", > "variables" :["$$first","$$second"], > "operatorId" : "1.5", > "physical-operator":"STREAM_PROJECT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.6", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"join", >"condition":"eq($$10, $$11)", >"operatorId" : "1.7", >"physical-operator":"HYBRID_HASH_JOIN > [$$10][$$11]", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"exchange", > "operatorId" : "1.8", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"data-scan", > "variables" :["$$10","$$first"], > "data-source":"Facebook.Friendship2", > "operatorId" : "1.9", > > "physical-operator":"DATASOURCE_SCAN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.10", > > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { > > "operator":"empty-tuple-source", >
[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug
[ https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304721#comment-16304721 ] Steven Jacobs commented on ASTERIXDB-2199: -- [~wyk] Wail: I have a CR waiting for review to address Shiva's filed issue (https://asterix-gerrit.ics.uci.edu/#/c/2246/). The solution doesn't involve EquivalenceClassUtils.addEquivalenceClassesForPrimaryIndexAccess(), but I think I see the code block that you are talking about, and maybe there is a potential optimization issue there as well. If you can find a query that produces a bug through this method, can you file it as a separate issue? > Nested primary key and hash repartitioning bug > --- > > Key: ASTERIXDB-2199 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB >Reporter: Shiva Jahangiri >Assignee: Steven Jacobs > > If a join is happening on primary keys of two tables, no hash partitioning > should happen. Having the following DDL(Note that primary key of Friendship2 > is string): > DROP DATAVERSE Facebook IF EXISTS; > CREATE DATAVERSE Facebook; > Use Facebook; > CREATE TYPE FriendshipType AS closed { > id:string, > friends :[string] > }; > CREATE DATASET Friendship2(FriendshipType) > PRIMARY KEY id; > insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, > {"id":"2","friends" : [ "4","5","6"]} > ]); > By running the following query: > Use Facebook; > select * from Friendship2 first, Friendship2 second where first.id = > second.id; > we can see that there is no hash partitioning happening in optimized logical > plan which is correct as join is happening on the primary key of both > relations and data is already partitioned on primary key: > { > "operator":"distribute-result", > "expressions":"$$9", > "operatorId" : "1.1", > "physical-operator":"DISTRIBUTE_RESULT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.2", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"project", >"variables" :["$$9"], >"operatorId" : "1.3", >"physical-operator":"STREAM_PROJECT", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"assign", > "variables" :["$$9"], > "expressions":"{ first : $$first, second : $$second}", > "operatorId" : "1.4", > "physical-operator":"ASSIGN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"project", > "variables" :["$$first","$$second"], > "operatorId" : "1.5", > "physical-operator":"STREAM_PROJECT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.6", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"join", >"condition":"eq($$10, $$11)", >"operatorId" : "1.7", >"physical-operator":"HYBRID_HASH_JOIN > [$$10][$$11]", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"exchange", > "operatorId" : "1.8", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"data-scan", > "variables" :["$$10","$$first"], > "data-source":"Facebook.Friendship2", > "operatorId" : "1.9", > > "physical-operator":"DATASOURCE_SCAN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", >
[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug
[ https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304360#comment-16304360 ] Wail Alkowaileet commented on ASTERIXDB-2199: - I noticed the same issue... I think the issue is in: {noformat} EquivalenceClassUtils.addEquivalenceClassesForPrimaryIndexAccess() {noformat} It assumes: {noformat} $second.getField(0) {noformat} is the primary key not {noformat} $second.getField(0).getField(0) {noformat} Query using the same DDL: {noformat} USE Facebook; EXPLAIN SELECT * FROM Friendship first, Friendship second WHERE first.person = second.person; {noformat} Plan shows no HASH_PARTITION_EXCHANGE? {noformat} distribute result [$$23] -- DISTRIBUTE_RESULT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$23]) -- STREAM_PROJECT |PARTITIONED| assign [$$23] <- [{"first": $$first, "second": $$second}] -- ASSIGN |PARTITIONED| project ([$$first, $$second]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| join (eq($$26, $$27)) -- HYBRID_HASH_JOIN [$$26][$$27] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$first, $$26]) -- STREAM_PROJECT |PARTITIONED| assign [$$first, $$26] <- [$$second, $$27] -- ASSIGN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| replicate -- REPLICATE |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| assign [$$27] <- [$$second.getField(0)] -- ASSIGN |PARTITIONED| project ([$$second]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| data-scan []<-[$$25, $$second] <- Facebook.Friendship -- DATASOURCE_SCAN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| empty-tuple-source -- EMPTY_TUPLE_SOURCE |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| replicate -- REPLICATE |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| assign [$$27] <- [$$second.getField(0)] -- ASSIGN |PARTITIONED| project ([$$second]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| data-scan []<-[$$25, $$second] <- Facebook.Friendship -- DATASOURCE_SCAN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| empty-tuple-source -- EMPTY_TUPLE_SOURCE |PARTITIONED| {noformat} > Nested primary key and hash repartitioning bug > --- > > Key: ASTERIXDB-2199 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB >Reporter: Shiva Jahangiri >Assignee: Steven Jacobs > > If a join is happening on primary keys of two tables, no hash partitioning > should happen. Having the following DDL(Note that primary key of Friendship2 > is string): > DROP DATAVERSE Facebook IF EXISTS; > CREATE DATAVERSE Facebook; > Use Facebook; > CREATE TYPE FriendshipType AS closed { > id:string, > friends :[string] > }; > CREATE DATASET Friendship2(FriendshipType) > PRIMARY KEY id; > insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, > {"id":"2","friends" : [ "4","5","6"]} > ]); > By running the following query: > Use Facebook; > select * from Friendship2 first, Friendship2 second where first.id = > second.id; > we can see that there is no hash partitioning happening in optimized logical > plan which is correct as join is happening on the primary key of both > relations and data is already partitioned on primary key: > { > "operator":"distribute-result", > "expressions":"$$9", > "operatorId" : "1.1", > "physical-operator":"DISTRIBUTE_RESULT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.2", > "physical-operator":"ONE_TO_ONE_EXCHANGE", >
[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug
[ https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291656#comment-16291656 ] Michael J. Carey commented on ASTERIXDB-2199: - @Steven: I think you added support for indexing nested fields, right? And enabled nested fields to be used as the key fields for a dataset? If so - maybe you could have a first look at this? It looks like their partitioning properties in the optimizer are not being noticed? > Nested primary key and hash repartitioning bug > --- > > Key: ASTERIXDB-2199 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB >Reporter: Shiva Jahangiri >Assignee: Steven Jacobs > > If a join is happening on primary keys of two tables, no hash partitioning > should happen. Having the following DDL(Note that primary key of Friendship2 > is string): > DROP DATAVERSE Facebook IF EXISTS; > CREATE DATAVERSE Facebook; > Use Facebook; > CREATE TYPE FriendshipType AS closed { > id:string, > friends :[string] > }; > CREATE DATASET Friendship2(FriendshipType) > PRIMARY KEY id; > insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, > {"id":"2","friends" : [ "4","5","6"]} > ]); > By running the following query: > Use Facebook; > select * from Friendship2 first, Friendship2 second where first.id = > second.id; > we can see that there is no hash partitioning happening in optimized logical > plan which is correct as join is happening on the primary key of both > relations and data is already partitioned on primary key: > { > "operator":"distribute-result", > "expressions":"$$9", > "operatorId" : "1.1", > "physical-operator":"DISTRIBUTE_RESULT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.2", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"project", >"variables" :["$$9"], >"operatorId" : "1.3", >"physical-operator":"STREAM_PROJECT", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"assign", > "variables" :["$$9"], > "expressions":"{ first : $$first, second : $$second}", > "operatorId" : "1.4", > "physical-operator":"ASSIGN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"project", > "variables" :["$$first","$$second"], > "operatorId" : "1.5", > "physical-operator":"STREAM_PROJECT", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.6", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { >"operator":"join", >"condition":"eq($$10, $$11)", >"operatorId" : "1.7", >"physical-operator":"HYBRID_HASH_JOIN > [$$10][$$11]", >"execution-mode":"PARTITIONED", >"inputs":[ >{ > "operator":"exchange", > "operatorId" : "1.8", > "physical-operator":"ONE_TO_ONE_EXCHANGE", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"data-scan", > "variables" :["$$10","$$first"], > "data-source":"Facebook.Friendship2", > "operatorId" : "1.9", > > "physical-operator":"DATASOURCE_SCAN", > "execution-mode":"PARTITIONED", > "inputs":[ > { > "operator":"exchange", > "operatorId" : "1.10", > > "physical-operator":"ONE_TO_ONE_EXCHANGE", >