[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug

2018-01-19 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16332713#comment-16332713
 ] 

ASF subversion and git services commented on ASTERIXDB-2199:


Commit 0a7233d8f810cb1da46361ab3cd1ed57e8efdf65 in asterixdb's branch 
refs/heads/master from [~sjaco002]
[ https://git-wip-us.apache.org/repos/asf?p=asterixdb.git;h=0a7233d ]

[ASTERIXDB-2199][COMP] Fix PushFieldAccessRule for nested partitioning keys

Fixes an issue where nested partitioning keys
were ignored by PushFieldAccessRule

Added Test and fixed changed plans

Change-Id: I874c1fd15719b6bdeb7b0913fbafc04a58d32ed4
Reviewed-on: https://asterix-gerrit.ics.uci.edu/2246
Integration-Tests: Jenkins 
Tested-by: Jenkins 
Contrib: Jenkins 
Reviewed-by: Ildar Absalyamov 


> Nested primary key and hash repartitioning bug 
> ---
>
> Key: ASTERIXDB-2199
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: *DB - AsterixDB
>Reporter: Shiva Jahangiri
>Assignee: Steven Jacobs
>Priority: Major
>
> If a join is happening on primary keys of two tables, no hash partitioning 
> should happen. Having the following DDL(Note that primary key of Friendship2 
> is string):
> DROP DATAVERSE Facebook IF EXISTS;
> CREATE DATAVERSE Facebook;
> Use Facebook;
> CREATE TYPE FriendshipType AS closed {
>   id:string,
>   friends :[string]
> };
> CREATE DATASET Friendship2(FriendshipType)
> PRIMARY KEY id; 
> insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, 
> {"id":"2","friends" : [ "4","5","6"]}
> ]);
> By running the following query:
> Use Facebook;
> select * from Friendship2 first, Friendship2 second where first.id = 
> second.id;
> we can see that there is no hash partitioning happening in optimized logical 
> plan which is correct as join is happening on the primary key of both 
> relations and data is already partitioned on primary key:
> {
>  "operator":"distribute-result",
>  "expressions":"$$9",
>  "operatorId" : "1.1",
>  "physical-operator":"DISTRIBUTE_RESULT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.2",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"project",
>"variables" :["$$9"],
>"operatorId" : "1.3",
>"physical-operator":"STREAM_PROJECT",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"assign",
> "variables" :["$$9"],
> "expressions":"{ first : $$first,  second : $$second}",
> "operatorId" : "1.4",
> "physical-operator":"ASSIGN",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"project",
>  "variables" :["$$first","$$second"],
>  "operatorId" : "1.5",
>  "physical-operator":"STREAM_PROJECT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.6",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"join",
>"condition":"eq($$10, $$11)",
>"operatorId" : "1.7",
>"physical-operator":"HYBRID_HASH_JOIN 
> [$$10][$$11]",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"exchange",
> "operatorId" : "1.8",
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"data-scan",
>  "variables" :["$$10","$$first"],
>  "data-source":"Facebook.Friendship2",
>  "operatorId" : "1.9",
> 

[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug

2018-01-03 Thread Steven Jacobs (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16310011#comment-16310011
 ] 

Steven Jacobs commented on ASTERIXDB-2199:
--

[~wyk] Can you go ahead and file this separately and assign to me? I'll go 
ahead and add it to the same CR though since it's in the same place of the code.

> Nested primary key and hash repartitioning bug 
> ---
>
> Key: ASTERIXDB-2199
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: *DB - AsterixDB
>Reporter: Shiva Jahangiri
>Assignee: Steven Jacobs
>
> If a join is happening on primary keys of two tables, no hash partitioning 
> should happen. Having the following DDL(Note that primary key of Friendship2 
> is string):
> DROP DATAVERSE Facebook IF EXISTS;
> CREATE DATAVERSE Facebook;
> Use Facebook;
> CREATE TYPE FriendshipType AS closed {
>   id:string,
>   friends :[string]
> };
> CREATE DATASET Friendship2(FriendshipType)
> PRIMARY KEY id; 
> insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, 
> {"id":"2","friends" : [ "4","5","6"]}
> ]);
> By running the following query:
> Use Facebook;
> select * from Friendship2 first, Friendship2 second where first.id = 
> second.id;
> we can see that there is no hash partitioning happening in optimized logical 
> plan which is correct as join is happening on the primary key of both 
> relations and data is already partitioned on primary key:
> {
>  "operator":"distribute-result",
>  "expressions":"$$9",
>  "operatorId" : "1.1",
>  "physical-operator":"DISTRIBUTE_RESULT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.2",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"project",
>"variables" :["$$9"],
>"operatorId" : "1.3",
>"physical-operator":"STREAM_PROJECT",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"assign",
> "variables" :["$$9"],
> "expressions":"{ first : $$first,  second : $$second}",
> "operatorId" : "1.4",
> "physical-operator":"ASSIGN",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"project",
>  "variables" :["$$first","$$second"],
>  "operatorId" : "1.5",
>  "physical-operator":"STREAM_PROJECT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.6",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"join",
>"condition":"eq($$10, $$11)",
>"operatorId" : "1.7",
>"physical-operator":"HYBRID_HASH_JOIN 
> [$$10][$$11]",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"exchange",
> "operatorId" : "1.8",
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"data-scan",
>  "variables" :["$$10","$$first"],
>  "data-source":"Facebook.Friendship2",
>  "operatorId" : "1.9",
>  
> "physical-operator":"DATASOURCE_SCAN",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.10",
>   
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>

[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug

2017-12-29 Thread Wail Alkowaileet (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306655#comment-16306655
 ] 

Wail Alkowaileet commented on ASTERIXDB-2199:
-

Yes, similar to the issue in this JIRA. However, the fix doesn't recognize the 
key expression if it's in the form
{noformat}
$assignVar.getField(0)
{noformat}
where $assingVar = $unnestVar.getField(0)
it only recognizes it if it is inlined nested field access
{noformat}
$unnestVar.getField(0).getField(0)
{noformat}
where $unnestVar is the scan variable from data-scan

> Nested primary key and hash repartitioning bug 
> ---
>
> Key: ASTERIXDB-2199
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: *DB - AsterixDB
>Reporter: Shiva Jahangiri
>Assignee: Steven Jacobs
>
> If a join is happening on primary keys of two tables, no hash partitioning 
> should happen. Having the following DDL(Note that primary key of Friendship2 
> is string):
> DROP DATAVERSE Facebook IF EXISTS;
> CREATE DATAVERSE Facebook;
> Use Facebook;
> CREATE TYPE FriendshipType AS closed {
>   id:string,
>   friends :[string]
> };
> CREATE DATASET Friendship2(FriendshipType)
> PRIMARY KEY id; 
> insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, 
> {"id":"2","friends" : [ "4","5","6"]}
> ]);
> By running the following query:
> Use Facebook;
> select * from Friendship2 first, Friendship2 second where first.id = 
> second.id;
> we can see that there is no hash partitioning happening in optimized logical 
> plan which is correct as join is happening on the primary key of both 
> relations and data is already partitioned on primary key:
> {
>  "operator":"distribute-result",
>  "expressions":"$$9",
>  "operatorId" : "1.1",
>  "physical-operator":"DISTRIBUTE_RESULT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.2",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"project",
>"variables" :["$$9"],
>"operatorId" : "1.3",
>"physical-operator":"STREAM_PROJECT",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"assign",
> "variables" :["$$9"],
> "expressions":"{ first : $$first,  second : $$second}",
> "operatorId" : "1.4",
> "physical-operator":"ASSIGN",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"project",
>  "variables" :["$$first","$$second"],
>  "operatorId" : "1.5",
>  "physical-operator":"STREAM_PROJECT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.6",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"join",
>"condition":"eq($$10, $$11)",
>"operatorId" : "1.7",
>"physical-operator":"HYBRID_HASH_JOIN 
> [$$10][$$11]",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"exchange",
> "operatorId" : "1.8",
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"data-scan",
>  "variables" :["$$10","$$first"],
>  "data-source":"Facebook.Friendship2",
>  "operatorId" : "1.9",
>  
> "physical-operator":"DATASOURCE_SCAN",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.10",
> 

[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug

2017-12-29 Thread Steven Jacobs (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16306415#comment-16306415
 ] 

Steven Jacobs commented on ASTERIXDB-2199:
--

I'm guessing that this bug happens in Master as well? If so, I guess it would 
be another issue?

> Nested primary key and hash repartitioning bug 
> ---
>
> Key: ASTERIXDB-2199
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: *DB - AsterixDB
>Reporter: Shiva Jahangiri
>Assignee: Steven Jacobs
>
> If a join is happening on primary keys of two tables, no hash partitioning 
> should happen. Having the following DDL(Note that primary key of Friendship2 
> is string):
> DROP DATAVERSE Facebook IF EXISTS;
> CREATE DATAVERSE Facebook;
> Use Facebook;
> CREATE TYPE FriendshipType AS closed {
>   id:string,
>   friends :[string]
> };
> CREATE DATASET Friendship2(FriendshipType)
> PRIMARY KEY id; 
> insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, 
> {"id":"2","friends" : [ "4","5","6"]}
> ]);
> By running the following query:
> Use Facebook;
> select * from Friendship2 first, Friendship2 second where first.id = 
> second.id;
> we can see that there is no hash partitioning happening in optimized logical 
> plan which is correct as join is happening on the primary key of both 
> relations and data is already partitioned on primary key:
> {
>  "operator":"distribute-result",
>  "expressions":"$$9",
>  "operatorId" : "1.1",
>  "physical-operator":"DISTRIBUTE_RESULT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.2",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"project",
>"variables" :["$$9"],
>"operatorId" : "1.3",
>"physical-operator":"STREAM_PROJECT",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"assign",
> "variables" :["$$9"],
> "expressions":"{ first : $$first,  second : $$second}",
> "operatorId" : "1.4",
> "physical-operator":"ASSIGN",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"project",
>  "variables" :["$$first","$$second"],
>  "operatorId" : "1.5",
>  "physical-operator":"STREAM_PROJECT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.6",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"join",
>"condition":"eq($$10, $$11)",
>"operatorId" : "1.7",
>"physical-operator":"HYBRID_HASH_JOIN 
> [$$10][$$11]",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"exchange",
> "operatorId" : "1.8",
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"data-scan",
>  "variables" :["$$10","$$first"],
>  "data-source":"Facebook.Friendship2",
>  "operatorId" : "1.9",
>  
> "physical-operator":"DATASOURCE_SCAN",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.10",
>   
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>  

[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug

2017-12-28 Thread Wail Alkowaileet (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305941#comment-16305941
 ] 

Wail Alkowaileet commented on ASTERIXDB-2199:
-

[~dtabass] It's a different issue as Steven pointed out... 
[~sjaco002] I found another bug in the current fix in 
(https://asterix-gerrit.ics.uci.edu/#/c/2246/). I'm not sure if I should file 
it as a different issue.
The issue is when the there's a common expression with the key getField() 
expression and the key getField() is no longer inlined.

Reproduce
DDL:
{noformat}
DROP DATAVERSE Facebook IF EXISTS;
CREATE DATAVERSE Facebook;

Use Facebook;

CREATE TYPE PersonType AS closed

{ id:string, name:string }
;

CREATE TYPE FriendshipType AS closed

{ person : PersonType, Friends :[PersonType] }
;

/* Creating Datasets */

CREATE DATASET Person(PersonType)
PRIMARY KEY id;

CREATE DATASET Friendship(FriendshipType)
PRIMARY KEY person.id;
{noformat}

Query:
{noformat}
Use Facebook;

select first.person.name as n1, second.person.name as n2
from Friendship first, Friendship second
where first.person.id = second.person.id;
{noformat}

Plan:
{noformat}
"distribute result [$$29]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
project ([$$29])
-- STREAM_PROJECT  |PARTITIONED|
  assign [$$29] <- [{"n1": $$36, "n2": $$37}]
  -- ASSIGN  |PARTITIONED|
project ([$$36, $$37])
-- STREAM_PROJECT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
join (eq($$34, $$35))
-- HYBRID_HASH_JOIN [$$34][$$35]  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
project ([$$36, $$34])
-- STREAM_PROJECT  |PARTITIONED|
  assign [$$36, $$34] <- [$$37, $$35]
  -- ASSIGN  |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  replicate
  -- REPLICATE  |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$35]  |PARTITIONED|
  project ([$$37, $$35])
  -- STREAM_PROJECT  |PARTITIONED|
assign [$$37, $$35] <- [$$31.getField(1), 
$$31.getField(0)]
-- ASSIGN  |PARTITIONED|
  project ([$$31])
  -- STREAM_PROJECT  |PARTITIONED|
assign [$$31] <- [$$second.getField(0)]
-- ASSIGN  |PARTITIONED|
  project ([$$second])
  -- STREAM_PROJECT  |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  data-scan []<-[$$33, $$second] <- 
Facebook.Friendship
  -- DATASOURCE_SCAN  |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  empty-tuple-source
  -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
replicate
-- REPLICATE  |PARTITIONED|
  exchange
  -- HASH_PARTITION_EXCHANGE [$$35]  |PARTITIONED|
project ([$$37, $$35])
-- STREAM_PROJECT  |PARTITIONED|
  assign [$$37, $$35] <- [$$31.getField(1), 
$$31.getField(0)]
  -- ASSIGN  |PARTITIONED|
project ([$$31])
-- STREAM_PROJECT  |PARTITIONED|
  assign [$$31] <- [$$second.getField(0)]
  -- ASSIGN  |PARTITIONED|
project ([$$second])
-- STREAM_PROJECT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
data-scan []<-[$$33, $$second] <- 
Facebook.Friendship
-- DATASOURCE_SCAN  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE  |PARTITIONED|
{noformat}


> Nested primary key and hash repartitioning bug 
> ---
>
> Key: ASTERIXDB-2199
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199
> Project: Apache AsterixDB
>  Issue Type: Bug
>  

[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug

2017-12-28 Thread Steven Jacobs (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305926#comment-16305926
 ] 

Steven Jacobs commented on ASTERIXDB-2199:
--

Mike: See my comment from yesterday. The fix for this issue is waiting for 
review. Wail brought up something that might be a separate issue, which he is 
going to check if exists.

> Nested primary key and hash repartitioning bug 
> ---
>
> Key: ASTERIXDB-2199
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: *DB - AsterixDB
>Reporter: Shiva Jahangiri
>Assignee: Steven Jacobs
>
> If a join is happening on primary keys of two tables, no hash partitioning 
> should happen. Having the following DDL(Note that primary key of Friendship2 
> is string):
> DROP DATAVERSE Facebook IF EXISTS;
> CREATE DATAVERSE Facebook;
> Use Facebook;
> CREATE TYPE FriendshipType AS closed {
>   id:string,
>   friends :[string]
> };
> CREATE DATASET Friendship2(FriendshipType)
> PRIMARY KEY id; 
> insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, 
> {"id":"2","friends" : [ "4","5","6"]}
> ]);
> By running the following query:
> Use Facebook;
> select * from Friendship2 first, Friendship2 second where first.id = 
> second.id;
> we can see that there is no hash partitioning happening in optimized logical 
> plan which is correct as join is happening on the primary key of both 
> relations and data is already partitioned on primary key:
> {
>  "operator":"distribute-result",
>  "expressions":"$$9",
>  "operatorId" : "1.1",
>  "physical-operator":"DISTRIBUTE_RESULT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.2",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"project",
>"variables" :["$$9"],
>"operatorId" : "1.3",
>"physical-operator":"STREAM_PROJECT",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"assign",
> "variables" :["$$9"],
> "expressions":"{ first : $$first,  second : $$second}",
> "operatorId" : "1.4",
> "physical-operator":"ASSIGN",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"project",
>  "variables" :["$$first","$$second"],
>  "operatorId" : "1.5",
>  "physical-operator":"STREAM_PROJECT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.6",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"join",
>"condition":"eq($$10, $$11)",
>"operatorId" : "1.7",
>"physical-operator":"HYBRID_HASH_JOIN 
> [$$10][$$11]",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"exchange",
> "operatorId" : "1.8",
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"data-scan",
>  "variables" :["$$10","$$first"],
>  "data-source":"Facebook.Friendship2",
>  "operatorId" : "1.9",
>  
> "physical-operator":"DATASOURCE_SCAN",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.10",
>   
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[

[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug

2017-12-28 Thread Michael J. Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305921#comment-16305921
 ] 

Michael J. Carey commented on ASTERIXDB-2199:
-

I think Steven may have fixed it already?





> Nested primary key and hash repartitioning bug 
> ---
>
> Key: ASTERIXDB-2199
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: *DB - AsterixDB
>Reporter: Shiva Jahangiri
>Assignee: Steven Jacobs
>
> If a join is happening on primary keys of two tables, no hash partitioning 
> should happen. Having the following DDL(Note that primary key of Friendship2 
> is string):
> DROP DATAVERSE Facebook IF EXISTS;
> CREATE DATAVERSE Facebook;
> Use Facebook;
> CREATE TYPE FriendshipType AS closed {
>   id:string,
>   friends :[string]
> };
> CREATE DATASET Friendship2(FriendshipType)
> PRIMARY KEY id; 
> insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, 
> {"id":"2","friends" : [ "4","5","6"]}
> ]);
> By running the following query:
> Use Facebook;
> select * from Friendship2 first, Friendship2 second where first.id = 
> second.id;
> we can see that there is no hash partitioning happening in optimized logical 
> plan which is correct as join is happening on the primary key of both 
> relations and data is already partitioned on primary key:
> {
>  "operator":"distribute-result",
>  "expressions":"$$9",
>  "operatorId" : "1.1",
>  "physical-operator":"DISTRIBUTE_RESULT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.2",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"project",
>"variables" :["$$9"],
>"operatorId" : "1.3",
>"physical-operator":"STREAM_PROJECT",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"assign",
> "variables" :["$$9"],
> "expressions":"{ first : $$first,  second : $$second}",
> "operatorId" : "1.4",
> "physical-operator":"ASSIGN",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"project",
>  "variables" :["$$first","$$second"],
>  "operatorId" : "1.5",
>  "physical-operator":"STREAM_PROJECT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.6",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"join",
>"condition":"eq($$10, $$11)",
>"operatorId" : "1.7",
>"physical-operator":"HYBRID_HASH_JOIN 
> [$$10][$$11]",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"exchange",
> "operatorId" : "1.8",
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"data-scan",
>  "variables" :["$$10","$$first"],
>  "data-source":"Facebook.Friendship2",
>  "operatorId" : "1.9",
>  
> "physical-operator":"DATASOURCE_SCAN",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.10",
>   
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>
> 

[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug

2017-12-28 Thread Wail Alkowaileet (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16305872#comment-16305872
 ] 

Wail Alkowaileet commented on ASTERIXDB-2199:
-

Will look into it..

Thanks!

> Nested primary key and hash repartitioning bug 
> ---
>
> Key: ASTERIXDB-2199
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: *DB - AsterixDB
>Reporter: Shiva Jahangiri
>Assignee: Steven Jacobs
>
> If a join is happening on primary keys of two tables, no hash partitioning 
> should happen. Having the following DDL(Note that primary key of Friendship2 
> is string):
> DROP DATAVERSE Facebook IF EXISTS;
> CREATE DATAVERSE Facebook;
> Use Facebook;
> CREATE TYPE FriendshipType AS closed {
>   id:string,
>   friends :[string]
> };
> CREATE DATASET Friendship2(FriendshipType)
> PRIMARY KEY id; 
> insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, 
> {"id":"2","friends" : [ "4","5","6"]}
> ]);
> By running the following query:
> Use Facebook;
> select * from Friendship2 first, Friendship2 second where first.id = 
> second.id;
> we can see that there is no hash partitioning happening in optimized logical 
> plan which is correct as join is happening on the primary key of both 
> relations and data is already partitioned on primary key:
> {
>  "operator":"distribute-result",
>  "expressions":"$$9",
>  "operatorId" : "1.1",
>  "physical-operator":"DISTRIBUTE_RESULT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.2",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"project",
>"variables" :["$$9"],
>"operatorId" : "1.3",
>"physical-operator":"STREAM_PROJECT",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"assign",
> "variables" :["$$9"],
> "expressions":"{ first : $$first,  second : $$second}",
> "operatorId" : "1.4",
> "physical-operator":"ASSIGN",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"project",
>  "variables" :["$$first","$$second"],
>  "operatorId" : "1.5",
>  "physical-operator":"STREAM_PROJECT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.6",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"join",
>"condition":"eq($$10, $$11)",
>"operatorId" : "1.7",
>"physical-operator":"HYBRID_HASH_JOIN 
> [$$10][$$11]",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"exchange",
> "operatorId" : "1.8",
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"data-scan",
>  "variables" :["$$10","$$first"],
>  "data-source":"Facebook.Friendship2",
>  "operatorId" : "1.9",
>  
> "physical-operator":"DATASOURCE_SCAN",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.10",
>   
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>
> "operator":"empty-tuple-source",
>   

[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug

2017-12-27 Thread Steven Jacobs (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304721#comment-16304721
 ] 

Steven Jacobs commented on ASTERIXDB-2199:
--

[~wyk] Wail: I have a CR waiting for review to address Shiva's filed issue 
(https://asterix-gerrit.ics.uci.edu/#/c/2246/). The solution doesn't involve 
EquivalenceClassUtils.addEquivalenceClassesForPrimaryIndexAccess(), but I think 
I see the code block that you are talking about, and maybe there is a potential 
optimization issue there as well. If you can find a query that produces a bug 
through this method, can you file it as a separate issue?

> Nested primary key and hash repartitioning bug 
> ---
>
> Key: ASTERIXDB-2199
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: *DB - AsterixDB
>Reporter: Shiva Jahangiri
>Assignee: Steven Jacobs
>
> If a join is happening on primary keys of two tables, no hash partitioning 
> should happen. Having the following DDL(Note that primary key of Friendship2 
> is string):
> DROP DATAVERSE Facebook IF EXISTS;
> CREATE DATAVERSE Facebook;
> Use Facebook;
> CREATE TYPE FriendshipType AS closed {
>   id:string,
>   friends :[string]
> };
> CREATE DATASET Friendship2(FriendshipType)
> PRIMARY KEY id; 
> insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, 
> {"id":"2","friends" : [ "4","5","6"]}
> ]);
> By running the following query:
> Use Facebook;
> select * from Friendship2 first, Friendship2 second where first.id = 
> second.id;
> we can see that there is no hash partitioning happening in optimized logical 
> plan which is correct as join is happening on the primary key of both 
> relations and data is already partitioned on primary key:
> {
>  "operator":"distribute-result",
>  "expressions":"$$9",
>  "operatorId" : "1.1",
>  "physical-operator":"DISTRIBUTE_RESULT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.2",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"project",
>"variables" :["$$9"],
>"operatorId" : "1.3",
>"physical-operator":"STREAM_PROJECT",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"assign",
> "variables" :["$$9"],
> "expressions":"{ first : $$first,  second : $$second}",
> "operatorId" : "1.4",
> "physical-operator":"ASSIGN",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"project",
>  "variables" :["$$first","$$second"],
>  "operatorId" : "1.5",
>  "physical-operator":"STREAM_PROJECT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.6",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"join",
>"condition":"eq($$10, $$11)",
>"operatorId" : "1.7",
>"physical-operator":"HYBRID_HASH_JOIN 
> [$$10][$$11]",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"exchange",
> "operatorId" : "1.8",
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"data-scan",
>  "variables" :["$$10","$$first"],
>  "data-source":"Facebook.Friendship2",
>  "operatorId" : "1.9",
>  
> "physical-operator":"DATASOURCE_SCAN",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>

[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug

2017-12-27 Thread Wail Alkowaileet (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16304360#comment-16304360
 ] 

Wail Alkowaileet commented on ASTERIXDB-2199:
-

I noticed the same issue... I think the issue is in: 
{noformat} EquivalenceClassUtils.addEquivalenceClassesForPrimaryIndexAccess() 
{noformat}
It assumes:
{noformat} $second.getField(0) {noformat}
is the primary key not
{noformat} $second.getField(0).getField(0) {noformat}

Query using the same DDL:
{noformat}
USE Facebook;
EXPLAIN
SELECT * 
FROM Friendship first, Friendship second
WHERE first.person = second.person;
{noformat}

Plan shows no HASH_PARTITION_EXCHANGE?
{noformat}
distribute result [$$23]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
project ([$$23])
-- STREAM_PROJECT  |PARTITIONED|
  assign [$$23] <- [{"first": $$first, "second": $$second}]
  -- ASSIGN  |PARTITIONED|
project ([$$first, $$second])
-- STREAM_PROJECT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
join (eq($$26, $$27))
-- HYBRID_HASH_JOIN [$$26][$$27]  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
project ([$$first, $$26])
-- STREAM_PROJECT  |PARTITIONED|
  assign [$$first, $$26] <- [$$second, $$27]
  -- ASSIGN  |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  replicate
  -- REPLICATE  |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  assign [$$27] <- [$$second.getField(0)]
  -- ASSIGN  |PARTITIONED|
project ([$$second])
-- STREAM_PROJECT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
data-scan []<-[$$25, $$second] <- 
Facebook.Friendship
-- DATASOURCE_SCAN  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
replicate
-- REPLICATE  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
assign [$$27] <- [$$second.getField(0)]
-- ASSIGN  |PARTITIONED|
  project ([$$second])
  -- STREAM_PROJECT  |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  data-scan []<-[$$25, $$second] <- Facebook.Friendship
  -- DATASOURCE_SCAN  |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
  empty-tuple-source
  -- EMPTY_TUPLE_SOURCE  |PARTITIONED|
{noformat}

> Nested primary key and hash repartitioning bug 
> ---
>
> Key: ASTERIXDB-2199
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: *DB - AsterixDB
>Reporter: Shiva Jahangiri
>Assignee: Steven Jacobs
>
> If a join is happening on primary keys of two tables, no hash partitioning 
> should happen. Having the following DDL(Note that primary key of Friendship2 
> is string):
> DROP DATAVERSE Facebook IF EXISTS;
> CREATE DATAVERSE Facebook;
> Use Facebook;
> CREATE TYPE FriendshipType AS closed {
>   id:string,
>   friends :[string]
> };
> CREATE DATASET Friendship2(FriendshipType)
> PRIMARY KEY id; 
> insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, 
> {"id":"2","friends" : [ "4","5","6"]}
> ]);
> By running the following query:
> Use Facebook;
> select * from Friendship2 first, Friendship2 second where first.id = 
> second.id;
> we can see that there is no hash partitioning happening in optimized logical 
> plan which is correct as join is happening on the primary key of both 
> relations and data is already partitioned on primary key:
> {
>  "operator":"distribute-result",
>  "expressions":"$$9",
>  "operatorId" : "1.1",
>  "physical-operator":"DISTRIBUTE_RESULT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.2",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   

[jira] [Commented] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug

2017-12-14 Thread Michael J. Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16291656#comment-16291656
 ] 

Michael J. Carey commented on ASTERIXDB-2199:
-

@Steven: I think you added support for indexing nested fields, right?  And 
enabled nested fields to be used as the key fields for a dataset?  If so - 
maybe you could have a first look at this?  It looks like their partitioning 
properties in the optimizer are not being noticed?

> Nested primary key and hash repartitioning bug 
> ---
>
> Key: ASTERIXDB-2199
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: *DB - AsterixDB
>Reporter: Shiva Jahangiri
>Assignee: Steven Jacobs
>
> If a join is happening on primary keys of two tables, no hash partitioning 
> should happen. Having the following DDL(Note that primary key of Friendship2 
> is string):
> DROP DATAVERSE Facebook IF EXISTS;
> CREATE DATAVERSE Facebook;
> Use Facebook;
> CREATE TYPE FriendshipType AS closed {
>   id:string,
>   friends :[string]
> };
> CREATE DATASET Friendship2(FriendshipType)
> PRIMARY KEY id; 
> insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, 
> {"id":"2","friends" : [ "4","5","6"]}
> ]);
> By running the following query:
> Use Facebook;
> select * from Friendship2 first, Friendship2 second where first.id = 
> second.id;
> we can see that there is no hash partitioning happening in optimized logical 
> plan which is correct as join is happening on the primary key of both 
> relations and data is already partitioned on primary key:
> {
>  "operator":"distribute-result",
>  "expressions":"$$9",
>  "operatorId" : "1.1",
>  "physical-operator":"DISTRIBUTE_RESULT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.2",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"project",
>"variables" :["$$9"],
>"operatorId" : "1.3",
>"physical-operator":"STREAM_PROJECT",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"assign",
> "variables" :["$$9"],
> "expressions":"{ first : $$first,  second : $$second}",
> "operatorId" : "1.4",
> "physical-operator":"ASSIGN",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"project",
>  "variables" :["$$first","$$second"],
>  "operatorId" : "1.5",
>  "physical-operator":"STREAM_PROJECT",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.6",
>   "physical-operator":"ONE_TO_ONE_EXCHANGE",
>   "execution-mode":"PARTITIONED",
>   "inputs":[
>   {
>"operator":"join",
>"condition":"eq($$10, $$11)",
>"operatorId" : "1.7",
>"physical-operator":"HYBRID_HASH_JOIN 
> [$$10][$$11]",
>"execution-mode":"PARTITIONED",
>"inputs":[
>{
> "operator":"exchange",
> "operatorId" : "1.8",
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
> "execution-mode":"PARTITIONED",
> "inputs":[
> {
>  "operator":"data-scan",
>  "variables" :["$$10","$$first"],
>  "data-source":"Facebook.Friendship2",
>  "operatorId" : "1.9",
>  
> "physical-operator":"DATASOURCE_SCAN",
>  "execution-mode":"PARTITIONED",
>  "inputs":[
>  {
>   "operator":"exchange",
>   "operatorId" : "1.10",
>   
> "physical-operator":"ONE_TO_ONE_EXCHANGE",
>