[jira] [Updated] (ASTERIXDB-2788) Plan's stages are not created correctly for bushy plans in compile time

2020-10-15 Thread Shiva Jahangiri (Jira)


 [ 
https://issues.apache.org/jira/browse/ASTERIXDB-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiva Jahangiri updated ASTERIXDB-2788:
---
Summary: Plan's stages are not created correctly for bushy plans in compile 
time  (was: Plan's stages do not create correct stages for bushy plans in 
compile time)

> Plan's stages are not created correctly for bushy plans in compile time
> ---
>
> Key: ASTERIXDB-2788
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2788
> Project: Apache AsterixDB
>  Issue Type: Bug
>Reporter: Shiva Jahangiri
>Priority: Major
>
> PlanStagesGenerator generates "pipelines" instead of stages and as such for 
> calculating the maximum required memory it calculates the memory needed for 
> largest pipeline not the largest stage. It does not impact the linear plans, 
> but for bushy plans, the independent pipelines (independent builds) that can 
> be co-scheduled should get merged into one stage. In this case, merging some 
> small pipelines can make a larger stage than what was considered to be the 
> largest stage in the past (which is the largest pipeline).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ASTERIXDB-2788) Plan's stages do not create correct stages for bushy plans in compile time

2020-10-15 Thread Shiva Jahangiri (Jira)
Shiva Jahangiri created ASTERIXDB-2788:
--

 Summary: Plan's stages do not create correct stages for bushy 
plans in compile time
 Key: ASTERIXDB-2788
 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2788
 Project: Apache AsterixDB
  Issue Type: Bug
Reporter: Shiva Jahangiri


PlanStagesGenerator generates "pipelines" instead of stages and as such for 
calculating the maximum required memory it calculates the memory needed for 
largest pipeline not the largest stage. It does not impact the linear plans, 
but for bushy plans, the independent pipelines (independent builds) that can be 
co-scheduled should get merged into one stage. In this case, merging some small 
pipelines can make a larger stage than what was considered to be the largest 
stage in the past (which is the largest pipeline).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ASTERIXDB-2784) Join memory requirement for large objects

2020-09-28 Thread Shiva Jahangiri (Jira)


[ 
https://issues.apache.org/jira/browse/ASTERIXDB-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203463#comment-17203463
 ] 

Shiva Jahangiri commented on ASTERIXDB-2784:


The fix for this issue is in my next patch for hybrid hash join where I define 
NoGrow-NoSteal and Grow-Steal policies. Feel free to assign it to me.

> Join memory requirement for large objects
> -
>
> Key: ASTERIXDB-2784
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2784
> Project: Apache AsterixDB
>  Issue Type: Improvement
>  Components: COMP - Compiler, RT - Runtime
>Reporter: Chen Luo
>Priority: Major
>
> Currently the compiler assumes the minimum number of join frames is 5 [1]. 
> However, this does not guarantee a join will always succeed in case of large 
> objects. The actual join memory requirement is actually MAX(5, #partitions * 
> #large object size). The reason is that in the spill policy [2], we only 
> spill a partition if it hasn't been spilled before. As a result, when we are 
> writing to an empty partition, it is possible that each of other partitions 
> has one large object (which could be larger than the frame size) but no 
> partition can be spilled. Thus, the join memory requirement becomes 
> #partitions * #large object size in this case.
> [1] 
> [https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29)|https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29).]
> [2] 
> https://github.com/apache/asterixdb/blob/37dfed60fb47afcc86de6d17704a8f100217057d/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java#L55



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ASTERIXDB-2656) No need for frame constraint in probe phase of hybrid hash join

2019-10-11 Thread Shiva Jahangiri (Jira)
Shiva Jahangiri created ASTERIXDB-2656:
--

 Summary: No need for frame constraint in probe phase of hybrid 
hash join
 Key: ASTERIXDB-2656
 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2656
 Project: Apache AsterixDB
  Issue Type: Bug
  Components: *DB - AsterixDB
Reporter: Shiva Jahangiri


In an optimized hybrid hash join, there is a constraint that a spilled 
partition cannot have more than 1 frame. While it seems to be necessary during 
the build phase to help with having as much data for in-memory partitions, it 
seems an unnecessary constraint during the probe phase.

During the probe, we should let the spilled partitions to grow and use the 
leftover frames not used by in-memory partitions. This also solves a bug that 
hybrid hash join currently has.

During the probe phase, in case there is not enough memory for inserting a 
large record, asterixdb may either flush the record directly to the disk or 
spill the largest spilled partition to the disk, depending on which one is 
larger. In the case that the biggest spilled partition is the vicitim to flush 
and release the frames, the large record cannot use the freed frames as it 
belongs to a spilled partition(otherwise it would have been probed) and as such 
1-frame limit is applied on it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ASTERIXDB-2648) Inconsistency in dataset distribution for Broadcast Hint

2019-09-26 Thread Shiva Jahangiri (Jira)
Shiva Jahangiri created ASTERIXDB-2648:
--

 Summary: Inconsistency in dataset distribution for Broadcast Hint
 Key: ASTERIXDB-2648
 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2648
 Project: Apache AsterixDB
  Issue Type: Bug
  Components: COMP - Compiler
Reporter: Shiva Jahangiri


 The Broadcast hint (/*+ bcast */) is using the order of the datasets in WHERE 
clause for a join to choose the broadcasting dataset. This is not consistent 
with other types of joins such as Hybrid Hash Join and Indexed Nested Loop 
Join. It can also cause confusion for both user and AsterixDB if the query has 
more than one condition in the WHERE clause for a join, and order of datasets 
in these conditions are consistent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ASTERIXDB-2582) out f budget memory usage in reload buffer of hybrid hash join

2019-06-04 Thread Shiva Jahangiri (JIRA)
Shiva Jahangiri created ASTERIXDB-2582:
--

 Summary: out f budget memory usage in reload buffer of hybrid hash 
join
 Key: ASTERIXDB-2582
 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2582
 Project: Apache AsterixDB
  Issue Type: Bug
  Components: *DB - AsterixDB
Affects Versions: 0.9.4.1
Reporter: Shiva Jahangiri


In optimized hybrid hash join for bringing some partitions back in "

makeSpaceForHashTableAndBringBackSpilledPartitions" method, a reload buffer 
with variable size is getting used. However, that buffer is not taken from join 
buffer manager, as a such, it is allocated from outside of the join budget.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ASTERIXDB-2581) Need of refactoring and unit test for hybrid hash join

2019-06-03 Thread Shiva Jahangiri (JIRA)
Shiva Jahangiri created ASTERIXDB-2581:
--

 Summary: Need of refactoring and unit test for hybrid hash join
 Key: ASTERIXDB-2581
 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2581
 Project: Apache AsterixDB
  Issue Type: Bug
  Components: *DB - AsterixDB
Affects Versions: 0.9.4.1
Reporter: Shiva Jahangiri


Currently, the whole logic of the optimized hybrid hash join is implemented in 
two classes of "OptimizedHybridHashJoin" and 
"OptimizedHybridHashJoinOperatorDescriptor". There is no unit test for either 
of them and the whole structure of the code makes it too difficult to add unit 
tests. There are a few bugs that could be found manually, which means there 
might be more hiding in the code due to the lack of unit testing. We need to 
refactor these two classes to smaller modules and have unit tests for them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ASTERIXDB-2577) Not keeping one frame for each spilled partition before starting the probe phase in hybrid hash join

2019-05-26 Thread Shiva Jahangiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/ASTERIXDB-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiva Jahangiri updated ASTERIXDB-2577:
---
Summary: Not keeping one frame for each spilled partition before starting 
the probe phase in hybrid hash join  (was: Not keeping one frame for each 
spilled partition during probe phase)

> Not keeping one frame for each spilled partition before starting the probe 
> phase in hybrid hash join
> 
>
> Key: ASTERIXDB-2577
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2577
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: *DB - AsterixDB
>Affects Versions: 0.9.4.1
>Reporter: Shiva Jahangiri
>Priority: Major
>
> In probe() method in optimized hybrid hash join, if insertion fails on the 
> current spilled partition, we try to find the biggest spilled partition and 
> flush it as a victim. If we could not find any spilled partition with size > 
> 0, then we ASSUME that the record is large and flush it as a big object. By 
> running customerOrderCIDHybridHashJoin_Case3() test in 
> TPCHCustomerOrderHashJoinTest, it can be seen that the record size is 206 
> bytes (so it is smaller than a frame), but neither the spilled partitions nor 
> the buffer manager has any frame (This is the problem, there should be 1 
> frame for each spilled partition). In this case, we flush the record as a 
> large object. This means that every single record that is supposed to get 
> inserted to a spilled partition during the probe, will get flushed 
> separately. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ASTERIXDB-2577) Not keeping one frame for each spilled partition during probe phase

2019-05-26 Thread Shiva Jahangiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/ASTERIXDB-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiva Jahangiri updated ASTERIXDB-2577:
---
Summary: Not keeping one frame for each spilled partition during probe 
phase  (was: Flushing small records during the probe in optimized hhj  as large 
objects)

> Not keeping one frame for each spilled partition during probe phase
> ---
>
> Key: ASTERIXDB-2577
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2577
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: *DB - AsterixDB
>Affects Versions: 0.9.4.1
>Reporter: Shiva Jahangiri
>Priority: Major
>
> In probe() method in optimized hybrid hash join, if insertion fails on the 
> current spilled partition, we try to find the biggest spilled partition and 
> flush it as a victim. If we could not find any spilled partition with size > 
> 0, then we ASSUME that the record is large and flush it as a big object. By 
> running customerOrderCIDHybridHashJoin_Case3() test in 
> TPCHCustomerOrderHashJoinTest, it can be seen that the record size is 206 
> bytes (so it is smaller than a frame), but neither the spilled partitions nor 
> the buffer manager has any frame (This is the problem, there should be 1 
> frame for each spilled partition). In this case, we flush the record as a 
> large object. This means that every single record that is supposed to get 
> inserted to a spilled partition during the probe, will get flushed 
> separately. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ASTERIXDB-2577) Flushing small records during the probe in optimized hhj as large objects

2019-05-26 Thread Shiva Jahangiri (JIRA)


 [ 
https://issues.apache.org/jira/browse/ASTERIXDB-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiva Jahangiri updated ASTERIXDB-2577:
---
Description: 
In probe() method in optimized hybrid hash join, if insertion fails on the 
current spilled partition, we try to find the biggest spilled partition and 
flush it as a victim. If we could not find any spilled partition with size > 0, 
then we ASSUME that the record is large and flush it as a big object. By 
running customerOrderCIDHybridHashJoin_Case3() test in 

TPCHCustomerOrderHashJoinTest, it can be seen that the record size is 206 bytes 
(so it is smaller than a frame), but neither the spilled partitions nor the 
buffer manager has any frame (This is the problem, there should be 1 frame for 
each spilled partition). In this case, we flush the record as a large object. 
This means that every single record that is supposed to get inserted to a 
spilled partition during the probe, will get flushed separately. 

  was:
In probe() method in optimized hybrid hash join, if insertion fails on the 
current spilled partition, we try to find the biggest spilled partition and 
flush it as a victim. If we could not find any spilled partition with size > 0, 
then we ASSUME that the record is large and flush it as a big object. By 
running customerOrderCIDHybridHashJoin_Case3() test in 

TPCHCustomerOrderHashJoinTest, it can be seen that the record size is 206 bytes 
(so it is smaller than a frame), but neither the spilled partitions nor the 
buffer manager has any frame (This is the problem, there should be 1 frame for 
each spilled partition). In this case, we flush the record(without checking if 
it is large or not) as a large object. This means that every single record that 
is supposed to get inserted to a spilled partition during the probe, will get 
flushed separately. 


> Flushing small records during the probe in optimized hhj  as large objects
> --
>
> Key: ASTERIXDB-2577
> URL: https://issues.apache.org/jira/browse/ASTERIXDB-2577
> Project: Apache AsterixDB
>  Issue Type: Bug
>  Components: *DB - AsterixDB
>Affects Versions: 0.9.4.1
>Reporter: Shiva Jahangiri
>Priority: Major
>
> In probe() method in optimized hybrid hash join, if insertion fails on the 
> current spilled partition, we try to find the biggest spilled partition and 
> flush it as a victim. If we could not find any spilled partition with size > 
> 0, then we ASSUME that the record is large and flush it as a big object. By 
> running customerOrderCIDHybridHashJoin_Case3() test in 
> TPCHCustomerOrderHashJoinTest, it can be seen that the record size is 206 
> bytes (so it is smaller than a frame), but neither the spilled partitions nor 
> the buffer manager has any frame (This is the problem, there should be 1 
> frame for each spilled partition). In this case, we flush the record as a 
> large object. This means that every single record that is supposed to get 
> inserted to a spilled partition during the probe, will get flushed 
> separately. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ASTERIXDB-2577) Flushing small records during the probe in optimized hhj as large objects

2019-05-25 Thread Shiva Jahangiri (JIRA)
Shiva Jahangiri created ASTERIXDB-2577:
--

 Summary: Flushing small records during the probe in optimized hhj  
as large objects
 Key: ASTERIXDB-2577
 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2577
 Project: Apache AsterixDB
  Issue Type: Bug
  Components: *DB - AsterixDB
Affects Versions: 0.9.4.1
Reporter: Shiva Jahangiri


In probe() method in optimized hybrid hash join, if insertion fails on the 
current spilled partition, we try to find the biggest spilled partition and 
flush it as a victim. If we could not find any spilled partition with size > 0, 
then we ASSUME that the record is large and flush it as a big object. By 
running customerOrderCIDHybridHashJoin_Case3() test in 

TPCHCustomerOrderHashJoinTest, it can be seen that the record size is 206 bytes 
(so it is smaller than a frame), but neither the spilled partitions nor the 
buffer manager has any frame (This is the problem, there should be 1 frame for 
each spilled partition). In this case, we flush the record(without checking if 
it is large or not) as a large object. This means that every single record that 
is supposed to get inserted to a spilled partition during the probe, will get 
flushed separately. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ASTERIXDB-2571) Memory assigned out of the budget in hash join for large objects

2019-05-22 Thread Shiva Jahangiri (JIRA)
Shiva Jahangiri created ASTERIXDB-2571:
--

 Summary: Memory assigned out of the budget in hash join for large 
objects
 Key: ASTERIXDB-2571
 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2571
 Project: Apache AsterixDB
  Issue Type: Bug
  Components: *DB - AsterixDB
Affects Versions: 0.9.4.1
Reporter: Shiva Jahangiri


Currently in optimized hybrid hash join in case of lack of memory asterix 
flushes the big prob object to the disk. For flushing a big object, a variable 
sized frame will be created from hyracks context not join memory budget. This 
means we may use more than the given budget.

Possible solutions:

1) Have a way to flush the incoming frame/buffer directly to the disk.

2) Spill enough data to be able to fit the large object(less preferred as it 
hurts the performance).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ASTERIXDB-2561) Out of memory in optimized hybrid hash join due to large records

2019-05-03 Thread Shiva Jahangiri (JIRA)
Shiva Jahangiri created ASTERIXDB-2561:
--

 Summary: Out of memory in optimized hybrid hash join due to large 
records
 Key: ASTERIXDB-2561
 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2561
 Project: Apache AsterixDB
  Issue Type: Bug
  Components: *DB - AsterixDB
Affects Versions: 0.9.4
Reporter: Shiva Jahangiri
 Fix For: 0.9.4.2


Currently, spilled partitions cannot grow for more than a frame, but there is 
no constraint on the size of that one frame. This can lead to out of memory 
when large records get inserted to spilled partitions, as we are not allowed to 
steal frames from spilled partitions. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug

2017-12-14 Thread Shiva Jahangiri (JIRA)

 [ 
https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shiva Jahangiri updated ASTERIXDB-2199:
---
Description: 
If a join is happening on primary keys of two tables, no hash partitioning 
should happen. Having the following DDL(Note that primary key of Friendship2 is 
string):

DROP DATAVERSE Facebook IF EXISTS;
CREATE DATAVERSE Facebook;

Use Facebook;

CREATE TYPE FriendshipType AS closed {
id:string,
friends :[string]
};

CREATE DATASET Friendship2(FriendshipType)
PRIMARY KEY id; 


insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, 
{"id":"2","friends" : [ "4","5","6"]}
]);



By running the following query:


Use Facebook;

select * from Friendship2 first, Friendship2 second where first.id = second.id;



we can see that there is no hash partitioning happening in optimized logical 
plan which is correct as join is happening on the primary key of both relations 
and data is already partitioned on primary key:



{
 "operator":"distribute-result",
 "expressions":"$$9",
 "operatorId" : "1.1",
 "physical-operator":"DISTRIBUTE_RESULT",
 "execution-mode":"PARTITIONED",
 "inputs":[
 {
  "operator":"exchange",
  "operatorId" : "1.2",
  "physical-operator":"ONE_TO_ONE_EXCHANGE",
  "execution-mode":"PARTITIONED",
  "inputs":[
  {
   "operator":"project",
   "variables" :["$$9"],
   "operatorId" : "1.3",
   "physical-operator":"STREAM_PROJECT",
   "execution-mode":"PARTITIONED",
   "inputs":[
   {
"operator":"assign",
"variables" :["$$9"],
"expressions":"{ first : $$first,  second : $$second}",
"operatorId" : "1.4",
"physical-operator":"ASSIGN",
"execution-mode":"PARTITIONED",
"inputs":[
{
 "operator":"project",
 "variables" :["$$first","$$second"],
 "operatorId" : "1.5",
 "physical-operator":"STREAM_PROJECT",
 "execution-mode":"PARTITIONED",
 "inputs":[
 {
  "operator":"exchange",
  "operatorId" : "1.6",
  "physical-operator":"ONE_TO_ONE_EXCHANGE",
  "execution-mode":"PARTITIONED",
  "inputs":[
  {
   "operator":"join",
   "condition":"eq($$10, $$11)",
   "operatorId" : "1.7",
   "physical-operator":"HYBRID_HASH_JOIN 
[$$10][$$11]",
   "execution-mode":"PARTITIONED",
   "inputs":[
   {
"operator":"exchange",
"operatorId" : "1.8",
"physical-operator":"ONE_TO_ONE_EXCHANGE",
"execution-mode":"PARTITIONED",
"inputs":[
{
 "operator":"data-scan",
 "variables" :["$$10","$$first"],
 "data-source":"Facebook.Friendship2",
 "operatorId" : "1.9",
 "physical-operator":"DATASOURCE_SCAN",
 "execution-mode":"PARTITIONED",
 "inputs":[
 {
  "operator":"exchange",
  "operatorId" : "1.10",
  
"physical-operator":"ONE_TO_ONE_EXCHANGE",
  "execution-mode":"PARTITIONED",
  "inputs":[
  {
   
"operator":"empty-tuple-source",
   "operatorId" : "1.11",
   
"physical-operator":"EMPTY_TUPLE_SOURCE",
   
"execution-mode":"PARTITIONED"
  }
]

 }
   ]

}
  ]

   }
,   {
  

[jira] [Created] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug

2017-12-14 Thread Shiva Jahangiri (JIRA)
Shiva Jahangiri created ASTERIXDB-2199:
--

 Summary: Nested primary key and hash repartitioning bug 
 Key: ASTERIXDB-2199
 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199
 Project: Apache AsterixDB
  Issue Type: Bug
  Components: *DB - AsterixDB
Reporter: Shiva Jahangiri


If a join is happening on primary keys of two tables, no hash partitioning 
should happen. Having the following DDL(Note that primary key of Friendship2 is 
string):

DROP DATAVERSE Facebook IF EXISTS;
CREATE DATAVERSE Facebook;

Use Facebook;

CREATE TYPE FriendshipType AS closed {
id:string,
friends :[string]
};

CREATE DATASET Friendship2(FriendshipType)
PRIMARY KEY id; 


insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, 
{"id":"2","friends" : [ "4","5","6"]}
]);

By running the following query:


Use Facebook;

select * from Friendship2 first, Friendship2 second where first.id = second.id;


we can see that there is no hash partitioning happening in optimized logical 
plan which is correct as join is happening on the primary key of both relations 
and data is already partitioned on primary key:

{
 "operator":"distribute-result",
 "expressions":"$$9",
 "operatorId" : "1.1",
 "physical-operator":"DISTRIBUTE_RESULT",
 "execution-mode":"PARTITIONED",
 "inputs":[
 {
  "operator":"exchange",
  "operatorId" : "1.2",
  "physical-operator":"ONE_TO_ONE_EXCHANGE",
  "execution-mode":"PARTITIONED",
  "inputs":[
  {
   "operator":"project",
   "variables" :["$$9"],
   "operatorId" : "1.3",
   "physical-operator":"STREAM_PROJECT",
   "execution-mode":"PARTITIONED",
   "inputs":[
   {
"operator":"assign",
"variables" :["$$9"],
"expressions":"{ first : $$first,  second : $$second}",
"operatorId" : "1.4",
"physical-operator":"ASSIGN",
"execution-mode":"PARTITIONED",
"inputs":[
{
 "operator":"project",
 "variables" :["$$first","$$second"],
 "operatorId" : "1.5",
 "physical-operator":"STREAM_PROJECT",
 "execution-mode":"PARTITIONED",
 "inputs":[
 {
  "operator":"exchange",
  "operatorId" : "1.6",
  "physical-operator":"ONE_TO_ONE_EXCHANGE",
  "execution-mode":"PARTITIONED",
  "inputs":[
  {
   "operator":"join",
   "condition":"eq($$10, $$11)",
   "operatorId" : "1.7",
   "physical-operator":"HYBRID_HASH_JOIN 
[$$10][$$11]",
   "execution-mode":"PARTITIONED",
   "inputs":[
   {
"operator":"exchange",
"operatorId" : "1.8",
"physical-operator":"ONE_TO_ONE_EXCHANGE",
"execution-mode":"PARTITIONED",
"inputs":[
{
 "operator":"data-scan",
 "variables" :["$$10","$$first"],
 "data-source":"Facebook.Friendship2",
 "operatorId" : "1.9",
 "physical-operator":"DATASOURCE_SCAN",
 "execution-mode":"PARTITIONED",
 "inputs":[
 {
  "operator":"exchange",
  "operatorId" : "1.10",
  
"physical-operator":"ONE_TO_ONE_EXCHANGE",
  "execution-mode":"PARTITIONED",
  "inputs":[
  {
   
"operator":"empty-tuple-source",
   "operatorId" : "1.11",
   
"physical-operator":"EMPTY_TUPLE_SOURCE",
   
"execution-mode":"PARTITIONED"
  }
]

 }
   ]

  

[jira] [Created] (ASTERIXDB-2037) Retrieve plan format variables from the UI in QueryServiceServlet

2017-08-13 Thread Shiva Jahangiri (JIRA)
Shiva Jahangiri created ASTERIXDB-2037:
--

 Summary: Retrieve plan format variables from the UI in 
QueryServiceServlet
 Key: ASTERIXDB-2037
 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2037
 Project: Apache AsterixDB
  Issue Type: Bug
Reporter: Shiva Jahangiri
Priority: Minor


Currently we have no way to determine which source these values should come 
from in QueryServiceServlet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ASTERIXDB-1921) Replication bugs and their affects in optimized logical plan

2017-05-25 Thread Shiva Jahangiri (JIRA)
Shiva Jahangiri created ASTERIXDB-1921:
--

 Summary: Replication bugs and their affects in optimized logical 
plan
 Key: ASTERIXDB-1921
 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1921
 Project: Apache AsterixDB
  Issue Type: Bug
  Components: AsterixDB, Optimizer
Reporter: Shiva Jahangiri


We were trying to see some replication in optimized logical plan by trying to 
run the following query on the example in AQL primer,however print optimized 
logical plan/print hyracks job/Execute query threw null pointer exception:

Query:
use dataverse TinySocial;
  let $temp := (for $message in dataset GleambookMessages
   where $message.authorId  >= 0 return $message)
return{
   "count1":count(
 for $t1 in $temp
for $user in dataset GleambookUsers
where $t1.authorId  = $user.id and $user.id > 0
  return {
"user":  $user,
"message": $t1
   }),
   "count2": count(
for $t2 in $temp
   for $user in dataset GleambookUsers
   where $t2.authorId  = $user.id and $user.id < 11
return {
"user":  $user,
"message": $t2
  })
}


Error :
Internal error. Please check instance logs for further details. 
[NullPointerException]

It happened when replication was happening as this query ran well with with 
either count1 or count2 but not both.

What we tried next to track the bug down, was the following query which is the 
same query as above without using replication:

use dataverse TinySocial;
 {
   "count1":count(
 for $t1 in (for $message in dataset GleambookMessages
   where $message.authorId  >= 0 return $message)
for $user in dataset GleambookUsers
where $t1.authorId  = $user.id and $user.id > 0
  return {
"user":  $user,
"message": $t1
   }),
   "count2": count(
for $t2 in (for $message in dataset GleambookMessages
   where $message.authorId  >= 0 return $message)
   for $user in dataset GleambookUsers
   where $t2.authorId  = $user.id and $user.id < 11
return {
"user":  $user,
"message": $t2
  })
}

This query produced the result and optimized logical plan successfully.

We continued by trying a simpler query that uses replication as follow:

use dataverse TinySocial;
let $temp := 
 (for $message in dataset GleambookMessages
   where $message.authorId  = 1 return $message)
 return {
   "copy1":(for $m in $temp where $m.messageId <= 10 return $m),
   "copy2":(for $m in $temp where $m.messageId >10 return $m)
}

Which produces the following optimized logical plan:

distribute result [$$8]
-- DISTRIBUTE_RESULT  |UNPARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |UNPARTITIONED|
project ([$$8])
-- STREAM_PROJECT  |UNPARTITIONED|
  assign [$$8] <- [{"copy1": $$11, "copy2": $$14}]
  -- ASSIGN  |UNPARTITIONED|
project ([$$11, $$14])
-- STREAM_PROJECT  |UNPARTITIONED|
  subplan {
aggregate [$$14] <- [listify($$m)]
-- AGGREGATE  |UNPARTITIONED|
  select (gt($$18, 10))
  -- STREAM_SELECT  |UNPARTITIONED|
assign [$$18] <- [$$m.getField(0)]
-- ASSIGN  |UNPARTITIONED|
  unnest $$m <- scan-collection($$7)
  -- UNNEST  |UNPARTITIONED|
nested tuple source
-- NESTED_TUPLE_SOURCE  |UNPARTITIONED|
 }
  -- SUBPLAN  |UNPARTITIONED|
subplan {
  aggregate [$$11] <- [listify($$m)]
  -- AGGREGATE  |UNPARTITIONED|
select (le($$17, 10))
-- STREAM_SELECT  |UNPARTITIONED|
  assign [$$17] <- [$$m.getField(0)]
  -- ASSIGN  |UNPARTITIONED|
unnest $$m <- scan-collection($$7)
-- UNNEST  |UNPARTITIONED|
  nested tuple source
  -- NESTED_TUPLE_SOURCE  |UNPARTITIONED|
   }
-- SUBPLAN  |UNPARTITIONED|
  aggregate [$$7] <- [listify($$message)] // ——>”why 
listifying?”
  -- AGGREGATE  |UNPARTITIONED|
exchange
-- RANDOM_MERGE_EXCHANGE  |PARTITIONED|
  select (eq($$message.getField(1), 1))
  -- STREAM_SELECT  |PARTITIONED|
project ([$$message])
-- STREAM_PROJECT  |PARTITIONED|
  exchange