[jira] [Updated] (ASTERIXDB-2788) Plan's stages are not created correctly for bushy plans in compile time
[ https://issues.apache.org/jira/browse/ASTERIXDB-2788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiva Jahangiri updated ASTERIXDB-2788: --- Summary: Plan's stages are not created correctly for bushy plans in compile time (was: Plan's stages do not create correct stages for bushy plans in compile time) > Plan's stages are not created correctly for bushy plans in compile time > --- > > Key: ASTERIXDB-2788 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2788 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Shiva Jahangiri >Priority: Major > > PlanStagesGenerator generates "pipelines" instead of stages and as such for > calculating the maximum required memory it calculates the memory needed for > largest pipeline not the largest stage. It does not impact the linear plans, > but for bushy plans, the independent pipelines (independent builds) that can > be co-scheduled should get merged into one stage. In this case, merging some > small pipelines can make a larger stage than what was considered to be the > largest stage in the past (which is the largest pipeline). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2788) Plan's stages do not create correct stages for bushy plans in compile time
Shiva Jahangiri created ASTERIXDB-2788: -- Summary: Plan's stages do not create correct stages for bushy plans in compile time Key: ASTERIXDB-2788 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2788 Project: Apache AsterixDB Issue Type: Bug Reporter: Shiva Jahangiri PlanStagesGenerator generates "pipelines" instead of stages and as such for calculating the maximum required memory it calculates the memory needed for largest pipeline not the largest stage. It does not impact the linear plans, but for bushy plans, the independent pipelines (independent builds) that can be co-scheduled should get merged into one stage. In this case, merging some small pipelines can make a larger stage than what was considered to be the largest stage in the past (which is the largest pipeline). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ASTERIXDB-2784) Join memory requirement for large objects
[ https://issues.apache.org/jira/browse/ASTERIXDB-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203463#comment-17203463 ] Shiva Jahangiri commented on ASTERIXDB-2784: The fix for this issue is in my next patch for hybrid hash join where I define NoGrow-NoSteal and Grow-Steal policies. Feel free to assign it to me. > Join memory requirement for large objects > - > > Key: ASTERIXDB-2784 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2784 > Project: Apache AsterixDB > Issue Type: Improvement > Components: COMP - Compiler, RT - Runtime >Reporter: Chen Luo >Priority: Major > > Currently the compiler assumes the minimum number of join frames is 5 [1]. > However, this does not guarantee a join will always succeed in case of large > objects. The actual join memory requirement is actually MAX(5, #partitions * > #large object size). The reason is that in the spill policy [2], we only > spill a partition if it hasn't been spilled before. As a result, when we are > writing to an empty partition, it is possible that each of other partitions > has one large object (which could be larger than the frame size) but no > partition can be spilled. Thus, the join memory requirement becomes > #partitions * #large object size in this case. > [1] > [https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29)|https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29).] > [2] > https://github.com/apache/asterixdb/blob/37dfed60fb47afcc86de6d17704a8f100217057d/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java#L55 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2656) No need for frame constraint in probe phase of hybrid hash join
Shiva Jahangiri created ASTERIXDB-2656: -- Summary: No need for frame constraint in probe phase of hybrid hash join Key: ASTERIXDB-2656 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2656 Project: Apache AsterixDB Issue Type: Bug Components: *DB - AsterixDB Reporter: Shiva Jahangiri In an optimized hybrid hash join, there is a constraint that a spilled partition cannot have more than 1 frame. While it seems to be necessary during the build phase to help with having as much data for in-memory partitions, it seems an unnecessary constraint during the probe phase. During the probe, we should let the spilled partitions to grow and use the leftover frames not used by in-memory partitions. This also solves a bug that hybrid hash join currently has. During the probe phase, in case there is not enough memory for inserting a large record, asterixdb may either flush the record directly to the disk or spill the largest spilled partition to the disk, depending on which one is larger. In the case that the biggest spilled partition is the vicitim to flush and release the frames, the large record cannot use the freed frames as it belongs to a spilled partition(otherwise it would have been probed) and as such 1-frame limit is applied on it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2648) Inconsistency in dataset distribution for Broadcast Hint
Shiva Jahangiri created ASTERIXDB-2648: -- Summary: Inconsistency in dataset distribution for Broadcast Hint Key: ASTERIXDB-2648 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2648 Project: Apache AsterixDB Issue Type: Bug Components: COMP - Compiler Reporter: Shiva Jahangiri The Broadcast hint (/*+ bcast */) is using the order of the datasets in WHERE clause for a join to choose the broadcasting dataset. This is not consistent with other types of joins such as Hybrid Hash Join and Indexed Nested Loop Join. It can also cause confusion for both user and AsterixDB if the query has more than one condition in the WHERE clause for a join, and order of datasets in these conditions are consistent. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2582) out f budget memory usage in reload buffer of hybrid hash join
Shiva Jahangiri created ASTERIXDB-2582: -- Summary: out f budget memory usage in reload buffer of hybrid hash join Key: ASTERIXDB-2582 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2582 Project: Apache AsterixDB Issue Type: Bug Components: *DB - AsterixDB Affects Versions: 0.9.4.1 Reporter: Shiva Jahangiri In optimized hybrid hash join for bringing some partitions back in " makeSpaceForHashTableAndBringBackSpilledPartitions" method, a reload buffer with variable size is getting used. However, that buffer is not taken from join buffer manager, as a such, it is allocated from outside of the join budget. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2581) Need of refactoring and unit test for hybrid hash join
Shiva Jahangiri created ASTERIXDB-2581: -- Summary: Need of refactoring and unit test for hybrid hash join Key: ASTERIXDB-2581 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2581 Project: Apache AsterixDB Issue Type: Bug Components: *DB - AsterixDB Affects Versions: 0.9.4.1 Reporter: Shiva Jahangiri Currently, the whole logic of the optimized hybrid hash join is implemented in two classes of "OptimizedHybridHashJoin" and "OptimizedHybridHashJoinOperatorDescriptor". There is no unit test for either of them and the whole structure of the code makes it too difficult to add unit tests. There are a few bugs that could be found manually, which means there might be more hiding in the code due to the lack of unit testing. We need to refactor these two classes to smaller modules and have unit tests for them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ASTERIXDB-2577) Not keeping one frame for each spilled partition before starting the probe phase in hybrid hash join
[ https://issues.apache.org/jira/browse/ASTERIXDB-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiva Jahangiri updated ASTERIXDB-2577: --- Summary: Not keeping one frame for each spilled partition before starting the probe phase in hybrid hash join (was: Not keeping one frame for each spilled partition during probe phase) > Not keeping one frame for each spilled partition before starting the probe > phase in hybrid hash join > > > Key: ASTERIXDB-2577 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2577 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB >Affects Versions: 0.9.4.1 >Reporter: Shiva Jahangiri >Priority: Major > > In probe() method in optimized hybrid hash join, if insertion fails on the > current spilled partition, we try to find the biggest spilled partition and > flush it as a victim. If we could not find any spilled partition with size > > 0, then we ASSUME that the record is large and flush it as a big object. By > running customerOrderCIDHybridHashJoin_Case3() test in > TPCHCustomerOrderHashJoinTest, it can be seen that the record size is 206 > bytes (so it is smaller than a frame), but neither the spilled partitions nor > the buffer manager has any frame (This is the problem, there should be 1 > frame for each spilled partition). In this case, we flush the record as a > large object. This means that every single record that is supposed to get > inserted to a spilled partition during the probe, will get flushed > separately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ASTERIXDB-2577) Not keeping one frame for each spilled partition during probe phase
[ https://issues.apache.org/jira/browse/ASTERIXDB-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiva Jahangiri updated ASTERIXDB-2577: --- Summary: Not keeping one frame for each spilled partition during probe phase (was: Flushing small records during the probe in optimized hhj as large objects) > Not keeping one frame for each spilled partition during probe phase > --- > > Key: ASTERIXDB-2577 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2577 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB >Affects Versions: 0.9.4.1 >Reporter: Shiva Jahangiri >Priority: Major > > In probe() method in optimized hybrid hash join, if insertion fails on the > current spilled partition, we try to find the biggest spilled partition and > flush it as a victim. If we could not find any spilled partition with size > > 0, then we ASSUME that the record is large and flush it as a big object. By > running customerOrderCIDHybridHashJoin_Case3() test in > TPCHCustomerOrderHashJoinTest, it can be seen that the record size is 206 > bytes (so it is smaller than a frame), but neither the spilled partitions nor > the buffer manager has any frame (This is the problem, there should be 1 > frame for each spilled partition). In this case, we flush the record as a > large object. This means that every single record that is supposed to get > inserted to a spilled partition during the probe, will get flushed > separately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ASTERIXDB-2577) Flushing small records during the probe in optimized hhj as large objects
[ https://issues.apache.org/jira/browse/ASTERIXDB-2577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiva Jahangiri updated ASTERIXDB-2577: --- Description: In probe() method in optimized hybrid hash join, if insertion fails on the current spilled partition, we try to find the biggest spilled partition and flush it as a victim. If we could not find any spilled partition with size > 0, then we ASSUME that the record is large and flush it as a big object. By running customerOrderCIDHybridHashJoin_Case3() test in TPCHCustomerOrderHashJoinTest, it can be seen that the record size is 206 bytes (so it is smaller than a frame), but neither the spilled partitions nor the buffer manager has any frame (This is the problem, there should be 1 frame for each spilled partition). In this case, we flush the record as a large object. This means that every single record that is supposed to get inserted to a spilled partition during the probe, will get flushed separately. was: In probe() method in optimized hybrid hash join, if insertion fails on the current spilled partition, we try to find the biggest spilled partition and flush it as a victim. If we could not find any spilled partition with size > 0, then we ASSUME that the record is large and flush it as a big object. By running customerOrderCIDHybridHashJoin_Case3() test in TPCHCustomerOrderHashJoinTest, it can be seen that the record size is 206 bytes (so it is smaller than a frame), but neither the spilled partitions nor the buffer manager has any frame (This is the problem, there should be 1 frame for each spilled partition). In this case, we flush the record(without checking if it is large or not) as a large object. This means that every single record that is supposed to get inserted to a spilled partition during the probe, will get flushed separately. > Flushing small records during the probe in optimized hhj as large objects > -- > > Key: ASTERIXDB-2577 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2577 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB >Affects Versions: 0.9.4.1 >Reporter: Shiva Jahangiri >Priority: Major > > In probe() method in optimized hybrid hash join, if insertion fails on the > current spilled partition, we try to find the biggest spilled partition and > flush it as a victim. If we could not find any spilled partition with size > > 0, then we ASSUME that the record is large and flush it as a big object. By > running customerOrderCIDHybridHashJoin_Case3() test in > TPCHCustomerOrderHashJoinTest, it can be seen that the record size is 206 > bytes (so it is smaller than a frame), but neither the spilled partitions nor > the buffer manager has any frame (This is the problem, there should be 1 > frame for each spilled partition). In this case, we flush the record as a > large object. This means that every single record that is supposed to get > inserted to a spilled partition during the probe, will get flushed > separately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2577) Flushing small records during the probe in optimized hhj as large objects
Shiva Jahangiri created ASTERIXDB-2577: -- Summary: Flushing small records during the probe in optimized hhj as large objects Key: ASTERIXDB-2577 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2577 Project: Apache AsterixDB Issue Type: Bug Components: *DB - AsterixDB Affects Versions: 0.9.4.1 Reporter: Shiva Jahangiri In probe() method in optimized hybrid hash join, if insertion fails on the current spilled partition, we try to find the biggest spilled partition and flush it as a victim. If we could not find any spilled partition with size > 0, then we ASSUME that the record is large and flush it as a big object. By running customerOrderCIDHybridHashJoin_Case3() test in TPCHCustomerOrderHashJoinTest, it can be seen that the record size is 206 bytes (so it is smaller than a frame), but neither the spilled partitions nor the buffer manager has any frame (This is the problem, there should be 1 frame for each spilled partition). In this case, we flush the record(without checking if it is large or not) as a large object. This means that every single record that is supposed to get inserted to a spilled partition during the probe, will get flushed separately. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2571) Memory assigned out of the budget in hash join for large objects
Shiva Jahangiri created ASTERIXDB-2571: -- Summary: Memory assigned out of the budget in hash join for large objects Key: ASTERIXDB-2571 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2571 Project: Apache AsterixDB Issue Type: Bug Components: *DB - AsterixDB Affects Versions: 0.9.4.1 Reporter: Shiva Jahangiri Currently in optimized hybrid hash join in case of lack of memory asterix flushes the big prob object to the disk. For flushing a big object, a variable sized frame will be created from hyracks context not join memory budget. This means we may use more than the given budget. Possible solutions: 1) Have a way to flush the incoming frame/buffer directly to the disk. 2) Spill enough data to be able to fit the large object(less preferred as it hurts the performance). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2561) Out of memory in optimized hybrid hash join due to large records
Shiva Jahangiri created ASTERIXDB-2561: -- Summary: Out of memory in optimized hybrid hash join due to large records Key: ASTERIXDB-2561 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2561 Project: Apache AsterixDB Issue Type: Bug Components: *DB - AsterixDB Affects Versions: 0.9.4 Reporter: Shiva Jahangiri Fix For: 0.9.4.2 Currently, spilled partitions cannot grow for more than a frame, but there is no constraint on the size of that one frame. This can lead to out of memory when large records get inserted to spilled partitions, as we are not allowed to steal frames from spilled partitions. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug
[ https://issues.apache.org/jira/browse/ASTERIXDB-2199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shiva Jahangiri updated ASTERIXDB-2199: --- Description: If a join is happening on primary keys of two tables, no hash partitioning should happen. Having the following DDL(Note that primary key of Friendship2 is string): DROP DATAVERSE Facebook IF EXISTS; CREATE DATAVERSE Facebook; Use Facebook; CREATE TYPE FriendshipType AS closed { id:string, friends :[string] }; CREATE DATASET Friendship2(FriendshipType) PRIMARY KEY id; insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, {"id":"2","friends" : [ "4","5","6"]} ]); By running the following query: Use Facebook; select * from Friendship2 first, Friendship2 second where first.id = second.id; we can see that there is no hash partitioning happening in optimized logical plan which is correct as join is happening on the primary key of both relations and data is already partitioned on primary key: { "operator":"distribute-result", "expressions":"$$9", "operatorId" : "1.1", "physical-operator":"DISTRIBUTE_RESULT", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"exchange", "operatorId" : "1.2", "physical-operator":"ONE_TO_ONE_EXCHANGE", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"project", "variables" :["$$9"], "operatorId" : "1.3", "physical-operator":"STREAM_PROJECT", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"assign", "variables" :["$$9"], "expressions":"{ first : $$first, second : $$second}", "operatorId" : "1.4", "physical-operator":"ASSIGN", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"project", "variables" :["$$first","$$second"], "operatorId" : "1.5", "physical-operator":"STREAM_PROJECT", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"exchange", "operatorId" : "1.6", "physical-operator":"ONE_TO_ONE_EXCHANGE", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"join", "condition":"eq($$10, $$11)", "operatorId" : "1.7", "physical-operator":"HYBRID_HASH_JOIN [$$10][$$11]", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"exchange", "operatorId" : "1.8", "physical-operator":"ONE_TO_ONE_EXCHANGE", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"data-scan", "variables" :["$$10","$$first"], "data-source":"Facebook.Friendship2", "operatorId" : "1.9", "physical-operator":"DATASOURCE_SCAN", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"exchange", "operatorId" : "1.10", "physical-operator":"ONE_TO_ONE_EXCHANGE", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"empty-tuple-source", "operatorId" : "1.11", "physical-operator":"EMPTY_TUPLE_SOURCE", "execution-mode":"PARTITIONED" } ] } ] } ] } , {
[jira] [Created] (ASTERIXDB-2199) Nested primary key and hash repartitioning bug
Shiva Jahangiri created ASTERIXDB-2199: -- Summary: Nested primary key and hash repartitioning bug Key: ASTERIXDB-2199 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2199 Project: Apache AsterixDB Issue Type: Bug Components: *DB - AsterixDB Reporter: Shiva Jahangiri If a join is happening on primary keys of two tables, no hash partitioning should happen. Having the following DDL(Note that primary key of Friendship2 is string): DROP DATAVERSE Facebook IF EXISTS; CREATE DATAVERSE Facebook; Use Facebook; CREATE TYPE FriendshipType AS closed { id:string, friends :[string] }; CREATE DATASET Friendship2(FriendshipType) PRIMARY KEY id; insert into Friendship2([ {"id":"1","friends" : [ "2","3","4"]}, {"id":"2","friends" : [ "4","5","6"]} ]); By running the following query: Use Facebook; select * from Friendship2 first, Friendship2 second where first.id = second.id; we can see that there is no hash partitioning happening in optimized logical plan which is correct as join is happening on the primary key of both relations and data is already partitioned on primary key: { "operator":"distribute-result", "expressions":"$$9", "operatorId" : "1.1", "physical-operator":"DISTRIBUTE_RESULT", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"exchange", "operatorId" : "1.2", "physical-operator":"ONE_TO_ONE_EXCHANGE", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"project", "variables" :["$$9"], "operatorId" : "1.3", "physical-operator":"STREAM_PROJECT", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"assign", "variables" :["$$9"], "expressions":"{ first : $$first, second : $$second}", "operatorId" : "1.4", "physical-operator":"ASSIGN", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"project", "variables" :["$$first","$$second"], "operatorId" : "1.5", "physical-operator":"STREAM_PROJECT", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"exchange", "operatorId" : "1.6", "physical-operator":"ONE_TO_ONE_EXCHANGE", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"join", "condition":"eq($$10, $$11)", "operatorId" : "1.7", "physical-operator":"HYBRID_HASH_JOIN [$$10][$$11]", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"exchange", "operatorId" : "1.8", "physical-operator":"ONE_TO_ONE_EXCHANGE", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"data-scan", "variables" :["$$10","$$first"], "data-source":"Facebook.Friendship2", "operatorId" : "1.9", "physical-operator":"DATASOURCE_SCAN", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"exchange", "operatorId" : "1.10", "physical-operator":"ONE_TO_ONE_EXCHANGE", "execution-mode":"PARTITIONED", "inputs":[ { "operator":"empty-tuple-source", "operatorId" : "1.11", "physical-operator":"EMPTY_TUPLE_SOURCE", "execution-mode":"PARTITIONED" } ] } ]
[jira] [Created] (ASTERIXDB-2037) Retrieve plan format variables from the UI in QueryServiceServlet
Shiva Jahangiri created ASTERIXDB-2037: -- Summary: Retrieve plan format variables from the UI in QueryServiceServlet Key: ASTERIXDB-2037 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2037 Project: Apache AsterixDB Issue Type: Bug Reporter: Shiva Jahangiri Priority: Minor Currently we have no way to determine which source these values should come from in QueryServiceServlet. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (ASTERIXDB-1921) Replication bugs and their affects in optimized logical plan
Shiva Jahangiri created ASTERIXDB-1921: -- Summary: Replication bugs and their affects in optimized logical plan Key: ASTERIXDB-1921 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1921 Project: Apache AsterixDB Issue Type: Bug Components: AsterixDB, Optimizer Reporter: Shiva Jahangiri We were trying to see some replication in optimized logical plan by trying to run the following query on the example in AQL primer,however print optimized logical plan/print hyracks job/Execute query threw null pointer exception: Query: use dataverse TinySocial; let $temp := (for $message in dataset GleambookMessages where $message.authorId >= 0 return $message) return{ "count1":count( for $t1 in $temp for $user in dataset GleambookUsers where $t1.authorId = $user.id and $user.id > 0 return { "user": $user, "message": $t1 }), "count2": count( for $t2 in $temp for $user in dataset GleambookUsers where $t2.authorId = $user.id and $user.id < 11 return { "user": $user, "message": $t2 }) } Error : Internal error. Please check instance logs for further details. [NullPointerException] It happened when replication was happening as this query ran well with with either count1 or count2 but not both. What we tried next to track the bug down, was the following query which is the same query as above without using replication: use dataverse TinySocial; { "count1":count( for $t1 in (for $message in dataset GleambookMessages where $message.authorId >= 0 return $message) for $user in dataset GleambookUsers where $t1.authorId = $user.id and $user.id > 0 return { "user": $user, "message": $t1 }), "count2": count( for $t2 in (for $message in dataset GleambookMessages where $message.authorId >= 0 return $message) for $user in dataset GleambookUsers where $t2.authorId = $user.id and $user.id < 11 return { "user": $user, "message": $t2 }) } This query produced the result and optimized logical plan successfully. We continued by trying a simpler query that uses replication as follow: use dataverse TinySocial; let $temp := (for $message in dataset GleambookMessages where $message.authorId = 1 return $message) return { "copy1":(for $m in $temp where $m.messageId <= 10 return $m), "copy2":(for $m in $temp where $m.messageId >10 return $m) } Which produces the following optimized logical plan: distribute result [$$8] -- DISTRIBUTE_RESULT |UNPARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED| project ([$$8]) -- STREAM_PROJECT |UNPARTITIONED| assign [$$8] <- [{"copy1": $$11, "copy2": $$14}] -- ASSIGN |UNPARTITIONED| project ([$$11, $$14]) -- STREAM_PROJECT |UNPARTITIONED| subplan { aggregate [$$14] <- [listify($$m)] -- AGGREGATE |UNPARTITIONED| select (gt($$18, 10)) -- STREAM_SELECT |UNPARTITIONED| assign [$$18] <- [$$m.getField(0)] -- ASSIGN |UNPARTITIONED| unnest $$m <- scan-collection($$7) -- UNNEST |UNPARTITIONED| nested tuple source -- NESTED_TUPLE_SOURCE |UNPARTITIONED| } -- SUBPLAN |UNPARTITIONED| subplan { aggregate [$$11] <- [listify($$m)] -- AGGREGATE |UNPARTITIONED| select (le($$17, 10)) -- STREAM_SELECT |UNPARTITIONED| assign [$$17] <- [$$m.getField(0)] -- ASSIGN |UNPARTITIONED| unnest $$m <- scan-collection($$7) -- UNNEST |UNPARTITIONED| nested tuple source -- NESTED_TUPLE_SOURCE |UNPARTITIONED| } -- SUBPLAN |UNPARTITIONED| aggregate [$$7] <- [listify($$message)] // ——>”why listifying?” -- AGGREGATE |UNPARTITIONED| exchange -- RANDOM_MERGE_EXCHANGE |PARTITIONED| select (eq($$message.getField(1), 1)) -- STREAM_SELECT |PARTITIONED| project ([$$message]) -- STREAM_PROJECT |PARTITIONED| exchange