[jira] [Created] (ASTERIXDB-2813) Limit the number of flush/merge threads
Chen Luo created ASTERIXDB-2813: --- Summary: Limit the number of flush/merge threads Key: ASTERIXDB-2813 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2813 Project: Apache AsterixDB Issue Type: Improvement Components: STO - Storage Reporter: Chen Luo Assignee: Chen Luo Currently, AsterixDB uses one thread to execute each flush and merge operation. This may result in a large number of I/O threads in some cases, e.g., when writing to many datasets at the same time. A better solution is to enforce some limit on the number of flush and merge threads. When there is no thread available, a newly created flush or merge operation should be delayed. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ASTERIXDB-2540) Optimize Performance Stability of Storage
[ https://issues.apache.org/jira/browse/ASTERIXDB-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2540. - Resolution: Implemented > Optimize Performance Stability of Storage > - > > Key: ASTERIXDB-2540 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2540 > Project: Apache AsterixDB > Issue Type: Improvement > Components: STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > This is one of a series of improvements to optimize the performance stability > of our storage subsystem, which suffers from a number of problems. The end > result is that there are periodically write stalls during data ingestion, > even though the ingestion speed is relatively low. > This improvement will deal with the following issues: > 1. Bypass all queuing of disk writes during LSM flush and merge operations. > Queuing (by BufferCache and IOManager) will cause serious problems to the > fairness of disk writes. Thus, a small flush operation could be severely > interfered with a large merge operation and would take a much longer time to > finish. > 2. Perform regular disk forces during flush and merge operations (16MB by > default). This is very helpful to limit the I/O queue length of the file > system and provides fairness to queries and other writers. This optimization > has been implemented in most storage systems today, including Couchbase > Server. > 3. Optionally, add the support for rate limiting of disk writes to ensure the > performance stability of queries. The user can configure the maximum disk > write bandwidth for each dataset. This ensures that the system can provide > stable performance for both queries and writes, even with large background > merges. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ASTERIXDB-2708) Optimize Primary Point Searches Via Batching and Stateful Cursors
[ https://issues.apache.org/jira/browse/ASTERIXDB-2708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2708. - Resolution: Implemented > Optimize Primary Point Searches Via Batching and Stateful Cursors > - > > Key: ASTERIXDB-2708 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2708 > Project: Apache AsterixDB > Issue Type: Improvement > Components: RT - Runtime, STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > Currently, the primary index point searches can be expensive, especially when > a query is not selecitivity, for a few reasons: > * Enter and exit LSM components for each search key > * Always traverse from root to leaf when searching a key > To optimize primary point searches, we introduce a number of optimizations > here: > * Introduce a batched point search cursor that enters an LSM index for a > batch of keys to amortize the cost > * Introduce a stateful BTree search algorithm that reuses the previous search > history to speedup subsequently searches. Specifically, we keep track of the > last leaf page ID and the last key index. For the next search key, if it > still exists in the last leaf page, we do not have to traverse from root to > leaf again. Moreover, instead of using binary search, we use exponential > search to reduce the search cost in case there are a lot of keys. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ASTERIXDB-2715) Dynamic Memory Component Architecture
[ https://issues.apache.org/jira/browse/ASTERIXDB-2715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2715. - Resolution: Implemented > Dynamic Memory Component Architecture > - > > Key: ASTERIXDB-2715 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2715 > Project: Apache AsterixDB > Issue Type: Improvement > Components: STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > AsterixDB uses a static memory component management architecture by dividing > the write memory budget evenly to the active datasets. This leads to low > memory utilization and cannot support a large number of active datasets > efficiently. To address this problem, we introduce a dynamic memory memory > component architecture, which has the following design decisions: > * All write memory pages are managed via a global virtual buffer cache > (global VBC). Each memory component simply requests pages from this global > VBC upon writes and return pages unpon flushes. Thus, memory allocation is > fully dynamic and on-demand and there is no need for pre-allocating write > memory. > * The global VBC keeps track of the list of the primary LSM-trees across all > partitions. Whenever the write memory is nearly full, it selects one primary > LSM-tree and flushes it as well as its secondary indexes to disk. Currently > we only flush one LSM-tree partition at a time. By doing so, the reclaimed > memory can be used by other components and this in turns increases the memory > utilization. > * For datasets with filters, using large memory components may hurt query > performance. Thus, we additionally introduce a parameter to control the > maximum memory component size for filtered datasets. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ASTERIXDB-2755) Test failure in MultiPartitionLSMIndexTest.testAllocateWhileFlushIsScheduled_0
[ https://issues.apache.org/jira/browse/ASTERIXDB-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2755. - Resolution: Fixed > Test failure in MultiPartitionLSMIndexTest.testAllocateWhileFlushIsScheduled_0 > -- > > Key: ASTERIXDB-2755 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2755 > Project: Apache AsterixDB > Issue Type: Bug > Components: STO - Storage >Reporter: Ian Maxon >Assignee: Chen Luo >Priority: Major > Attachments: info.log.gz > > > Failed in the asterix-gerrit-asterix-app-openjdk11 job. I think the patch it > failed on shouldn't have effected any of this. Details below & logs attached: > Error Message > expected: but was:<[1,1]> > Stacktrace > java.lang.AssertionError: expected: but was:<[1,1]> > at > org.apache.asterix.test.dataflow.MultiPartitionLSMIndexTest.testAllocateWhileFlushIsScheduled(MultiPartitionLSMIndexTest.java:380) > Standard Output > Proceed to flush > Standard Error > java.lang.AssertionError: expected: but was:<[1,1]> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:120) > at org.junit.Assert.assertEquals(Assert.java:146) > at > org.apache.asterix.test.dataflow.MultiPartitionLSMIndexTest.testAllocateWhileFlushIsScheduled(MultiPartitionLSMIndexTest.java:376) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at org.junit.runners.Suite.runChild(Suite.java:128) > at org.junit.runners.Suite.runChild(Suite.java:27) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:379) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:340) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:
[jira] [Resolved] (ASTERIXDB-2786) Intermittent failure in GlobalVirtualBufferCacheTest.testFlushes
[ https://issues.apache.org/jira/browse/ASTERIXDB-2786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2786. - Resolution: Fixed > Intermittent failure in GlobalVirtualBufferCacheTest.testFlushes > > > Key: ASTERIXDB-2786 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2786 > Project: Apache AsterixDB > Issue Type: Bug > Components: STO - Storage >Affects Versions: 0.9.6 >Reporter: Murtadha Makki Al Hubail >Assignee: Chen Luo >Priority: Major > Fix For: 0.9.6 > > > GlobalVirtualBufferCacheTest.testFlushes is intermittently failing with: > {noformat} > 20:02:27.531 [main] ExecutionTestUtil - Starting setup > 20:02:27.537 [main] ExecutionTestUtil - initializing pseudo cluster > 20:02:30.760 [main] ExecutionTestUtil - initializing HDFS > 20:02:37.559 [main] GlobalVirtualBufferCacheTest - > java.util.ConcurrentModificationException > 20:02:37.564 [main] GlobalVirtualBufferCacheTest - HYR0105: Cannot drop > in-use index (ds) > java.lang.AssertionError > at > org.apache.asterix.test.dataflow.GlobalVirtualBufferCacheTest.testFlushes(GlobalVirtualBufferCacheTest.java:171) > {noformat} > and > {noformat} > ava.lang.AssertionError: HYR0105: Cannot drop in-use index (ds) > at > org.apache.asterix.test.dataflow.GlobalVirtualBufferCacheTest.deinitializeTest(GlobalVirtualBufferCacheTest.java:131) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ASTERIXDB-2784) Join memory requirement for large objects
[ https://issues.apache.org/jira/browse/ASTERIXDB-2784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo reassigned ASTERIXDB-2784: --- Assignee: Shiva Jahangiri > Join memory requirement for large objects > - > > Key: ASTERIXDB-2784 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2784 > Project: Apache AsterixDB > Issue Type: Improvement > Components: COMP - Compiler, RT - Runtime >Reporter: Chen Luo >Assignee: Shiva Jahangiri >Priority: Major > > Currently the compiler assumes the minimum number of join frames is 5 [1]. > However, this does not guarantee a join will always succeed in case of large > objects. The actual join memory requirement is actually MAX(5, #partitions * > #large object size). The reason is that in the spill policy [2], we only > spill a partition if it hasn't been spilled before. As a result, when we are > writing to an empty partition, it is possible that each of other partitions > has one large object (which could be larger than the frame size) but no > partition can be spilled. Thus, the join memory requirement becomes > #partitions * #large object size in this case. > [1] > [https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29)|https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29).] > [2] > https://github.com/apache/asterixdb/blob/37dfed60fb47afcc86de6d17704a8f100217057d/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java#L55 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ASTERIXDB-2783) Serious Hash Collision in Hash Join/Groupby
[ https://issues.apache.org/jira/browse/ASTERIXDB-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2783. - Resolution: Fixed > Serious Hash Collision in Hash Join/Groupby > --- > > Key: ASTERIXDB-2783 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2783 > Project: Apache AsterixDB > Issue Type: Bug > Components: RT - Runtime >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Critical > > The current implementation of Hash Join/Groupby suffers from a serious hash > collision problem. In these two operators, we first used the hash exchange > operator to assign each key to an NC partition (hash1(key)%P, where P is the > number of partitions), and then build a hash table at each NC partition > (hash2(key)%N, where N is the hash table size). However, our implementation > currently uses the same hash function for both steps (hash1 == hash2). This > is simply incorrect and can lead to a lot of hash collisions. > To see this problem, consider what happens to NC partition 0. After hash > partitioning, for each key assigned to this partition, we know that > hash(key)%P == 0. Unless the greatest common divisor of P and N is 0, there > will be a lot hash collisions! For example, suppose P = 16 and N is a > multiple of 4. Since hash(key) is a multiple of 16, we know for sure that > hash(key)%N can be multiples of 4 as well! This implies that all slots that > are multiples of 1,2,3 will always be empty, but all entries will be inserted > into slots that are multiples of 4. > > To fix this problem, we can simply use a different hash function for hash > join/groupby. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2784) Join memory requirement for large objects
Chen Luo created ASTERIXDB-2784: --- Summary: Join memory requirement for large objects Key: ASTERIXDB-2784 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2784 Project: Apache AsterixDB Issue Type: Improvement Components: COMP - Compiler, RT - Runtime Reporter: Chen Luo Currently the compiler assumes the minimum number of join frames is 5 [1]. However, this does not guarantee a join will always succeed in case of large objects. The actual join memory requirement is actually MAX(5, #partitions * #large object size). The reason is that in the spill policy [2], we only spill a partition if it hasn't been spilled before. As a result, when we are writing to an empty partition, it is possible that each of other partitions has one large object (which could be larger than the frame size) but no partition can be spilled. Thus, the join memory requirement becomes #partitions * #large object size in this case. [1] [https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29)|https://github.com/apache/asterixdb/blob/master/hyracks-fullstack/algebricks/algebricks-core/src/main/java/org/apache/hyracks/algebricks/core/algebra/operators/physical/AbstractJoinPOperator.java#L29).] [2] https://github.com/apache/asterixdb/blob/37dfed60fb47afcc86de6d17704a8f100217057d/hyracks-fullstack/hyracks/hyracks-dataflow-std/src/main/java/org/apache/hyracks/dataflow/std/buffermanager/PreferToSpillFullyOccupiedFramePolicy.java#L55 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2783) Serious Hash Collision in Hash Join/Groupby
Chen Luo created ASTERIXDB-2783: --- Summary: Serious Hash Collision in Hash Join/Groupby Key: ASTERIXDB-2783 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2783 Project: Apache AsterixDB Issue Type: Bug Components: RT - Runtime Reporter: Chen Luo Assignee: Chen Luo The current implementation of Hash Join/Groupby suffers from a serious hash collision problem. In these two operators, we first used the hash exchange operator to assign each key to an NC partition (hash1(key)%P, where P is the number of partitions), and then build a hash table at each NC partition (hash2(key)%N, where N is the hash table size). However, our implementation currently uses the same hash function for both steps (hash1 == hash2). This is simply incorrect and can lead to a lot of hash collisions. To see this problem, consider what happens to NC partition 0. After hash partitioning, for each key assigned to this partition, we know that hash(key)%P == 0. Unless the greatest common divisor of P and N is 0, there will be a lot hash collisions! For example, suppose P = 16 and N is a multiple of 4. Since hash(key) is a multiple of 16, we know for sure that hash(key)%N can be multiples of 4 as well! This implies that all slots that are multiples of 1,2,3 will always be empty, but all entries will be inserted into slots that are multiples of 4. To fix this problem, we can simply use a different hash function for hash join/groupby. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ASTERIXDB-2779) Join Condition Is Not Identified for TPC-H Q18
[ https://issues.apache.org/jira/browse/ASTERIXDB-2779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193962#comment-17193962 ] Chen Luo commented on ASTERIXDB-2779: - [~shivaj] This issue may be of your interest if you're running TPC-H workloads. > Join Condition Is Not Identified for TPC-H Q18 > -- > > Key: ASTERIXDB-2779 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2779 > Project: Apache AsterixDB > Issue Type: Bug > Components: COMP - Compiler >Reporter: Chen Luo >Priority: Major > Attachments: tpch_ddl.sql > > > The query optimizer fails to identify the join condition for TPC-H Q18, and > thus produced a query plan with a cartisian product. The DDLs are attached > below. > The original Q18 is as follows: > > {code:java} > use tpch; > WITH tmp AS > ( > SELECT l_orderkey l_orderkey, sum(l.l_quantity) t_sum_quantity > FROM LineItem AS l > GROUP BY l.l_orderkey as l_orderkey > )SELECT c_name c_name, c_custkey c_custkey, o_orderkey o_orderkey, >o_orderdate o_orderdate, o_totalprice o_totalprice, >sum(l.l_quantity) sum_quantity > FROM Customer c, > Orders o, > tmp t, > LineItem l > WHERE c.c_custkey = o.o_custkey AND o.o_orderkey = t.l_orderkey AND > t.t_sum_quantity > 300 > AND l.l_orderkey = t.l_orderkey > GROUP BY c.c_name AS c_name,c.c_custkey AS c_custkey, > o.o_orderkey AS o_orderkey,o.o_orderdate AS o_orderdate, > o.o_totalprice AS o_totalprice > ORDER BY o_totalprice DESC,o_orderdate > LIMIT 100 > ; > {code} > However, the query condition was correctly identified after Q18 is refactored > as follows: > > > {code:java} > use tpch; > WITH tmp AS > ( > SELECT l_orderkey, sum(l.l_quantity) t_sum_quantity > FROM LineItem AS l > GROUP BY l.l_orderkey as l_orderkey > HAVING sum(l.l_quantity)>300 > )SELECT c_name, c_custkey, o_orderkey, >o_orderdate, o_totalprice, >sum(l.l_quantity) sum_quantity > FROM Customer c JOIN Orders o ON c.c_custkey = o.o_custkey > JOIN tmp t ON o.o_orderkey = t.l_orderkey >JOIN LineItem l ON t.l_orderkey = l.l_orderkey > GROUP BY c.c_name AS c_name,c.c_custkey AS c_custkey, > o.o_orderkey AS o_orderkey,o.o_orderdate AS o_orderdate, > o.o_totalprice AS o_totalprice > ORDER BY o_totalprice DESC,o_orderdate > LIMIT 100 > ; > {code} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2779) Join Condition Is Not Identified for TPC-H Q18
Chen Luo created ASTERIXDB-2779: --- Summary: Join Condition Is Not Identified for TPC-H Q18 Key: ASTERIXDB-2779 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2779 Project: Apache AsterixDB Issue Type: Bug Components: COMP - Compiler Reporter: Chen Luo Attachments: tpch_ddl.sql The query optimizer fails to identify the join condition for TPC-H Q18, and thus produced a query plan with a cartisian product. The DDLs are attached below. The original Q18 is as follows: {code:java} use tpch; WITH tmp AS ( SELECT l_orderkey l_orderkey, sum(l.l_quantity) t_sum_quantity FROM LineItem AS l GROUP BY l.l_orderkey as l_orderkey )SELECT c_name c_name, c_custkey c_custkey, o_orderkey o_orderkey, o_orderdate o_orderdate, o_totalprice o_totalprice, sum(l.l_quantity) sum_quantity FROM Customer c, Orders o, tmp t, LineItem l WHERE c.c_custkey = o.o_custkey AND o.o_orderkey = t.l_orderkey AND t.t_sum_quantity > 300 AND l.l_orderkey = t.l_orderkey GROUP BY c.c_name AS c_name,c.c_custkey AS c_custkey, o.o_orderkey AS o_orderkey,o.o_orderdate AS o_orderdate, o.o_totalprice AS o_totalprice ORDER BY o_totalprice DESC,o_orderdate LIMIT 100 ; {code} However, the query condition was correctly identified after Q18 is refactored as follows: {code:java} use tpch; WITH tmp AS ( SELECT l_orderkey, sum(l.l_quantity) t_sum_quantity FROM LineItem AS l GROUP BY l.l_orderkey as l_orderkey HAVING sum(l.l_quantity)>300 )SELECT c_name, c_custkey, o_orderkey, o_orderdate, o_totalprice, sum(l.l_quantity) sum_quantity FROM Customer c JOIN Orders o ON c.c_custkey = o.o_custkey JOIN tmp t ON o.o_orderkey = t.l_orderkey JOIN LineItem l ON t.l_orderkey = l.l_orderkey GROUP BY c.c_name AS c_name,c.c_custkey AS c_custkey, o.o_orderkey AS o_orderkey,o.o_orderdate AS o_orderdate, o.o_totalprice AS o_totalprice ORDER BY o_totalprice DESC,o_orderdate LIMIT 100 ; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ASTERIXDB-2776) Transaction logs not truncated during rebalancing
[ https://issues.apache.org/jira/browse/ASTERIXDB-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2776: Description: During rebalancing, we upsert a lot of records into the new dataset, which also produce a lot of log records. However, the transaction log is never truncated on the metadata node since the rebalance operation itself is a metadata transaction. (was: During rebalancing, we upsert a lot of records into the new dataset, which also produce a lot of log records. However, the transaction log is never truncated because the storage still uses the LSN of the old dataset.) > Transaction logs not truncated during rebalancing > - > > Key: ASTERIXDB-2776 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2776 > Project: Apache AsterixDB > Issue Type: Wish > Components: CLUS - Cluster management, STO - Storage >Reporter: Chen Luo >Priority: Major > > During rebalancing, we upsert a lot of records into the new dataset, which > also produce a lot of log records. However, the transaction log is never > truncated on the metadata node since the rebalance operation itself is a > metadata transaction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ASTERIXDB-2776) Transaction logs not truncated during rebalancing
[ https://issues.apache.org/jira/browse/ASTERIXDB-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2776: Issue Type: Bug (was: Wish) > Transaction logs not truncated during rebalancing > - > > Key: ASTERIXDB-2776 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2776 > Project: Apache AsterixDB > Issue Type: Bug > Components: CLUS - Cluster management, STO - Storage >Reporter: Chen Luo >Priority: Major > > During rebalancing, we upsert a lot of records into the new dataset, which > also produce a lot of log records. However, the transaction log is never > truncated on the metadata node since the rebalance operation itself is a > metadata transaction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2776) Transaction logs not truncated during rebalancing
Chen Luo created ASTERIXDB-2776: --- Summary: Transaction logs not truncated during rebalancing Key: ASTERIXDB-2776 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2776 Project: Apache AsterixDB Issue Type: Wish Components: CLUS - Cluster management, STO - Storage Reporter: Chen Luo During rebalancing, we upsert a lot of records into the new dataset, which also produce a lot of log records. However, the transaction log is never truncated because the storage still uses the LSN of the old dataset. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ASTERIXDB-2766) fulltext index issues a bug.
[ https://issues.apache.org/jira/browse/ASTERIXDB-2766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164593#comment-17164593 ] Chen Luo commented on ASTERIXDB-2766: - We do not support building inverted index on non-fixed-size primary keys. The dataset has a primary key (measureId) with string type, which does not have fixed-size. The front-end should prevent creating the inverted index in the first place (but not sure why we didn't). > fulltext index issues a bug. > > > Key: ASTERIXDB-2766 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2766 > Project: Apache AsterixDB > Issue Type: Bug > Components: HYR - Hyracks >Affects Versions: 0.9.4.1 > Environment: Windows/Linux >Reporter: Wenhai Li >Assignee: Ian Maxon >Priority: Major > > Recently, we want to utilize AsterixDB's fulltext to search related records > based on a token. We have the following issues. The problem is quite strange, > # If we DID NOT load records into the dataset, we can not see the error. > # Once we load (even only one record) records, the following problem appears. > Did I generate wrong records? > > Best, > > problem: > Caused by: java.lang.ArithmeticException: / by zero > at > org.apache.hyracks.storage.am.lsm.invertedindex.ondisk.FixedSizeElementInvertedListCursor.setInvListInfo(FixedSizeElementInvertedListCursor.java:370) > ~[hyracks-storage-am-lsm-invertedindex-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] > at > org.apache.hyracks.storage.am.lsm.invertedindex.api.InvertedListCursor.doOpen(InvertedListCursor.java:55) > ~[hyracks-storage-am-lsm-invertedindex-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] > at > org.apache.hyracks.storage.common.EnforcedIndexCursor.open(EnforcedIndexCursor.java:54) > ~[hyracks-storage-common-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] > at > org.apache.hyracks.storage.am.lsm.invertedindex.ondisk.OnDiskInvertedIndex.openInvertedListCursor(OnDiskInvertedIndex.java:213) > ~[hyracks-storage-am-lsm-invertedindex-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] > at > org.apache.hyracks.storage.am.lsm.invertedindex.search.TOccurrenceSearcher.search(TOccurrenceSearcher.java:56) > ~[hyracks-storage-am-lsm-invertedindex-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] > at > org.apache.hyracks.storage.am.lsm.invertedindex.ondisk.OnDiskInvertedIndex$OnDiskInvertedIndexAccessor.search(OnDiskInvertedIndex.java:498) > ~[hyracks-storage-am-lsm-invertedindex-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] > at > org.apache.hyracks.storage.am.lsm.invertedindex.impls.LSMInvertedIndexSearchCursor.doHasNext(LSMInvertedIndexSearchCursor.java:162) > ~[hyracks-storage-am-lsm-invertedindex-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] > at > org.apache.hyracks.storage.common.EnforcedIndexCursor.hasNext(EnforcedIndexCursor.java:69) > ~[hyracks-storage-common-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] > at > org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.writeSearchResults(IndexSearchOperatorNodePushable.java:241) > ~[hyracks-storage-am-common-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] > at > org.apache.hyracks.storage.am.common.dataflow.IndexSearchOperatorNodePushable.nextFrame(IndexSearchOperatorNodePushable.java:290) > ~[hyracks-storage-am-common-0.3.4-SNAPSHOT.jar:0.3.4-SNAPSHOT] > > Schema: > USE Personicle; > DROP DATASET GeneralMeasurement IF EXISTS; > DROP TYPE GeneralMeasurementType IF EXISTS; > CREATE TYPE GeneralMeasurementType AS OPEN { > measureId: string, --primary key string for measurement, UUID > deviceId: string, > timestamp: bigint, > userName: string?, > beginAt: datetime?, > endAt: datetime?, > category: string, > attribute: string?, > activity: string?, > description: string? > }; > CREATE DATASET GeneralMeasurement(GeneralMeasurementType) PRIMARY KEY > measureId; > CREATE INDEX GeneralMeasurementDeviceIdIdx ON GeneralMeasurement(deviceId, > timestamp) type btree; > CREATE INDEX GeneralMeasurementAttributeIdx ON GeneralMeasurement(attribute) > type fulltext; > > USE Personicle; > load dataset GeneralMeasurement using localfs > (("path"="127.0.0.1:///f:/Work/Personicle/example/BigFoodLog.adm"),("input-format"="text-input-format"),("input-format"="text-input-format"),("format"="adm")); > > sampling record: > {"attribute":"1acc5da443f34eb1870f22873b8b489f","category":"foodlog","comments":"爱谷鸿 > ate 254.10932950875716g 矿泉水","description":"爱谷鸿 ate 254.10932950875716g > 矿泉水","deviceId":"c5223137c9284b649e0bf6bd0c37fe2f","endAt":datetime("2017-10-21T17:54:45"),"foodName":"矿泉水","latitude":22.300131276012458,"longitude":113.67809523288565,"measureId":"f59149a0cd834192b72f77c478fa2b40","preference_star":9,"startAt":datetime("2017-10-21T17:54:35"),"timestamp":1508579675000,"total_calories":417.7053493274758,"userName":"爱谷鸿","weight":254.10932950875716} > > query: > > USE Personicle
[jira] [Created] (ASTERIXDB-2765) Indentify Secondary Index-Only Plans without WHERE conditions
Chen Luo created ASTERIXDB-2765: --- Summary: Indentify Secondary Index-Only Plans without WHERE conditions Key: ASTERIXDB-2765 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2765 Project: Apache AsterixDB Issue Type: Wish Reporter: Chen Luo Assignee: Dmitry Lychagin Currently, a secondary index is only picked if its secondary key is used in the WHERE clause. This limits the applicability of secondary index plans. For example, consider the following dataset User(+id+, name, salary) with a secondary index on salary. The following query {code:java} select max(salary) from User {code} will not pick the secondary index, even though a secondary index-only plan is appcalible. This query is equivalent to the following query (assuming all salaries are non-negative), where the secondary index will be picked. {code:java} select max(salary) from User where salary >= 0 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2764) Support Secondary Index Compression
Chen Luo created ASTERIXDB-2764: --- Summary: Support Secondary Index Compression Key: ASTERIXDB-2764 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2764 Project: Apache AsterixDB Issue Type: Wish Components: IDX - Indexes, STO - Storage Reporter: Chen Luo Currently only the primary indexes are compressed while secondary indexes are not. The secondary indexes can become quite big, especially with covering indexes for index-only plans. It would be useful to compress secondary indexes as well to reduce storage space and improve query performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2763) Support Included Fields in Secondary Indexes
Chen Luo created ASTERIXDB-2763: --- Summary: Support Included Fields in Secondary Indexes Key: ASTERIXDB-2763 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2763 Project: Apache AsterixDB Issue Type: Wish Components: IDX - Indexes Reporter: Chen Luo Assignee: Dmitry Lychagin Currently, the secondary indexes in AsterixDB does not supported included fields, which are not part of the secondary keys but also included in the secondary index. This feature will be helpful for users to build covering indexes to better exploit index-only plans. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ASTERIXDB-2761) Join keys not sorted in index nested loop join
[ https://issues.apache.org/jira/browse/ASTERIXDB-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2761: Attachment: plan > Join keys not sorted in index nested loop join > -- > > Key: ASTERIXDB-2761 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2761 > Project: Apache AsterixDB > Issue Type: Bug > Components: COMP - Compiler >Reporter: Chen Luo >Priority: Minor > Attachments: plan, tpch_ddl.sql > > > When testing a query (TPC-H Q3), I found out that sometimes the join keys are > not sorted if the index-nested loop join was choosen. The DDLs are attached > below. > The example query: > {code:sql} > use tpch; > SELECT l_orderkey AS l_orderkey, >sum(l.l_extendedprice * (1 - l.l_discount)) AS revenue, >o_orderdate AS o_orderdate, >o_shippriority AS o_shippriority > FROM Customer AS c, > Orders AS o, > LineItem AS l > where c.c_mktsegment = 'BUILDING' AND c.c_custkey = o.o_custkey > AND o.o_orderkey /*+ indexnl */= l.l_orderkey > AND o.o_orderdate < '1995-03-15' AND l.l_shipdate > '1995-03-15' > /* +hash */ > GROUP BY l.l_orderkey AS l_orderkey, > o.o_orderdate AS o_orderdate, > o.o_shippriority AS o_shippriority > ORDER BY revenue DESC,o_orderdate > LIMIT 10; > {code} > The optimized query plan is attached below. The LineItem table was searched > directly using o_orderkeys without sorting them first. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ASTERIXDB-2761) Join keys not sorted in index nested loop join
[ https://issues.apache.org/jira/browse/ASTERIXDB-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2761: Description: When testing a query (TPC-H Q3), I found out that sometimes the join keys are not sorted if the index-nested loop join was choosen. The DDLs are attached below. The example query: {code:sql} use tpch; SELECT l_orderkey AS l_orderkey, sum(l.l_extendedprice * (1 - l.l_discount)) AS revenue, o_orderdate AS o_orderdate, o_shippriority AS o_shippriority FROM Customer AS c, Orders AS o, LineItem AS l where c.c_mktsegment = 'BUILDING' AND c.c_custkey = o.o_custkey AND o.o_orderkey /*+ indexnl */= l.l_orderkey AND o.o_orderdate < '1995-03-15' AND l.l_shipdate > '1995-03-15' /* +hash */ GROUP BY l.l_orderkey AS l_orderkey, o.o_orderdate AS o_orderdate, o.o_shippriority AS o_shippriority ORDER BY revenue DESC,o_orderdate LIMIT 10; {code} The optimized query plan is attached below. The LineItem table was searched directly using o_orderkeys without sorting them first. was: When testing a query (TPC-H Q3), I found out that sometimes the join keys are not sorted if the index-nested loop join was choosen. The DDLs are attached below. The example query: {code:sql} use tpch; SELECT l_orderkey AS l_orderkey, sum(l.l_extendedprice * (1 - l.l_discount)) AS revenue, o_orderdate AS o_orderdate, o_shippriority AS o_shippriority FROM Customer AS c, Orders AS o, LineItem AS l where c.c_mktsegment = 'BUILDING' AND c.c_custkey = o.o_custkey AND o.o_orderkey /*+ indexnl */= l.l_orderkey AND o.o_orderdate < '1995-03-15' AND l.l_shipdate > '1995-03-15' /* +hash */ GROUP BY l.l_orderkey AS l_orderkey, o.o_orderdate AS o_orderdate, o.o_shippriority AS o_shippriority ORDER BY revenue DESC,o_orderdate LIMIT 10; {code} The optimized query plan is attached below. The LineItem table was searched directly using o_orderkeys without sorting them first. {code} distribute result [$$105] -- DISTRIBUTE_RESULT |UNPARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED| limit 10 -- STREAM_LIMIT |UNPARTITIONED| project ([$$105]) -- STREAM_PROJECT |PARTITIONED| assign [$$105] <- [{"l_orderkey": $$l_orderkey, "revenue": $$117, "o_orderdate": $$o_orderdate, "o_shippriority": $$o_shippriority}] -- ASSIGN |PARTITIONED| exchange -- SORT_MERGE_EXCHANGE [$$117(DESC), $$o_orderdate(ASC) ] |PARTITIONED| limit 10 -- STREAM_LIMIT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| order (topK: 10) (DESC, $$117) (ASC, $$o_orderdate) -- STABLE_SORT [topK: 10] [$$117(DESC), $$o_orderdate(ASC)] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| group by ([$$l_orderkey := $$125; $$o_orderdate := $$126; $$o_shippriority := $$127]) decor ([]) { aggregate [$$117] <- [global-sql-sum-serial($$124)] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$125, $$126, $$127] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$125, $$126, $$127] |PARTITIONED| group by ([$$125 := $$113; $$126 := $$110; $$127 := $$108]) decor ([]) { aggregate [$$124] <- [local-sql-sum-serial(numeric-multiply($$122, numeric-subtract(1, $$123)))] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$113, $$110, $$108] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$122, $$123, $$113, $$110, $$108]) -- STREAM_PROJECT |PARTITIONED| select (gt($$l.getField(10), "1995-03-15")) -- STREAM_SELECT |PARTITIONED| assign [$$123, $$122] <- [$$l.getField(6), $$l.getField(5)] -- ASSIGN |PARTITIONED| project ([$$108, $$110, $$113, $$l]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| unnest-map [$$113, $$114, $$l] <- index-search("LineItem", 0, "tpch", "LineItem", TRUE, T
[jira] [Updated] (ASTERIXDB-2761) Join keys not sorted in index nested loop join
[ https://issues.apache.org/jira/browse/ASTERIXDB-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2761: Description: When testing a query (TPC-H Q3), I found out that sometimes the join keys are not sorted if the index-nested loop join was choosen. The DDLs are attached below. {code:sql} The example query: use tpch; SELECT l_orderkey AS l_orderkey, sum(l.l_extendedprice * (1 - l.l_discount)) AS revenue, o_orderdate AS o_orderdate, o_shippriority AS o_shippriority FROM Customer AS c, Orders AS o, LineItem AS l where c.c_mktsegment = 'BUILDING' AND c.c_custkey = o.o_custkey AND {color:red}o.o_orderkey /*+ indexnl */= l.l_orderkey {color} AND o.o_orderdate < '1995-03-15' AND l.l_shipdate > '1995-03-15' /* +hash */ GROUP BY l.l_orderkey AS l_orderkey, o.o_orderdate AS o_orderdate, o.o_shippriority AS o_shippriority ORDER BY revenue DESC,o_orderdate LIMIT 10; {code} The optimized query plan is attached below. The LineItem table was searched directly using o_orderkeys without sorting them first. {code} distribute result [$$105] -- DISTRIBUTE_RESULT |UNPARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED| limit 10 -- STREAM_LIMIT |UNPARTITIONED| project ([$$105]) -- STREAM_PROJECT |PARTITIONED| assign [$$105] <- [{"l_orderkey": $$l_orderkey, "revenue": $$117, "o_orderdate": $$o_orderdate, "o_shippriority": $$o_shippriority}] -- ASSIGN |PARTITIONED| exchange -- SORT_MERGE_EXCHANGE [$$117(DESC), $$o_orderdate(ASC) ] |PARTITIONED| limit 10 -- STREAM_LIMIT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| order (topK: 10) (DESC, $$117) (ASC, $$o_orderdate) -- STABLE_SORT [topK: 10] [$$117(DESC), $$o_orderdate(ASC)] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| group by ([$$l_orderkey := $$125; $$o_orderdate := $$126; $$o_shippriority := $$127]) decor ([]) { aggregate [$$117] <- [global-sql-sum-serial($$124)] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$125, $$126, $$127] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$125, $$126, $$127] |PARTITIONED| group by ([$$125 := $$113; $$126 := $$110; $$127 := $$108]) decor ([]) { aggregate [$$124] <- [local-sql-sum-serial(numeric-multiply($$122, numeric-subtract(1, $$123)))] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$113, $$110, $$108] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$122, $$123, $$113, $$110, $$108]) -- STREAM_PROJECT |PARTITIONED| select (gt($$l.getField(10), "1995-03-15")) -- STREAM_SELECT |PARTITIONED| assign [$$123, $$122] <- [$$l.getField(6), $$l.getField(5)] -- ASSIGN |PARTITIONED| project ([$$108, $$110, $$113, $$l]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| unnest-map [$$113, $$114, $$l] <- index-search("LineItem", 0, "tpch", "LineItem", TRUE, TRUE, 1, $$112, 1, $$112, TRUE, TRUE, TRUE) -- BTREE_SEARCH |PARTITIONED| exchange -- BROADCAST_EXCHANGE |PARTITIONED| project ([$$112, $$108, $$110]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| join (eq($$111, $$119)) -- HYBRID_HASH_JOIN [$$111][$$119] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$111] |PARTITIONED| project ([$$111])
[jira] [Updated] (ASTERIXDB-2761) Join keys not sorted in index nested loop join
[ https://issues.apache.org/jira/browse/ASTERIXDB-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2761: Description: When testing a query (TPC-H Q3), I found out that sometimes the join keys are not sorted if the index-nested loop join was choosen. The DDLs are attached below. The example query: {code:sql} use tpch; SELECT l_orderkey AS l_orderkey, sum(l.l_extendedprice * (1 - l.l_discount)) AS revenue, o_orderdate AS o_orderdate, o_shippriority AS o_shippriority FROM Customer AS c, Orders AS o, LineItem AS l where c.c_mktsegment = 'BUILDING' AND c.c_custkey = o.o_custkey AND o.o_orderkey /*+ indexnl */= l.l_orderkey AND o.o_orderdate < '1995-03-15' AND l.l_shipdate > '1995-03-15' /* +hash */ GROUP BY l.l_orderkey AS l_orderkey, o.o_orderdate AS o_orderdate, o.o_shippriority AS o_shippriority ORDER BY revenue DESC,o_orderdate LIMIT 10; {code} The optimized query plan is attached below. The LineItem table was searched directly using o_orderkeys without sorting them first. {code} distribute result [$$105] -- DISTRIBUTE_RESULT |UNPARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED| limit 10 -- STREAM_LIMIT |UNPARTITIONED| project ([$$105]) -- STREAM_PROJECT |PARTITIONED| assign [$$105] <- [{"l_orderkey": $$l_orderkey, "revenue": $$117, "o_orderdate": $$o_orderdate, "o_shippriority": $$o_shippriority}] -- ASSIGN |PARTITIONED| exchange -- SORT_MERGE_EXCHANGE [$$117(DESC), $$o_orderdate(ASC) ] |PARTITIONED| limit 10 -- STREAM_LIMIT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| order (topK: 10) (DESC, $$117) (ASC, $$o_orderdate) -- STABLE_SORT [topK: 10] [$$117(DESC), $$o_orderdate(ASC)] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| group by ([$$l_orderkey := $$125; $$o_orderdate := $$126; $$o_shippriority := $$127]) decor ([]) { aggregate [$$117] <- [global-sql-sum-serial($$124)] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$125, $$126, $$127] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$125, $$126, $$127] |PARTITIONED| group by ([$$125 := $$113; $$126 := $$110; $$127 := $$108]) decor ([]) { aggregate [$$124] <- [local-sql-sum-serial(numeric-multiply($$122, numeric-subtract(1, $$123)))] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$113, $$110, $$108] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$122, $$123, $$113, $$110, $$108]) -- STREAM_PROJECT |PARTITIONED| select (gt($$l.getField(10), "1995-03-15")) -- STREAM_SELECT |PARTITIONED| assign [$$123, $$122] <- [$$l.getField(6), $$l.getField(5)] -- ASSIGN |PARTITIONED| project ([$$108, $$110, $$113, $$l]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| unnest-map [$$113, $$114, $$l] <- index-search("LineItem", 0, "tpch", "LineItem", TRUE, TRUE, 1, $$112, 1, $$112, TRUE, TRUE, TRUE) -- BTREE_SEARCH |PARTITIONED| exchange -- BROADCAST_EXCHANGE |PARTITIONED| project ([$$112, $$108, $$110]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| join (eq($$111, $$119)) -- HYBRID_HASH_JOIN [$$111][$$119] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$111] |PARTITIONED| project ([$$111])
[jira] [Updated] (ASTERIXDB-2761) Join keys not sorted in index nested loop join
[ https://issues.apache.org/jira/browse/ASTERIXDB-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2761: Description: When testing a query (TPC-H Q3), I found out that sometimes the join keys are not sorted if the index-nested loop join was choosen. The DDLs are attached below. The example query: {code:sql} use tpch; SELECT l_orderkey AS l_orderkey, sum(l.l_extendedprice * (1 - l.l_discount)) AS revenue, o_orderdate AS o_orderdate, o_shippriority AS o_shippriority FROM Customer AS c, Orders AS o, LineItem AS l where c.c_mktsegment = 'BUILDING' AND c.c_custkey = o.o_custkey AND {color:red}o.o_orderkey /*+ indexnl */= l.l_orderkey {color} AND o.o_orderdate < '1995-03-15' AND l.l_shipdate > '1995-03-15' /* +hash */ GROUP BY l.l_orderkey AS l_orderkey, o.o_orderdate AS o_orderdate, o.o_shippriority AS o_shippriority ORDER BY revenue DESC,o_orderdate LIMIT 10; {code} The optimized query plan is attached below. The LineItem table was searched directly using o_orderkeys without sorting them first. {code} distribute result [$$105] -- DISTRIBUTE_RESULT |UNPARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED| limit 10 -- STREAM_LIMIT |UNPARTITIONED| project ([$$105]) -- STREAM_PROJECT |PARTITIONED| assign [$$105] <- [{"l_orderkey": $$l_orderkey, "revenue": $$117, "o_orderdate": $$o_orderdate, "o_shippriority": $$o_shippriority}] -- ASSIGN |PARTITIONED| exchange -- SORT_MERGE_EXCHANGE [$$117(DESC), $$o_orderdate(ASC) ] |PARTITIONED| limit 10 -- STREAM_LIMIT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| order (topK: 10) (DESC, $$117) (ASC, $$o_orderdate) -- STABLE_SORT [topK: 10] [$$117(DESC), $$o_orderdate(ASC)] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| group by ([$$l_orderkey := $$125; $$o_orderdate := $$126; $$o_shippriority := $$127]) decor ([]) { aggregate [$$117] <- [global-sql-sum-serial($$124)] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$125, $$126, $$127] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$125, $$126, $$127] |PARTITIONED| group by ([$$125 := $$113; $$126 := $$110; $$127 := $$108]) decor ([]) { aggregate [$$124] <- [local-sql-sum-serial(numeric-multiply($$122, numeric-subtract(1, $$123)))] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$113, $$110, $$108] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$122, $$123, $$113, $$110, $$108]) -- STREAM_PROJECT |PARTITIONED| select (gt($$l.getField(10), "1995-03-15")) -- STREAM_SELECT |PARTITIONED| assign [$$123, $$122] <- [$$l.getField(6), $$l.getField(5)] -- ASSIGN |PARTITIONED| project ([$$108, $$110, $$113, $$l]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| unnest-map [$$113, $$114, $$l] <- index-search("LineItem", 0, "tpch", "LineItem", TRUE, TRUE, 1, $$112, 1, $$112, TRUE, TRUE, TRUE) -- BTREE_SEARCH |PARTITIONED| exchange -- BROADCAST_EXCHANGE |PARTITIONED| project ([$$112, $$108, $$110]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| join (eq($$111, $$119)) -- HYBRID_HASH_JOIN [$$111][$$119] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$111] |PARTITIONED| project ([$$111])
[jira] [Updated] (ASTERIXDB-2761) Join keys not sorted in index nested loop join
[ https://issues.apache.org/jira/browse/ASTERIXDB-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2761: Description: When testing a query (TPC-H Q3), I found out that sometimes the join keys are not sorted if the index-nested loop join was choosen. The DDLs are attached below. The example query: use tpch; SELECT l_orderkey AS l_orderkey, sum(l.l_extendedprice * (1 - l.l_discount)) AS revenue, o_orderdate AS o_orderdate, o_shippriority AS o_shippriority FROM Customer AS c, Orders AS o, LineItem AS l where c.c_mktsegment = 'BUILDING' AND c.c_custkey = o.o_custkey AND {color:red}o.o_orderkey /*+ indexnl */= l.l_orderkey {color} AND o.o_orderdate < '1995-03-15' AND l.l_shipdate > '1995-03-15' /* +hash */ GROUP BY l.l_orderkey AS l_orderkey, o.o_orderdate AS o_orderdate, o.o_shippriority AS o_shippriority ORDER BY revenue DESC,o_orderdate LIMIT 10; } The optimized query plan is attached below. The LineItem table was searched directly using o_orderkeys without sorting them first. { distribute result [$$105] -- DISTRIBUTE_RESULT |UNPARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED| limit 10 -- STREAM_LIMIT |UNPARTITIONED| project ([$$105]) -- STREAM_PROJECT |PARTITIONED| assign [$$105] <- [{"l_orderkey": $$l_orderkey, "revenue": $$117, "o_orderdate": $$o_orderdate, "o_shippriority": $$o_shippriority}] -- ASSIGN |PARTITIONED| exchange -- SORT_MERGE_EXCHANGE [$$117(DESC), $$o_orderdate(ASC) ] |PARTITIONED| limit 10 -- STREAM_LIMIT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| order (topK: 10) (DESC, $$117) (ASC, $$o_orderdate) -- STABLE_SORT [topK: 10] [$$117(DESC), $$o_orderdate(ASC)] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| group by ([$$l_orderkey := $$125; $$o_orderdate := $$126; $$o_shippriority := $$127]) decor ([]) { aggregate [$$117] <- [global-sql-sum-serial($$124)] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$125, $$126, $$127] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$125, $$126, $$127] |PARTITIONED| group by ([$$125 := $$113; $$126 := $$110; $$127 := $$108]) decor ([]) { aggregate [$$124] <- [local-sql-sum-serial(numeric-multiply($$122, numeric-subtract(1, $$123)))] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$113, $$110, $$108] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$122, $$123, $$113, $$110, $$108]) -- STREAM_PROJECT |PARTITIONED| select (gt($$l.getField(10), "1995-03-15")) -- STREAM_SELECT |PARTITIONED| assign [$$123, $$122] <- [$$l.getField(6), $$l.getField(5)] -- ASSIGN |PARTITIONED| project ([$$108, $$110, $$113, $$l]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| unnest-map [$$113, $$114, $$l] <- index-search("LineItem", 0, "tpch", "LineItem", TRUE, TRUE, 1, $$112, 1, $$112, TRUE, TRUE, TRUE) -- BTREE_SEARCH |PARTITIONED| exchange -- BROADCAST_EXCHANGE |PARTITIONED| project ([$$112, $$108, $$110]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| join (eq($$111, $$119)) -- HYBRID_HASH_JOIN [$$111][$$119] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$111] |PARTITIONED| project ([$$111])
[jira] [Updated] (ASTERIXDB-2761) Join keys not sorted in index nested loop join
[ https://issues.apache.org/jira/browse/ASTERIXDB-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2761: Description: When testing a query (TPC-H Q3), I found out that sometimes the join keys are not sorted if the index-nested loop join was choosen. The DDLs are attached below. {code:sql} The example query: use tpch; SELECT l_orderkey AS l_orderkey, sum(l.l_extendedprice * (1 - l.l_discount)) AS revenue, o_orderdate AS o_orderdate, o_shippriority AS o_shippriority FROM Customer AS c, Orders AS o, LineItem AS l where c.c_mktsegment = 'BUILDING' AND c.c_custkey = o.o_custkey AND {color:red}o.o_orderkey /*+ indexnl */= l.l_orderkey {color} AND o.o_orderdate < '1995-03-15' AND l.l_shipdate > '1995-03-15' /* +hash */ GROUP BY l.l_orderkey AS l_orderkey, o.o_orderdate AS o_orderdate, o.o_shippriority AS o_shippriority ORDER BY revenue DESC,o_orderdate LIMIT 10; } {code} The optimized query plan is attached below. The LineItem table was searched directly using o_orderkeys without sorting them first. {code} distribute result [$$105] -- DISTRIBUTE_RESULT |UNPARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED| limit 10 -- STREAM_LIMIT |UNPARTITIONED| project ([$$105]) -- STREAM_PROJECT |PARTITIONED| assign [$$105] <- [{"l_orderkey": $$l_orderkey, "revenue": $$117, "o_orderdate": $$o_orderdate, "o_shippriority": $$o_shippriority}] -- ASSIGN |PARTITIONED| exchange -- SORT_MERGE_EXCHANGE [$$117(DESC), $$o_orderdate(ASC) ] |PARTITIONED| limit 10 -- STREAM_LIMIT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| order (topK: 10) (DESC, $$117) (ASC, $$o_orderdate) -- STABLE_SORT [topK: 10] [$$117(DESC), $$o_orderdate(ASC)] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| group by ([$$l_orderkey := $$125; $$o_orderdate := $$126; $$o_shippriority := $$127]) decor ([]) { aggregate [$$117] <- [global-sql-sum-serial($$124)] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$125, $$126, $$127] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$125, $$126, $$127] |PARTITIONED| group by ([$$125 := $$113; $$126 := $$110; $$127 := $$108]) decor ([]) { aggregate [$$124] <- [local-sql-sum-serial(numeric-multiply($$122, numeric-subtract(1, $$123)))] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$113, $$110, $$108] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$122, $$123, $$113, $$110, $$108]) -- STREAM_PROJECT |PARTITIONED| select (gt($$l.getField(10), "1995-03-15")) -- STREAM_SELECT |PARTITIONED| assign [$$123, $$122] <- [$$l.getField(6), $$l.getField(5)] -- ASSIGN |PARTITIONED| project ([$$108, $$110, $$113, $$l]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| unnest-map [$$113, $$114, $$l] <- index-search("LineItem", 0, "tpch", "LineItem", TRUE, TRUE, 1, $$112, 1, $$112, TRUE, TRUE, TRUE) -- BTREE_SEARCH |PARTITIONED| exchange -- BROADCAST_EXCHANGE |PARTITIONED| project ([$$112, $$108, $$110]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| join (eq($$111, $$119)) -- HYBRID_HASH_JOIN [$$111][$$119] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$111] |PARTITIONED| project ([$$111])
[jira] [Updated] (ASTERIXDB-2761) Join keys not sorted in index nested loop join
[ https://issues.apache.org/jira/browse/ASTERIXDB-2761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2761: Description: When testing a query (TPC-H Q3), I found out that sometimes the join keys are not sorted if the index-nested loop join was choosen. The DDLs are attached below. The example query: { use tpch; SELECT l_orderkey AS l_orderkey, sum(l.l_extendedprice * (1 - l.l_discount)) AS revenue, o_orderdate AS o_orderdate, o_shippriority AS o_shippriority FROM Customer AS c, Orders AS o, LineItem AS l where c.c_mktsegment = 'BUILDING' AND c.c_custkey = o.o_custkey AND {color:red}o.o_orderkey /*+ indexnl */= l.l_orderkey {color} AND o.o_orderdate < '1995-03-15' AND l.l_shipdate > '1995-03-15' /* +hash */ GROUP BY l.l_orderkey AS l_orderkey, o.o_orderdate AS o_orderdate, o.o_shippriority AS o_shippriority ORDER BY revenue DESC,o_orderdate LIMIT 10; } The optimized query plan is attached below. The LineItem table was searched directly using o_orderkeys without sorting them first. { distribute result [$$105] -- DISTRIBUTE_RESULT |UNPARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED| limit 10 -- STREAM_LIMIT |UNPARTITIONED| project ([$$105]) -- STREAM_PROJECT |PARTITIONED| assign [$$105] <- [{"l_orderkey": $$l_orderkey, "revenue": $$117, "o_orderdate": $$o_orderdate, "o_shippriority": $$o_shippriority}] -- ASSIGN |PARTITIONED| exchange -- SORT_MERGE_EXCHANGE [$$117(DESC), $$o_orderdate(ASC) ] |PARTITIONED| limit 10 -- STREAM_LIMIT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| order (topK: 10) (DESC, $$117) (ASC, $$o_orderdate) -- STABLE_SORT [topK: 10] [$$117(DESC), $$o_orderdate(ASC)] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| group by ([$$l_orderkey := $$125; $$o_orderdate := $$126; $$o_shippriority := $$127]) decor ([]) { aggregate [$$117] <- [global-sql-sum-serial($$124)] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$125, $$126, $$127] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$125, $$126, $$127] |PARTITIONED| group by ([$$125 := $$113; $$126 := $$110; $$127 := $$108]) decor ([]) { aggregate [$$124] <- [local-sql-sum-serial(numeric-multiply($$122, numeric-subtract(1, $$123)))] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$113, $$110, $$108] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$122, $$123, $$113, $$110, $$108]) -- STREAM_PROJECT |PARTITIONED| select (gt($$l.getField(10), "1995-03-15")) -- STREAM_SELECT |PARTITIONED| assign [$$123, $$122] <- [$$l.getField(6), $$l.getField(5)] -- ASSIGN |PARTITIONED| project ([$$108, $$110, $$113, $$l]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| unnest-map [$$113, $$114, $$l] <- index-search("LineItem", 0, "tpch", "LineItem", TRUE, TRUE, 1, $$112, 1, $$112, TRUE, TRUE, TRUE) -- BTREE_SEARCH |PARTITIONED| exchange -- BROADCAST_EXCHANGE |PARTITIONED| project ([$$112, $$108, $$110]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| join (eq($$111, $$119)) -- HYBRID_HASH_JOIN [$$111][$$119] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$111] |PARTITIONED| project ([$$111])
[jira] [Created] (ASTERIXDB-2761) Join keys not sorted in index nested loop join
Chen Luo created ASTERIXDB-2761: --- Summary: Join keys not sorted in index nested loop join Key: ASTERIXDB-2761 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2761 Project: Apache AsterixDB Issue Type: Bug Components: COMP - Compiler Reporter: Chen Luo Attachments: tpch_ddl.sql When testing a query (TPC-H Q3), I found out that sometimes the join keys are not sorted if the index-nested loop join was choosen. The DDLs are attached below. The example query: ``` use tpch; SELECT l_orderkey AS l_orderkey, sum(l.l_extendedprice * (1 - l.l_discount)) AS revenue, o_orderdate AS o_orderdate, o_shippriority AS o_shippriority FROM Customer AS c, Orders AS o, LineItem AS l where c.c_mktsegment = 'BUILDING' AND c.c_custkey = o.o_custkey AND {color:red}o.o_orderkey /*+ indexnl */= l.l_orderkey {color} AND o.o_orderdate < '1995-03-15' AND l.l_shipdate > '1995-03-15' /* +hash */ GROUP BY l.l_orderkey AS l_orderkey, o.o_orderdate AS o_orderdate, o.o_shippriority AS o_shippriority ORDER BY revenue DESC,o_orderdate LIMIT 10; ``` The optimized query plan is attached below. The LineItem table was searched directly using o_orderkeys without sorting them first. ``` distribute result [$$105] -- DISTRIBUTE_RESULT |UNPARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |UNPARTITIONED| limit 10 -- STREAM_LIMIT |UNPARTITIONED| project ([$$105]) -- STREAM_PROJECT |PARTITIONED| assign [$$105] <- [{"l_orderkey": $$l_orderkey, "revenue": $$117, "o_orderdate": $$o_orderdate, "o_shippriority": $$o_shippriority}] -- ASSIGN |PARTITIONED| exchange -- SORT_MERGE_EXCHANGE [$$117(DESC), $$o_orderdate(ASC) ] |PARTITIONED| limit 10 -- STREAM_LIMIT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| order (topK: 10) (DESC, $$117) (ASC, $$o_orderdate) -- STABLE_SORT [topK: 10] [$$117(DESC), $$o_orderdate(ASC)] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| group by ([$$l_orderkey := $$125; $$o_orderdate := $$126; $$o_shippriority := $$127]) decor ([]) { aggregate [$$117] <- [global-sql-sum-serial($$124)] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$125, $$126, $$127] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$125, $$126, $$127] |PARTITIONED| group by ([$$125 := $$113; $$126 := $$110; $$127 := $$108]) decor ([]) { aggregate [$$124] <- [local-sql-sum-serial(numeric-multiply($$122, numeric-subtract(1, $$123)))] -- AGGREGATE |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- EXTERNAL_GROUP_BY[$$113, $$110, $$108] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$122, $$123, $$113, $$110, $$108]) -- STREAM_PROJECT |PARTITIONED| select (gt($$l.getField(10), "1995-03-15")) -- STREAM_SELECT |PARTITIONED| assign [$$123, $$122] <- [$$l.getField(6), $$l.getField(5)] -- ASSIGN |PARTITIONED| project ([$$108, $$110, $$113, $$l]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| unnest-map [$$113, $$114, $$l] <- index-search("LineItem", 0, "tpch", "LineItem", TRUE, TRUE, 1, $$112, 1, $$112, TRUE, TRUE, TRUE) -- BTREE_SEARCH |PARTITIONED| exchange -- BROADCAST_EXCHANGE |PARTITIONED| project ([$$112, $$108, $$110]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| join (eq($$111, $$119)) -- HYBRID_HASH_JOIN [$$111][$$119] |PARTITIONED|
[jira] [Commented] (ASTERIXDB-2756) NullPointerException in DatasetResourceReference.parse()
[ https://issues.apache.org/jira/browse/ASTERIXDB-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152930#comment-17152930 ] Chen Luo commented on ASTERIXDB-2756: - Thanks! I think we should simply skip replicaiton, if the dataset resource is already gone. I assigned the issue to you since you may have a better idea of how to handle it. > NullPointerException in DatasetResourceReference.parse() > > > Key: ASTERIXDB-2756 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2756 > Project: Apache AsterixDB > Issue Type: Bug > Components: STO - Storage >Affects Versions: 0.9.5 >Reporter: Murtadha Makki Al Hubail >Assignee: Murtadha Makki Al Hubail >Priority: Major > Fix For: 0.9.5 > > > The following NPE was encountered during a test that creates 10 datasets then > ingestion some data then create another 10 datasets and ingestion more data: > {noformat} > java.lang.NullPointerException: null > at > org.apache.asterix.common.storage.DatasetResourceReference.parse(DatasetResourceReference.java:55) > ~[asterix-common.jar:7.0.0-] > at > org.apache.asterix.common.storage.DatasetResourceReference.of(DatasetResourceReference.java:38) > ~[asterix-common.jar:7.0.0-] > at > org.apache.asterix.transaction.management.resource.PersistentLocalResourceRepository.getLocalResourceReference(PersistentLocalResourceRepository.java:352) > ~[asterix-transactions.jar:7.0.0-] > at > org.apache.asterix.replication.management.IndexReplicationManager.skip(IndexReplicationManager.java:135) > ~[asterix-replication.jar:7.0.0-] > at > org.apache.asterix.replication.management.IndexReplicationManager.process(IndexReplicationManager.java:104) > ~[asterix-replication.jar:7.0.0-] > at > org.apache.asterix.replication.management.IndexReplicationManager.access$200(IndexReplicationManager.java:45) > ~[asterix-replication.jar:7.0.0-] > at > org.apache.asterix.replication.management.IndexReplicationManager$ReplicationJobsProcessor.run(IndexReplicationManager.java:175) > ~[asterix-replication.jar:7.0.0-] > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > ~[?:?] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > ~[?:?] > {noformat} > It looks like due to LSM lifecycle changes, some file of a disk component was > deleted before we get a chance to check if it needs to be replicated or not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ASTERIXDB-2756) NullPointerException in DatasetResourceReference.parse()
[ https://issues.apache.org/jira/browse/ASTERIXDB-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo reassigned ASTERIXDB-2756: --- Assignee: Murtadha Makki Al Hubail (was: Chen Luo) > NullPointerException in DatasetResourceReference.parse() > > > Key: ASTERIXDB-2756 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2756 > Project: Apache AsterixDB > Issue Type: Bug > Components: STO - Storage >Affects Versions: 0.9.5 >Reporter: Murtadha Makki Al Hubail >Assignee: Murtadha Makki Al Hubail >Priority: Major > Fix For: 0.9.5 > > > The following NPE was encountered during a test that creates 10 datasets then > ingestion some data then create another 10 datasets and ingestion more data: > {noformat} > java.lang.NullPointerException: null > at > org.apache.asterix.common.storage.DatasetResourceReference.parse(DatasetResourceReference.java:55) > ~[asterix-common.jar:7.0.0-] > at > org.apache.asterix.common.storage.DatasetResourceReference.of(DatasetResourceReference.java:38) > ~[asterix-common.jar:7.0.0-] > at > org.apache.asterix.transaction.management.resource.PersistentLocalResourceRepository.getLocalResourceReference(PersistentLocalResourceRepository.java:352) > ~[asterix-transactions.jar:7.0.0-] > at > org.apache.asterix.replication.management.IndexReplicationManager.skip(IndexReplicationManager.java:135) > ~[asterix-replication.jar:7.0.0-] > at > org.apache.asterix.replication.management.IndexReplicationManager.process(IndexReplicationManager.java:104) > ~[asterix-replication.jar:7.0.0-] > at > org.apache.asterix.replication.management.IndexReplicationManager.access$200(IndexReplicationManager.java:45) > ~[asterix-replication.jar:7.0.0-] > at > org.apache.asterix.replication.management.IndexReplicationManager$ReplicationJobsProcessor.run(IndexReplicationManager.java:175) > ~[asterix-replication.jar:7.0.0-] > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > ~[?:?] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > ~[?:?] > {noformat} > It looks like due to LSM lifecycle changes, some file of a disk component was > deleted before we get a chance to check if it needs to be replicated or not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ASTERIXDB-2756) NullPointerException in DatasetResourceReference.parse()
[ https://issues.apache.org/jira/browse/ASTERIXDB-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17152864#comment-17152864 ] Chen Luo commented on ASTERIXDB-2756: - I checked the code but I don't think the recent LSM changes have affected replication. Any idea when this bug happens? It looks like the replication job is exected after a dataset is dropped, because IndexReplicationManager looks at the dataset resource instead of a specific LSM component resource. If so, shouldn't we simply skip the replication job? > NullPointerException in DatasetResourceReference.parse() > > > Key: ASTERIXDB-2756 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2756 > Project: Apache AsterixDB > Issue Type: Bug > Components: STO - Storage >Affects Versions: 0.9.5 >Reporter: Murtadha Makki Al Hubail >Assignee: Chen Luo >Priority: Major > Fix For: 0.9.5 > > > The following NPE was encountered during a test that creates 10 datasets then > ingestion some data then create another 10 datasets and ingestion more data: > {noformat} > java.lang.NullPointerException: null > at > org.apache.asterix.common.storage.DatasetResourceReference.parse(DatasetResourceReference.java:55) > ~[asterix-common.jar:7.0.0-] > at > org.apache.asterix.common.storage.DatasetResourceReference.of(DatasetResourceReference.java:38) > ~[asterix-common.jar:7.0.0-] > at > org.apache.asterix.transaction.management.resource.PersistentLocalResourceRepository.getLocalResourceReference(PersistentLocalResourceRepository.java:352) > ~[asterix-transactions.jar:7.0.0-] > at > org.apache.asterix.replication.management.IndexReplicationManager.skip(IndexReplicationManager.java:135) > ~[asterix-replication.jar:7.0.0-] > at > org.apache.asterix.replication.management.IndexReplicationManager.process(IndexReplicationManager.java:104) > ~[asterix-replication.jar:7.0.0-] > at > org.apache.asterix.replication.management.IndexReplicationManager.access$200(IndexReplicationManager.java:45) > ~[asterix-replication.jar:7.0.0-] > at > org.apache.asterix.replication.management.IndexReplicationManager$ReplicationJobsProcessor.run(IndexReplicationManager.java:175) > ~[asterix-replication.jar:7.0.0-] > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > ~[?:?] > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > ~[?:?] > {noformat} > It looks like due to LSM lifecycle changes, some file of a disk component was > deleted before we get a chance to check if it needs to be replicated or not. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2715) Dynamic Memory Component Architecture
Chen Luo created ASTERIXDB-2715: --- Summary: Dynamic Memory Component Architecture Key: ASTERIXDB-2715 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2715 Project: Apache AsterixDB Issue Type: Improvement Components: STO - Storage Reporter: Chen Luo Assignee: Chen Luo AsterixDB uses a static memory component management architecture by dividing the write memory budget evenly to the active datasets. This leads to low memory utilization and cannot support a large number of active datasets efficiently. To address this problem, we introduce a dynamic memory memory component architecture, which has the following design decisions: * All write memory pages are managed via a global virtual buffer cache (global VBC). Each memory component simply requests pages from this global VBC upon writes and return pages unpon flushes. Thus, memory allocation is fully dynamic and on-demand and there is no need for pre-allocating write memory. * The global VBC keeps track of the list of the primary LSM-trees across all partitions. Whenever the write memory is nearly full, it selects one primary LSM-tree and flushes it as well as its secondary indexes to disk. Currently we only flush one LSM-tree partition at a time. By doing so, the reclaimed memory can be used by other components and this in turns increases the memory utilization. * For datasets with filters, using large memory components may hurt query performance. Thus, we additionally introduce a parameter to control the maximum memory component size for filtered datasets. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2708) Optimize Primary Point Searches Via Batching and Stateful Cursors
Chen Luo created ASTERIXDB-2708: --- Summary: Optimize Primary Point Searches Via Batching and Stateful Cursors Key: ASTERIXDB-2708 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2708 Project: Apache AsterixDB Issue Type: Improvement Components: RT - Runtime, STO - Storage Reporter: Chen Luo Assignee: Chen Luo Currently, the primary index point searches can be expensive, especially when a query is not selecitivity, for a few reasons: * Enter and exit LSM components for each search key * Always traverse from root to leaf when searching a key To optimize primary point searches, we introduce a number of optimizations here: * Introduce a batched point search cursor that enters an LSM index for a batch of keys to amortize the cost * Introduce a stateful BTree search algorithm that reuses the previous search history to speedup subsequently searches. Specifically, we keep track of the last leaf page ID and the last key index. For the next search key, if it still exists in the last leaf page, we do not have to traverse from root to leaf again. Moreover, instead of using binary search, we use exponential search to reduce the search cost in case there are a lot of keys. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Assigned] (ASTERIXDB-2493) In-memory LSM filter is not thread safe
[ https://issues.apache.org/jira/browse/ASTERIXDB-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo reassigned ASTERIXDB-2493: --- Assignee: Chen Luo > In-memory LSM filter is not thread safe > --- > > Key: ASTERIXDB-2493 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2493 > Project: Apache AsterixDB > Issue Type: Bug >Affects Versions: 0.9.4 >Reporter: Wail Y. Alkowaileet >Assignee: Chen Luo >Priority: Critical > > To reproduce the issue: > 1- Setup a cluster with a single NC and a single partition. > 2- Set a breakpoint at > [LSMComponentFilter.java#L71|https://github.com/apache/asterixdb/blob/6b31f73565a3b16e0dd1fce9ea010e640c53ca79/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/LSMComponentFilter.java#L71] > 3- DDL: > {code:sql} > DROP DATAVERSE ThreadSafe IF EXISTS; > CREATE DATAVERSE ThreadSafe; > USE ThreadSafe; > CREATE TYPE FilterTestType AS { > uid: uuid, > created: int > }; > CREATE DATASET FilterTest(FilterTestType) > PRIMARY KEY uid AUTOGENERATED WITH FILTER ON created; > {code} > 4- Initiate two insert queries: > {code:sql} > USE ThreadSafe; > INSERT INTO FilterTest ( > {"created": 1} > ) > INSERT INTO FilterTest ( > {"created": 0} > ) > {code} > 5- Let the insert with "created = 0" to update the minTuple (L79) > 6- Now, let the insert with "created = 1" to update minTuple > 7- Do the same for the max. > After (7) both min and max should equal to 1 > 8- Flush the component: > > [http://localhost:19002/connector?dataverseName=ThreadSafe&datasetName=FilterTest] > 9- Execute search query: > {code:sql} > USE ThreadSafe; > SELECT * > FROM FilterTest > WHERE created = 0; > {code} > The query returns an empty result -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ASTERIXDB-2493) In-memory LSM filter is not thread safe
[ https://issues.apache.org/jira/browse/ASTERIXDB-2493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2493. - Resolution: Fixed Fixed by https://asterix-gerrit.ics.uci.edu/c/asterixdb/+/5363 > In-memory LSM filter is not thread safe > --- > > Key: ASTERIXDB-2493 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2493 > Project: Apache AsterixDB > Issue Type: Bug >Affects Versions: 0.9.4 >Reporter: Wail Y. Alkowaileet >Assignee: Chen Luo >Priority: Critical > > To reproduce the issue: > 1- Setup a cluster with a single NC and a single partition. > 2- Set a breakpoint at > [LSMComponentFilter.java#L71|https://github.com/apache/asterixdb/blob/6b31f73565a3b16e0dd1fce9ea010e640c53ca79/hyracks-fullstack/hyracks/hyracks-storage-am-lsm-common/src/main/java/org/apache/hyracks/storage/am/lsm/common/impls/LSMComponentFilter.java#L71] > 3- DDL: > {code:sql} > DROP DATAVERSE ThreadSafe IF EXISTS; > CREATE DATAVERSE ThreadSafe; > USE ThreadSafe; > CREATE TYPE FilterTestType AS { > uid: uuid, > created: int > }; > CREATE DATASET FilterTest(FilterTestType) > PRIMARY KEY uid AUTOGENERATED WITH FILTER ON created; > {code} > 4- Initiate two insert queries: > {code:sql} > USE ThreadSafe; > INSERT INTO FilterTest ( > {"created": 1} > ) > INSERT INTO FilterTest ( > {"created": 0} > ) > {code} > 5- Let the insert with "created = 0" to update the minTuple (L79) > 6- Now, let the insert with "created = 1" to update minTuple > 7- Do the same for the max. > After (7) both min and max should equal to 1 > 8- Flush the component: > > [http://localhost:19002/connector?dataverseName=ThreadSafe&datasetName=FilterTest] > 9- Execute search query: > {code:sql} > USE ThreadSafe; > SELECT * > FROM FilterTest > WHERE created = 0; > {code} > The query returns an empty result -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2666) One-Phase Log Replay Approach
Chen Luo created ASTERIXDB-2666: --- Summary: One-Phase Log Replay Approach Key: ASTERIXDB-2666 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2666 Project: Apache AsterixDB Issue Type: Wish Components: STO - Storage, TX - Transactions Reporter: Chen Luo AsterixDB currently uses a classical two-phase log replay approach during recovery by first identifying committed writes and then applying these commit writes to LSM-trees. This is a stardard approach for general-purpose transaction processing systems, but for AsterixDB, we can design something better. AsterixDB uses a record-level transaction model where each write is committed as soon as possible by "entity commit". To exploit this property, we can design a one-phase log replay approach as follows: * Start from the log head based on the low watermark LSN * Whenever we see an update log record, store that log record in memory (for each job) * Whenever we see an entity commit or abort record, redo the corresponding update log record immediately and remove it from memory The key property here is that the window between an update log record and a commit log record is very short - we commit on a frame basis. Thus, this will speed up the recovery process by only using one log read pass and avoiding store all entity commits in memory. We only need a small amount of memory, based on the window between updates and commits, during the recovery proces. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ASTERIXDB-2654) Potential LSM failure related to max file size
[ https://issues.apache.org/jira/browse/ASTERIXDB-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2654. - Resolution: Not A Bug This is not a bug because the OS file size limit should be increased. Reopen this issue if changing the OS file size limit is not a viable solution. > Potential LSM failure related to max file size > -- > > Key: ASTERIXDB-2654 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2654 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB, IDX - Indexes, STO - Storage >Affects Versions: 0.9.5 >Reporter: Michael J. Carey >Assignee: Chen Luo >Priority: Critical > Fix For: 0.9.5 > > > The new "unlimited" merge policy (aka concurrent) needs to gracefully handle > the condition where the system has components that can no longer be merged > due to OS file size limits!!! (Pointed out by M. Blow just now.) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ASTERIXDB-2654) Potential LSM failure related to max file size
[ https://issues.apache.org/jira/browse/ASTERIXDB-2654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948081#comment-16948081 ] Chen Luo commented on ASTERIXDB-2654: - It's not a good idea to limit to the maximum disk component size because eventually the disk will run out of space under an update-heavy workload, and thus we have to keep merging. Why don't we simply change the OS file size limit (https://access.redhat.com/solutions/61334)? PS - Cloudbery also did something similar to change the number of open files in order to run AsterixDB (https://github.com/ISG-ICS/cloudberry/issues/625). > Potential LSM failure related to max file size > -- > > Key: ASTERIXDB-2654 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2654 > Project: Apache AsterixDB > Issue Type: Bug > Components: *DB - AsterixDB, IDX - Indexes, STO - Storage >Affects Versions: 0.9.5 >Reporter: Michael J. Carey >Assignee: Chen Luo >Priority: Critical > Fix For: 0.9.5 > > > The new "unlimited" merge policy (aka concurrent) needs to gracefully handle > the condition where the system has components that can no longer be merged > due to OS file size limits!!! (Pointed out by M. Blow just now.) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ASTERIXDB-2600) Introduce ConcurrentMergePolicy
[ https://issues.apache.org/jira/browse/ASTERIXDB-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2600. - Resolution: Implemented > Introduce ConcurrentMergePolicy > --- > > Key: ASTERIXDB-2600 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2600 > Project: Apache AsterixDB > Issue Type: Improvement >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > The default PrefixMergePolicy has a number of problems: it only schedules 1 > merge at a time; it stops merging components when they are too large. Both > problems will lead to undesirable performance behavior. This issue introduces > a ConcurrentMergePolicy that fixes these problems. > It has the following four parameters: > * minComponentMergeCount: the minimum number of components to trigger a merge > * maxComponentMergeCount: the maximum number of components per merge > * maxComponentCount: the maximum number of components tolerated in total; > when this number is reached, flush will be stopped > * sizeRatio: a merge is scheduled if the size of the oldest component <= > sizeRatio * total size of youngest components. A larger size ratio(>1) > implies fewer merges, but this leads to more components. A smaller size ratio > (<1) implies more merges, but this improves query performance and space > utilization. > Concurrent merges will be scheduled as well. Given a sequence of components > ordered by newest to oldest, this policy first finds a longest prefix of this > sequence so that no component is being merged. The merge decision will then > be based on this prefix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ASTERIXDB-2541) Introduce GreedyScheduler
[ https://issues.apache.org/jira/browse/ASTERIXDB-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2541. - Resolution: Implemented > Introduce GreedyScheduler > - > > Key: ASTERIXDB-2541 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2541 > Project: Apache AsterixDB > Issue Type: Improvement > Components: STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > Our currently AsynchronousScheduler tries to schedule all merge operations at > the same without any control. This is not optimal in terms of minimizing the > number of disk components, which directly impacts query performance. > Here we introduce GreedyScheduler to minimize the number of disk components > over time. It keeps tracks of all merge operations of an LSM index, and only > activates the merge operation with the smallest number of remaining I/Os. It > can be proven that if the number of components is the same for all merge > operations, then this GreedyScheduler is strictly optimal. Otherwise, this > will still be a good heuristic. > In order for GreedyScheduler to work, we need the following two changes: > * Keep track of the number of scanned pages of index cursors so that we will > know how many pages left; > * Introduce a mechanism to activate/deactivate merge operations > NOTE: GreedyScheduler should only be used during runtime (with a controlled > data arrival process) so that it can reduce the number of disk components at > its best effort. It CANNOT be used when benchmarking the system by writing as > fast as possible since large merges will be starved. The measured write > throughput will be high but unsustainable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ASTERIXDB-2522) Skip logging WAIT record during lock conflicts
[ https://issues.apache.org/jira/browse/ASTERIXDB-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2522. - Resolution: Implemented > Skip logging WAIT record during lock conflicts > -- > > Key: ASTERIXDB-2522 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2522 > Project: Apache AsterixDB > Issue Type: Improvement > Components: TX - Transactions >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Minor > > Currently, our deadlock-free locking protocol > (https://cwiki.apache.org/confluence/display/ASTERIXDB/Deadlock-Free+Locking+Protocol) > is conservative. It works as follows: > A writer thread (i.e., transactor) first tries to acquire X lock on a primary > key. > If the try lock fails, it should release all previous held locks before > acquiring the X lock. To release previous locks, the transactor pushes > partial frames so that previous records can be committed, and further log a > WAIT record to wait for the log flusher to force all previous log records and > unlock previous locks. > However, the WAIT record is actually not necessary. After committing previous > records, the locks will eventually be released by the log flusher thread. As > a result, deadlock still cannot happen in this case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ASTERIXDB-2310) Use Primary Key Index to Enforce Insert Key Uniqueness
[ https://issues.apache.org/jira/browse/ASTERIXDB-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2310. - Resolution: Implemented > Use Primary Key Index to Enforce Insert Key Uniqueness > -- > > Key: ASTERIXDB-2310 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2310 > Project: Apache AsterixDB > Issue Type: Improvement > Components: ING - Ingestion, STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > Labels: triaged > > **Currently when ingesting data using INSERT operations, we always check the > primary index to ensure key uniqueness. However, this implies in most cases > all ingested records might be accessed, which will slow down the ingestion > performance a lot when the records cannot be cached. To handle this, we can > enforce key uniqueness by checking the primary key index, which is much > smaller and can be more easily cached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ASTERIXDB-2600) Introduce ConcurrentMergePolicy
Chen Luo created ASTERIXDB-2600: --- Summary: Introduce ConcurrentMergePolicy Key: ASTERIXDB-2600 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2600 Project: Apache AsterixDB Issue Type: Improvement Reporter: Chen Luo Assignee: Chen Luo The default PrefixMergePolicy has a number of problems: it only schedules 1 merge at a time; it stops merging components when they are too large. Both problems will lead to undesirable performance behavior. This issue introduces a ConcurrentMergePolicy that fixes these problems. It has the following four parameters: * minComponentMergeCount: the minimum number of components to trigger a merge * maxComponentMergeCount: the maximum number of components per merge * maxComponentCount: the maximum number of components tolerated in total; when this number is reached, flush will be stopped * sizeRatio: a merge is scheduled if the size of the oldest component <= sizeRatio * total size of youngest components. A larger size ratio(>1) implies fewer merges, but this leads to more components. A smaller size ratio (<1) implies more merges, but this improves query performance and space utilization. Concurrent merges will be scheduled as well. Given a sequence of components ordered by newest to oldest, this policy first finds a longest prefix of this sequence so that no component is being merged. The merge decision will then be based on this prefix. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ASTERIXDB-2543) Curl HTTP API Usages for multi-statements
[ https://issues.apache.org/jira/browse/ASTERIXDB-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2543: Attachment: create_insert.sh > Curl HTTP API Usages for multi-statements > - > > Key: ASTERIXDB-2543 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2543 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Chen Luo >Priority: Major > Attachments: create_insert.sh > > > Cloudberry has many scripts (similar to the uploaded one) for setting up DB > environment. With the recent change > https://asterix-gerrit.ics.uci.edu/#/c/3267/, this script does not work. > I was trying to revise it to make it work, but two problems surface: > # Only the 1st statement gets executed; > # Double quotes are always ignored by the system. > How would the uploaded script be revised to work properly? We should at least > update our wiki docs to have a more complicated example like this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ASTERIXDB-2543) Curl HTTP API Usages for multi-statements
[ https://issues.apache.org/jira/browse/ASTERIXDB-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2543: Attachment: (was: create_insert.sh) > Curl HTTP API Usages for multi-statements > - > > Key: ASTERIXDB-2543 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2543 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Chen Luo >Priority: Major > Attachments: create_insert.sh > > > Cloudberry has many scripts (similar to the uploaded one) for setting up DB > environment. With the recent change > https://asterix-gerrit.ics.uci.edu/#/c/3267/, this script does not work. > I was trying to revise it to make it work, but two problems surface: > # Only the 1st statement gets executed; > # Double quotes are always ignored by the system. > How would the uploaded script be revised to work properly? We should at least > update our wiki docs to have a more complicated example like this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2543) Curl HTTP API Usages for multi-statements
Chen Luo created ASTERIXDB-2543: --- Summary: Curl HTTP API Usages for multi-statements Key: ASTERIXDB-2543 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2543 Project: Apache AsterixDB Issue Type: Bug Reporter: Chen Luo Attachments: create_insert.sh Cloudberry has many scripts (similar to the uploaded one) for setting up DB environment. With the recent change https://asterix-gerrit.ics.uci.edu/#/c/3267/, this script does not work. I was trying to revise it to make it work, but two problems surface: # Only the 1st statement gets executed; # Double quotes are always ignored by the system. How would the uploaded script be revised to work properly? We should at least update our wiki docs to have a more complicated example like this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ASTERIXDB-2542) Insert duplicate check is not atomic
[ https://issues.apache.org/jira/browse/ASTERIXDB-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo closed ASTERIXDB-2542. --- Resolution: Not A Problem The conflict is resolved when writers insert into the (same) memory component. > Insert duplicate check is not atomic > > > Key: ASTERIXDB-2542 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2542 > Project: Apache AsterixDB > Issue Type: Improvement > Components: STO - Storage, TX - Transactions >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > To insert a record into the primary index, the primary index will first > search itself and only perform the insertion if the old record does not > exist. However, the index search operation and the insertion operation are > not atomic because index search does not lock the primary key; it is possible > that two writers that insert records with the same key can both pass the > uniqueness test and insert the record twice. > We should lock the primary key during index search, as we did for doing > upsert. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2542) Insert duplicate check is not atomic
Chen Luo created ASTERIXDB-2542: --- Summary: Insert duplicate check is not atomic Key: ASTERIXDB-2542 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2542 Project: Apache AsterixDB Issue Type: Improvement Components: STO - Storage, TX - Transactions Reporter: Chen Luo Assignee: Chen Luo To insert a record into the primary index, the primary index will first search itself and only perform the insertion if the old record does not exist. However, the index search operation and the insertion operation are not atomic because index search does not lock the primary key; it is possible that two writers that insert records with the same key can both pass the uniqueness test and insert the record twice. We should lock the primary key during index search, as we did for doing upsert. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ASTERIXDB-2541) Introduce GreedyScheduler
[ https://issues.apache.org/jira/browse/ASTERIXDB-2541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2541: Description: Our currently AsynchronousScheduler tries to schedule all merge operations at the same without any control. This is not optimal in terms of minimizing the number of disk components, which directly impacts query performance. Here we introduce GreedyScheduler to minimize the number of disk components over time. It keeps tracks of all merge operations of an LSM index, and only activates the merge operation with the smallest number of remaining I/Os. It can be proven that if the number of components is the same for all merge operations, then this GreedyScheduler is strictly optimal. Otherwise, this will still be a good heuristic. In order for GreedyScheduler to work, we need the following two changes: * Keep track of the number of scanned pages of index cursors so that we will know how many pages left; * Introduce a mechanism to activate/deactivate merge operations NOTE: GreedyScheduler should only be used during runtime (with a controlled data arrival process) so that it can reduce the number of disk components at its best effort. It CANNOT be used when benchmarking the system by writing as fast as possible since large merges will be starved. The measured write throughput will be high but unsustainable. was: Our currently AsynchronousScheduler tries to schedule all merge operations at the same without any control. This is not optimal in terms of minimizing the number of disk components, which directly impacts query performance. Here we introduce GreedyScheduler to minimize the number of disk components over time. It keeps tracks of all merge operations of an LSM index, and only activates the merge operation with the smallest number of remaining I/Os. It can be proven that if the number of components is the same for all merge operations, then this GreedyScheduler is strictly optimal. Otherwise, this will still be a good heuristic. In order for GreedyScheduler to work, we need the following two changes: * Keep track of the number of scanned pages of index cursors so that we will know how many pages left; * Introduce a mechanism to activate/deactivate merge operations > Introduce GreedyScheduler > - > > Key: ASTERIXDB-2541 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2541 > Project: Apache AsterixDB > Issue Type: Improvement > Components: STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > Our currently AsynchronousScheduler tries to schedule all merge operations at > the same without any control. This is not optimal in terms of minimizing the > number of disk components, which directly impacts query performance. > Here we introduce GreedyScheduler to minimize the number of disk components > over time. It keeps tracks of all merge operations of an LSM index, and only > activates the merge operation with the smallest number of remaining I/Os. It > can be proven that if the number of components is the same for all merge > operations, then this GreedyScheduler is strictly optimal. Otherwise, this > will still be a good heuristic. > In order for GreedyScheduler to work, we need the following two changes: > * Keep track of the number of scanned pages of index cursors so that we will > know how many pages left; > * Introduce a mechanism to activate/deactivate merge operations > NOTE: GreedyScheduler should only be used during runtime (with a controlled > data arrival process) so that it can reduce the number of disk components at > its best effort. It CANNOT be used when benchmarking the system by writing as > fast as possible since large merges will be starved. The measured write > throughput will be high but unsustainable. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2541) Introduce GreedyScheduler
Chen Luo created ASTERIXDB-2541: --- Summary: Introduce GreedyScheduler Key: ASTERIXDB-2541 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2541 Project: Apache AsterixDB Issue Type: Improvement Components: STO - Storage Reporter: Chen Luo Assignee: Chen Luo Our currently AsynchronousScheduler tries to schedule all merge operations at the same without any control. This is not optimal in terms of minimizing the number of disk components, which directly impacts query performance. Here we introduce GreedyScheduler to minimize the number of disk components over time. It keeps tracks of all merge operations of an LSM index, and only activates the merge operation with the smallest number of remaining I/Os. It can be proven that if the number of components is the same for all merge operations, then this GreedyScheduler is strictly optimal. Otherwise, this will still be a good heuristic. In order for GreedyScheduler to work, we need the following two changes: * Keep track of the number of scanned pages of index cursors so that we will know how many pages left; * Introduce a mechanism to activate/deactivate merge operations -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2540) Optimize Performance Stability of Storage
Chen Luo created ASTERIXDB-2540: --- Summary: Optimize Performance Stability of Storage Key: ASTERIXDB-2540 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2540 Project: Apache AsterixDB Issue Type: Improvement Components: STO - Storage Reporter: Chen Luo Assignee: Chen Luo This is one of a series of improvements to optimize the performance stability of our storage subsystem, which suffers from a number of problems. The end result is that there are periodically write stalls during data ingestion, even though the ingestion speed is relatively low. This improvement will deal with the following issues: 1. Bypass all queuing of disk writes during LSM flush and merge operations. Queuing (by BufferCache and IOManager) will cause serious problems to the fairness of disk writes. Thus, a small flush operation could be severely interfered with a large merge operation and would take a much longer time to finish. 2. Perform regular disk forces during flush and merge operations (16MB by default). This is very helpful to limit the I/O queue length of the file system and provides fairness to queries and other writers. This optimization has been implemented in most storage systems today, including Couchbase Server. 3. Optionally, add the support for rate limiting of disk writes to ensure the performance stability of queries. The user can configure the maximum disk write bandwidth for each dataset. This ensures that the system can provide stable performance for both queries and writes, even with large background merges. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2522) Skip logging WAIT record during lock conflicts
Chen Luo created ASTERIXDB-2522: --- Summary: Skip logging WAIT record during lock conflicts Key: ASTERIXDB-2522 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2522 Project: Apache AsterixDB Issue Type: Improvement Components: TX - Transactions Reporter: Chen Luo Assignee: Chen Luo Currently, our deadlock-free locking protocol (https://cwiki.apache.org/confluence/display/ASTERIXDB/Deadlock-Free+Locking+Protocol) is conservative. It works as follows: A writer thread (i.e., transactor) first tries to acquire X lock on a primary key. If the try lock fails, it should release all previous held locks before acquiring the X lock. To release previous locks, the transactor pushes partial frames so that previous records can be committed, and further log a WAIT record to wait for the log flusher to force all previous log records and unlock previous locks. However, the WAIT record is actually not necessary. After committing previous records, the locks will eventually be released by the log flusher thread. As a result, deadlock still cannot happen in this case. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ASTERIXDB-2468) Projection pushdown is not performed correctly for COUNT()
[ https://issues.apache.org/jira/browse/ASTERIXDB-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2468. - Resolution: Implemented Assignee: Chen Luo > Projection pushdown is not performed correctly for COUNT() > -- > > Key: ASTERIXDB-2468 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2468 > Project: Apache AsterixDB > Issue Type: Bug > Components: COMP - Compiler >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > The following query performs projection pushdown correctly to project out > records so that only pks are sorted: > {code} > select count(*) from (select * from %s.%s where sid>=%d AND sid<=%d order by > id) tmp; > {code} > However, when a dataset variable is referenced by count(), projection > pushdown is not working anymore and full records are sorted > {code} > select count(tmp) from (select * from %s.%s where sid>=%d AND sid<=%d order > by id) tmp; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ASTERIXDB-2468) Projection pushdown is not performed correctly for COUNT()
[ https://issues.apache.org/jira/browse/ASTERIXDB-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664366#comment-16664366 ] Chen Luo commented on ASTERIXDB-2468: - [~dlychagin-cb] Can you confirm this is a correct behavior or a bug? > Projection pushdown is not performed correctly for COUNT() > -- > > Key: ASTERIXDB-2468 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2468 > Project: Apache AsterixDB > Issue Type: Bug > Components: COMP - Compiler >Reporter: Chen Luo >Priority: Major > > The following query performs projection pushdown correctly to project out > records so that only pks are sorted: > {code} > select count(*) from (select * from %s.%s where sid>=%d AND sid<=%d order by > id) tmp; > {code} > However, when a dataset variable is referenced by count(), projection > pushdown is not working anymore and full records are sorted > {code} > select count(tmp) from (select * from %s.%s where sid>=%d AND sid<=%d order > by id) tmp; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2468) Projection pushdown is not performed correctly for COUNT()
Chen Luo created ASTERIXDB-2468: --- Summary: Projection pushdown is not performed correctly for COUNT() Key: ASTERIXDB-2468 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2468 Project: Apache AsterixDB Issue Type: Bug Components: COMP - Compiler Reporter: Chen Luo The following query performs projection pushdown correctly to project out records so that only pks are sorted: {code} select count(*) from (select * from %s.%s where sid>=%d AND sid<=%d order by id) tmp; {code} However, when a dataset variable is referenced by count(), projection pushdown is not working anymore and full records are sorted {code} select count(tmp) from (select * from %s.%s where sid>=%d AND sid<=%d order by id) tmp; {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ASTERIXDB-2467) Concurrency control protocol is not correct when there are bad tuples
[ https://issues.apache.org/jira/browse/ASTERIXDB-2467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2467. - Resolution: Fixed Assignee: Chen Luo > Concurrency control protocol is not correct when there are bad tuples > - > > Key: ASTERIXDB-2467 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2467 > Project: Apache AsterixDB > Issue Type: Bug > Components: STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > The current deadlock-free locking protocol pushes partial frames when the try > locking fails. However, when there are bad tuples (e.g., duplicates) in the > frame, the state of partial frames is always reset after the bad tuple is > removed from frame. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2467) Concurrency control protocol is not correct when there are bad tuples
Chen Luo created ASTERIXDB-2467: --- Summary: Concurrency control protocol is not correct when there are bad tuples Key: ASTERIXDB-2467 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2467 Project: Apache AsterixDB Issue Type: Bug Components: STO - Storage Reporter: Chen Luo The current deadlock-free locking protocol pushes partial frames when the try locking fails. However, when there are bad tuples (e.g., duplicates) in the frame, the state of partial frames is always reset after the bad tuple is removed from frame. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (ASTERIXDB-2464) createFile() followed by deleteFile() will not unregister the file
[ https://issues.apache.org/jira/browse/ASTERIXDB-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649510#comment-16649510 ] Chen Luo edited comment on ASTERIXDB-2464 at 10/14/18 7:47 PM: --- -Why this would cause problems? Line 1026 is mainly for concurrency control to prevent a file being deleted concurrently multiple times.- I see! If a file is created but never opened, then it won't be deleted... was (Author: luochen01): Why this would cause problems? Line 1026 is mainly for concurrency control to prevent a file being deleted concurrently multiple times. > createFile() followed by deleteFile() will not unregister the file > -- > > Key: ASTERIXDB-2464 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2464 > Project: Apache AsterixDB > Issue Type: Bug > Components: HYR - Hyracks, STO - Storage >Reporter: Wail Alkowaileet >Assignee: Wail Alkowaileet >Priority: Major > > In > [BufferCache.java#L817|https://github.com/apache/asterixdb/blob/adfb63361a1808aadb1782aee03acc4d9af8eb0c/hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java#L817], > we create the file and register it. However, it does not belong to a > FileHandle. > In > [BufferCache.java#L1026|https://github.com/apache/asterixdb/blob/adfb63361a1808aadb1782aee03acc4d9af8eb0c/hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java#L1026], > the BufferCache will not unregister the file as it's not in the fileInfoMap. > This probably the reason for the sporadic _the file is already mapped_ issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ASTERIXDB-2464) createFile() followed by deleteFile() will not unregister the file
[ https://issues.apache.org/jira/browse/ASTERIXDB-2464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16649510#comment-16649510 ] Chen Luo commented on ASTERIXDB-2464: - Why this would cause problems? Line 1026 is mainly for concurrency control to prevent a file being deleted concurrently multiple times. > createFile() followed by deleteFile() will not unregister the file > -- > > Key: ASTERIXDB-2464 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2464 > Project: Apache AsterixDB > Issue Type: Bug > Components: HYR - Hyracks, STO - Storage >Reporter: Wail Alkowaileet >Assignee: Wail Alkowaileet >Priority: Major > > In > [BufferCache.java#L817|https://github.com/apache/asterixdb/blob/adfb63361a1808aadb1782aee03acc4d9af8eb0c/hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java#L817], > we create the file and register it. However, it does not belong to a > FileHandle. > In > [BufferCache.java#L1026|https://github.com/apache/asterixdb/blob/adfb63361a1808aadb1782aee03acc4d9af8eb0c/hyracks-fullstack/hyracks/hyracks-storage-common/src/main/java/org/apache/hyracks/storage/common/buffercache/BufferCache.java#L1026], > the BufferCache will not unregister the file as it's not in the fileInfoMap. > This probably the reason for the sporadic _the file is already mapped_ issue? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ASTERIXDB-2452) ListifyUnnestingFunctionRule didn't recompute type environment properly after firing
[ https://issues.apache.org/jira/browse/ASTERIXDB-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2452. - Resolution: Fixed > ListifyUnnestingFunctionRule didn't recompute type environment properly after > firing > > > Key: ASTERIXDB-2452 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2452 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > The following query trigger a NPE during query optimization > {code} > set import-private-functions 'true' > let $nullstring := [null, null, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] > let $prefix1 := subset-collection($nullstring, 0, > prefix-len-jaccard(len($nullstring), 0.1f)) > let $prefix4 := subset-collection($nullstring, 0, > prefix-len-jaccard(len($nullstring), 0.4f)) > let $joinpair := > for $s in $prefix4 > for $r in $prefix1 > where $s = $r > return $s > return [$joinpair] > {code} > The problem is that after ListifyUnnestingFunctionRule is fired, the parent > operator's type environment is not recomputed, and thus it still points to > the old operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ASTERIXDB-2453) Improve the Constant Merge Policy
[ https://issues.apache.org/jira/browse/ASTERIXDB-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2453. - Resolution: Fixed > Improve the Constant Merge Policy > - > > Key: ASTERIXDB-2453 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2453 > Project: Apache AsterixDB > Issue Type: Bug > Components: STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > The current constant merge policy has a very high merge cost (O(n*n), where n > is the number of records), and is thus seldom used in practice. However, it > still has a desirable property that read cost is always bounded. From the > user's perspective, this policy is also easy to tune - only a single > parameter of the number of components. > To improve the write cost of the constant merge policy, we will adopt the > idea of Binomial policy proposed by https://arxiv.org/abs/1407.3008. This > policy significantly improves the merge cost to O(K*n^(1+1/K)), where K is > the maximum number of components, and n is the total number of records (or > flushes). Another desirable property is that this policy only has write cost > O(n log n) (similar to the current prefix policy) when n is relatively small > (the number of flushes < 4^K). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ASTERIXDB-2453) Improve the Constant Merge Policy
[ https://issues.apache.org/jira/browse/ASTERIXDB-2453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2453: Description: The current constant merge policy has a very high merge cost (O(n*n), where n is the number of records), and is thus seldom used in practice. However, it still has a desirable property that read cost is always bounded. From the user's perspective, this policy is also easy to tune - only a single parameter of the number of components. To improve the write cost of the constant merge policy, we will adopt the idea of Binomial policy proposed by https://arxiv.org/abs/1407.3008. This policy significantly improves the merge cost to O(K*n^(1+1/K)), where K is the maximum number of components, and n is the total number of records (or flushes). Another desirable property is that this policy only has write cost O(n log n) (similar to the current prefix policy) when n is relatively small (the number of flushes < 4^K). was: The current constant merge policy has a very high merge cost (O(n*n), where n is the number of records), and is thus seldom used in practice. However, it still has a desirable property that read cost is always bounded. From the user's perspective, this policy is also easy to tune - only a single parameter of the number of components. To improve the write cost of the constant merge policy, we will adopt the idea of Binomial policy proposed by https://arxiv.org/abs/1407.3008. This policy significantly improves the merge cost to O(K*n^{1+1/K}), where K is the bound of number of components, and n is the total number of records. Another desirable property is that this policy only has write cost O(n log n) (similar to the current prefix policy) when n is relatively small (the number of flushes < 4^K). > Improve the Constant Merge Policy > - > > Key: ASTERIXDB-2453 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2453 > Project: Apache AsterixDB > Issue Type: Bug > Components: STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > The current constant merge policy has a very high merge cost (O(n*n), where n > is the number of records), and is thus seldom used in practice. However, it > still has a desirable property that read cost is always bounded. From the > user's perspective, this policy is also easy to tune - only a single > parameter of the number of components. > To improve the write cost of the constant merge policy, we will adopt the > idea of Binomial policy proposed by https://arxiv.org/abs/1407.3008. This > policy significantly improves the merge cost to O(K*n^(1+1/K)), where K is > the maximum number of components, and n is the total number of records (or > flushes). Another desirable property is that this policy only has write cost > O(n log n) (similar to the current prefix policy) when n is relatively small > (the number of flushes < 4^K). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2453) Improve the Constant Merge Policy
Chen Luo created ASTERIXDB-2453: --- Summary: Improve the Constant Merge Policy Key: ASTERIXDB-2453 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2453 Project: Apache AsterixDB Issue Type: Bug Components: STO - Storage Reporter: Chen Luo Assignee: Chen Luo The current constant merge policy has a very high merge cost (O(n*n), where n is the number of records), and is thus seldom used in practice. However, it still has a desirable property that read cost is always bounded. From the user's perspective, this policy is also easy to tune - only a single parameter of the number of components. To improve the write cost of the constant merge policy, we will adopt the idea of Binomial policy proposed by https://arxiv.org/abs/1407.3008. This policy significantly improves the merge cost to O(K*n^{1+1/K}), where K is the bound of number of components, and n is the total number of records. Another desirable property is that this policy only has write cost O(n log n) (similar to the current prefix policy) when n is relatively small (the number of flushes < 4^K). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2452) ListifyUnnestingFunctionRule didn't recompute type environment properly after firing
Chen Luo created ASTERIXDB-2452: --- Summary: ListifyUnnestingFunctionRule didn't recompute type environment properly after firing Key: ASTERIXDB-2452 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2452 Project: Apache AsterixDB Issue Type: Bug Reporter: Chen Luo Assignee: Chen Luo The following query trigger a NPE during query optimization {code} set import-private-functions 'true' let $nullstring := [null, null, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10] let $prefix1 := subset-collection($nullstring, 0, prefix-len-jaccard(len($nullstring), 0.1f)) let $prefix4 := subset-collection($nullstring, 0, prefix-len-jaccard(len($nullstring), 0.4f)) let $joinpair := for $s in $prefix4 for $r in $prefix1 where $s = $r return $s return [$joinpair] {code} The problem is that after ListifyUnnestingFunctionRule is fired, the parent operator's type environment is not recomputed, and thus it still points to the old operator. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ASTERIXDB-2312) JsonLogicalPlanPrinter does not handle Upsert properly
[ https://issues.apache.org/jira/browse/ASTERIXDB-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2312. - Resolution: Fixed > JsonLogicalPlanPrinter does not handle Upsert properly > -- > > Key: ASTERIXDB-2312 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2312 > Project: Apache AsterixDB > Issue Type: Bug > Components: COMP - Compiler >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Minor > Labels: triaged > > In case of UPSERT operator, the JsonLogicalPlanPrinter outputs a wrong json > plan, and JsonLogicalPlanTest throws > com.fasterxml.jackson.core.JsonParseException. > Failed run: > https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-asterix-app/3666/org.apache.asterix$asterix-app/testReport/org.apache.asterix.test.jsonplan/JsonOptimizedLogicalPlanTest/test_JsonLogicalPlanTest_11__src_test_resources_optimizerts_queries_primary_key_index_upsert_primary_key_index_with_secondary_sqlpp_/ -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ASTERIXDB-2367) Reduce Blocking when Create/Open/Delete Files from BufferCache
[ https://issues.apache.org/jira/browse/ASTERIXDB-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2367. - Resolution: Fixed > Reduce Blocking when Create/Open/Delete Files from BufferCache > -- > > Key: ASTERIXDB-2367 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2367 > Project: Apache AsterixDB > Issue Type: Improvement >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > Currently, all files create/open/delete operations are all synchronized using > the same object in buffer cache. Certain operations inside the synchronized > block can be very expensive (e.g., sweepAndFlush and closeFile/openFile from > IOManager) and thus block other operations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ASTERIXDB-2334) A range-search on a composite index doesn't work as expected.
[ https://issues.apache.org/jira/browse/ASTERIXDB-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2334. - Resolution: Fixed > A range-search on a composite index doesn't work as expected. > - > > Key: ASTERIXDB-2334 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2334 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Taewoo Kim >Assignee: Chen Luo >Priority: Critical > > A range-search query on a composite primary-index doesn't work as expected. > > The DDL and INSERT statments > {code:java} > DROP DATAVERSE earthquake IF EXISTS; > CREATE DATAVERSE earthquake; > USE earthquake; > CREATE TYPE QzExternalTypeNew AS { > stationid: string, > pointid: string, > itemid: string, > samplerate: string, > startdate: string, > obsvalue: string > }; > CREATE DATASET qz9130all(QzExternalTypeNew) PRIMARY KEY > stationid,pointid,itemid,samplerate,startdate; > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080509","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080510","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080511","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080512","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080513","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080514","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080515","obsvalue":"9"} > ); > {code} > > The query > {code:java} > SELECT startdate > FROM qz9130all > WHERE samplerate='01' and stationid='01' and pointid='5' and itemid='9130' > and startdate >= '20080510' and startdate < '20080513' > ORDER BY startdate;{code} > > The result > {code:java} > { "startdate": "20080510" } > { "startdate": "20080511" } > { "startdate": "20080512" } > { "startdate": "20080513" }{code} > > The last row should be filtered. As the following plan shows, there's no > SELECT operator. The optimizer thinks that the primary-index search can > generate the final answer. But, it doesn't. There are false positive results. > {code:java} > distribute result [$$25] > -- DISTRIBUTE_RESULT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$25]) > -- STREAM_PROJECT |PARTITIONED| > assign [$$25] <- [{"startdate": $$32}] > -- ASSIGN |PARTITIONED| > exchange > -- SORT_MERGE_EXCHANGE [$$32(ASC) ] |PARTITIONED| > order (ASC, $$32) > -- STABLE_SORT [$$32(ASC)] |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$32]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > unnest-map [$$28, $$29, $$30, $$31, $$32, $$qz9130all] <- > index-search("qz9130all", 0, "earthquake", "qz9130all", FALSE, FALSE, 5, > $$38, $$39, $$40, $$41, $$42, 5, $$43, $$44, $$45, $$46, $$47, TRUE, TRUE, > TRUE) > -- BTREE_SEARCH |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > assign [$$38, $$39, $$40, $$41, $$42, $$43, $$44, $$45, > $$46, $$47] <- ["01", "5", "9130", "01", "20080510", "01", "5", "9130", "01", > "20080513"] > -- ASSIGN |PARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE |PARTITIONED|{code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ASTERIXDB-2344) Predicate/LIMIT pushdown for primary scans
[ https://issues.apache.org/jira/browse/ASTERIXDB-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2344. - Resolution: Implemented > Predicate/LIMIT pushdown for primary scans > -- > > Key: ASTERIXDB-2344 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2344 > Project: Apache AsterixDB > Issue Type: Improvement > Components: COMP - Compiler, STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > Currently we don't have limit/predicate pushdown for simple select queries, > e.g., > {code} > select * > from ds_tweet > where friends_count < 10 > limit 5; > {code} > It'll be nice to have: > 1. pushdown predicates to dataset scan operator > (IndexSearchOperatorNodePushable) such that only quantified records are > returned to the outside; > 2. pushdown LIMIT to dataset scan operator when possible so that the query > can be terminated once enough records are fetched; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ASTERIXDB-2429) Primary Key Index is not properly maintained by upsert feeds
[ https://issues.apache.org/jira/browse/ASTERIXDB-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2429. - Resolution: Fixed > Primary Key Index is not properly maintained by upsert feeds > > > Key: ASTERIXDB-2429 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2429 > Project: Apache AsterixDB > Issue Type: Bug > Components: ING - Ingestion, STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > During upsert, the primary key index is not properly maintained and is always > empty. The problem is that when upserting secondary indexes, we generate new > values and old values to clean up secondaries. However, since there is no > secondary key for the primary key index, the old value would always points to > the new primary key. As a result, LSMSecondaryUpsertOperator would always > think the secondary key values do not change, ignoring maintaining the > primary key index. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2432) HTTP server hangs with lot of threads waiting for completion
Chen Luo created ASTERIXDB-2432: --- Summary: HTTP server hangs with lot of threads waiting for completion Key: ASTERIXDB-2432 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2432 Project: Apache AsterixDB Issue Type: Bug Components: API - HTTP API Reporter: Chen Luo Cloudberry experienced an issue that the HTTP port 19002 (for HTTP API) is not responsive, while everything else works fine. It seems that a lot of CC threads are waiting for job completions, and the HTTP server hangs. The complete stack trace is attached: {code} 2018-07-30 10:56:14 Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.0-b70 mixed mode): "Attach Listener" #5244 daemon prio=9 os_prio=0 tid=0x7f0268349800 nid=0x4bf7 waiting on condition [0x] java.lang.Thread.State: RUNNABLE Locked ownable synchronizers: - None "HttpExecutor(port:19001)-15" #5193 prio=10 os_prio=0 tid=0x7f023c00c000 nid=0x4054 waiting on condition [0x7f01fb703000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0006ca6ba290> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:744) Locked ownable synchronizers: - None "HttpExecutor(port:19001)-14" #5192 prio=10 os_prio=0 tid=0x7f0220019000 nid=0x4053 waiting on condition [0x7f01fb804000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0006ca6ba290> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:744) Locked ownable synchronizers: - None "HttpExecutor(port:19001)-13" #5191 prio=10 os_prio=0 tid=0x7f0234005000 nid=0x4052 waiting on condition [0x7f01fb905000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0006ca6ba290> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:744) Locked ownable synchronizers: - None "HttpExecutor(port:19001)-12" #5190 prio=10 os_prio=0 tid=0x7f0220017000 nid=0x4051 waiting on condition [0x7f01fba06000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0006ca6ba290> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:744) Locked ownable
[jira] [Assigned] (ASTERIXDB-2429) Primary Key Index is not properly maintained by upsert feeds
[ https://issues.apache.org/jira/browse/ASTERIXDB-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo reassigned ASTERIXDB-2429: --- Assignee: Chen Luo > Primary Key Index is not properly maintained by upsert feeds > > > Key: ASTERIXDB-2429 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2429 > Project: Apache AsterixDB > Issue Type: Bug > Components: ING - Ingestion, STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > During upsert, the primary key index is not properly maintained and is always > empty. The problem is that when upserting secondary indexes, we generate new > values and old values to clean up secondaries. However, since there is no > secondary key for the primary key index, the old value would always points to > the new primary key. As a result, LSMSecondaryUpsertOperator would always > think the secondary key values do not change, ignoring maintaining the > primary key index. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2429) Primary Key Index is not properly maintained by upsert feeds
Chen Luo created ASTERIXDB-2429: --- Summary: Primary Key Index is not properly maintained by upsert feeds Key: ASTERIXDB-2429 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2429 Project: Apache AsterixDB Issue Type: Bug Components: ING - Ingestion, STO - Storage Reporter: Chen Luo During upsert, the primary key index is not properly maintained and is always empty. The problem is that when upserting secondary indexes, we generate new values and old values to clean up secondaries. However, since there is no secondary key for the primary key index, the old value would always points to the new primary key. As a result, LSMSecondaryUpsertOperator would always think the secondary key values do not change, ignoring maintaining the primary key index. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ASTERIXDB-2421) Recovery fails with component ID mismatch
[ https://issues.apache.org/jira/browse/ASTERIXDB-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2421. - Resolution: Duplicate > Recovery fails with component ID mismatch > -- > > Key: ASTERIXDB-2421 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2421 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Ian Maxon >Assignee: Chen Luo >Priority: Major > > It seems local recovery can fail based on the local component IDs appearing > to be ahead of the redo id somehow: > > java.lang.IllegalStateException: Illegal state of component Id. Max disk > component Id [1532110283703,1532129243741] should be less than redo flush > component Id [1532129243740,1532129243740] > at > org.apache.asterix.app.nc.RecoveryManager.redoFlush(RecoveryManager.java:797) > ~[asterix-app-0.9.4-SNAPSHOT.jar:0.9.4-SNAPSHOT] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ASTERIXDB-2421) Recovery fails with component ID mismatch
[ https://issues.apache.org/jira/browse/ASTERIXDB-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16554710#comment-16554710 ] Chen Luo commented on ASTERIXDB-2421: - This bug should have been fixed by a later patch, but Cloudberry is using a relative old version. The old issue was reported here https://issues.apache.org/jira/browse/ASTERIXDB-2309, and the fix is in https://asterix-gerrit.ics.uci.edu/#/c/2437/. This bug happens if there were multiple partitions in a single node; during recovery, previously we didn't check the partition of a FLUSH record properly. > Recovery fails with component ID mismatch > -- > > Key: ASTERIXDB-2421 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2421 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Ian Maxon >Assignee: Chen Luo >Priority: Major > > It seems local recovery can fail based on the local component IDs appearing > to be ahead of the redo id somehow: > > java.lang.IllegalStateException: Illegal state of component Id. Max disk > component Id [1532110283703,1532129243741] should be less than redo flush > component Id [1532129243740,1532129243740] > at > org.apache.asterix.app.nc.RecoveryManager.redoFlush(RecoveryManager.java:797) > ~[asterix-app-0.9.4-SNAPSHOT.jar:0.9.4-SNAPSHOT] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ASTERIXDB-2421) Recovery fails with component ID mismatch
[ https://issues.apache.org/jira/browse/ASTERIXDB-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo reassigned ASTERIXDB-2421: --- Assignee: Chen Luo (was: Ian Maxon) > Recovery fails with component ID mismatch > -- > > Key: ASTERIXDB-2421 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2421 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Ian Maxon >Assignee: Chen Luo >Priority: Major > > It seems local recovery can fail based on the local component IDs appearing > to be ahead of the redo id somehow: > > java.lang.IllegalStateException: Illegal state of component Id. Max disk > component Id [1532110283703,1532129243741] should be less than redo flush > component Id [1532129243740,1532129243740] > at > org.apache.asterix.app.nc.RecoveryManager.redoFlush(RecoveryManager.java:797) > ~[asterix-app-0.9.4-SNAPSHOT.jar:0.9.4-SNAPSHOT] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2392) Improve Key Comparison During LSM Merge
Chen Luo created ASTERIXDB-2392: --- Summary: Improve Key Comparison During LSM Merge Key: ASTERIXDB-2392 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2392 Project: Apache AsterixDB Issue Type: Improvement Reporter: Chen Luo Assignee: Chen Luo Merging search results from multiple LSM components using the priority queue is relatively expensive. To optimize this, we can use a special comparator without type checking since all types of index keys should be the same. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ASTERIXDB-2379) Inconsistent naming for LSM Component timestamps
[ https://issues.apache.org/jira/browse/ASTERIXDB-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo closed ASTERIXDB-2379. --- Resolution: Not A Problem > Inconsistent naming for LSM Component timestamps > > > Key: ASTERIXDB-2379 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2379 > Project: Apache AsterixDB > Issue Type: Improvement > Components: STO - Storage >Reporter: Chen Luo >Priority: Major > > The current naming of LSM components are inconsistent. > AbstractLSMIndexFileManager.getComponentStartTime/getComponentEndTime assumes > a component is named as "beginTS-endTS". However, physically a component is > named as "endTS-beginTS". This leads to recovery problems because certain > valid components could be erroneously deleted because of index checkpointing > check. > The problem is caused by merge operation. For example, in > LSMBTree.getMergeFileReferences, firstComponent is actually newer than > lastComponent, which breaks the assumption of time range. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ASTERIXDB-2379) Inconsistent naming for LSM Component timestamps
[ https://issues.apache.org/jira/browse/ASTERIXDB-2379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2379: Description: The current naming of LSM components are inconsistent. AbstractLSMIndexFileManager.getComponentStartTime/getComponentEndTime assumes a component is named as "beginTS-endTS". However, physically a component is named as "endTS-beginTS". This leads to recovery problems because certain valid components could be erroneously deleted because of index checkpointing check. The problem is caused by merge operation. For example, in LSMBTree.getMergeFileReferences, firstComponent is actually newer than lastComponent, which breaks the assumption of time range. was: The current naming of LSM components are inconsistent. AbstractLSMIndexFileManager.getComponentStartTime/getComponentEndTime assumes a component is named as "beginTS-endTS". However, physically a component is named as "endTS-beginTS". The problem is caused by merge operation. For example, in LSMBTree.getMergeFileReferences, firstComponent is actually newer than lastComponent, which breaks the assumption of time range. > Inconsistent naming for LSM Component timestamps > > > Key: ASTERIXDB-2379 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2379 > Project: Apache AsterixDB > Issue Type: Improvement > Components: STO - Storage >Reporter: Chen Luo >Priority: Major > > The current naming of LSM components are inconsistent. > AbstractLSMIndexFileManager.getComponentStartTime/getComponentEndTime assumes > a component is named as "beginTS-endTS". However, physically a component is > named as "endTS-beginTS". This leads to recovery problems because certain > valid components could be erroneously deleted because of index checkpointing > check. > The problem is caused by merge operation. For example, in > LSMBTree.getMergeFileReferences, firstComponent is actually newer than > lastComponent, which breaks the assumption of time range. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2379) Inconsistent naming for LSM Component timestamps
Chen Luo created ASTERIXDB-2379: --- Summary: Inconsistent naming for LSM Component timestamps Key: ASTERIXDB-2379 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2379 Project: Apache AsterixDB Issue Type: Improvement Components: STO - Storage Reporter: Chen Luo The current naming of LSM components are inconsistent. AbstractLSMIndexFileManager.getComponentStartTime/getComponentEndTime assumes a component is named as "beginTS-endTS". However, physically a component is named as "endTS-beginTS". The problem is caused by merge operation. For example, in LSMBTree.getMergeFileReferences, firstComponent is actually newer than lastComponent, which breaks the assumption of time range. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2367) Reduce Blocking when Create/Open/Delete Files from BufferCache
Chen Luo created ASTERIXDB-2367: --- Summary: Reduce Blocking when Create/Open/Delete Files from BufferCache Key: ASTERIXDB-2367 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2367 Project: Apache AsterixDB Issue Type: Improvement Reporter: Chen Luo Assignee: Chen Luo Currently, all files create/open/delete operations are all synchronized using the same object in buffer cache. Certain operations inside the synchronized block can be very expensive (e.g., sweepAndFlush and closeFile/openFile from IOManager) and thus block other operations. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ASTERIXDB-2357) ADMParser String Creation Improvements
[ https://issues.apache.org/jira/browse/ASTERIXDB-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2357. - Resolution: Fixed > ADMParser String Creation Improvements > -- > > Key: ASTERIXDB-2357 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2357 > Project: Apache AsterixDB > Issue Type: Improvement > Components: EXT - External data >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Minor > > The current ADMParser heavily relies on string operations, which results in a > lot of string objects are being created. It's better to operate on char[] > directly to avoid object creation and memory copying overheads. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ASTERIXDB-2364) Insert statement doesn't recognize the dataverse in full dataset name
[ https://issues.apache.org/jira/browse/ASTERIXDB-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16436725#comment-16436725 ] Chen Luo commented on ASTERIXDB-2364: - I think the correct grammar should be `test`.`employee` instead of `test.employee`? `xxx` is used for resolving conflict names/characters. In this case, `test.employee` is simply treated as one identifier, instead of two > Insert statement doesn't recognize the dataverse in full dataset name > - > > Key: ASTERIXDB-2364 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2364 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Xikui Wang >Priority: Minor > > drop dataverse test if exists; > create dataverse test; > create type test.Emp as > closed { > id : integer, > fname : string, > lname : string, > age : integer, > dept : string > }; > create dataset test.employee(Emp) primary key id; > create index idx_employee_first_name on test.employee (fname) type btree; > insert into test.employee > select element {'id':(x.id + > 1),'fname':x.fname,'lname':x.lname,'age':x.age,'dept':x.dept} > from `test.employee` as x > ; > This fails. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2365) Various LSM IO Operations Shouldn't Simply Share the Same Write Queue
Chen Luo created ASTERIXDB-2365: --- Summary: Various LSM IO Operations Shouldn't Simply Share the Same Write Queue Key: ASTERIXDB-2365 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2365 Project: Apache AsterixDB Issue Type: Improvement Components: STO - Storage Reporter: Chen Luo Currently the BufferCache has a single writer queue for all write operations (mainly for LSM component bulkloading). However, without a smart scheduling algorithm, this causes unnecessary stalls for short duration operations (e.g., flush) if some heavy operation (e.g., large merge) is ongoing. It is necessary to develop some sort of I/O scheduling algorithm to ensure short duration operations are not blocked by those longer ones. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2357) ADMParser String Creation Improvements
Chen Luo created ASTERIXDB-2357: --- Summary: ADMParser String Creation Improvements Key: ASTERIXDB-2357 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2357 Project: Apache AsterixDB Issue Type: Improvement Components: EXT - External data Reporter: Chen Luo Assignee: Chen Luo The current ADMParser heavily relies on string operations, which results in a lot of string objects are being created. It's better to operate on char[] directly to avoid object creation and memory copying overheads. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ASTERIXDB-2350) Improve IN subquery
[ https://issues.apache.org/jira/browse/ASTERIXDB-2350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2350: Summary: Improve IN subquery (was: Improvement of IN subquery) > Improve IN subquery > --- > > Key: ASTERIXDB-2350 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2350 > Project: Apache AsterixDB > Issue Type: Improvement > Components: COMP - Compiler >Reporter: Chen Luo >Priority: Minor > > By default, IN is translated into a hybrid hash join followed by a sort group > by, which is expensive for simply in queries with constant values. > For example > {code} > select * > from lineitem > where l_partkey in [1, 2 ,3]; > {code} > is translated into > {code} > distribute result [$$16] > -- DISTRIBUTE_RESULT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$16]) > -- STREAM_PROJECT |PARTITIONED| > assign [$$16] <- [{"lineitem": $$lineitem}] > -- ASSIGN |PARTITIONED| > project ([$$lineitem]) > -- STREAM_PROJECT |PARTITIONED| > select ($$14) > -- STREAM_SELECT |PARTITIONED| > project ([$$14, $$lineitem]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > group by ([$$20 := $$17]) decor ([$$lineitem]) { > aggregate [$$14] <- [non-empty-stream()] > -- AGGREGATE |LOCAL| > select (not(is-missing($$19))) > -- STREAM_SELECT |LOCAL| > nested tuple source > -- NESTED_TUPLE_SOURCE |LOCAL| >} > -- PRE_CLUSTERED_GROUP_BY[$$17] |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > order (ASC, $$17) > -- STABLE_SORT [$$17(ASC)] |PARTITIONED| > exchange > -- HASH_PARTITION_EXCHANGE [$$17] |PARTITIONED| > project ([$$lineitem, $$19, $$17]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > left outer join (eq($$18, $#1)) > -- HYBRID_HASH_JOIN [$$18][$#1] |PARTITIONED| > exchange > -- HASH_PARTITION_EXCHANGE [$$18] |PARTITIONED| > assign [$$18] <- [$$lineitem.getField(1)] > -- ASSIGN |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > data-scan []<-[$$17, $$lineitem] <- > Default.lineitem > -- DATASOURCE_SCAN |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE |PARTITIONED| > exchange > -- HASH_PARTITION_EXCHANGE [$#1] |PARTITIONED| > assign [$$19] <- [TRUE] > -- ASSIGN |UNPARTITIONED| > unnest $#1 <- scan-collection(array: [ 1, > 2, 3 ]) > -- UNNEST |UNPARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE |UNPARTITIONED| > {code} > While the following query > {code} > select * > from lineitem > where l_partkey = 1 OR l_partkey=2 OR l_partkey =3; > {code} > is translated into > {code} > distribute result [$$18] > -- DISTRIBUTE_RESULT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$18]) > -- STREAM_PROJECT |PARTITIONED| > assign [$$18] <- [{"lineitem": $$lineitem}] > -- ASSIGN |PARTITIONED| > project ([$$lineitem]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > join (eq($$21, $$19)) > -- HYBRID_HASH_JOIN [$$19][$$21] |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > assign [$$19] <- [$$lineitem.getField(1)] > -- ASSIGN |PARTITIONED| > project ([$$lineitem]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PA
[jira] [Created] (ASTERIXDB-2350) Improvement of IN subquery
Chen Luo created ASTERIXDB-2350: --- Summary: Improvement of IN subquery Key: ASTERIXDB-2350 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2350 Project: Apache AsterixDB Issue Type: Improvement Components: COMP - Compiler Reporter: Chen Luo By default, IN is translated into a hybrid hash join followed by a sort group by, which is expensive for simply in queries with constant values. For example {code} select * from lineitem where l_partkey in [1, 2 ,3]; {code} is translated into {code} distribute result [$$16] -- DISTRIBUTE_RESULT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$16]) -- STREAM_PROJECT |PARTITIONED| assign [$$16] <- [{"lineitem": $$lineitem}] -- ASSIGN |PARTITIONED| project ([$$lineitem]) -- STREAM_PROJECT |PARTITIONED| select ($$14) -- STREAM_SELECT |PARTITIONED| project ([$$14, $$lineitem]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| group by ([$$20 := $$17]) decor ([$$lineitem]) { aggregate [$$14] <- [non-empty-stream()] -- AGGREGATE |LOCAL| select (not(is-missing($$19))) -- STREAM_SELECT |LOCAL| nested tuple source -- NESTED_TUPLE_SOURCE |LOCAL| } -- PRE_CLUSTERED_GROUP_BY[$$17] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| order (ASC, $$17) -- STABLE_SORT [$$17(ASC)] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$17] |PARTITIONED| project ([$$lineitem, $$19, $$17]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| left outer join (eq($$18, $#1)) -- HYBRID_HASH_JOIN [$$18][$#1] |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$$18] |PARTITIONED| assign [$$18] <- [$$lineitem.getField(1)] -- ASSIGN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| data-scan []<-[$$17, $$lineitem] <- Default.lineitem -- DATASOURCE_SCAN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| empty-tuple-source -- EMPTY_TUPLE_SOURCE |PARTITIONED| exchange -- HASH_PARTITION_EXCHANGE [$#1] |PARTITIONED| assign [$$19] <- [TRUE] -- ASSIGN |UNPARTITIONED| unnest $#1 <- scan-collection(array: [ 1, 2, 3 ]) -- UNNEST |UNPARTITIONED| empty-tuple-source -- EMPTY_TUPLE_SOURCE |UNPARTITIONED| {code} While the following query {code} select * from lineitem where l_partkey = 1 OR l_partkey=2 OR l_partkey =3; {code} is translated into {code} distribute result [$$18] -- DISTRIBUTE_RESULT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| project ([$$18]) -- STREAM_PROJECT |PARTITIONED| assign [$$18] <- [{"lineitem": $$lineitem}] -- ASSIGN |PARTITIONED| project ([$$lineitem]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| join (eq($$21, $$19)) -- HYBRID_HASH_JOIN [$$19][$$21] |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| assign [$$19] <- [$$lineitem.getField(1)] -- ASSIGN |PARTITIONED| project ([$$lineitem]) -- STREAM_PROJECT |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| data-scan []<-[$$20, $$lineitem] <- Default.lineitem -- DATASOURCE_SCAN |PARTITIONED| exchange -- ONE_TO_ONE_EXCHANGE |PARTITIONED| empty-tuple-source -- EMPTY_TUPLE_SOURCE |PARTITIONED| exchange -- BROADCAST_EXCHANGE |PARTITIONED|
[jira] [Resolved] (ASTERIXDB-2339) Improve Inverted Index Merge Performance
[ https://issues.apache.org/jira/browse/ASTERIXDB-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2339. - Resolution: Implemented > Improve Inverted Index Merge Performance > > > Key: ASTERIXDB-2339 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2339 > Project: Apache AsterixDB > Issue Type: Improvement > Components: STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > Currently, the merge of inverted index is implemented by a full range scan, > i.e., token+key pairs are generated and fed into a priority queue to obtain a > global ordering. However, it is typical that a token can correspond to tens > or hundreds (or even much more) keys. As a result, comparisons of tokens are > wasted because for many times tokens would be the same. To improve this, we > can have two priority queues, one for tokens and one for keys. For each > token, we merge their inverted lists using the key priority queue. After > that, we fetch the next token from the token queue, and merge their inverted > lists again. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ASTERIXDB-2280) RTree on an optional nested field can't be built.
[ https://issues.apache.org/jira/browse/ASTERIXDB-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2280. - Resolution: Fixed > RTree on an optional nested field can't be built. > - > > Key: ASTERIXDB-2280 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2280 > Project: Apache AsterixDB > Issue Type: Bug > Components: IDX - Indexes >Reporter: Taewoo Kim >Assignee: Chen Luo >Priority: Major > Labels: triaged > > If there is an optional nested field, we can't build an RTree index. > > {code:java} > use twitter; > create type typePlace if not exists as open{ > country : string, > country_code : string, > full_name : string, > id : string, > name : string, > place_type : string, > bounding_box : rectangle > }; > create type typeTweet2 if not exists as open { > create_at : datetime, > id: int64, > text: string, > in_reply_to_status : int64, > in_reply_to_user : int64, > favorite_count : int64, > coordinate: point?, > retweet_count : int64, > lang : string, > is_retweet: boolean, > hashtags : {{ string }} ?, > user_mentions : {{ int64 }} ? , > place : typePlace? > }; > create dataset ds_test(typeTweet2) primary key id with filter on create_at; > // success > CREATE INDEX dsTwIphoneIdx ON ds_test(create_at) TYPE BTREE; > // success > CREATE INDEX dsTwIphoneIdxCo ON ds_test(coordinate) TYPE RTREE; > // fail > CREATE INDEX dsTwIphoneIdxBBox ON ds_test(place.bounding_box) TYPE RTREE; > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2347) Documentation is missing for "storage.max.active.writable.datasets"
Chen Luo created ASTERIXDB-2347: --- Summary: Documentation is missing for "storage.max.active.writable.datasets" Key: ASTERIXDB-2347 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2347 Project: Apache AsterixDB Issue Type: Bug Components: DOC - Documentation Reporter: Chen Luo Assignee: Murtadha Hubail Recently we've changed the configuration for specifying memory component budget. However, this configuration is still unavailable from our website. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (ASTERIXDB-2344) Predicate/LIMIT pushdown for primary scans
[ https://issues.apache.org/jira/browse/ASTERIXDB-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo updated ASTERIXDB-2344: Description: Currently we don't have limit/predicate pushdown for simple select queries, e.g., {code} select * from ds_tweet where friends_count < 10 limit 5; {code} It'll be nice to have: 1. pushdown predicates to dataset scan operator (IndexSearchOperatorNodePushable) such that only quantified records are returned to the outside; 2. pushdown LIMIT to dataset scan operator when possible so that the query can be terminated once enough records are fetched; was: Currently we don't have limit/predicate pushdown for simple SPJ queries, e.g., {code} select * from ds_tweet where friends_count < 10 limit 5; {code} It'll be nice to have: 1. pushdown predicates to dataset scan operator (IndexSearchOperatorNodePushable) such that only quantified records are returned to the outside; 2. pushdown LIMIT to dataset scan operator when possible so that the query can be terminated once enough records are fetched; > Predicate/LIMIT pushdown for primary scans > -- > > Key: ASTERIXDB-2344 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2344 > Project: Apache AsterixDB > Issue Type: Improvement > Components: COMP - Compiler, STO - Storage >Reporter: Chen Luo >Assignee: Chen Luo >Priority: Major > > Currently we don't have limit/predicate pushdown for simple select queries, > e.g., > {code} > select * > from ds_tweet > where friends_count < 10 > limit 5; > {code} > It'll be nice to have: > 1. pushdown predicates to dataset scan operator > (IndexSearchOperatorNodePushable) such that only quantified records are > returned to the outside; > 2. pushdown LIMIT to dataset scan operator when possible so that the query > can be terminated once enough records are fetched; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2344) Predicate/LIMIT pushdown for primary scans
Chen Luo created ASTERIXDB-2344: --- Summary: Predicate/LIMIT pushdown for primary scans Key: ASTERIXDB-2344 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2344 Project: Apache AsterixDB Issue Type: Improvement Components: COMP - Compiler, STO - Storage Reporter: Chen Luo Assignee: Chen Luo Currently we don't have limit/predicate pushdown for simple SPJ queries, e.g., {code} select * from ds_tweet where friends_count < 10 limit 5; {code} It'll be nice to have: 1. pushdown predicates to dataset scan operator (IndexSearchOperatorNodePushable) such that only quantified records are returned to the outside; 2. pushdown LIMIT to dataset scan operator when possible so that the query can be terminated once enough records are fetched; -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (ASTERIXDB-2342) Inconsistent Configuration with NCService
[ https://issues.apache.org/jira/browse/ASTERIXDB-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo closed ASTERIXDB-2342. --- Resolution: Not A Bug > Inconsistent Configuration with NCService > - > > Key: ASTERIXDB-2342 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2342 > Project: Apache AsterixDB > Issue Type: Bug > Components: CONF - Configuration, DOC - Documentation >Reporter: Chen Luo >Priority: Major > > Both the document website and NCService code assumes there is a [ncservice] > section in the configuration file to configure NCService related settings. > However, [ncservice] is missing in class > org.apache.hyracks.api.config.Section. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2342) Inconsistent Configuration with NCService
Chen Luo created ASTERIXDB-2342: --- Summary: Inconsistent Configuration with NCService Key: ASTERIXDB-2342 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2342 Project: Apache AsterixDB Issue Type: Bug Components: CONF - Configuration, DOC - Documentation Reporter: Chen Luo Both the document website and NCService code assumes there is a [ncservice] section in the configuration file to configure NCService related settings. However, [ncservice] is missing in class org.apache.hyracks.api.config.Section. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (ASTERIXDB-2341) RSSRecordReaderTest Failure
[ https://issues.apache.org/jira/browse/ASTERIXDB-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo resolved ASTERIXDB-2341. - Resolution: Duplicate Duplicate with ASTERIXDB-2216 > RSSRecordReaderTest Failure > --- > > Key: ASTERIXDB-2341 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2341 > Project: Apache AsterixDB > Issue Type: Bug > Components: EXT - External data >Reporter: Chen Luo >Priority: Minor > > I saw a test failure > https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/8618/, which > is caused by that http://lorem-rss.herokuapp.com/feed was down... > Why our test case would depend on an external website? Can this test case be > improved by letting our test case pass even when that host is down? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2341) RSSRecordReaderTest Failure
Chen Luo created ASTERIXDB-2341: --- Summary: RSSRecordReaderTest Failure Key: ASTERIXDB-2341 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2341 Project: Apache AsterixDB Issue Type: Bug Components: EXT - External data Reporter: Chen Luo I saw a test failure https://asterix-jenkins.ics.uci.edu/job/asterix-gerrit-notopic/8618/, which is caused by that http://lorem-rss.herokuapp.com/feed was down... Why our test case would depend on an external website? Can this test case be improved by letting our test case pass even when that host is down? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (ASTERIXDB-2334) A range-search on a composite index doesn't work as expected.
[ https://issues.apache.org/jira/browse/ASTERIXDB-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Luo reassigned ASTERIXDB-2334: --- Assignee: Chen Luo (was: Dmitry Lychagin) > A range-search on a composite index doesn't work as expected. > - > > Key: ASTERIXDB-2334 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2334 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Taewoo Kim >Assignee: Chen Luo >Priority: Critical > > A range-search query on a composite primary-index doesn't work as expected. > > The DDL and INSERT statments > {code:java} > DROP DATAVERSE earthquake IF EXISTS; > CREATE DATAVERSE earthquake; > USE earthquake; > CREATE TYPE QzExternalTypeNew AS { > stationid: string, > pointid: string, > itemid: string, > samplerate: string, > startdate: string, > obsvalue: string > }; > CREATE DATASET qz9130all(QzExternalTypeNew) PRIMARY KEY > stationid,pointid,itemid,samplerate,startdate; > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080509","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080510","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080511","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080512","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080513","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080514","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080515","obsvalue":"9"} > ); > {code} > > The query > {code:java} > SELECT startdate > FROM qz9130all > WHERE samplerate='01' and stationid='01' and pointid='5' and itemid='9130' > and startdate >= '20080510' and startdate < '20080513' > ORDER BY startdate;{code} > > The result > {code:java} > { "startdate": "20080510" } > { "startdate": "20080511" } > { "startdate": "20080512" } > { "startdate": "20080513" }{code} > > The last row should be filtered. As the following plan shows, there's no > SELECT operator. The optimizer thinks that the primary-index search can > generate the final answer. But, it doesn't. There are false positive results. > {code:java} > distribute result [$$25] > -- DISTRIBUTE_RESULT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$25]) > -- STREAM_PROJECT |PARTITIONED| > assign [$$25] <- [{"startdate": $$32}] > -- ASSIGN |PARTITIONED| > exchange > -- SORT_MERGE_EXCHANGE [$$32(ASC) ] |PARTITIONED| > order (ASC, $$32) > -- STABLE_SORT [$$32(ASC)] |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$32]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > unnest-map [$$28, $$29, $$30, $$31, $$32, $$qz9130all] <- > index-search("qz9130all", 0, "earthquake", "qz9130all", FALSE, FALSE, 5, > $$38, $$39, $$40, $$41, $$42, 5, $$43, $$44, $$45, $$46, $$47, TRUE, TRUE, > TRUE) > -- BTREE_SEARCH |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > assign [$$38, $$39, $$40, $$41, $$42, $$43, $$44, $$45, > $$46, $$47] <- ["01", "5", "9130", "01", "20080510", "01", "5", "9130", "01", > "20080513"] > -- ASSIGN |PARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE |PARTITIONED|{code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (ASTERIXDB-2339) Improve Inverted Index Merge Performance
Chen Luo created ASTERIXDB-2339: --- Summary: Improve Inverted Index Merge Performance Key: ASTERIXDB-2339 URL: https://issues.apache.org/jira/browse/ASTERIXDB-2339 Project: Apache AsterixDB Issue Type: Improvement Components: STO - Storage Reporter: Chen Luo Assignee: Chen Luo Currently, the merge of inverted index is implemented by a full range scan, i.e., token+key pairs are generated and fed into a priority queue to obtain a global ordering. However, it is typical that a token can correspond to tens or hundreds (or even much more) keys. As a result, comparisons of tokens are wasted because for many times tokens would be the same. To improve this, we can have two priority queues, one for tokens and one for keys. For each token, we merge their inverted lists using the key priority queue. After that, we fetch the next token from the token queue, and merge their inverted lists again. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ASTERIXDB-2334) A range-search on a composite index doesn't work as expected.
[ https://issues.apache.org/jira/browse/ASTERIXDB-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16411711#comment-16411711 ] Chen Luo commented on ASTERIXDB-2334: - In this case, the SELECT operator should not be needed because 'startdate' is the last composite key, and it just works as a normal range search after determining the prefix. There might be something wrong when during composite search. > A range-search on a composite index doesn't work as expected. > - > > Key: ASTERIXDB-2334 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2334 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Taewoo Kim >Assignee: Dmitry Lychagin >Priority: Critical > > A range-search query on a composite primary-index doesn't work as expected. > > The DDL and INSERT statments > {code:java} > DROP DATAVERSE earthquake IF EXISTS; > CREATE DATAVERSE earthquake; > USE earthquake; > CREATE TYPE QzExternalTypeNew AS { > stationid: string, > pointid: string, > itemid: string, > samplerate: string, > startdate: string, > obsvalue: string > }; > CREATE DATASET qz9130all(QzExternalTypeNew) PRIMARY KEY > stationid,pointid,itemid,samplerate,startdate; > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080509","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080510","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080511","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080512","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080513","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080514","obsvalue":"9"} > ); > INSERT INTO qz9130all( > {"stationid":"01","pointid":"5","itemid":"9130","samplerate":"01","startdate":"20080515","obsvalue":"9"} > ); > {code} > > The query > {code:java} > SELECT startdate > FROM qz9130all > WHERE samplerate='01' and stationid='01' and pointid='5' and itemid='9130' > and startdate >= '20080510' and startdate < '20080513' > ORDER BY startdate;{code} > > The result > {code:java} > { "startdate": "20080510" } > { "startdate": "20080511" } > { "startdate": "20080512" } > { "startdate": "20080513" }{code} > > The last row should be filtered. As the following plan shows, there's no > SELECT operator. The optimizer thinks that the primary-index search can > generate the final answer. But, it doesn't. There are false positive results. > {code:java} > distribute result [$$25] > -- DISTRIBUTE_RESULT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$25]) > -- STREAM_PROJECT |PARTITIONED| > assign [$$25] <- [{"startdate": $$32}] > -- ASSIGN |PARTITIONED| > exchange > -- SORT_MERGE_EXCHANGE [$$32(ASC) ] |PARTITIONED| > order (ASC, $$32) > -- STABLE_SORT [$$32(ASC)] |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$32]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > unnest-map [$$28, $$29, $$30, $$31, $$32, $$qz9130all] <- > index-search("qz9130all", 0, "earthquake", "qz9130all", FALSE, FALSE, 5, > $$38, $$39, $$40, $$41, $$42, 5, $$43, $$44, $$45, $$46, $$47, TRUE, TRUE, > TRUE) > -- BTREE_SEARCH |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > assign [$$38, $$39, $$40, $$41, $$42, $$43, $$44, $$45, > $$46, $$47] <- ["01", "5", "9130", "01", "20080510", "01", "5", "9130", "01", > "20080513"] > -- ASSIGN |PARTITIONED| > empty-tuple-source > -- EMPTY_TUPLE_SOURCE |PARTITIONED|{code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ASTERIXDB-2336) Sorting big tuples fails due to memory budget
[ https://issues.apache.org/jira/browse/ASTERIXDB-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16409857#comment-16409857 ] Chen Luo commented on ASTERIXDB-2336: - I think this is designed intentionally to enforce memory budgeting during sort. The record size cannot exceed the memory budget for sort, otherwise the actual memory usage would be unbounded. If this happens, should the user just increase the memory budget for sort? > Sorting big tuples fails due to memory budget > - > > Key: ASTERIXDB-2336 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2336 > Project: Apache AsterixDB > Issue Type: Bug >Reporter: Ali Alsuliman >Assignee: Ali Alsuliman >Priority: Major > > Currently, when sorting data in a partition, and during merging the sorted > run files, the system throws an exception if one of the run files has a huge > tuple. The reason is that such a huge tuple will take almost the entire > budget allocated for sorting. For such cases, allow exceeding the budget in > order to sort the data. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (ASTERIXDB-2331) Plan branch repeated
[ https://issues.apache.org/jira/browse/ASTERIXDB-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16398060#comment-16398060 ] Chen Luo commented on ASTERIXDB-2331: - I think this is the intended behavior of index-only plan, where results of secondary index search are divided into two branches based on the try lock results(one branch needs to perform do primary index lookup). > Plan branch repeated > > > Key: ASTERIXDB-2331 > URL: https://issues.apache.org/jira/browse/ASTERIXDB-2331 > Project: Apache AsterixDB > Issue Type: Bug > Components: COMP - Compiler >Reporter: Wail Alkowaileet >Priority: Major > > I didn't investigate. But it looks like unmaintained Split output. > DDL > {noformat} > DROP DATAVERSE SocialNetworkData IF EXISTS; > CREATE DATAVERSE SocialNetworkData; > USE SocialNetworkData; > create type ChirpMessageType as { > chirpid: int64, > send_time: datetime > }; > create type GleambookUserType as { > id: int64, > user_since: datetime > }; > create type GleambookMessageType as { > message_id: int64, > author_id: int64, > send_time: datetime > }; > create dataset GleambookMessages(GleambookMessageType) > primary key message_id; > create dataset GleambookUsers(GleambookUserType) > primary key id; > create dataset ChirpMessages(ChirpMessageType) > primary key chirpid; > create index usrSinceIx on GleambookUsers(user_since); > create index sndTimeIx on ChirpMessages(send_time); > create index authorIdIx on GleambookMessages(author_id); > {noformat} > Query: > {noformat} > USE SocialNetworkData; > EXPLAIN > SELECT g.message_id > FROM GleambookUsers as u, GleambookMessages as g > WHERE u.id/*+indexnl*/ = g.author_id > AND u.user_since = datetime("2013-04-16T09:45:46") > {noformat} > Plan: > {noformat} > distribute result [$$28] > -- DISTRIBUTE_RESULT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > union ($$54, $$55, $$28) > -- UNION_ALL |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$54]) > -- STREAM_PROJECT |PARTITIONED| > assign [$$54] <- [{"message_id": $$52}] > -- ASSIGN |PARTITIONED| > project ([$$52]) > -- STREAM_PROJECT |PARTITIONED| > select (eq($$29, $$53.getField(1))) > -- STREAM_SELECT |PARTITIONED| > project ([$$29, $$52, $$53]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > unnest-map [$$52, $$53] <- > index-search("GleambookMessages", 0, "SocialNetworkData", > "GleambookMessages", TRUE, FALSE, 1, $$46, 1, $$46, TRUE, TRUE, TRUE) > -- BTREE_SEARCH |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$29, $$46]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > split ($$47) > -- SPLIT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > project ([$$29, $$46, $$47]) > -- STREAM_PROJECT |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE |PARTITIONED| > unnest-map [$$45, $$46, $$47] <- > index-search("authorIdIx", 0, "SocialNetworkData", "GleambookMessages", TRUE, > TRUE, 1, $$29, 1, $$29, TRUE, TRUE, TRUE) > -- BTREE_SEARCH |PARTITIONED| > exchange > -- BROADCAST_EXCHANGE |PARTITIONED| > union ($$43, $$38, $$29) > -- UNION_ALL |PARTITIONED| > exchange > -- ONE_TO_ONE_EXCHANGE > |PARTITIONED| > project ([$$43]) > -- STREAM_PROJECT |PARTITIONED| > select (eq($$33, datetime: { > 2013-04-16T09:45:46.000Z })) > -- STREAM_SELECT |PARTITIONED| > project ([$$43, $$33]) > -- STREAM_PROJECT > |PARTITIONED| > assign [$$33] <- > [$$44.getField(1)] >