[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Luo updated ASTERIXDB-1812:
--------------------------------
    Description: 
The dataset is a sample tweet dataset provided by Cloudberry, which contains 
324000 tweets (about 300M). When issuing the following query, I always get an 
OutofMemoryError.

Query:
```
select * from twitter.ds_tweet t
group by t.test;
```
Stacktrace:
```
org.apache.hyracks.api.exceptions.HyracksException: Job failed on account of:
HYR0003: java.lang.OutOfMemoryError: Java heap space

        at 
org.apache.hyracks.control.cc.job.JobRun.waitForCompletion(JobRun.java:211)
        at 
org.apache.hyracks.control.cc.work.WaitForJobCompletionWork$1.run(WaitForJobCompletionWork.java:48)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0003: 
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
        at org.apache.hyracks.control.nc.Task.run(Task.java:330)
        ... 3 more
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: 
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:228)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.initialize(SuperActivityOperatorNodePushable.java:84)
        at org.apache.hyracks.control.nc.Task.run(Task.java:273)
        ... 3 more
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: 
Java heap space
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:222)
        ... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
        at 
org.apache.hyracks.control.nc.resources.memory.FrameManager.allocateFrame(FrameManager.java:57)
        at 
org.apache.hyracks.control.nc.resources.memory.FrameManager.reallocateFrame(FrameManager.java:73)
        at org.apache.hyracks.control.nc.Joblet.reallocateFrame(Joblet.java:242)
        at org.apache.hyracks.control.nc.Task.reallocateFrame(Task.java:136)
        at 
org.apache.hyracks.api.comm.VSizeFrame.ensureFrameSize(VSizeFrame.java:53)
        at 
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.canHoldNewTuple(AbstractFrameAppender.java:104)
        at 
org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppender.append(FrameTupleAppender.java:49)
        at 
org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:159)
        at 
org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:82)
        at 
org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:78)
        at 
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:150)
        at 
org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
        at 
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
        at 
org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppenderWrapper.write(FrameTupleAppenderWrapper.java:50)
        at 
org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupWriter.close(PreclusteredGroupWriter.java:189)
        at 
org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupOperatorNodePushable.close(PreclusteredGroupOperatorNodePushable.java:77)
        at 
org.apache.hyracks.dataflow.std.sort.AbstractExternalSortRunMerger.process(AbstractExternalSortRunMerger.java:165)
        at 
org.apache.hyracks.dataflow.std.sort.AbstractSorterOperatorDescriptor$MergeActivity$1.initialize(AbstractSorterOperatorDescriptor.java:181)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$initialize$0(SuperActivityOperatorNodePushable.java:86)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$17/1550206216.runAction(Unknown
 Source)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$2(SuperActivityOperatorNodePushable.java:216)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$18/914923531.call(Unknown
 Source)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more

Feb 25, 2017 9:11:13 AM org.apache.asterix.api.http.servlet.APIServlet doPost
SEVERE: Job failed on account of:
HYR0003: java.lang.OutOfMemoryError: Java heap space

org.apache.hyracks.api.exceptions.HyracksException: Job failed on account of:
HYR0003: java.lang.OutOfMemoryError: Java heap space

        at 
org.apache.hyracks.control.cc.job.JobRun.waitForCompletion(JobRun.java:211)
        at 
org.apache.hyracks.control.cc.work.WaitForJobCompletionWork$1.run(WaitForJobCompletionWork.java:48)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0003: 
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
        at org.apache.hyracks.control.nc.Task.run(Task.java:330)
        ... 3 more
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: 
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:228)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.initialize(SuperActivityOperatorNodePushable.java:84)
        at org.apache.hyracks.control.nc.Task.run(Task.java:273)
        ... 3 more
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: 
Java heap space
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:222)
        ... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
        at 
org.apache.hyracks.control.nc.resources.memory.FrameManager.allocateFrame(FrameManager.java:57)
        at 
org.apache.hyracks.control.nc.resources.memory.FrameManager.reallocateFrame(FrameManager.java:73)
        at org.apache.hyracks.control.nc.Joblet.reallocateFrame(Joblet.java:242)
        at org.apache.hyracks.control.nc.Task.reallocateFrame(Task.java:136)
        at 
org.apache.hyracks.api.comm.VSizeFrame.ensureFrameSize(VSizeFrame.java:53)
        at 
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.canHoldNewTuple(AbstractFrameAppender.java:104)
        at 
org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppender.append(FrameTupleAppender.java:49)
        at 
org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:159)
        at 
org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:82)
        at 
org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:78)
        at 
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:150)
        at 
org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
        at 
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
        at 
org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppenderWrapper.write(FrameTupleAppenderWrapper.java:50)
        at 
org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupWriter.close(PreclusteredGroupWriter.java:189)
        at 
org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupOperatorNodePushable.close(PreclusteredGroupOperatorNodePushable.java:77)
        at 
org.apache.hyracks.dataflow.std.sort.AbstractExternalSortRunMerger.process(AbstractExternalSortRunMerger.java:165)
        at 
org.apache.hyracks.dataflow.std.sort.AbstractSorterOperatorDescriptor$MergeActivity$1.initialize(AbstractSorterOperatorDescriptor.java:181)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$initialize$0(SuperActivityOperatorNodePushable.java:86)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$17/1550206216.runAction(Unknown
 Source)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$2(SuperActivityOperatorNodePushable.java:216)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$18/914923531.call(Unknown
 Source)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more
```

Reproduce steps:
1. Install a local AsterixDB cluster from 
https://asterixdb.apache.org/docs/0.9.0/install.html#Section1SingleMachineAsterixDBInstallation.
2. Load sample data from CloudBerry.
-2.1 Download CloudBerry project from https://github.com/ISG-ICS/cloudberry
-2.2 Go to CloudBerry dir, and ingest sample tweets using 
"bin/ingestTwitterToLocalCluster.sh". You might need to change the Asterix 
Cluster IP address at line 23, and the cluster instance name at line 86.
3. Issue the following SQL++ query:
select * from twitter.ds_tweet t
group by t.test;

  was:
The dataset is a sample tweet dataset provided by Cloudberry, which contains 
324000 tweets (about 300M). When issuing the following query, I always get an 
OutofMemoryError.

Query:
select * from twitter.ds_tweet t
group by t.test;

Stacktrace:
org.apache.hyracks.api.exceptions.HyracksException: Job failed on account of:
HYR0003: java.lang.OutOfMemoryError: Java heap space

        at 
org.apache.hyracks.control.cc.job.JobRun.waitForCompletion(JobRun.java:211)
        at 
org.apache.hyracks.control.cc.work.WaitForJobCompletionWork$1.run(WaitForJobCompletionWork.java:48)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0003: 
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
        at org.apache.hyracks.control.nc.Task.run(Task.java:330)
        ... 3 more
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: 
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:228)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.initialize(SuperActivityOperatorNodePushable.java:84)
        at org.apache.hyracks.control.nc.Task.run(Task.java:273)
        ... 3 more
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: 
Java heap space
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:222)
        ... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
        at 
org.apache.hyracks.control.nc.resources.memory.FrameManager.allocateFrame(FrameManager.java:57)
        at 
org.apache.hyracks.control.nc.resources.memory.FrameManager.reallocateFrame(FrameManager.java:73)
        at org.apache.hyracks.control.nc.Joblet.reallocateFrame(Joblet.java:242)
        at org.apache.hyracks.control.nc.Task.reallocateFrame(Task.java:136)
        at 
org.apache.hyracks.api.comm.VSizeFrame.ensureFrameSize(VSizeFrame.java:53)
        at 
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.canHoldNewTuple(AbstractFrameAppender.java:104)
        at 
org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppender.append(FrameTupleAppender.java:49)
        at 
org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:159)
        at 
org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:82)
        at 
org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:78)
        at 
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:150)
        at 
org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
        at 
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
        at 
org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppenderWrapper.write(FrameTupleAppenderWrapper.java:50)
        at 
org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupWriter.close(PreclusteredGroupWriter.java:189)
        at 
org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupOperatorNodePushable.close(PreclusteredGroupOperatorNodePushable.java:77)
        at 
org.apache.hyracks.dataflow.std.sort.AbstractExternalSortRunMerger.process(AbstractExternalSortRunMerger.java:165)
        at 
org.apache.hyracks.dataflow.std.sort.AbstractSorterOperatorDescriptor$MergeActivity$1.initialize(AbstractSorterOperatorDescriptor.java:181)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$initialize$0(SuperActivityOperatorNodePushable.java:86)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$17/1550206216.runAction(Unknown
 Source)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$2(SuperActivityOperatorNodePushable.java:216)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$18/914923531.call(Unknown
 Source)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more

Feb 25, 2017 9:11:13 AM org.apache.asterix.api.http.servlet.APIServlet doPost
SEVERE: Job failed on account of:
HYR0003: java.lang.OutOfMemoryError: Java heap space

org.apache.hyracks.api.exceptions.HyracksException: Job failed on account of:
HYR0003: java.lang.OutOfMemoryError: Java heap space

        at 
org.apache.hyracks.control.cc.job.JobRun.waitForCompletion(JobRun.java:211)
        at 
org.apache.hyracks.control.cc.work.WaitForJobCompletionWork$1.run(WaitForJobCompletionWork.java:48)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0003: 
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
        at org.apache.hyracks.control.nc.Task.run(Task.java:330)
        ... 3 more
Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: 
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:228)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.initialize(SuperActivityOperatorNodePushable.java:84)
        at org.apache.hyracks.control.nc.Task.run(Task.java:273)
        ... 3 more
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: 
Java heap space
        at java.util.concurrent.FutureTask.report(FutureTask.java:122)
        at java.util.concurrent.FutureTask.get(FutureTask.java:192)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:222)
        ... 5 more
Caused by: java.lang.OutOfMemoryError: Java heap space
        at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
        at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
        at 
org.apache.hyracks.control.nc.resources.memory.FrameManager.allocateFrame(FrameManager.java:57)
        at 
org.apache.hyracks.control.nc.resources.memory.FrameManager.reallocateFrame(FrameManager.java:73)
        at org.apache.hyracks.control.nc.Joblet.reallocateFrame(Joblet.java:242)
        at org.apache.hyracks.control.nc.Task.reallocateFrame(Task.java:136)
        at 
org.apache.hyracks.api.comm.VSizeFrame.ensureFrameSize(VSizeFrame.java:53)
        at 
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.canHoldNewTuple(AbstractFrameAppender.java:104)
        at 
org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppender.append(FrameTupleAppender.java:49)
        at 
org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:159)
        at 
org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:82)
        at 
org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:78)
        at 
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:150)
        at 
org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
        at 
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
        at 
org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppenderWrapper.write(FrameTupleAppenderWrapper.java:50)
        at 
org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupWriter.close(PreclusteredGroupWriter.java:189)
        at 
org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupOperatorNodePushable.close(PreclusteredGroupOperatorNodePushable.java:77)
        at 
org.apache.hyracks.dataflow.std.sort.AbstractExternalSortRunMerger.process(AbstractExternalSortRunMerger.java:165)
        at 
org.apache.hyracks.dataflow.std.sort.AbstractSorterOperatorDescriptor$MergeActivity$1.initialize(AbstractSorterOperatorDescriptor.java:181)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$initialize$0(SuperActivityOperatorNodePushable.java:86)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$17/1550206216.runAction(Unknown
 Source)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$2(SuperActivityOperatorNodePushable.java:216)
        at 
org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$18/914923531.call(Unknown
 Source)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        ... 3 more

Reproduce steps:
1. Install a local AsterixDB cluster from 
https://asterixdb.apache.org/docs/0.9.0/install.html#Section1SingleMachineAsterixDBInstallation.
2. Load sample data from CloudBerry.
-2.1 Download CloudBerry project from https://github.com/ISG-ICS/cloudberry
-2.2 Go to CloudBerry dir, and ingest sample tweets using 
"bin/ingestTwitterToLocalCluster.sh". You might need to change the Asterix 
Cluster IP address at line 23, and the cluster instance name at line 86.
3. Issue the following SQL++ query:
select * from twitter.ds_tweet t
group by t.test;


> OutofMemoryError when group by on a non-existing field with 300k records 
> (tweets)
> ---------------------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1812
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1812
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: AsterixDB, Hyracks
>         Environment: Linux 16.04
> Asterix 0.9.0 with 2 nc nodes and 1 cc node. (all using default 
> configurations from 
> https://asterixdb.apache.org/docs/0.9.0/install.html#Section1SingleMachineAsterixDBInstallation)
>            Reporter: Chen Luo
>
> The dataset is a sample tweet dataset provided by Cloudberry, which contains 
> 324000 tweets (about 300M). When issuing the following query, I always get an 
> OutofMemoryError.
> Query:
> ```
> select * from twitter.ds_tweet t
> group by t.test;
> ```
> Stacktrace:
> ```
> org.apache.hyracks.api.exceptions.HyracksException: Job failed on account of:
> HYR0003: java.lang.OutOfMemoryError: Java heap space
>       at 
> org.apache.hyracks.control.cc.job.JobRun.waitForCompletion(JobRun.java:211)
>       at 
> org.apache.hyracks.control.cc.work.WaitForJobCompletionWork$1.run(WaitForJobCompletionWork.java:48)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0003: 
> java.lang.OutOfMemoryError: Java heap space
>       at 
> org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
>       at org.apache.hyracks.control.nc.Task.run(Task.java:330)
>       ... 3 more
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: 
> java.lang.OutOfMemoryError: Java heap space
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:228)
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.initialize(SuperActivityOperatorNodePushable.java:84)
>       at org.apache.hyracks.control.nc.Task.run(Task.java:273)
>       ... 3 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.OutOfMemoryError: Java heap space
>       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:222)
>       ... 5 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
>       at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
>       at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>       at 
> org.apache.hyracks.control.nc.resources.memory.FrameManager.allocateFrame(FrameManager.java:57)
>       at 
> org.apache.hyracks.control.nc.resources.memory.FrameManager.reallocateFrame(FrameManager.java:73)
>       at org.apache.hyracks.control.nc.Joblet.reallocateFrame(Joblet.java:242)
>       at org.apache.hyracks.control.nc.Task.reallocateFrame(Task.java:136)
>       at 
> org.apache.hyracks.api.comm.VSizeFrame.ensureFrameSize(VSizeFrame.java:53)
>       at 
> org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.canHoldNewTuple(AbstractFrameAppender.java:104)
>       at 
> org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppender.append(FrameTupleAppender.java:49)
>       at 
> org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:159)
>       at 
> org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:82)
>       at 
> org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:78)
>       at 
> org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:150)
>       at 
> org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
>       at 
> org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
>       at 
> org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppenderWrapper.write(FrameTupleAppenderWrapper.java:50)
>       at 
> org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupWriter.close(PreclusteredGroupWriter.java:189)
>       at 
> org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupOperatorNodePushable.close(PreclusteredGroupOperatorNodePushable.java:77)
>       at 
> org.apache.hyracks.dataflow.std.sort.AbstractExternalSortRunMerger.process(AbstractExternalSortRunMerger.java:165)
>       at 
> org.apache.hyracks.dataflow.std.sort.AbstractSorterOperatorDescriptor$MergeActivity$1.initialize(AbstractSorterOperatorDescriptor.java:181)
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$initialize$0(SuperActivityOperatorNodePushable.java:86)
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$17/1550206216.runAction(Unknown
>  Source)
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$2(SuperActivityOperatorNodePushable.java:216)
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$18/914923531.call(Unknown
>  Source)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       ... 3 more
> Feb 25, 2017 9:11:13 AM org.apache.asterix.api.http.servlet.APIServlet doPost
> SEVERE: Job failed on account of:
> HYR0003: java.lang.OutOfMemoryError: Java heap space
> org.apache.hyracks.api.exceptions.HyracksException: Job failed on account of:
> HYR0003: java.lang.OutOfMemoryError: Java heap space
>       at 
> org.apache.hyracks.control.cc.job.JobRun.waitForCompletion(JobRun.java:211)
>       at 
> org.apache.hyracks.control.cc.work.WaitForJobCompletionWork$1.run(WaitForJobCompletionWork.java:48)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: HYR0003: 
> java.lang.OutOfMemoryError: Java heap space
>       at 
> org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
>       at org.apache.hyracks.control.nc.Task.run(Task.java:330)
>       ... 3 more
> Caused by: org.apache.hyracks.api.exceptions.HyracksDataException: 
> java.lang.OutOfMemoryError: Java heap space
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:228)
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.initialize(SuperActivityOperatorNodePushable.java:84)
>       at org.apache.hyracks.control.nc.Task.run(Task.java:273)
>       ... 3 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.OutOfMemoryError: Java heap space
>       at java.util.concurrent.FutureTask.report(FutureTask.java:122)
>       at java.util.concurrent.FutureTask.get(FutureTask.java:192)
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.runInParallel(SuperActivityOperatorNodePushable.java:222)
>       ... 5 more
> Caused by: java.lang.OutOfMemoryError: Java heap space
>       at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:57)
>       at java.nio.ByteBuffer.allocate(ByteBuffer.java:335)
>       at 
> org.apache.hyracks.control.nc.resources.memory.FrameManager.allocateFrame(FrameManager.java:57)
>       at 
> org.apache.hyracks.control.nc.resources.memory.FrameManager.reallocateFrame(FrameManager.java:73)
>       at org.apache.hyracks.control.nc.Joblet.reallocateFrame(Joblet.java:242)
>       at org.apache.hyracks.control.nc.Task.reallocateFrame(Task.java:136)
>       at 
> org.apache.hyracks.api.comm.VSizeFrame.ensureFrameSize(VSizeFrame.java:53)
>       at 
> org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.canHoldNewTuple(AbstractFrameAppender.java:104)
>       at 
> org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppender.append(FrameTupleAppender.java:49)
>       at 
> org.apache.hyracks.dataflow.common.comm.util.FrameUtils.appendToWriter(FrameUtils.java:159)
>       at 
> org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:82)
>       at 
> org.apache.hyracks.algebricks.runtime.operators.base.AbstractOneInputOneOutputOneFramePushRuntime.appendToFrameFromTupleBuilder(AbstractOneInputOneOutputOneFramePushRuntime.java:78)
>       at 
> org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:150)
>       at 
> org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
>       at 
> org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
>       at 
> org.apache.hyracks.dataflow.common.comm.io.FrameTupleAppenderWrapper.write(FrameTupleAppenderWrapper.java:50)
>       at 
> org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupWriter.close(PreclusteredGroupWriter.java:189)
>       at 
> org.apache.hyracks.dataflow.std.group.preclustered.PreclusteredGroupOperatorNodePushable.close(PreclusteredGroupOperatorNodePushable.java:77)
>       at 
> org.apache.hyracks.dataflow.std.sort.AbstractExternalSortRunMerger.process(AbstractExternalSortRunMerger.java:165)
>       at 
> org.apache.hyracks.dataflow.std.sort.AbstractSorterOperatorDescriptor$MergeActivity$1.initialize(AbstractSorterOperatorDescriptor.java:181)
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$initialize$0(SuperActivityOperatorNodePushable.java:86)
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$17/1550206216.runAction(Unknown
>  Source)
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable.lambda$runInParallel$2(SuperActivityOperatorNodePushable.java:216)
>       at 
> org.apache.hyracks.api.rewriter.runtime.SuperActivityOperatorNodePushable$$Lambda$18/914923531.call(Unknown
>  Source)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       ... 3 more
> ```
> Reproduce steps:
> 1. Install a local AsterixDB cluster from 
> https://asterixdb.apache.org/docs/0.9.0/install.html#Section1SingleMachineAsterixDBInstallation.
> 2. Load sample data from CloudBerry.
> -2.1 Download CloudBerry project from https://github.com/ISG-ICS/cloudberry
> -2.2 Go to CloudBerry dir, and ingest sample tweets using 
> "bin/ingestTwitterToLocalCluster.sh". You might need to change the Asterix 
> Cluster IP address at line 23, and the cluster instance name at line 86.
> 3. Issue the following SQL++ query:
> select * from twitter.ds_tweet t
> group by t.test;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to