[ 
https://issues.apache.org/jira/browse/ASTERIXDB-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Luo updated ASTERIXDB-1874:
--------------------------------
    Description: 
Basically, I have two dataset, ds_tweet and US_population, and I performed a 
left outer join after group by using SQL++. Executing the query gives 
ArrayIndexOutOfBoundsException.

The detailed stacktrace is as follows:
{code}
java.lang.ArrayIndexOutOfBoundsException: 2
org.apache.hyracks.api.exceptions.HyracksDataException: 
java.lang.ArrayIndexOutOfBoundsException: 2
        at 
org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:50)
        at 
org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
        at org.apache.hyracks.control.nc.Task.run(Task.java:330)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
        at 
org.apache.asterix.builders.RecordBuilder.addField(RecordBuilder.java:166)
        at 
org.apache.asterix.runtime.evaluators.constructors.OpenRecordConstructorDescriptor$2$1.evaluate(OpenRecordConstructorDescriptor.java:103)
        at 
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.produceTuple(AssignRuntimeFactory.java:168)
        at 
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:137)
        at 
org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
        at 
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
        at 
org.apache.hyracks.dataflow.std.join.InMemoryHashJoin.completeJoin(InMemoryHashJoin.java:200)
        at 
org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoin.completeProbe(OptimizedHybridHashJoin.java:551)
        at 
org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoinOperatorDescriptor$ProbeAndJoinActivityNode$1.close(OptimizedHybridHashJoinOperatorDescriptor.java:429)
        at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:367)
        at org.apache.hyracks.control.nc.Task.run(Task.java:308)
        ... 3 more
{code}

Steps to reproduce:
1. I used the sample twitter dataset for Cloudberry, where can be found at 
https://github.com/ISG-ICS/cloudberry. You may simply enter the project 
directory and execute "./script/ingestTwitterToLocalCluster.sh".
2. Create the US_Population dataset using the following commands (SQL++):
{code}
use twitter;
create type typePopulation if not exists as open {
    id: int64,
    create_at: date,
    stateID:int64,
    population:int64
}

create dataset US_population(typePopulation) if not exists primary key id;
{code}

3. Execute the following query (SQL++):
{code}
select t1.state, t1.count, l0.state
from (select state, coll_count(g) as `count`
         from twitter.ds_tweet t
         group by t.geo_tag.stateID as `state` group as g) t1
left outer join twitter.US_population l0 on t1.state = l0. state;
{code}

  was:
Basically, I have two dataset, ds_tweet and US_population, and I performed a 
left outer join after group by using SQL++. Executing the query gives 
ArrayIndexOutOfBoundsException.

The detailed stacktrace is as follows:
{code}
java.lang.ArrayIndexOutOfBoundsException: 2
org.apache.hyracks.api.exceptions.HyracksDataException: 
java.lang.ArrayIndexOutOfBoundsException: 2
        at 
org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:50)
        at 
org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
        at org.apache.hyracks.control.nc.Task.run(Task.java:330)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
        at 
org.apache.asterix.builders.RecordBuilder.addField(RecordBuilder.java:166)
        at 
org.apache.asterix.runtime.evaluators.constructors.OpenRecordConstructorDescriptor$2$1.evaluate(OpenRecordConstructorDescriptor.java:103)
        at 
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.produceTuple(AssignRuntimeFactory.java:168)
        at 
org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:137)
        at 
org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
        at 
org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
        at 
org.apache.hyracks.dataflow.std.join.InMemoryHashJoin.completeJoin(InMemoryHashJoin.java:200)
        at 
org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoin.completeProbe(OptimizedHybridHashJoin.java:551)
        at 
org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoinOperatorDescriptor$ProbeAndJoinActivityNode$1.close(OptimizedHybridHashJoinOperatorDescriptor.java:429)
        at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:367)
        at org.apache.hyracks.control.nc.Task.run(Task.java:308)
        ... 3 more
{code}

Steps to reproduce:
1. I used the sample twitter dataset for Cloudberry, where can be found at 
https://github.com/ISG-ICS/cloudberry. You may simply enter the project 
directory and execute "./script/ingestTwitterToLocalCluster.sh".
2. Create the US_Population dataset using the following commands (SQL++):
{code}
create type typePopulation if not exists as open {
    id: int64,
    create_at: date,
    stateID:int64,
    population:int64
}

create dataset US_population(typePopulation) if not exists primary key id;
{code}

3. Execute the following query (SQL++):
{code}
select t1.state, t1.count, l0.state
from (select state, coll_count(g) as `count`
         from twitter.ds_tweet t
         group by t.geo_tag.stateID as `state` group as g) t1
left outer join twitter.US_population l0 on t1.state = l0. state;
{code}


> ArrayIndexOutOfBoundsException when joining a dataset after groupby
> -------------------------------------------------------------------
>
>                 Key: ASTERIXDB-1874
>                 URL: https://issues.apache.org/jira/browse/ASTERIXDB-1874
>             Project: Apache AsterixDB
>          Issue Type: Bug
>          Components: Hyracks
>            Reporter: Chen Luo
>            Priority: Minor
>
> Basically, I have two dataset, ds_tweet and US_population, and I performed a 
> left outer join after group by using SQL++. Executing the query gives 
> ArrayIndexOutOfBoundsException.
> The detailed stacktrace is as follows:
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 2
> org.apache.hyracks.api.exceptions.HyracksDataException: 
> java.lang.ArrayIndexOutOfBoundsException: 2
>       at 
> org.apache.hyracks.api.exceptions.HyracksDataException.create(HyracksDataException.java:50)
>       at 
> org.apache.hyracks.control.common.utils.ExceptionUtils.setNodeIds(ExceptionUtils.java:62)
>       at org.apache.hyracks.control.nc.Task.run(Task.java:330)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
>       at 
> org.apache.asterix.builders.RecordBuilder.addField(RecordBuilder.java:166)
>       at 
> org.apache.asterix.runtime.evaluators.constructors.OpenRecordConstructorDescriptor$2$1.evaluate(OpenRecordConstructorDescriptor.java:103)
>       at 
> org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.produceTuple(AssignRuntimeFactory.java:168)
>       at 
> org.apache.hyracks.algebricks.runtime.operators.std.AssignRuntimeFactory$1.nextFrame(AssignRuntimeFactory.java:137)
>       at 
> org.apache.hyracks.algebricks.runtime.operators.meta.AlgebricksMetaOperatorDescriptor$2.nextFrame(AlgebricksMetaOperatorDescriptor.java:134)
>       at 
> org.apache.hyracks.dataflow.common.comm.io.AbstractFrameAppender.write(AbstractFrameAppender.java:92)
>       at 
> org.apache.hyracks.dataflow.std.join.InMemoryHashJoin.completeJoin(InMemoryHashJoin.java:200)
>       at 
> org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoin.completeProbe(OptimizedHybridHashJoin.java:551)
>       at 
> org.apache.hyracks.dataflow.std.join.OptimizedHybridHashJoinOperatorDescriptor$ProbeAndJoinActivityNode$1.close(OptimizedHybridHashJoinOperatorDescriptor.java:429)
>       at org.apache.hyracks.control.nc.Task.pushFrames(Task.java:367)
>       at org.apache.hyracks.control.nc.Task.run(Task.java:308)
>       ... 3 more
> {code}
> Steps to reproduce:
> 1. I used the sample twitter dataset for Cloudberry, where can be found at 
> https://github.com/ISG-ICS/cloudberry. You may simply enter the project 
> directory and execute "./script/ingestTwitterToLocalCluster.sh".
> 2. Create the US_Population dataset using the following commands (SQL++):
> {code}
> use twitter;
> create type typePopulation if not exists as open {
>     id: int64,
>     create_at: date,
>     stateID:int64,
>     population:int64
> }
> create dataset US_population(typePopulation) if not exists primary key id;
> {code}
> 3. Execute the following query (SQL++):
> {code}
> select t1.state, t1.count, l0.state
> from (select state, coll_count(g) as `count`
>          from twitter.ds_tweet t
>          group by t.geo_tag.stateID as `state` group as g) t1
> left outer join twitter.US_population l0 on t1.state = l0. state;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to