[jira] [Updated] (DRILL-5480) Empty batch returning from HBase may cause SchemChangeException or incorrect query result

Jinfeng Ni (JIRA) Fri, 05 May 2017 16:19:24 -0700

     [ 
https://issues.apache.org/jira/browse/DRILL-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jinfeng Ni updated DRILL-5480:
------------------------------
    Description: 
The following repo was provided by [~haozhu].

1. Create a Hbase table with 4 regions
{code}
create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
put 'myhbase','a','cf1:col1','somedata'
put 'myhbase','b','cf1:col2','somedata'
put 'myhbase','c','cf2:col1','somedata'
{code}

One region has cf1.col1.  One region has column family 'cf1', but does not have 
'col1' under 'cf1'. One region has only column family 'cf2'. And last region is 
complete empty.

2. Prepare a csv file.
{code}
select * from dfs.tmp.`joinhbase.csv`;
+-------------------+
|      columns      |
+-------------------+
| ["1","somedata"]  |
| ["2","somedata"]  |
| ["3","somedata"]  |
{code}

Now run the following query on drill 1.11.0-SNAPSHOT:

{code}
select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 
'UTF8') as col1
from 
hbase.myhbase H JOIN dfs.tmp.`joinhbase.csv` C
ON CONVERT_FROM(H.cf1.col1, 'UTF8')= C.columns[1]
;
{code}

The correct query result show be:
{code}
+---------+-----------+
| keyCol  |   col1    |
+---------+-----------+
| a       | somedata  |
| a       | somedata  |
| a       | somedata  |
+---------+-----------+
{code}

Turn off broadcast join, then we will see SchemaChangeException, or incorrect 
result randomly. By 'randomly', it means in the same session, the same query 
would hit SchemaChangeException in one run, while gets incorrect result in a 
second run. 

{code}
alter session set `planner.enable_broadcast_join`=false;
{code}

{code}

select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 
'UTF8') as col1
. . . . . . . . . . . . . . . . . .> from
. . . . . . . . . . . . . . . . . .> hbase.myhbase H JOIN 
dfs.tmp.`joinhbase.csv` C
. . . . . . . . . . . . . . . . . .> ON CONVERT_FROM(H.cf1.col1, 'UTF8')= 
C.columns[1]
. . . . . . . . . . . . . . . . . .> ;
Error: SYSTEM ERROR: SchemaChangeException: Hash join does not support schema 
changes
{code}

{code}

+---------+-------+
| keyCol  | col1  |
+---------+-------+
+---------+-------+
No rows selected (0.302 seconds)
{code}


  was:
The following repo was provided by [~haozhu].

1. Create a Hbase table with 4 regions
{code}
create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
put 'myhbase','a','cf1:col1','somedata'
put 'myhbase','b','cf1:col2','somedata'
put 'myhbase','c','cf2:col1','somedata'
{code}

One region has cf1.col1.  One region has column family 'cf1', but does not have 
'col1' under 'cf1'. One region has only column family 'cf2'. And last region is 
complete empty.

2. Prepare a csv file.
{code}
select * from dfs.tmp.`joinhbase.csv`;
+-------------------+
|      columns      |
+-------------------+
| ["1","somedata"]  |
| ["2","somedata"]  |
| ["3","somedata"]  |
{code}

Now run the following query on drill 1.11.0-SNAPSHOT:

{code}
select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 
'UTF8') as col1
from 
hbase.myhbase H JOIN dfs.tmp.`joinhbase.csv` C
ON CONVERT_FROM(H.cf1.col1, 'UTF8')= C.columns[1]
;
{code}

The correct query result show be:
{code}
+---------+-----------+
| keyCol  |   col1    |
+---------+-----------+
| a       | somedata  |
| a       | somedata  |
| a       | somedata  |
+---------+-----------+
{code}

Turn off broadcast join, then we will see SchemaChangeException, or incorrect 
result randomly.

{code}
alter session set `planner.enable_broadcast_join`=false;
{code}

{code}

select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 
'UTF8') as col1
. . . . . . . . . . . . . . . . . .> from
. . . . . . . . . . . . . . . . . .> hbase.myhbase H JOIN 
dfs.tmp.`joinhbase.csv` C
. . . . . . . . . . . . . . . . . .> ON CONVERT_FROM(H.cf1.col1, 'UTF8')= 
C.columns[1]
. . . . . . . . . . . . . . . . . .> ;
Error: SYSTEM ERROR: SchemaChangeException: Hash join does not support schema 
changes
{code}

{code}

+---------+-------+
| keyCol  | col1  |
+---------+-------+
+---------+-------+
No rows selected (0.302 seconds)
{code}



> Empty batch returning from HBase may cause SchemChangeException or incorrect 
> query result
> -----------------------------------------------------------------------------------------
>
>                 Key: DRILL-5480
>                 URL: https://issues.apache.org/jira/browse/DRILL-5480
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>
> The following repo was provided by [~haozhu].
> 1. Create a Hbase table with 4 regions
> {code}
> create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']}
> put 'myhbase','a','cf1:col1','somedata'
> put 'myhbase','b','cf1:col2','somedata'
> put 'myhbase','c','cf2:col1','somedata'
> {code}
> One region has cf1.col1.  One region has column family 'cf1', but does not 
> have 'col1' under 'cf1'. One region has only column family 'cf2'. And last 
> region is complete empty.
> 2. Prepare a csv file.
> {code}
> select * from dfs.tmp.`joinhbase.csv`;
> +-------------------+
> |      columns      |
> +-------------------+
> | ["1","somedata"]  |
> | ["2","somedata"]  |
> | ["3","somedata"]  |
> {code}
> Now run the following query on drill 1.11.0-SNAPSHOT:
> {code}
> select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 
> 'UTF8') as col1
> from 
> hbase.myhbase H JOIN dfs.tmp.`joinhbase.csv` C
> ON CONVERT_FROM(H.cf1.col1, 'UTF8')= C.columns[1]
> ;
> {code}
> The correct query result show be:
> {code}
> +---------+-----------+
> | keyCol  |   col1    |
> +---------+-----------+
> | a       | somedata  |
> | a       | somedata  |
> | a       | somedata  |
> +---------+-----------+
> {code}
> Turn off broadcast join, then we will see SchemaChangeException, or incorrect 
> result randomly. By 'randomly', it means in the same session, the same query 
> would hit SchemaChangeException in one run, while gets incorrect result in a 
> second run. 
> {code}
> alter session set `planner.enable_broadcast_join`=false;
> {code}
> {code}
> select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, 
> 'UTF8') as col1
> . . . . . . . . . . . . . . . . . .> from
> . . . . . . . . . . . . . . . . . .> hbase.myhbase H JOIN 
> dfs.tmp.`joinhbase.csv` C
> . . . . . . . . . . . . . . . . . .> ON CONVERT_FROM(H.cf1.col1, 'UTF8')= 
> C.columns[1]
> . . . . . . . . . . . . . . . . . .> ;
> Error: SYSTEM ERROR: SchemaChangeException: Hash join does not support schema 
> changes
> {code}
> {code}
> +---------+-------+
> | keyCol  | col1  |
> +---------+-------+
> +---------+-------+
> No rows selected (0.302 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (DRILL-5480) Empty batch returning from HBase may cause SchemChangeException or incorrect query result

Reply via email to