[ https://issues.apache.org/jira/browse/DRILL-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jinfeng Ni resolved DRILL-5480. ------------------------------- Resolution: Fixed Fix Version/s: 1.12.0 Mark it resolved, as the issue was fixed as part of DRILL-5466. commit id: fde0a1df1734e0742b49aabdd28b02202ee2b044 > Empty batch returning from HBase may cause SchemChangeException even when > data does not have different schema > ------------------------------------------------------------------------------------------------------------- > > Key: DRILL-5480 > URL: https://issues.apache.org/jira/browse/DRILL-5480 > Project: Apache Drill > Issue Type: Bug > Reporter: Jinfeng Ni > Assignee: Jinfeng Ni > Fix For: 1.12.0 > > > The following repo was provided by [~haozhu]. > 1. Create a Hbase table with 4 regions > {code} > create 'myhbase', 'cf1','cf2', {SPLITS => ['a', 'b', 'c']} > put 'myhbase','a','cf1:col1','somedata' > put 'myhbase','b','cf1:col2','somedata' > put 'myhbase','c','cf2:col1','somedata' > {code} > One region has cf1.col1. One region has column family 'cf1', but does not > have 'col1' under 'cf1'. One region has only column family 'cf2'. And last > region is complete empty. > 2. Prepare a csv file. > {code} > select * from dfs.tmp.`joinhbase.csv`; > +-------------------+ > | columns | > +-------------------+ > | ["1","somedata"] | > | ["2","somedata"] | > | ["3","somedata"] | > {code} > Now run the following query on drill 1.11.0-SNAPSHOT: > {code} > select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, > 'UTF8') as col1 > from > hbase.myhbase H JOIN dfs.tmp.`joinhbase.csv` C > ON CONVERT_FROM(H.cf1.col1, 'UTF8')= C.columns[1] > ; > {code} > The correct query result show be: > {code} > +---------+-----------+ > | keyCol | col1 | > +---------+-----------+ > | a | somedata | > | a | somedata | > | a | somedata | > +---------+-----------+ > {code} > Turn off broadcast join, then we will see SchemaChangeException, or incorrect > result randomly. By 'randomly', it means in the same session, the same query > would hit SchemaChangeException in one run, while gets incorrect result in a > second run. > {code} > alter session set `planner.enable_broadcast_join`=false; > {code} > {code} > select cast(H.row_key as varchar(10)) as keyCol, CONVERT_FROM(H.cf1.col1, > 'UTF8') as col1 > . . . . . . . . . . . . . . . . . .> from > . . . . . . . . . . . . . . . . . .> hbase.myhbase H JOIN > dfs.tmp.`joinhbase.csv` C > . . . . . . . . . . . . . . . . . .> ON CONVERT_FROM(H.cf1.col1, 'UTF8')= > C.columns[1] > . . . . . . . . . . . . . . . . . .> ; > Error: SYSTEM ERROR: SchemaChangeException: Hash join does not support schema > changes > {code} > {code} > +---------+-------+ > | keyCol | col1 | > +---------+-------+ > +---------+-------+ > No rows selected (0.302 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)