[
https://issues.apache.org/jira/browse/DRILL-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14226090#comment-14226090
]
Steven Phillips commented on DRILL-1736:
----------------------------------------
I am testing my fix using a text file:
100|[10, 1000]
101|[20, 1200]
and running this query:
with tmp as (select b, flatten(convert_from(c, 'json')) as f from (select
columns[0] as b, columns[1] as c from t))
select * from tmp where cast(tmp.f as int) = 10;
My patch fixes part of the problem, but there is another problem that I did not
address. With my changes, this is new error:
org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to
materialize incoming schema. Errors:
Error in expression at index -1. Error: Only ProjectRecordBatch could have
complex writer function. You are using complex writer function convert_fromJSON
in a non-project operation!. Full expression: --UNKNOWN EXPRESSION--.
Error in expression at index -1. Error: Missing function implementation:
[flatten(LATE-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--..
at
org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer(FilterRecordBatch.java:197)
~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT]
at
org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema(FilterRecordBatch.java:117)
~[drill-java-exec-0.7.0-incubating-SNAPSHOT-rebuffed.jar:0.7.0-incubating-SNAPSHOT]
...
and this is the physical plan:
Drill Physical :
00-00 Screen: rowcount = 1.0, cumulative cost = {6.1 rows, 34.1 cpu, 0.0 io,
0.0 network, 0.0 memory}, id = 799
00-01 Project(b=[$0], f=[$2]): rowcount = 1.0, cumulative cost = {6.0
rows, 34.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 798
00-02 Flatten(flattenField=[$2]): rowcount = 1.0, cumulative cost = {5.0
rows, 26.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 797
00-03 Project(EXPR$0=[$0], EXPR$1=[$1], EXPR$2=[CONVERT_FROM($1,
'json')]): rowcount = 1.0, cumulative cost = {4.0 rows, 25.0 cpu, 0.0 io, 0.0
network, 0.0 memory}, id = 796
00-04 SelectionVectorRemover: rowcount = 1.0, cumulative cost = {3.0
rows, 13.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 795
00-05 Filter(condition=[=(CAST(FLATTEN(CONVERT_FROM($1,
'json'))):INTEGER NOT NULL, 10)]): rowcount = 1.0, cumulative cost = {2.0 rows,
12.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 794
00-06 Project(b=[ITEM($0, 0)], c=[ITEM($0, 1)]): rowcount = 1.0,
cumulative cost = {1.0 rows, 8.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 793
00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/tmp/t,
numFiles=1, columns=[`columns`[0], `columns`[1]],
files=[file:/tmp/t/file.tbl]]]): rowcount = 1.0, cumulative cost = {0.0 rows,
0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 792
The problem appears to be that the convert_from expression is used inside of a
filter operator, but currently complex_functions are only allowed inside of a
project operator.
> Cannot cast to other data types after using flatten + convert_from('json')
> --------------------------------------------------------------------------
>
> Key: DRILL-1736
> URL: https://issues.apache.org/jira/browse/DRILL-1736
> Project: Apache Drill
> Issue Type: Bug
> Components: Functions - Drill
> Affects Versions: 0.6.0, 0.7.0
> Reporter: Hao Zhu
> Assignee: Steven Phillips
> Fix For: 0.7.0
>
>
> 1. This SQL looks good.
> {code}
> select cast(row_key as int) as b, flatten(convert_from(mat.i.n , 'json')) as
> d from dfs.root.`table/mat` as mat;
> +------------+------------+
> | b | d |
> +------------+------------+
> | 100 | 10 |
> | 100 | 1000 |
> | 101 | 20 |
> | 101 | 1200 |
> +------------+------------+
> 4 rows selected (0.196 seconds)
> {code}
> 2. Can not cast column 'b' to other data type.
> {code}
> with tmp as
> (select cast(row_key as int) as b, flatten(convert_from(mat.i.n , 'json')) as
> d from dfs.root.`table/mat` as mat)
> select * from tmp where cast(tmp.d as int)=10;
>
> Query failed: Failure while running fragment., Failure while trying to
> materialize incoming schema. Errors:
> Error in expression at index -1. Error: Missing function implementation:
> [castINT(MAP-REQUIRED)]. Full expression: --UNKNOWN EXPRESSION--.. [
> 744bffba-5ad9-40f4-a47e-25dc83565716 on n4a:31010 ]
> (org.apache.drill.exec.exception.SchemaChangeException) Failure while
> trying to materialize incoming schema. Errors:
> Error in expression at index -1. Error: Missing function implementation:
> [castINT(MAP-REQUIRED)]. Full expression: --UNKNOWN EXPRESSION--..
>
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():194
>
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():114
> org.apache.drill.exec.record.AbstractSingleRecordBatch.buildSchema():110
>
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.buildSchema():80
>
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.buildSchema():64
>
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.buildSchema():80
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.buildSchema():269
>
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.buildSchema():80
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.buildSchema():95
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():111
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run():249
> .......():0
> Error: exception while executing query: Failure while executing query.
> (state=,code=0)
> {code}
> 3. Still can not change data type after creating the view.
> {code}
> create or replace view testview as select cast(row_key as int) as b,
> flatten(convert_from(mat.i.n , 'json')) as d from dfs.root.`table/mat` as mat;
>
> describe testview;
> +-------------+------------+-------------+
> | COLUMN_NAME | DATA_TYPE | IS_NULLABLE |
> +-------------+------------+-------------+
> | b | INTEGER | NO |
> | d | ANY | NO |
> +-------------+------------+-------------+
> 2 rows selected (0.505 seconds)
> select * from testview where cast(d as int)=10;
>
> Query failed: Failure while running fragment., Failure while trying to
> materialize incoming schema. Errors:
> Error in expression at index -1. Error: Missing function implementation:
> [castINT(MAP-REQUIRED)]. Full expression: --UNKNOWN EXPRESSION--.. [
> e3a92573-3947-416e-b0ea-aa6dc4d47a20 on n4a:31010 ]
> (org.apache.drill.exec.exception.SchemaChangeException) Failure while
> trying to materialize incoming schema. Errors:
> Error in expression at index -1. Error: Missing function implementation:
> [castINT(MAP-REQUIRED)]. Full expression: --UNKNOWN EXPRESSION--..
>
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.generateSV2Filterer():194
>
> org.apache.drill.exec.physical.impl.filter.FilterRecordBatch.setupNewSchema():114
> org.apache.drill.exec.record.AbstractSingleRecordBatch.buildSchema():110
>
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.buildSchema():80
>
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.buildSchema():64
>
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.buildSchema():80
>
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.buildSchema():269
>
> org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.buildSchema():80
>
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.buildSchema():95
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():111
> org.apache.drill.exec.work.WorkManager$RunnableWrapper.run():249
> .......():0
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)