[jira] [Commented] (DRILL-4859) Missing function implementation: [repeated_count(INT-OPTIONAL)
[ https://issues.apache.org/jira/browse/DRILL-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431428#comment-15431428 ] jean-claude commented on DRILL-4859: In order to better understand the ProjectorGen classes I have decided to simplify the problem. So instead of trying to apply REPEATED_COUNT on fields that are sometimes list/map I'm applying it to the "label" field of type VARHCAR. In this case the ProjectorGen class generated is simpler to understand. However in order to make this work I needed to add the REPEATED_COUNT functions for VARCHAR-REQUIRED and VARCHAR-OPTIONAL. It can now handle {code} select repeated_count(label) from dfs.`/Users/jccote/repeated_count.json`; {code} > Missing function implementation: [repeated_count(INT-OPTIONAL) > -- > > Key: DRILL-4859 > URL: https://issues.apache.org/jira/browse/DRILL-4859 > Project: Apache Drill > Issue Type: Bug >Reporter: jean-claude >Priority: Minor > > Given a JSON file with many thousands of empty maps {} and a few lines like > this at the end of the file > {code} > {} > {} > {} > {} > {} > {"listArray":[], "intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": > "foo"}], "label": "foo"} > {"listArray":[[1,2],[3,4],[5,6]], "intArray": [1,2,3,4], "mapArray": > [{"name": "foo"}], "label": "foo"} > {"listArray":[[1,2]], "intArray": [1,2,3,4], "mapArray": [], "label": "foo"} > {"listArray":[[1,2],[3,4]], "intArray": [1,2,3,4], "mapArray": [{"name": > "foo"},{"name": "foo"},{"name": "foo"}], "label": "foo"} > {"listArray":[[1,2],[3,4],[5,6]], "intArray": [1,2,3,4], "mapArray": > [{"name": "foo"},{"name": "foo"}], "label": "foo"} > {} > {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"},{"name": > "foo"},{"name": "foo"}], "label": "foo"} > {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], > "label": "foo"} > {code} > If you perform a repeated_count on the intArray field. The JSON reader will > assume a plain INT column. You can do this with any type of array even the > mapArray above, you get the same error. > {code} > 0: jdbc:drill:zk=local> select repeated_count(intArray) from > dfs.`/Users/jccote/repeated_count.json`; > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > > Error in expression at index -1. Error: Missing function implementation: > [repeated_count(INT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > Fragment 0:0 > [Error Id: f2e25b81-a53c-4fc6-9bf3-2f7b2fa68d60 on 192.168.1.3:31010] > (org.apache.drill.exec.exception.SchemaChangeException) Failure while > trying to materialize incoming schema. Errors: > > Error in expression at index -1. Error: Missing function implementation: > [repeated_count(INT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():424 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1657 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():226 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4859) Missing function implementation: [repeated_count(INT-OPTIONAL)
[ https://issues.apache.org/jira/browse/DRILL-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431360#comment-15431360 ] jean-claude commented on DRILL-4859: I looks like the JSON reader emits INT fields when it does not know anything about the selected field. As such I have added repeated_count functions for INT, INT-OPTIONAL, BIGINT, BIGINT-OPTIONAL. I then run the query above using enable_union_type enabled. It now computes the count of zero for the few thousands rows until it reaches a map which has the given field. At that point I get this error {code} | 0 | | 0 | | 0 | Error: SYSTEM ERROR: DrillRuntimeException: Unable to cast union to LIST Fragment 0:0 [Error Id: d03e7bb0-365e-40d6-aa79-4c826754c675 on 192.168.1.3:31010] (org.apache.drill.common.exceptions.DrillRuntimeException) Unable to cast union to LIST org.apache.drill.exec.expr.fn.ExceptionFunction.throwException():51 org.apache.drill.exec.test.generated.ProjectorGen2.doEval():113 org.apache.drill.exec.test.generated.ProjectorGen2.projectRecords():62 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():199 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135 org.apache.drill.exec.record.AbstractRecordBatch.next():162 org.apache.drill.exec.physical.impl.BaseRootExec.next():104 org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 org.apache.drill.exec.physical.impl.BaseRootExec.next():94 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232 org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226 java.security.AccessController.doPrivileged():-2 javax.security.auth.Subject.doAs():422 org.apache.hadoop.security.UserGroupInformation.doAs():1657 org.apache.drill.exec.work.fragment.FragmentExecutor.run():226 org.apache.drill.common.SelfCleaningRunnable.run():38 java.util.concurrent.ThreadPoolExecutor.runWorker():1142 java.util.concurrent.ThreadPoolExecutor$Worker.run():617 java.lang.Thread.run():745 (state=,code=0) {code} > Missing function implementation: [repeated_count(INT-OPTIONAL) > -- > > Key: DRILL-4859 > URL: https://issues.apache.org/jira/browse/DRILL-4859 > Project: Apache Drill > Issue Type: Bug >Reporter: jean-claude >Priority: Minor > > Given a JSON file with many thousands of empty maps {} and a few lines like > this at the end of the file > {code} > {} > {} > {} > {} > {} > {"listArray":[], "intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": > "foo"}], "label": "foo"} > {"listArray":[[1,2],[3,4],[5,6]], "intArray": [1,2,3,4], "mapArray": > [{"name": "foo"}], "label": "foo"} > {"listArray":[[1,2]], "intArray": [1,2,3,4], "mapArray": [], "label": "foo"} > {"listArray":[[1,2],[3,4]], "intArray": [1,2,3,4], "mapArray": [{"name": > "foo"},{"name": "foo"},{"name": "foo"}], "label": "foo"} > {"listArray":[[1,2],[3,4],[5,6]], "intArray": [1,2,3,4], "mapArray": > [{"name": "foo"},{"name": "foo"}], "label": "foo"} > {} > {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"},{"name": > "foo"},{"name": "foo"}], "label": "foo"} > {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], > "label": "foo"} > {code} > If you perform a repeated_count on the intArray field. The JSON reader will > assume a plain INT column. You can do this with any type of array even the > mapArray above, you get the same error. > {code} > 0: jdbc:drill:zk=local> select repeated_count(intArray) from > dfs.`/Users/jccote/repeated_count.json`; > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > > Error in expression at index -1. Error: Missing function implementation: > [repeated_count(INT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > Fragment 0:0 > [Error Id: f2e25b81-a53c-4fc6-9bf3-2f7b2fa68d60 on 192.168.1.3:31010] > (org.apache.drill.exec.exception.SchemaChangeException) Failure while > trying to materialize incoming schema. Errors: > > Error in expression at index -1. Error: Missing function implementation: > [repeated_count(INT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():424 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 >
[jira] [Commented] (DRILL-4859) Missing function implementation: [repeated_count(INT-OPTIONAL)
[ https://issues.apache.org/jira/browse/DRILL-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431292#comment-15431292 ] jean-claude commented on DRILL-4859: You can also turn on the union feature for JSON {code} 0: jdbc:drill:zk=local> ALTER SESSION SET `exec.enable_union_type` = true; {code} same result. Using the listArray or the mapArray above {code} : jdbc:drill:zk=local> select repeated_count(listArray) from dfs.`/Users/jccote/repeated_count.json`; Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [repeated_count(INT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. Fragment 0:0 [Error Id: d92a02d5-ede9-41db-99a1-cf042ebf6061 on 192.168.1.3:31010] (org.apache.drill.exec.exception.SchemaChangeException) Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [repeated_count(INT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():424 org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78 org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135 org.apache.drill.exec.record.AbstractRecordBatch.next():162 {code} same error. > Missing function implementation: [repeated_count(INT-OPTIONAL) > -- > > Key: DRILL-4859 > URL: https://issues.apache.org/jira/browse/DRILL-4859 > Project: Apache Drill > Issue Type: Bug >Reporter: jean-claude >Priority: Minor > > Given a JSON file with many thousands of empty maps {} and a few lines like > this at the end of the file > {code} > {} > {} > {} > {} > {} > {"listArray":[], "intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": > "foo"}], "label": "foo"} > {"listArray":[[1,2],[3,4],[5,6]], "intArray": [1,2,3,4], "mapArray": > [{"name": "foo"}], "label": "foo"} > {"listArray":[[1,2]], "intArray": [1,2,3,4], "mapArray": [], "label": "foo"} > {"listArray":[[1,2],[3,4]], "intArray": [1,2,3,4], "mapArray": [{"name": > "foo"},{"name": "foo"},{"name": "foo"}], "label": "foo"} > {"listArray":[[1,2],[3,4],[5,6]], "intArray": [1,2,3,4], "mapArray": > [{"name": "foo"},{"name": "foo"}], "label": "foo"} > {} > {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"},{"name": > "foo"},{"name": "foo"}], "label": "foo"} > {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], > "label": "foo"} > {code} > If you perform a repeated_count on the intArray field. The JSON reader will > assume a plain INT column. You can do this with any type of array even the > mapArray above, you get the same error. > {code} > 0: jdbc:drill:zk=local> select repeated_count(intArray) from > dfs.`/Users/jccote/repeated_count.json`; > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > > Error in expression at index -1. Error: Missing function implementation: > [repeated_count(INT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > Fragment 0:0 > [Error Id: f2e25b81-a53c-4fc6-9bf3-2f7b2fa68d60 on 192.168.1.3:31010] > (org.apache.drill.exec.exception.SchemaChangeException) Failure while > trying to materialize incoming schema. Errors: > > Error in expression at index -1. Error: Missing function implementation: > [repeated_count(INT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():424 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81 > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():422 > org.apache.hadoop.security.UserGroupInformation.doAs():1657 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():226 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)