[jira] [Commented] (DRILL-4859) Missing function implementation: [repeated_count(INT-OPTIONAL)

2016-08-22 Thread jean-claude (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431428#comment-15431428
 ] 

jean-claude commented on DRILL-4859:


In order to better understand the ProjectorGen classes I have decided to 
simplify the problem. So instead of trying to apply REPEATED_COUNT on fields 
that are sometimes list/map I'm applying it to the "label" field of type 
VARHCAR. In this case the ProjectorGen class generated is simpler to 
understand. However in order to make this work I needed to add the 
REPEATED_COUNT functions for VARCHAR-REQUIRED and VARCHAR-OPTIONAL. It can now 
handle
{code}
select repeated_count(label) from dfs.`/Users/jccote/repeated_count.json`;
{code}


> Missing function implementation: [repeated_count(INT-OPTIONAL)
> --
>
> Key: DRILL-4859
> URL: https://issues.apache.org/jira/browse/DRILL-4859
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: jean-claude
>Priority: Minor
>
> Given a JSON file with many thousands of empty maps {} and a few lines like 
> this at the end of the file
> {code}
> {}
> {}
> {}
> {}
> {}
> {"listArray":[], "intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": 
> "foo"}], "label": "foo"}
> {"listArray":[[1,2],[3,4],[5,6]], "intArray": [1,2,3,4], "mapArray": 
> [{"name": "foo"}], "label": "foo"}
> {"listArray":[[1,2]], "intArray": [1,2,3,4], "mapArray": [], "label": "foo"}
> {"listArray":[[1,2],[3,4]], "intArray": [1,2,3,4], "mapArray": [{"name": 
> "foo"},{"name": "foo"},{"name": "foo"}], "label": "foo"}
> {"listArray":[[1,2],[3,4],[5,6]], "intArray": [1,2,3,4], "mapArray": 
> [{"name": "foo"},{"name": "foo"}], "label": "foo"}
> {}
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"},{"name": 
> "foo"},{"name": "foo"}], "label": "foo"}
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], 
> "label": "foo"}
> {code}
> If you perform a repeated_count on the intArray field. The JSON reader will 
> assume a plain INT column. You can do this with any type of array even the 
> mapArray above, you get the same error.
> {code}
> 0: jdbc:drill:zk=local> select repeated_count(intArray) from 
> dfs.`/Users/jccote/repeated_count.json`;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [repeated_count(INT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: f2e25b81-a53c-4fc6-9bf3-2f7b2fa68d60 on 192.168.1.3:31010]
>   (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
> trying to materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [repeated_count(INT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():424
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4859) Missing function implementation: [repeated_count(INT-OPTIONAL)

2016-08-22 Thread jean-claude (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431360#comment-15431360
 ] 

jean-claude commented on DRILL-4859:


I looks like the JSON reader emits INT fields when it does not know anything 
about the selected field.

As such I have added repeated_count functions for INT, INT-OPTIONAL, BIGINT, 
BIGINT-OPTIONAL. I then run the query above using enable_union_type enabled.

It now computes the count of zero for the few thousands rows until it reaches a 
map which has the given field. At that point I get this error
{code}
| 0  |
| 0  |
| 0  |
Error: SYSTEM ERROR: DrillRuntimeException: Unable to cast union to LIST

Fragment 0:0

[Error Id: d03e7bb0-365e-40d6-aa79-4c826754c675 on 192.168.1.3:31010]

  (org.apache.drill.common.exceptions.DrillRuntimeException) Unable to cast 
union to LIST
org.apache.drill.exec.expr.fn.ExceptionFunction.throwException():51
org.apache.drill.exec.test.generated.ProjectorGen2.doEval():113
org.apache.drill.exec.test.generated.ProjectorGen2.projectRecords():62
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.doWork():199
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():93

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1657
org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745 (state=,code=0)
{code}


> Missing function implementation: [repeated_count(INT-OPTIONAL)
> --
>
> Key: DRILL-4859
> URL: https://issues.apache.org/jira/browse/DRILL-4859
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: jean-claude
>Priority: Minor
>
> Given a JSON file with many thousands of empty maps {} and a few lines like 
> this at the end of the file
> {code}
> {}
> {}
> {}
> {}
> {}
> {"listArray":[], "intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": 
> "foo"}], "label": "foo"}
> {"listArray":[[1,2],[3,4],[5,6]], "intArray": [1,2,3,4], "mapArray": 
> [{"name": "foo"}], "label": "foo"}
> {"listArray":[[1,2]], "intArray": [1,2,3,4], "mapArray": [], "label": "foo"}
> {"listArray":[[1,2],[3,4]], "intArray": [1,2,3,4], "mapArray": [{"name": 
> "foo"},{"name": "foo"},{"name": "foo"}], "label": "foo"}
> {"listArray":[[1,2],[3,4],[5,6]], "intArray": [1,2,3,4], "mapArray": 
> [{"name": "foo"},{"name": "foo"}], "label": "foo"}
> {}
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"},{"name": 
> "foo"},{"name": "foo"}], "label": "foo"}
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], 
> "label": "foo"}
> {code}
> If you perform a repeated_count on the intArray field. The JSON reader will 
> assume a plain INT column. You can do this with any type of array even the 
> mapArray above, you get the same error.
> {code}
> 0: jdbc:drill:zk=local> select repeated_count(intArray) from 
> dfs.`/Users/jccote/repeated_count.json`;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [repeated_count(INT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: f2e25b81-a53c-4fc6-9bf3-2f7b2fa68d60 on 192.168.1.3:31010]
>   (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
> trying to materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [repeated_count(INT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():424
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> 

[jira] [Commented] (DRILL-4859) Missing function implementation: [repeated_count(INT-OPTIONAL)

2016-08-22 Thread jean-claude (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431292#comment-15431292
 ] 

jean-claude commented on DRILL-4859:


You can also turn on the union feature for JSON
{code}
0: jdbc:drill:zk=local> ALTER SESSION SET `exec.enable_union_type` = true;
{code}

same result.

Using the listArray or the mapArray above
{code}
: jdbc:drill:zk=local> select repeated_count(listArray) from 
dfs.`/Users/jccote/repeated_count.json`;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize 
incoming schema.  Errors:
 
Error in expression at index -1.  Error: Missing function implementation: 
[repeated_count(INT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..

Fragment 0:0

[Error Id: d92a02d5-ede9-41db-99a1-cf042ebf6061 on 192.168.1.3:31010]

  (org.apache.drill.exec.exception.SchemaChangeException) Failure while trying 
to materialize incoming schema.  Errors:
 
Error in expression at index -1.  Error: Missing function implementation: 
[repeated_count(INT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():424
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
org.apache.drill.exec.record.AbstractRecordBatch.next():162
{code}

same error.

> Missing function implementation: [repeated_count(INT-OPTIONAL)
> --
>
> Key: DRILL-4859
> URL: https://issues.apache.org/jira/browse/DRILL-4859
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: jean-claude
>Priority: Minor
>
> Given a JSON file with many thousands of empty maps {} and a few lines like 
> this at the end of the file
> {code}
> {}
> {}
> {}
> {}
> {}
> {"listArray":[], "intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": 
> "foo"}], "label": "foo"}
> {"listArray":[[1,2],[3,4],[5,6]], "intArray": [1,2,3,4], "mapArray": 
> [{"name": "foo"}], "label": "foo"}
> {"listArray":[[1,2]], "intArray": [1,2,3,4], "mapArray": [], "label": "foo"}
> {"listArray":[[1,2],[3,4]], "intArray": [1,2,3,4], "mapArray": [{"name": 
> "foo"},{"name": "foo"},{"name": "foo"}], "label": "foo"}
> {"listArray":[[1,2],[3,4],[5,6]], "intArray": [1,2,3,4], "mapArray": 
> [{"name": "foo"},{"name": "foo"}], "label": "foo"}
> {}
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"},{"name": 
> "foo"},{"name": "foo"}], "label": "foo"}
> {"intArray": [1,2,3,4], "mapArray": [{"name": "foo"},{"name": "foo"}], 
> "label": "foo"}
> {code}
> If you perform a repeated_count on the intArray field. The JSON reader will 
> assume a plain INT column. You can do this with any type of array even the 
> mapArray above, you get the same error.
> {code}
> 0: jdbc:drill:zk=local> select repeated_count(intArray) from 
> dfs.`/Users/jccote/repeated_count.json`;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to 
> materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [repeated_count(INT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: f2e25b81-a53c-4fc6-9bf3-2f7b2fa68d60 on 192.168.1.3:31010]
>   (org.apache.drill.exec.exception.SchemaChangeException) Failure while 
> trying to materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [repeated_count(INT-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():424
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> 
> org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
> org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():422
> org.apache.hadoop.security.UserGroupInformation.doAs():1657
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1142
> java.util.concurrent.ThreadPoolExecutor$Worker.run():617
> java.lang.Thread.run():745 (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)