[jira] [Commented] (DRILL-5079) PreparedStatement dynamic parameters to avoid SQL Injection test
[ https://issues.apache.org/jira/browse/DRILL-5079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15927668#comment-15927668 ] Tobias commented on DRILL-5079: --- We would also like this for the above reason and additionally we would like that the plan for the statement or at least parts of the physical plan could be cached as it is a significant part of the query plan for short running queries. > PreparedStatement dynamic parameters to avoid SQL Injection test > > > Key: DRILL-5079 > URL: https://issues.apache.org/jira/browse/DRILL-5079 > Project: Apache Drill > Issue Type: Improvement > Components: Client - JDBC >Affects Versions: 1.8.0 >Reporter: Wahyu Sudrajat >Priority: Critical > Labels: security > > Capability to use PreparedStatement with dynamic parameters to prevent SQL > Injection. > For example: > select * from PEOPLE where FIRST_NAME = ? and LAST_NAME = ? limit 100 > As for now, Drill will return: > Error Message:PreparedStatementCallback; uncategorized SQLException for SQL > []; SQL state [null]; error code [0]; Failed to create prepared statement: > PLAN ERROR: Cannot convert RexNode to equivalent Drill expression. RexNode > Class: org.apache.calcite.rex.RexDynamicParam, RexNode Digest: ?0 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (DRILL-5358) Error if Parquet file changes during query
Tobias created DRILL-5358: - Summary: Error if Parquet file changes during query Key: DRILL-5358 URL: https://issues.apache.org/jira/browse/DRILL-5358 Project: Apache Drill Issue Type: Bug Components: Metadata, Storage - Parquet Affects Versions: 1.9.0 Reporter: Tobias We have a scenario where we generate our own parquet files every X amount of seconds. These files are in a structure based on date and it is only the file for today that gets updated The process is as follows 1. generate parquet file in temp directory 2. When finished generation mv the file into a drill workspace/ (data/2017/03/10/data.parquet, ..) 3. Then restart the process We have noticed that if the file is moved in while a query has started running it will throw and error that the parquet magic number is incorrect This is due to the file length being cached and reused so basically what seems to happen is 1. Drill plans the query 2. File gets changed under Drills feet 3. Drill executes query and tries to read and incorrect offset of the changed file Is there anyway to fix this or avoid this scenario? Another side effect of constantly generating a new file is that the metadata cache gets discarded for the whole workspace despite only one file changing Is there a way to avoid that? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (DRILL-4517) Reading emtpy Parquet file failes with java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/DRILL-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199442#comment-15199442 ] Tobias commented on DRILL-4517: --- Is this fixed on head (1.7) as mentioned in DRILL-2223? If so we can build our own version > Reading emtpy Parquet file failes with java.lang.IllegalArgumentException > - > > Key: DRILL-4517 > URL: https://issues.apache.org/jira/browse/DRILL-4517 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: Tobias > > When querying a Parquet file that has a schema but no rows the Drill Server > will fail with the below > This looks similar to DRILL-3557 > {noformat} > {{ParquetMetaData{FileMetaData{schema: message TRANSACTION_REPORT { > required int64 MEMBER_ACCOUNT_ID; > required int64 TIMESTAMP_IN_HOUR; > optional int64 APPLICATION_ID; > } > , metadata: {}}}, blocks: []} > {noformat} > {noformat} > Caused by: java.lang.IllegalArgumentException: MinorFragmentId 0 has no read > entries assigned > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) > ~[guava-14.0.1.jar:na] > at > org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:707) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:105) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:68) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:35) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.base.AbstractGroupScan.accept(AbstractGroupScan.java:60) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:102) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:35) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProject(AbstractPhysicalVisitor.java:77) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.config.Project.accept(Project.java:51) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:82) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:35) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:195) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.config.Screen.accept(Screen.java:97) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.SimpleParallelizer.generateWorkUnit(SimpleParallelizer.java:355) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.SimpleParallelizer.getFragments(SimpleParallelizer.java:134) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.work.foreman.Foreman.getQueryWorkUnit(Foreman.java:518) > [drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:405) > [drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:926) > [drill-java-exec-1.5.0.jar:1.5.0] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4517) Reading emtpy Parquet file failes with java.lang.IllegalArgumentException
Tobias created DRILL-4517: - Summary: Reading emtpy Parquet file failes with java.lang.IllegalArgumentException Key: DRILL-4517 URL: https://issues.apache.org/jira/browse/DRILL-4517 Project: Apache Drill Issue Type: Bug Components: Server Reporter: Tobias When querying a Parquet file that has a schema but no rows the Drill Server will fail with the below This looks similar to DRILL-3557 {noformat} {{ParquetMetaData{FileMetaData{schema: message TRANSACTION_REPORT { required int64 MEMBER_ACCOUNT_ID; required int64 TIMESTAMP_IN_HOUR; optional int64 APPLICATION_ID; } , metadata: {}}}, blocks: []} {noformat} {noformat} Caused by: java.lang.IllegalArgumentException: MinorFragmentId 0 has no read entries assigned at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) ~[guava-14.0.1.jar:na] at org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:707) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:105) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:68) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:35) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.physical.base.AbstractGroupScan.accept(AbstractGroupScan.java:60) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:102) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:35) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProject(AbstractPhysicalVisitor.java:77) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.physical.config.Project.accept(Project.java:51) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:82) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:35) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:195) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.physical.config.Screen.accept(Screen.java:97) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.SimpleParallelizer.generateWorkUnit(SimpleParallelizer.java:355) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.SimpleParallelizer.getFragments(SimpleParallelizer.java:134) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.work.foreman.Foreman.getQueryWorkUnit(Foreman.java:518) [drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:405) [drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:926) [drill-java-exec-1.5.0.jar:1.5.0] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4505) Can't group by or sort across files with different schema
Tobias created DRILL-4505: - Summary: Can't group by or sort across files with different schema Key: DRILL-4505 URL: https://issues.apache.org/jira/browse/DRILL-4505 Project: Apache Drill Issue Type: Bug Components: Storage - Parquet Affects Versions: 1.5.0 Environment: Java 1.8 Reporter: Tobias We are currently trying out the support for querying across parquet files with different schemas. Simple selects work well but when we wan't to do sort or group by Drill returns "UNSUPPORTED_OPERATION ERROR: Sort doesn't currently support sorts with changing schemas Fragment 0:0 [Error Id: ff490670-64c1-4fb8-990e-a02aa44ac010 on zookeeper-1:31010]" This is despite not even including the new columns in the query. Expected result would be to treat the non existing columns in certain files as either null or default value and allow them to be grouped and sorted Example SELECT APPLICATION_ID ,dir0 AS year_ FROM dfs.`/PRO/UTC/1` WHERE dir2 >='2016-01-01' AND dir2<'2016-04-02' work with changing schema but SELECT max(APPLICATION_ID ),dir0 AS year_ FROM dfs.`/PRO/UTC/1` WHERE dir2 >='2016-01-01' AND dir2<'2016-04-02' group by dir0 does not work For us this hampers any possibility to have an evolving schema with moderatly complex queries -- This message was sent by Atlassian JIRA (v6.3.4#6332)