[jira] [Updated] (DRILL-3150) Error when filtering non-existent field with a string
[ https://issues.apache.org/jira/browse/DRILL-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Gilmore updated DRILL-3150: Attachment: DRILL-3150.1.patch.txt > Error when filtering non-existent field with a string > - > > Key: DRILL-3150 > URL: https://issues.apache.org/jira/browse/DRILL-3150 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.0.0 >Reporter: Adam Gilmore >Assignee: Adam Gilmore >Priority: Critical > Fix For: 1.1.0 > > Attachments: DRILL-3150.1.patch.txt > > > The following query throws an exception: > {code} > select count(*) from cp.`employee.json` where `blah` = 'test' > {code} > "blah" does not exist as a field in the JSON. The expected behaviour would > be to filter out all rows as that field is not present (thus cannot equal the > string 'test'). > Instead, the following exception occurs: > {code} > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: test > Fragment 0:0 > [Error Id: 5d6c9a82-8f87-41b2-a496-67b360302b76 on > ip-10-1-50-208.ec2.internal:31010] > {code} > Apart from the fact the real error message is hidden, the issue is that we're > trying to cast the varchar to int ('test' to an int). This seems to be > because the projection out of the scan when a field is not found becomes > INT:OPTIONAL. > The filter should not fail on this - if the varchar fails to convert to an > int, the filter should just simply not allow any records through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3150) Error when filtering non-existent field with a string
[ https://issues.apache.org/jira/browse/DRILL-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Gilmore updated DRILL-3150: Attachment: (was: DRILL-3150.1.patch.txt) > Error when filtering non-existent field with a string > - > > Key: DRILL-3150 > URL: https://issues.apache.org/jira/browse/DRILL-3150 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.0.0 >Reporter: Adam Gilmore >Assignee: Adam Gilmore >Priority: Critical > Fix For: 1.1.0 > > > The following query throws an exception: > {code} > select count(*) from cp.`employee.json` where `blah` = 'test' > {code} > "blah" does not exist as a field in the JSON. The expected behaviour would > be to filter out all rows as that field is not present (thus cannot equal the > string 'test'). > Instead, the following exception occurs: > {code} > org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: test > Fragment 0:0 > [Error Id: 5d6c9a82-8f87-41b2-a496-67b360302b76 on > ip-10-1-50-208.ec2.internal:31010] > {code} > Apart from the fact the real error message is hidden, the issue is that we're > trying to cast the varchar to int ('test' to an int). This seems to be > because the projection out of the scan when a field is not found becomes > INT:OPTIONAL. > The filter should not fail on this - if the varchar fails to convert to an > int, the filter should just simply not allow any records through. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3216) Fix existing(+) INFORMATION_SCHEMA.COLUMNS columns
[ https://issues.apache.org/jira/browse/DRILL-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-3216: -- Description: [Editing in progress] Change logical null from {{-1}} to actual {{NULL}}: - Change column {{CHARACTER_MAXIMUM_LENGTH}}. - Change column {{NUMERIC_PRECISION}}. - Change column {{NUMERIC_PRECISION_RADIX}}. - Change column {{NUMERIC_SCALE}}. Change column {{ORDINAL_POSITION}} from zero-based to one-based. Change column {{DATA_TYPE}} from short names (e.g., "CHAR") to specified names (e.g., "CHARACTER"). - "CHAR" -> "CHARACTER" - "VARCHAR" -> "CHARACTER VARYING" - "VARBINARY" -> "BINARY VARYING" - "... ARRAY" -> "ARRAY" - "(...) MAP" -> "MAP" - "STRUCT (...)" -> "STRUCT" Fix data type names "INTERVAL_DAY_TIME" and "INTERVAL_YEAR_MONTH" to "INTERVAL": - Change column {{DATA_TYPE}} to list "INTERVAL" for interval types. - Add column {{INTERVAL_TYPE}}. Move {{CHAR}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as {{VARCHAR}} length): - Change column {{NUMERIC_PRECISION}} from length to logical null for CHAR. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for CHAR. Move {{BINARY}} and {{VARBINARY}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as CHAR and VARCHAR length): - Change column {{NUMERIC_PRECISION}} from length to logical null for BINARY and VARBINARY. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for BINARY and VARBINARY. To correct ordinal position of some existing columns: - Add column {{COLUMN_DEFAULT}}. - Add column {{CHARACTER_OCTET_LENGTH}}. - Reorder column {{NUMERIC_PRECISION}}. Move date/time and interval precisions from {{NUMERIC_PRECISION}} to {{DATETIME_PRECISION}} and {{INTERVAL_PRECISION}}: - Change column {{NUMERIC_PRECISION}} to logically null for date/time and interval types. - Add column {{DATETIME_PRECISION}}. - Add column {{INTERVAL_PRECISION}}. Implement {{NUMERIC_PRECISION_RADIX}}: - Change column {{NUMERIC_PRECISION_RADIX}} from always logically null to appropriate values (2, 10, NULL). Add missing numeric precision and scale values (for non-DECIMAL types): - Change NUMERIC_SCALE from logical null to zero for integer types. - Change NUMERIC_PRECISION from logical null to precision for non-DECIMAL numeric types. Update implementation of JDBC's {{DatabaseMeta.getColumns()}} (at least enough to not break; maybe also to use newly available data to fix some partial implementations). was: [Editing in progress] Change logical null from {{-1}} to actual {{NULL}}: - Change column {{CHARACTER_MAXIMUM_LENGTH}}. - Change column {{NUMERIC_PRECISION}}. - Change column {{NUMERIC_PRECISION_RADIX}}. - Change column {{NUMERIC_SCALE}}. Change column {{ORDINAL_POSITION}} from zero-based to one-based. Change column {{DATA_TYPE}} from short names (e.g., "CHAR") to specified names (e.g., "CHARACTER"). - "CHAR" -> "CHARACTER" - "VARCHAR" -> "CHARACTER VARYING" - "VARBINARY" -> "BINARY VARYING" - "... ARRAY" -> "ARRAY" - "(...) MAP" -> "MAP" - "STRUCT (...)" -> "STRUCT" Fix data type names "INTERVAL_DAY_TIME" and "INTERVAL_YEAR_MONTH" to "INTERVAL": - Change column {{DATA_TYPE}} to list "INTERVAL" for interval types. - Add column {{INTERVAL_TYPE}}. Move {{CHAR}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as {{VARCHAR}} length): - Change column {{NUMERIC_PRECISION}} from length to logical null for CHAR. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for CHAR. Move {{BINARY}} and {{VARBINARY}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as CHAR and VARCHAR length): - Change column {{NUMERIC_PRECISION}} from length to logical null for BINARY and VARBINARY. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for BINARY and VARBINARY. To correct ordinal position of some existing columns: - Add column {{COLUMN_DEFAULT}}. - Add column {{CHARACTER_OCTET_LENGTH}}. - Reorder column {{NUMERIC_PRECISION}}. Move date/time and interval precisions from {{NUMERIC_PRECISION}} to {{DATETIME_PRECISION}} and {{INTERVAL_PRECISION}}: - Change column {{NUMERIC_PRECISION}} to logically null for date/time and interval types. - Add column {{DATETIME_PRECISION}}. - Add column {{INTERVAL_PRECISION}}. Implement {{NUMERIC_PRECISION_RADIX}}: - Change column {{NUMERIC_PRECISION_RADIX}} from always logically null to appropriate values (2, 10, NULL). Add missing numeric precision and scale values (for non-DECIMAL types): - Change NUMERIC_SCALE from logical null to zero for integer types. - Change NUMERIC_PRECISION from logical null to precision for non-DECIMAL numeric types. > Fix existing(+) INFORMATION_SCHEMA.COLUMNS columns > -- > > Key: DRILL-3216 > URL: https://issues.ap
[jira] [Created] (DRILL-3243) Need a better error message - Use of alias in window function definition
Khurram Faraaz created DRILL-3243: - Summary: Need a better error message - Use of alias in window function definition Key: DRILL-3243 URL: https://issues.apache.org/jira/browse/DRILL-3243 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.0.0 Reporter: Khurram Faraaz Assignee: Chris Westin Need a better error message when we use alias for window definition in query that uses window functions. for example, OVER(PARTITION BY columns[0] ORDER BY columns[1]) tmp, and if alias "tmp" is used in the predicate we need a message that says, column "tmp" does not exist, that is how it is in Postgres 9.3 Postgres 9.3 {code} postgres=# select count(*) OVER(partition by type order by id) `tmp` from airports where tmp is not null; ERROR: column "tmp" does not exist LINE 1: ...ect count(*) OVER(partition by type order by id) `tmp` from ... ^ {code} Drill 1.0 {code} 0: jdbc:drill:schema=dfs.tmp> select count(*) OVER(partition by columns[2] order by columns[0]) tmp from `airports.csv` where tmp is not null; Error: SYSTEM ERROR: java.lang.IllegalArgumentException: Selected column(s) must have name 'columns' or must be plain '*' Fragment 0:0 [Error Id: 66987b81-fe50-422d-95e4-9ce61c873584 on centos-02.qa.lab:31010] (state=,code=0) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3242) Enhance RPC layer to offload all request work onto a separate thread.
Jacques Nadeau created DRILL-3242: - Summary: Enhance RPC layer to offload all request work onto a separate thread. Key: DRILL-3242 URL: https://issues.apache.org/jira/browse/DRILL-3242 Project: Apache Drill Issue Type: Improvement Components: Execution - RPC Reporter: Jacques Nadeau Assignee: Jacques Nadeau Fix For: 1.1.0 Right now, the app is responsible for ensuring that very small amounts of work are done on the RPC thread. In some cases, the app doesn't do this correctly. Additionally, in high load situations these small amounts of work become no trivial. As such, we need to make RPC layer protect itself from slow requests/responses. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3241) Query with window function runs out of direct memory and does not report back to client that it did
Victoria Markman created DRILL-3241: --- Summary: Query with window function runs out of direct memory and does not report back to client that it did Key: DRILL-3241 URL: https://issues.apache.org/jira/browse/DRILL-3241 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.0.0 Reporter: Victoria Markman Assignee: Chris Westin Even though query run out of memory and was cancelled on the server, client (sqlline) was never notified of the event and it appears to the user that query is hung. Configuration: Single drillbit configured with: DRILL_MAX_DIRECT_MEMORY="2G" DRILL_HEAP="1G" TPCDS100 parquet files Query: {code} select sum(ss_quantity) over(partition by ss_store_sk order by ss_sold_date_sk) from store_sales; {code} drillbit.log {code} 2015-06-01 21:42:29,514 [BitServer-5] ERROR o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. Connection: /10.10.88.133:31012 <--> /10.10.88.133:38887 (data server). Closing connection. io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct buffer memory at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) [netty-transport-4.0.27.Final.jar:4.0.27.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) [netty-common-4.0.27.Final.jar:4.0.27.Final] at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71] Caused by: java.lang.OutOfMemoryError: Direct buffer memory at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.7.0_71] at java.nio.DirectByteBuffer.(DirectByteBuffer.java:123) ~[na:1.7.0_71] at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306) ~[na:1.7.0_71] at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:437) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:179) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:168) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.PoolArena.reallocate(PoolArena.java:280) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:110) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:251) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:849) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:841) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:831) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.WrappedByteBuf.writeBytes(WrappedByteBuf.java:600) ~[netty-buffer-4.0.27.Final.jar:4.0.27.Final] at io.netty.buffer.UnsafeDirectLittleEndian.writeBytes(UnsafeDirectLittleEndian.java:28) ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:4.0.27.Final] at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92) ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:227) ~[
[jira] [Created] (DRILL-3240) Fetch hadoop maven profile specific Hive version in Hive storage plugin
Venki Korukanti created DRILL-3240: -- Summary: Fetch hadoop maven profile specific Hive version in Hive storage plugin Key: DRILL-3240 URL: https://issues.apache.org/jira/browse/DRILL-3240 Project: Apache Drill Issue Type: Improvement Components: Storage - Hive, Tools, Build & Test Affects Versions: 0.4.0 Reporter: Venki Korukanti Assignee: Venki Korukanti Priority: Minor Fix For: 1.1.0 Currently we always fetch the Apache Hive libs irrespective of the Hadoop vendor profile used in {{mvn clean install}}. This jira is to allow specifying the custom version of Hive in hadoop vendor profile. Note: Hive storage plugin assumes there are no major differences in Hive APIs between different vendor specific custom Hive builds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3239) Join between empty hive tables throws an IllegalStateException
Rahul Challapalli created DRILL-3239: Summary: Join between empty hive tables throws an IllegalStateException Key: DRILL-3239 URL: https://issues.apache.org/jira/browse/DRILL-3239 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Reporter: Rahul Challapalli Assignee: Venki Korukanti Attachments: error.log git.commit.id.abbrev=6f54223 Created 2 hive tables on top of tpch data in orc format. The tables are empty. Below query returns 0 rows from hive. However it fails with an IllegalStateException from drill {code} select * from customer c, orders o where c.c_custkey = o.o_custkey; Error: SYSTEM ERROR: java.lang.IllegalStateException: You tried to do a batch data read operation when you were in a state of NONE. You can only do this type of operation when you are in a state of OK or OK_NEW_SCHEMA. Fragment 0:0 [Error Id: 8483cab2-d771-4337-ae65-1db41eb5720d on qa-node191.qa.lab:31010] (state=,code=0) {code} Below is the hive ddl I used {code} create table if not exists tpch01_orc.customer ( c_custkey int, c_name string, c_address string, c_nationkey int, c_phone string, c_acctbal double, c_mktsegment string, c_comment string ) STORED AS orc LOCATION '/drill/testdata/Tpch0.01/orc/customer'; create table if not exists tpch01_orc.orders ( o_orderkey int, o_custkey int, o_orderstatus string, o_totalprice double, o_orderdate date, o_orderpriority string, o_clerk string, o_shippriority int, o_comment string ) STORED AS orc LOCATION '/drill/testdata/Tpch0.01/orc/orders'; {code} I attached the log files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3239) Join between empty hive tables throws an IllegalStateException
[ https://issues.apache.org/jira/browse/DRILL-3239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rahul Challapalli updated DRILL-3239: - Attachment: error.log > Join between empty hive tables throws an IllegalStateException > -- > > Key: DRILL-3239 > URL: https://issues.apache.org/jira/browse/DRILL-3239 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Hive >Reporter: Rahul Challapalli >Assignee: Venki Korukanti > Attachments: error.log > > > git.commit.id.abbrev=6f54223 > Created 2 hive tables on top of tpch data in orc format. The tables are > empty. Below query returns 0 rows from hive. However it fails with an > IllegalStateException from drill > {code} > select * from customer c, orders o where c.c_custkey = o.o_custkey; > Error: SYSTEM ERROR: java.lang.IllegalStateException: You tried to do a batch > data read operation when you were in a state of NONE. You can only do this > type of operation when you are in a state of OK or OK_NEW_SCHEMA. > Fragment 0:0 > [Error Id: 8483cab2-d771-4337-ae65-1db41eb5720d on qa-node191.qa.lab:31010] > (state=,code=0) > {code} > Below is the hive ddl I used > {code} > create table if not exists tpch01_orc.customer ( > c_custkey int, > c_name string, > c_address string, > c_nationkey int, > c_phone string, > c_acctbal double, > c_mktsegment string, > c_comment string > ) > STORED AS orc > LOCATION '/drill/testdata/Tpch0.01/orc/customer'; > create table if not exists tpch01_orc.orders ( > o_orderkey int, > o_custkey int, > o_orderstatus string, > o_totalprice double, > o_orderdate date, > o_orderpriority string, > o_clerk string, > o_shippriority int, > o_comment string > ) > STORED AS orc > LOCATION '/drill/testdata/Tpch0.01/orc/orders'; > {code} > I attached the log files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3238) Cannot Plan Exception is raised when the same window partition is defined in select & window clauses
[ https://issues.apache.org/jira/browse/DRILL-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victoria Markman updated DRILL-3238: Labels: window_functions (was: ) > Cannot Plan Exception is raised when the same window partition is defined in > select & window clauses > > > Key: DRILL-3238 > URL: https://issues.apache.org/jira/browse/DRILL-3238 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Sean Hsuan-Yi Chu >Assignee: Sean Hsuan-Yi Chu > Labels: window_functions > > While this works: > {code} > select sum(a2) over(partition by a2 order by a2), count(*) over(partition by > a2 order by a2) > from t > {code} > , this fails > {code} > select sum(a2) over(w), count(*) over(partition by a2 order by a2) > from t > window w as (partition by a2 order by a2) > {code} > Notice these two queries are logically the same thing if we plug-in the > window definition back into the SELECT-CLAUSE in the 2nd query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3238) Cannot Plan Exception is raised when the same window partition is defined in select & window clauses
[ https://issues.apache.org/jira/browse/DRILL-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568055#comment-14568055 ] Victoria Markman commented on DRILL-3238: - Interestingly, this case "over W" works: {code} select sum(a2) over w, count(*) over(partition by a2 order by a2) from t2 window w as (partition by a2 order by a2); {code} I did not realized that over(W) is the supported grammar ... > Cannot Plan Exception is raised when the same window partition is defined in > select & window clauses > > > Key: DRILL-3238 > URL: https://issues.apache.org/jira/browse/DRILL-3238 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Sean Hsuan-Yi Chu >Assignee: Sean Hsuan-Yi Chu > > While this works: > {code} > select sum(a2) over(partition by a2 order by a2), count(*) over(partition by > a2 order by a2) > from t > {code} > , this fails > {code} > select sum(a2) over(w), count(*) over(partition by a2 order by a2) > from t > window w as (partition by a2 order by a2) > {code} > Notice these two queries are logically the same thing if we plug-in the > window definition back into the SELECT-CLAUSE in the 2nd query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3238) Cannot Plan Exception is raised when the same window partition is defined in select & window clauses
[ https://issues.apache.org/jira/browse/DRILL-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu updated DRILL-3238: - Description: While this works: {code} select sum(a2) over(partition by a2 order by a2), count(*) over(partition by a2 order by a2) from t {code} , this fails {code} select sum(a2) over(w), count(*) over(partition by a2 order by a2) from t window w as (partition by a2 order by a2) {code} Notice these two queries are logically the same thing if we plug-in the window definition back into the SELECT-CLAUSE in the 2nd query. > Cannot Plan Exception is raised when the same window partition is defined in > select & window clauses > > > Key: DRILL-3238 > URL: https://issues.apache.org/jira/browse/DRILL-3238 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Reporter: Sean Hsuan-Yi Chu >Assignee: Sean Hsuan-Yi Chu > > While this works: > {code} > select sum(a2) over(partition by a2 order by a2), count(*) over(partition by > a2 order by a2) > from t > {code} > , this fails > {code} > select sum(a2) over(w), count(*) over(partition by a2 order by a2) > from t > window w as (partition by a2 order by a2) > {code} > Notice these two queries are logically the same thing if we plug-in the > window definition back into the SELECT-CLAUSE in the 2nd query. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3238) Cannot Plan Exception is raised when the same window partition is defined in select & window clauses
Sean Hsuan-Yi Chu created DRILL-3238: Summary: Cannot Plan Exception is raised when the same window partition is defined in select & window clauses Key: DRILL-3238 URL: https://issues.apache.org/jira/browse/DRILL-3238 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Reporter: Sean Hsuan-Yi Chu Assignee: Sean Hsuan-Yi Chu -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3155) Composite vectors leak memory
[ https://issues.apache.org/jira/browse/DRILL-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3155: --- Assignee: Hanifi Gunes (was: Mehant Baid) > Composite vectors leak memory > - > > Key: DRILL-3155 > URL: https://issues.apache.org/jira/browse/DRILL-3155 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Hanifi Gunes > Fix For: 1.1.0 > > Attachments: DRILL-3155-1.patch, DRILL-3155-2.patch > > > While allocating memory for variable width vectors we first allocate the > necessary memory for the actual data followed by the memory needed for the > offset vector. However if the first allocation for the data buffer succeeds > and the one for the offset vector fails we don't release the buffer allocated > for the data causing memory leaks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3155) Composite vectors leak memory
[ https://issues.apache.org/jira/browse/DRILL-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3155: --- Summary: Composite vectors leak memory (was: Variable width vectors leak memory) > Composite vectors leak memory > - > > Key: DRILL-3155 > URL: https://issues.apache.org/jira/browse/DRILL-3155 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Mehant Baid > Fix For: 1.1.0 > > Attachments: DRILL-3155-1.patch, DRILL-3155-2.patch > > > While allocating memory for variable width vectors we first allocate the > necessary memory for the actual data followed by the memory needed for the > offset vector. However if the first allocation for the data buffer succeeds > and the one for the offset vector fails we don't release the buffer allocated > for the data causing memory leaks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3155) Composite vectors leak memory
[ https://issues.apache.org/jira/browse/DRILL-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mehant Baid updated DRILL-3155: --- Attachment: DRILL-3155-2.patch DRILL-3155-1.patch First patch is a minor refactoring patch moving the classes into the correct package. Second patch is the one that fixes the issue. > Composite vectors leak memory > - > > Key: DRILL-3155 > URL: https://issues.apache.org/jira/browse/DRILL-3155 > Project: Apache Drill > Issue Type: Bug >Reporter: Mehant Baid >Assignee: Mehant Baid > Fix For: 1.1.0 > > Attachments: DRILL-3155-1.patch, DRILL-3155-2.patch > > > While allocating memory for variable width vectors we first allocate the > necessary memory for the actual data followed by the memory needed for the > offset vector. However if the first allocation for the data buffer succeeds > and the one for the offset vector fails we don't release the buffer allocated > for the data causing memory leaks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3216) Fix existing(+) INFORMATION_SCHEMA.COLUMNS columns
[ https://issues.apache.org/jira/browse/DRILL-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-3216: -- Description: [Editing in progress] Change logical null from {{-1}} to actual {{NULL}}: - Change column {{CHARACTER_MAXIMUM_LENGTH}}. - Change column {{NUMERIC_PRECISION}}. - Change column {{NUMERIC_PRECISION_RADIX}}. - Change column {{NUMERIC_SCALE}}. Change column {{ORDINAL_POSITION}} from zero-based to one-based. Change column {{DATA_TYPE}} from short names (e.g., "CHAR") to specified names (e.g., "CHARACTER"). - "CHAR" -> "CHARACTER" - "VARCHAR" -> "CHARACTER VARYING" - "VARBINARY" -> "BINARY VARYING" - "... ARRAY" -> "ARRAY" - "(...) MAP" -> "MAP" - "STRUCT (...)" -> "STRUCT" Fix data type names "INTERVAL_DAY_TIME" and "INTERVAL_YEAR_MONTH" to "INTERVAL": - Change column {{DATA_TYPE}} to list "INTERVAL" for interval types. - Add column {{INTERVAL_TYPE}}. Move {{CHAR}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as {{VARCHAR}} length): - Change column {{NUMERIC_PRECISION}} from length to logical null for CHAR. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for CHAR. Move {{BINARY}} and {{VARBINARY}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as CHAR and VARCHAR length): - Change column {{NUMERIC_PRECISION}} from length to logical null for BINARY and VARBINARY. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for BINARY and VARBINARY. To correct ordinal position of some existing columns: - Add column {{COLUMN_DEFAULT}}. - Add column {{CHARACTER_OCTET_LENGTH}}. - Reorder column {{NUMERIC_PRECISION}}. Move date/time and interval precisions from {{NUMERIC_PRECISION}} to {{DATETIME_PRECISION}} and {{INTERVAL_PRECISION}}: - Change column {{NUMERIC_PRECISION}} to logically null for date/time and interval types. - Add column {{DATETIME_PRECISION}}. - Add column {{INTERVAL_PRECISION}}. Implement {{NUMERIC_PRECISION_RADIX}}: - Change column {{NUMERIC_PRECISION_RADIX}} from always logically null to appropriate values (2, 10, NULL). Add missing numeric precision and scale values (for non-DECIMAL types): - Change NUMERIC_SCALE from logical null to zero for integer types. - Change NUMERIC_PRECISION from logical null to precision for non-DECIMAL numeric types. was: [Editing in progress] Change logical null from {{-1}} to actual {{NULL}}: - Change column {{CHARACTER_MAXIMUM_LENGTH}}. - Change column {{NUMERIC_PRECISION}}. - Change column {{NUMERIC_PRECISION_RADIX}}. - Change column {{NUMERIC_SCALE}}. Change column {{ORDINAL_POSITION}} from zero-based to one-based. Change column {{DATA_TYPE}} from short names (e.g., "CHAR") to specified names (e.g., "CHARACTER"). Fix data type names "INTERVAL_DAY_TIME" and "INTERVAL_YEAR_MONTH" to "INTERVAL": - Change column {{DATA_TYPE}} to list "INTERVAL" for interval types. - Add column {{INTERVAL_TYPE}}. Move {{CHAR}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as {{VARCHAR}} length): - Change column {{NUMERIC_PRECISION}} from length to logical null for CHAR. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for CHAR. Move {{BINARY}} and {{VARBINARY}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as CHAR and VARCHAR length): - Change column {{NUMERIC_PRECISION}} from length to logical null for BINARY and VARBINARY. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for BINARY and VARBINARY. To correct ordinal position of some existing columns: - Add column {{COLUMN_DEFAULT}}. - Add column {{CHARACTER_OCTET_LENGTH}}. - Reorder column {{NUMERIC_PRECISION}}. Move date/time and interval precisions from {{NUMERIC_PRECISION}} to {{DATETIME_PRECISION}} and {{INTERVAL_PRECISION}}: - Change column {{NUMERIC_PRECISION}} to logically null for date/time and interval types. - Add column {{DATETIME_PRECISION}}. - Add column {{INTERVAL_PRECISION}}. Implement {{NUMERIC_PRECISION_RADIX}}: - Change column {{NUMERIC_PRECISION_RADIX}} from always logically null to appropriate values (2, 10, NULL). Add missing numeric precision and scale values (for non-DECIMAL types): - Change NUMERIC_SCALE from logical null to zero for integer types. - Change NUMERIC_PRECISION from logical null to precision for non-DECIMAL numeric types. > Fix existing(+) INFORMATION_SCHEMA.COLUMNS columns > -- > > Key: DRILL-3216 > URL: https://issues.apache.org/jira/browse/DRILL-3216 > Project: Apache Drill > Issue Type: Bug >Reporter: Daniel Barclay (Drill) > > [Editing in progress] > Change logical null from {{-1}} to actual {{NULL}}: > - Change column {{CHARACTER_MAXIMUM_LENGTH}}. > - Change column {{NUMERIC_PRECISION}}. > - Change column {{NUMERIC_PRECIS
[jira] [Commented] (DRILL-2658) Add ilike and regex substring functions
[ https://issues.apache.org/jira/browse/DRILL-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567881#comment-14567881 ] Patrick Toole commented on DRILL-2658: -- It appears the alias version of "substr" does not work (substring): 0: jdbc:drill:> select substr('a','b') from sys.version; +-+ | EXPR$0 | +-+ | null| +-+ 1 row selected (0.288 seconds) 0: jdbc:drill:> select substring('a','b') from sys.version; Error: PARSE ERROR: From line 1, column 8 to line 1, column 25: Cannot apply 'SUBSTRING' to arguments of type 'SUBSTRING( FROM )'. Supported form(s): 'SUBSTRING( FROM )' 'SUBSTRING( FROM FOR )' 'SUBSTRING( FROM )' 'SUBSTRING( FROM FOR )' 'SUBSTRING( FROM )' 'SUBSTRING( FROM FOR )' 'SUBSTRING( FROM )' 'SUBSTRING( FROM FOR )' > Add ilike and regex substring functions > --- > > Key: DRILL-2658 > URL: https://issues.apache.org/jira/browse/DRILL-2658 > Project: Apache Drill > Issue Type: New Feature > Components: Functions - Drill >Reporter: Steven Phillips >Assignee: Steven Phillips > Fix For: 1.0.0 > > Attachments: DRILL-2658.patch, DRILL-2658.patch > > > This will not modify the parser, so postgress syntax such as: > "... where c ILIKE '%ABC%'" > will not be currently supported. It will simply be a function: > "... where ILIKE(c, '%ABC%')" > Same for substring: > "select substr(c, 'abc')..." > will be equivalent to postgress > "select substr(c from 'abc')", > but 'abc' will be treated as a java regex pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2658) Add ilike and regex substring functions
[ https://issues.apache.org/jira/browse/DRILL-2658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567866#comment-14567866 ] Patrick Toole commented on DRILL-2658: -- This form breaks other downstream items: SELECT * FROM table_name WHERE column_name ilike '4\t' ESCAPE '\' > Add ilike and regex substring functions > --- > > Key: DRILL-2658 > URL: https://issues.apache.org/jira/browse/DRILL-2658 > Project: Apache Drill > Issue Type: New Feature > Components: Functions - Drill >Reporter: Steven Phillips >Assignee: Steven Phillips > Fix For: 1.0.0 > > Attachments: DRILL-2658.patch, DRILL-2658.patch > > > This will not modify the parser, so postgress syntax such as: > "... where c ILIKE '%ABC%'" > will not be currently supported. It will simply be a function: > "... where ILIKE(c, '%ABC%')" > Same for substring: > "select substr(c, 'abc')..." > will be equivalent to postgress > "select substr(c from 'abc')", > but 'abc' will be treated as a java regex pattern. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3237) Come up with enhanced AbstractRecordBatch and AbstractSingleRecordBatch to better handle type promotion and schema change
Jacques Nadeau created DRILL-3237: - Summary: Come up with enhanced AbstractRecordBatch and AbstractSingleRecordBatch to better handle type promotion and schema change Key: DRILL-3237 URL: https://issues.apache.org/jira/browse/DRILL-3237 Project: Apache Drill Issue Type: Sub-task Reporter: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3236) Enhance JSON writer to write EmbeddedType
Jacques Nadeau created DRILL-3236: - Summary: Enhance JSON writer to write EmbeddedType Key: DRILL-3236 URL: https://issues.apache.org/jira/browse/DRILL-3236 Project: Apache Drill Issue Type: Sub-task Reporter: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3235) Enhance JSON reader to leverage EmbeddedType
Jacques Nadeau created DRILL-3235: - Summary: Enhance JSON reader to leverage EmbeddedType Key: DRILL-3235 URL: https://issues.apache.org/jira/browse/DRILL-3235 Project: Apache Drill Issue Type: Sub-task Reporter: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3234) Drill fails to implicit cast hive tinyint and smallint data as int
Krystal created DRILL-3234: -- Summary: Drill fails to implicit cast hive tinyint and smallint data as int Key: DRILL-3234 URL: https://issues.apache.org/jira/browse/DRILL-3234 Project: Apache Drill Issue Type: Bug Components: Storage - Hive Affects Versions: 1.0.0 Reporter: Krystal Assignee: Venki Korukanti I have the following hive table: describe `hive.default`.voter_hive; +++--+ | COLUMN_NAME | DATA_TYPE | IS_NULLABLE | +++--+ | voter_id | SMALLINT | YES | | name | VARCHAR| YES | | age| TINYINT| YES | | registration | VARCHAR| YES | | contributions | DECIMAL| YES | | voterzone | INTEGER| YES | | create_time| TIMESTAMP | YES | +++--+ If just include the voter_id and age fields in the select, then the query works fine. However if I include them in the where clause, the query would fail. For example: select voter_id, name, age from voter_hive where age < 30; Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index -1. Error: Missing function implementation: [castINT(TINYINT-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3233) Update code generation & function code to support reading and writing embedded type
Jacques Nadeau created DRILL-3233: - Summary: Update code generation & function code to support reading and writing embedded type Key: DRILL-3233 URL: https://issues.apache.org/jira/browse/DRILL-3233 Project: Apache Drill Issue Type: Sub-task Reporter: Jacques Nadeau -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3232) Modify existing vectors to allow type promotion
Jacques Nadeau created DRILL-3232: - Summary: Modify existing vectors to allow type promotion Key: DRILL-3232 URL: https://issues.apache.org/jira/browse/DRILL-3232 Project: Apache Drill Issue Type: Sub-task Reporter: Jacques Nadeau Support the ability for existing vectors to be promoted similar to supported implicit casting rules. For example: INT > DOUBLE > STRING > EMBEDDED -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3231) Throw better error messages for schema changes
Hanifi Gunes created DRILL-3231: --- Summary: Throw better error messages for schema changes Key: DRILL-3231 URL: https://issues.apache.org/jira/browse/DRILL-3231 Project: Apache Drill Issue Type: Bug Components: Execution - Data Types Affects Versions: 1.0.0 Reporter: Hanifi Gunes Assignee: Hanifi Gunes This task is concerned about making error messages more intelligible especially for the case of schema changes. {code:title=current error message} Error: DATA_READ ERROR: Error parsing JSON - You tried to write a BigInt type when you are using a ValueWriter of type NullableFloat8WriterImpl. {code} Proposed message should be non-technical possibly with some more context that helps investigate the problem such like line and column number and name. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3216) Fix existing(+) INFORMATION_SCHEMA.COLUMNS columns
[ https://issues.apache.org/jira/browse/DRILL-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-3216: -- Summary: Fix existing(+) INFORMATION_SCHEMA.COLUMNS columns (was: Fix existing INFORMATION_SCHEMA.COLUMNS columns) > Fix existing(+) INFORMATION_SCHEMA.COLUMNS columns > -- > > Key: DRILL-3216 > URL: https://issues.apache.org/jira/browse/DRILL-3216 > Project: Apache Drill > Issue Type: Bug >Reporter: Daniel Barclay (Drill) > > [Editing in progress] > Change logical null from {{-1}} to actual {{NULL}}: > - Change column {{CHARACTER_MAXIMUM_LENGTH}}. > - Change column {{NUMERIC_PRECISION}}. > - Change column {{NUMERIC_PRECISION_RADIX}}. > - Change column {{NUMERIC_SCALE}}. > Change column {{ORDINAL_POSITION}} from zero-based to one-based. > Change column {{DATA_TYPE}} from short names (e.g., "CHAR") to specified > names (e.g., "CHARACTER"). > Fix data type names "INTERVAL_DAY_TIME" and "INTERVAL_YEAR_MONTH" to > "INTERVAL": > - Change column {{DATA_TYPE}} to list "INTERVAL" for interval types. > - Add column {{INTERVAL_TYPE}}. > Move {{CHAR}} length from {{NUMERIC_PRECISION}} to > {{CHARACTER_MAXIMUM_LENGTH}} (same as {{VARCHAR}} length): > - Change column {{NUMERIC_PRECISION}} from length to logical null for CHAR. > - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for > CHAR. > Move {{BINARY}} and {{VARBINARY}} length from {{NUMERIC_PRECISION}} to > {{CHARACTER_MAXIMUM_LENGTH}} (same as CHAR and VARCHAR length): > - Change column {{NUMERIC_PRECISION}} from length to logical null for BINARY > and VARBINARY. > - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for > BINARY and VARBINARY. > To correct ordinal position of some existing columns: > - Add column {{COLUMN_DEFAULT}}. > - Add column {{CHARACTER_OCTET_LENGTH}}. > - Reorder column {{NUMERIC_PRECISION}}. > Move date/time and interval precisions from {{NUMERIC_PRECISION}} to > {{DATETIME_PRECISION}} and {{INTERVAL_PRECISION}}: > - Change column {{NUMERIC_PRECISION}} to logically null for date/time and > interval types. > - Add column {{DATETIME_PRECISION}}. > - Add column {{INTERVAL_PRECISION}}. > Implement {{NUMERIC_PRECISION_RADIX}}: > - Change column {{NUMERIC_PRECISION_RADIX}} from always logically null to > appropriate values (2, 10, NULL). > Add missing numeric precision and scale values (for non-DECIMAL types): > - Change NUMERIC_SCALE from logical null to zero for integer types. > - Change NUMERIC_PRECISION from logical null to precision for non-DECIMAL > numeric types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-3130) Project can be pushed below union all / union to improve performance
[ https://issues.apache.org/jira/browse/DRILL-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu resolved DRILL-3130. -- Resolution: Fixed Fix Version/s: 1.1.0 Target Version/s: (was: Future) > Project can be pushed below union all / union to improve performance > > > Key: DRILL-3130 > URL: https://issues.apache.org/jira/browse/DRILL-3130 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Reporter: Sean Hsuan-Yi Chu >Assignee: Sean Hsuan-Yi Chu > Fix For: 1.1.0 > > > A query such as > {code} > Select a from > (select a, b, c, ..., union all select a, b, c, ...) > {code} > will perform Union-All over all the specified columns on the two sides, > despite the fact that only one column is asked for at the end. Ideally, we > should perform ProjectPushDown rule for Union & Union-All to avoid them to > generate results which will be discarded at the end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3130) Project can be pushed below union all / union to improve performance
[ https://issues.apache.org/jira/browse/DRILL-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567769#comment-14567769 ] Sean Hsuan-Yi Chu commented on DRILL-3130: -- Review completed at : https://reviews.apache.org/r/34528/ Commit#: bca20655283d351d5f5c4090e9047419ff22c75e > Project can be pushed below union all / union to improve performance > > > Key: DRILL-3130 > URL: https://issues.apache.org/jira/browse/DRILL-3130 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Reporter: Sean Hsuan-Yi Chu >Assignee: Sean Hsuan-Yi Chu > > A query such as > {code} > Select a from > (select a, b, c, ..., union all select a, b, c, ...) > {code} > will perform Union-All over all the specified columns on the two sides, > despite the fact that only one column is asked for at the end. Ideally, we > should perform ProjectPushDown rule for Union & Union-All to avoid them to > generate results which will be discarded at the end. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (DRILL-3216) Fix existing INFORMATION_SCHEMA.COLUMNS columns
[ https://issues.apache.org/jira/browse/DRILL-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-3216: -- Comment: was deleted (was: Note: Regarding INFORMATION_SCHEMA.COLUMNS.DATA_TYPE: Need to analyze: 1) ISO/IEC 9075-2:2011(E) section 9.9, "Type name determination" (which defines some kind of type name (an return value) which uses short forms, e.g., "VARCHAR"), and uses of (explicit or implement references to) it, versus 2) DEFINITION_SCHEMA's DATA_TYPE_DESCRIPTOR base table's constraint DATA_TYPE_DESCRIPTOR_DATA_TYPE_CHECK_COMBINATIONS (which clearly requires long forms, e.g., 'CHARACTER VARYING').) > Fix existing INFORMATION_SCHEMA.COLUMNS columns > --- > > Key: DRILL-3216 > URL: https://issues.apache.org/jira/browse/DRILL-3216 > Project: Apache Drill > Issue Type: Bug >Reporter: Daniel Barclay (Drill) > > [Editing in progress] > Change logical null from {{-1}} to actual {{NULL}}: > - Change column {{CHARACTER_MAXIMUM_LENGTH}}. > - Change column {{NUMERIC_PRECISION}}. > - Change column {{NUMERIC_PRECISION_RADIX}}. > - Change column {{NUMERIC_SCALE}}. > Change column {{ORDINAL_POSITION}} from zero-based to one-based. > Change column {{DATA_TYPE}} from short names (e.g., "CHAR") to specified > names (e.g., "CHARACTER"). > Fix data type names "INTERVAL_DAY_TIME" and "INTERVAL_YEAR_MONTH" to > "INTERVAL": > - Change column {{DATA_TYPE}} to list "INTERVAL" for interval types. > - Add column {{INTERVAL_TYPE}}. > Move {{CHAR}} length from {{NUMERIC_PRECISION}} to > {{CHARACTER_MAXIMUM_LENGTH}} (same as {{VARCHAR}} length): > - Change column {{NUMERIC_PRECISION}} from length to logical null for CHAR. > - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for > CHAR. > Move {{BINARY}} and {{VARBINARY}} length from {{NUMERIC_PRECISION}} to > {{CHARACTER_MAXIMUM_LENGTH}} (same as CHAR and VARCHAR length): > - Change column {{NUMERIC_PRECISION}} from length to logical null for BINARY > and VARBINARY. > - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for > BINARY and VARBINARY. > To correct ordinal position of some existing columns: > - Add column {{COLUMN_DEFAULT}}. > - Add column {{CHARACTER_OCTET_LENGTH}}. > - Reorder column {{NUMERIC_PRECISION}}. > Move date/time and interval precisions from {{NUMERIC_PRECISION}} to > {{DATETIME_PRECISION}} and {{INTERVAL_PRECISION}}: > - Change column {{NUMERIC_PRECISION}} to logically null for date/time and > interval types. > - Add column {{DATETIME_PRECISION}}. > - Add column {{INTERVAL_PRECISION}}. > Implement {{NUMERIC_PRECISION_RADIX}}: > - Change column {{NUMERIC_PRECISION_RADIX}} from always logically null to > appropriate values (2, 10, NULL). > Add missing numeric precision and scale values (for non-DECIMAL types): > - Change NUMERIC_SCALE from logical null to zero for integer types. > - Change NUMERIC_PRECISION from logical null to precision for non-DECIMAL > numeric types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2746) Filter is not pushed into subquery past UNION ALL
[ https://issues.apache.org/jira/browse/DRILL-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567767#comment-14567767 ] Sean Hsuan-Yi Chu commented on DRILL-2746: -- Review completed at : https://reviews.apache.org/r/34528/ Commit#: bca20655283d351d5f5c4090e9047419ff22c75e > Filter is not pushed into subquery past UNION ALL > - > > Key: DRILL-2746 > URL: https://issues.apache.org/jira/browse/DRILL-2746 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 0.9.0 >Reporter: Victoria Markman >Assignee: Sean Hsuan-Yi Chu > Fix For: 1.1.0 > > > I expected to see filter pushed to at least left side of UNION ALL, instead > it is applied after UNION ALL > {code} > 0: jdbc:drill:schema=dfs> explain plan for select * from (select a1, b1, c1 > from t1 union all select a2, b2, c2 from t2 ) where a1 = 10; > +++ > |text|json| > +++ > | 00-00Screen > 00-01 Project(a1=[$0], b1=[$1], c1=[$2]) > 00-02SelectionVectorRemover > 00-03 Filter(condition=[=($0, 10)]) > 00-04UnionAll(all=[true]) > 00-06 Project(a1=[$2], b1=[$1], c1=[$0]) > 00-08Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t1]], > selectionRoot=/drill/testdata/predicates/t1, numFiles=1, columns=[`a1`, `b1`, > `c1`]]]) > 00-05 Project(a2=[$1], b2=[$0], c2=[$2]) > 00-07Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t2]], > selectionRoot=/drill/testdata/predicates/t2, numFiles=1, columns=[`a2`, `b2`, > `c2`]]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-2746) Filter is not pushed into subquery past UNION ALL
[ https://issues.apache.org/jira/browse/DRILL-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Hsuan-Yi Chu resolved DRILL-2746. -- Resolution: Fixed Fix Version/s: (was: 1.2.0) 1.1.0 > Filter is not pushed into subquery past UNION ALL > - > > Key: DRILL-2746 > URL: https://issues.apache.org/jira/browse/DRILL-2746 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 0.9.0 >Reporter: Victoria Markman >Assignee: Sean Hsuan-Yi Chu > Fix For: 1.1.0 > > > I expected to see filter pushed to at least left side of UNION ALL, instead > it is applied after UNION ALL > {code} > 0: jdbc:drill:schema=dfs> explain plan for select * from (select a1, b1, c1 > from t1 union all select a2, b2, c2 from t2 ) where a1 = 10; > +++ > |text|json| > +++ > | 00-00Screen > 00-01 Project(a1=[$0], b1=[$1], c1=[$2]) > 00-02SelectionVectorRemover > 00-03 Filter(condition=[=($0, 10)]) > 00-04UnionAll(all=[true]) > 00-06 Project(a1=[$2], b1=[$1], c1=[$0]) > 00-08Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t1]], > selectionRoot=/drill/testdata/predicates/t1, numFiles=1, columns=[`a1`, `b1`, > `c1`]]]) > 00-05 Project(a2=[$1], b2=[$0], c2=[$2]) > 00-07Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/predicates/t2]], > selectionRoot=/drill/testdata/predicates/t2, numFiles=1, columns=[`a2`, `b2`, > `c2`]]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3216) Fix existing INFORMATION_SCHEMA.COLUMNS columns
[ https://issues.apache.org/jira/browse/DRILL-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-3216: -- Description: [Editing in progress] Change logical null from {{-1}} to actual {{NULL}}: - Change column {{CHARACTER_MAXIMUM_LENGTH}}. - Change column {{NUMERIC_PRECISION}}. - Change column {{NUMERIC_PRECISION_RADIX}}. - Change column {{NUMERIC_SCALE}}. Change column {{ORDINAL_POSITION}} from zero-based to one-based. Change column {{DATA_TYPE}} from short names (e.g., "CHAR") to specified names (e.g., "CHARACTER"). Fix data type names "INTERVAL_DAY_TIME" and "INTERVAL_YEAR_MONTH" to "INTERVAL": - Change column {{DATA_TYPE}} to list "INTERVAL" for interval types. - Add column {{INTERVAL_TYPE}}. Move {{CHAR}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as {{VARCHAR}} length): - Change column {{NUMERIC_PRECISION}} from length to logical null for CHAR. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for CHAR. Move {{BINARY}} and {{VARBINARY}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as CHAR and VARCHAR length): - Change column {{NUMERIC_PRECISION}} from length to logical null for BINARY and VARBINARY. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for BINARY and VARBINARY. To correct ordinal position of some existing columns: - Add column {{COLUMN_DEFAULT}}. - Add column {{CHARACTER_OCTET_LENGTH}}. - Reorder column {{NUMERIC_PRECISION}}. Move date/time and interval precisions from {{NUMERIC_PRECISION}} to {{DATETIME_PRECISION}} and {{INTERVAL_PRECISION}}: - Change column {{NUMERIC_PRECISION}} to logically null for date/time and interval types. - Add column {{DATETIME_PRECISION}}. - Add column {{INTERVAL_PRECISION}}. Implement {{NUMERIC_PRECISION_RADIX}}: - Change column {{NUMERIC_PRECISION_RADIX}} from always logically null to appropriate values (2, 10, NULL). Add missing numeric precision and scale values (for non-DECIMAL types): - Change NUMERIC_SCALE from logical null to zero for integer types. - Change NUMERIC_PRECISION from logical null to precision for non-DECIMAL numeric types. was: [Editing in progress] Change logical null from {{-1}} to actual {{NULL}}: - Change column {{CHARACTER_MAXIMUM_LENGTH}}. - Change column {{NUMERIC_PRECISION}}. - Change column {{NUMERIC_PRECISION_RADIX}}. - Change column {{NUMERIC_SCALE}}. Change column {{DATA_TYPE}} from short names (e.g., "CHAR") to specified names (e.g., "CHARACTER"). Change column {{ORDINAL_POSITION}} from zero-based to one-based. Move {{CHAR}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as {{VARCHAR}} length): - Change column {{NUMERIC_PRECISION}} from length to logical null for CHAR. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for CHAR. Move {{BINARY}} and {{VARBINARY}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as CHAR and VARCHAR length): - Change column {{NUMERIC_PRECISION}} from length to logical null for BINARY and VARBINARY. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for BINARY and VARBINARY. Fix data type names "INTERVAL_DAY_TIME" and "INTERVAL_YEAR_MONTH" to "INTERVAL": - Change column {{DATA_TYPE}} to list "INTERVAL" for interval types. - Add column {{INTERVAL_TYPE}}. To correct ordinal position of some existing columns: - Add column {{COLUMN_DEFAULT}}. - Add column {{CHARACTER_OCTET_LENGTH}}. - Reorder column {{NUMERIC_PRECISION}}. Move date/time and interval precisions from {{NUMERIC_PRECISION}} to {{DATETIME_PRECISION}} and {{INTERVAL_PRECISION}}: - Change column {{NUMERIC_PRECISION}} to logically null for date/time and interval types. - Add column {{DATETIME_PRECISION}}. - Add column {{INTERVAL_PRECISION}}. Implement {{NUMERIC_PRECISION_RADIX}}: - Change column {{NUMERIC_PRECISION_RADIX}} from always logically null to appropriate values (2, 10, NULL). Add missing numeric precision and scale values (for non-DECIMAL types): - Change NUMERIC_SCALE from logical null to zero for integer types. - Change NUMERIC_PRECISION from logical null to precision for non-DECIMAL numeric types. > Fix existing INFORMATION_SCHEMA.COLUMNS columns > --- > > Key: DRILL-3216 > URL: https://issues.apache.org/jira/browse/DRILL-3216 > Project: Apache Drill > Issue Type: Bug >Reporter: Daniel Barclay (Drill) > > [Editing in progress] > Change logical null from {{-1}} to actual {{NULL}}: > - Change column {{CHARACTER_MAXIMUM_LENGTH}}. > - Change column {{NUMERIC_PRECISION}}. > - Change column {{NUMERIC_PRECISION_RADIX}}. > - Change column {{NUMERIC_SCALE}}. > Change column {{ORDINAL_POSITION}} from zero-based to one-based. > Change column {{DATA_TYPE}} from short names (e.g., "CHAR"
[jira] [Updated] (DRILL-3216) Fix existing INFORMATION_SCHEMA.COLUMNS columns
[ https://issues.apache.org/jira/browse/DRILL-3216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Barclay (Drill) updated DRILL-3216: -- Description: [Editing in progress] Change logical null from {{-1}} to actual {{NULL}}: - Change column {{CHARACTER_MAXIMUM_LENGTH}}. - Change column {{NUMERIC_PRECISION}}. - Change column {{NUMERIC_PRECISION_RADIX}}. - Change column {{NUMERIC_SCALE}}. Change column {{DATA_TYPE}} from short names (e.g., "CHAR") to specified names (e.g., "CHARACTER"). Change column {{ORDINAL_POSITION}} from zero-based to one-based. Move {{CHAR}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as {{VARCHAR}} length): - Change column {{NUMERIC_PRECISION}} from length to logical null for CHAR. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for CHAR. Move {{BINARY}} and {{VARBINARY}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as CHAR and VARCHAR length): - Change column {{NUMERIC_PRECISION}} from length to logical null for BINARY and VARBINARY. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for BINARY and VARBINARY. Fix data type names "INTERVAL_DAY_TIME" and "INTERVAL_YEAR_MONTH" to "INTERVAL": - Change column {{DATA_TYPE}} to list "INTERVAL" for interval types. - Add column {{INTERVAL_TYPE}}. To correct ordinal position of some existing columns: - Add column {{COLUMN_DEFAULT}}. - Add column {{CHARACTER_OCTET_LENGTH}}. - Reorder column {{NUMERIC_PRECISION}}. Move date/time and interval precisions from {{NUMERIC_PRECISION}} to {{DATETIME_PRECISION}} and {{INTERVAL_PRECISION}}: - Change column {{NUMERIC_PRECISION}} to logically null for date/time and interval types. - Add column {{DATETIME_PRECISION}}. - Add column {{INTERVAL_PRECISION}}. Implement {{NUMERIC_PRECISION_RADIX}}: - Change column {{NUMERIC_PRECISION_RADIX}} from always logically null to appropriate values (2, 10, NULL). Add missing numeric precision and scale values (for non-DECIMAL types): - Change NUMERIC_SCALE from logical null to zero for integer types. - Change NUMERIC_PRECISION from logical null to precision for non-DECIMAL numeric types. was: [Editing in progress] Change logical null from {{-1}} to actual {{NULL}}: - Change column {{CHARACTER_MAXIMUM_LENGTH}}. - Change column {{NUMERIC_PRECISION}}. - Change column {{NUMERIC_PRECISION_RADIX}}. - Change column {{NUMERIC_SCALE}}. Change column {{ORDINAL_POSITION}} from zero-based to one-based. Move {{CHAR}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as {{VARCHAR}} length): - Change column {{NUMERIC_PRECISION}} from length to logical null for CHAR. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for CHAR. Move {{BINARY}} and {{VARBINARY}} length from {{NUMERIC_PRECISION}} to {{CHARACTER_MAXIMUM_LENGTH}} (same as CHAR and VARCHAR length): - Change column {{NUMERIC_PRECISION}} from length to logical null for BINARY and VARBINARY. - Change column {{CHARACTER_MAXIMUM_LENGTH}} from logical null to length for BINARY and VARBINARY. Fix data type names "INTERVAL_DAY_TIME" and "INTERVAL_YEAR_MONTH" to "INTERVAL": - Change column {{DATA_TYPE}} to list "INTERVAL" for interval types. - Add column {{INTERVAL_TYPE}}. To correct ordinal position of some existing columns: - Add column {{COLUMN_DEFAULT}}. - Add column {{CHARACTER_OCTET_LENGTH}}. - Reorder column {{NUMERIC_PRECISION}}. Move date/time and interval precisions from {{NUMERIC_PRECISION}} to {{DATETIME_PRECISION}} and {{INTERVAL_PRECISION}}: - Change column {{NUMERIC_PRECISION}} to logically null for date/time and interval types. - Add column {{DATETIME_PRECISION}}. - Add column {{INTERVAL_PRECISION}}. Implement {{NUMERIC_PRECISION_RADIX}}: - Change column {{NUMERIC_PRECISION_RADIX}} from always logically null to appropriate values (2, 10, NULL). Add missing numeric precision and scale values (for non-DECIMAL types): - Change NUMERIC_SCALE from logical null to zero for integer types. - Change NUMERIC_PRECISION from logical null to precision for non-DECIMAL numeric types. > Fix existing INFORMATION_SCHEMA.COLUMNS columns > --- > > Key: DRILL-3216 > URL: https://issues.apache.org/jira/browse/DRILL-3216 > Project: Apache Drill > Issue Type: Bug >Reporter: Daniel Barclay (Drill) > > [Editing in progress] > Change logical null from {{-1}} to actual {{NULL}}: > - Change column {{CHARACTER_MAXIMUM_LENGTH}}. > - Change column {{NUMERIC_PRECISION}}. > - Change column {{NUMERIC_PRECISION_RADIX}}. > - Change column {{NUMERIC_SCALE}}. > Change column {{DATA_TYPE}} from short names (e.g., "CHAR") to specified > names (e.g., "CHARACTER"). > Change column {{ORDINAL_POSITION}} from zero-based to one-based. > Move {{CHAR}} length from {{NUMERIC_PRECISION}} to > {
[jira] [Commented] (DRILL-2530) getColumns() doesn't return right COLUMN_SIZE for INTERVAL types
[ https://issues.apache.org/jira/browse/DRILL-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14567635#comment-14567635 ] Daniel Barclay (Drill) commented on DRILL-2530: --- Note: A change a while ago improved MetaImpl.,getColumns() a little bit to return upper bound values for COLUMN_SIZE for INTERVAL_YEAR_MONTH and INTERVAR_DAY_TIME (using maximum possible precisions in lieu of having actual precisions). > getColumns() doesn't return right COLUMN_SIZE for INTERVAL types > > > Key: DRILL-2530 > URL: https://issues.apache.org/jira/browse/DRILL-2530 > Project: Apache Drill > Issue Type: Bug > Components: Client - JDBC >Reporter: Daniel Barclay (Drill) >Assignee: Daniel Barclay (Drill) > Fix For: 1.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3230) Local file system plug-in must be disabled in distributed mode
Abhishek Girish created DRILL-3230: -- Summary: Local file system plug-in must be disabled in distributed mode Key: DRILL-3230 URL: https://issues.apache.org/jira/browse/DRILL-3230 Project: Apache Drill Issue Type: Bug Components: Client - HTTP Reporter: Abhishek Girish Assignee: Jacques Nadeau The local file system plug-in (The "file:///" connection string in dfs storage plug-in) does not behave as expected for both CTAS and querying files, when Drill is configured with distributed mode (multiple drill-bits across nodes). In case of CTAS, parquet files will be written to a specific node's local file system, depending on which Drill-bit the client connects to. And if the table is moderate to large in size, Drill may process them in a distributed manner and write data into more than one node - data is partitioned into different nodes. In case of queries, it could be confusing again, as the behavior will depend on which drill-bit the client connects to. Hence the behavior seen would be inconsistent - queries would return only partial results, which depend on the drillbit connected to. My suggestion would be that the local file system plugin be disabled with distributed mode. With multiple drill bits and a centralized plugin for local file system, consistent behavior cannot be expected. It should be either disabled when distributed mode is detected or we could add support for multiple namespaces (using IP of nodes) with local file systems (might still not fix all issues). Or may be there could be other ways to resolve this, which I might be overlooking or not aware of. There have been many issues seen on the user ML, where inconsistent behaviors have been observed by users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3229) Create a new EmbeddedVector
Jacques Nadeau created DRILL-3229: - Summary: Create a new EmbeddedVector Key: DRILL-3229 URL: https://issues.apache.org/jira/browse/DRILL-3229 Project: Apache Drill Issue Type: Sub-task Reporter: Jacques Nadeau Embedded Vector will leverage a binary encoding for holding information about type for each individual field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-3228) Implement Embedded Type
Jacques Nadeau created DRILL-3228: - Summary: Implement Embedded Type Key: DRILL-3228 URL: https://issues.apache.org/jira/browse/DRILL-3228 Project: Apache Drill Issue Type: Task Components: Execution - Codegen, Execution - Data Types, Execution - Relational Operators, Functions - Drill Reporter: Jacques Nadeau Assignee: Jacques Nadeau An Umbrella for the implementation of Embedded types within Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3209) [Umbrella] Plan reads of Hive tables as native Drill reads when a native reader for the underlying table format exists
[ https://issues.apache.org/jira/browse/DRILL-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse updated DRILL-3209: --- Description: All reads against Hive are currently done through the Hive Serde interface, while this provides the most flexibility the API is not optimized for maximum performance while reading the data into Drill's native data structures. For Parquet and Text file backed tables, we can plan these reads as Drill native reads. Currently reads of these file types provide untyped data. While parquet has metadata in the file we currently do not make use of the type information while planning. For text files we read all of the files as lists of varchars. In both of these cases, casts will need to be injected to provide the same datatypes provided by the reads through the SerDe interface. > [Umbrella] Plan reads of Hive tables as native Drill reads when a native > reader for the underlying table format exists > -- > > Key: DRILL-3209 > URL: https://issues.apache.org/jira/browse/DRILL-3209 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization, Storage - Hive >Reporter: Jason Altekruse >Assignee: Jason Altekruse > > All reads against Hive are currently done through the Hive Serde interface, > while this provides the most flexibility the API is not optimized for maximum > performance while reading the data into Drill's native data structures. For > Parquet and Text file backed tables, we can plan these reads as Drill native > reads. Currently reads of these file types provide untyped data. While > parquet has metadata in the file we currently do not make use of the type > information while planning. For text files we read all of the files as lists > of varchars. In both of these cases, casts will need to be injected to > provide the same datatypes provided by the reads through the SerDe interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3209) [Umbrella] Plan reads of Hive tables as native Drill reads when a native reader for the underlying table format exists
[ https://issues.apache.org/jira/browse/DRILL-3209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Altekruse updated DRILL-3209: --- Description: All reads against Hive are currently done through the Hive Serde interface. While this provides the most flexibility, the API is not optimized for maximum performance while reading the data into Drill's native data structures. For Parquet and Text file backed tables, we can plan these reads as Drill native reads. Currently reads of these file types provide untyped data. While parquet has metadata in the file we currently do not make use of the type information while planning. For text files we read all of the files as lists of varchars. In both of these cases, casts will need to be injected to provide the same datatypes provided by the reads through the SerDe interface. (was: All reads against Hive are currently done through the Hive Serde interface, while this provides the most flexibility the API is not optimized for maximum performance while reading the data into Drill's native data structures. For Parquet and Text file backed tables, we can plan these reads as Drill native reads. Currently reads of these file types provide untyped data. While parquet has metadata in the file we currently do not make use of the type information while planning. For text files we read all of the files as lists of varchars. In both of these cases, casts will need to be injected to provide the same datatypes provided by the reads through the SerDe interface.) > [Umbrella] Plan reads of Hive tables as native Drill reads when a native > reader for the underlying table format exists > -- > > Key: DRILL-3209 > URL: https://issues.apache.org/jira/browse/DRILL-3209 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization, Storage - Hive >Reporter: Jason Altekruse >Assignee: Jason Altekruse > > All reads against Hive are currently done through the Hive Serde interface. > While this provides the most flexibility, the API is not optimized for > maximum performance while reading the data into Drill's native data > structures. For Parquet and Text file backed tables, we can plan these reads > as Drill native reads. Currently reads of these file types provide untyped > data. While parquet has metadata in the file we currently do not make use of > the type information while planning. For text files we read all of the files > as lists of varchars. In both of these cases, casts will need to be injected > to provide the same datatypes provided by the reads through the SerDe > interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)