[jira] [Resolved] (DRILL-3745) Hive CHAR not supported
[ https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arina Ielchiieva resolved DRILL-3745. - Resolution: Fixed Fix Version/s: 1.6.0 > Hive CHAR not supported > --- > > Key: DRILL-3745 > URL: https://issues.apache.org/jira/browse/DRILL-3745 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Nathaniel Auvil >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.6.0 > > > It doesn’t look like Drill 1.1.0 supports the Hive CHAR type? > In Hive: > create table development.foo > ( > bad CHAR(10) > ); > And then in sqlline: > > use `hive.development`; > > select * from foo; > Error: PARSE ERROR: Unsupported Hive data type CHAR. > Following Hive data types are supported in Drill INFORMATION_SCHEMA: > BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, > BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION > [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] > (state=,code=0) > This was originally found when getting failures trying to connect via JDBS > using Squirrel. We have the Hive plugin enabled with tables using CHAR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2223) Empty parquet file created with Limit 0 query errors out when querying
[ https://issues.apache.org/jira/browse/DRILL-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200792#comment-15200792 ] Khurram Faraaz commented on DRILL-2223: --- Yes this is not reproducible on Drill 1.7.0. The question is if CTAS that used a LIMIT 0 query was successful, and because every successful CTAS creates a valid parquet file, one would expect that in this case CTAS create an empty parquet file that has the metadata information in the parquet footer with no actual data in the parquet file, since the query was a LIMIT 0 query. {noformat} 0: jdbc:drill:schema=dfs.tmp> create table t_2223 as select firstName, lastName, isAlive, age, height_cm, address, phoneNumbers, hobbies from `employee.json` LIMIT 0; +---++ | Fragment | Number of records written | +---++ | 0_0 | 0 | +---++ 1 row selected (0.31 seconds) 0: jdbc:drill:schema=dfs.tmp> select * from t_2223; Error: VALIDATION ERROR: From line 1, column 15 to line 1, column 20: Table 't_2223' not found SQL Query null [Error Id: 18273406-da54-415d-b8fe-aa96c6cc3c85 on centos-01.qa.lab:31010] (state=,code=0) {noformat} > Empty parquet file created with Limit 0 query errors out when querying > -- > > Key: DRILL-2223 > URL: https://issues.apache.org/jira/browse/DRILL-2223 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 0.7.0 >Reporter: Aman Sinha > Fix For: Future > > > Doing a CTAS with limit 0 creates a 0 length parquet file which errors out > during querying. This should at least write the schema information and > metadata which will allow queries to run. > {code} > 0: jdbc:drill:zk=local> create table tt_nation2 as select n_nationkey, > n_name, n_regionkey from cp.`tpch/nation.parquet` limit 0; > ++---+ > | Fragment | Number of records written | > ++---+ > | 0_0| 0 | > ++---+ > 1 row selected (0.315 seconds) > 0: jdbc:drill:zk=local> select n_nationkey from tt_nation2; > Query failed: RuntimeException: file:/tmp/tt_nation2/0_0_0.parquet is not a > Parquet file (too small) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4510) IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.dr
[ https://issues.apache.org/jira/browse/DRILL-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200148#comment-15200148 ] ASF GitHub Bot commented on DRILL-4510: --- GitHub user hsuanyi opened a pull request: https://github.com/apache/drill/pull/433 DRILL-4510: Force Union-All to happen in a single node You can merge this pull request into a Git repository by running: $ git pull https://github.com/hsuanyi/incubator-drill DRILL-4510 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/433.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #433 commit 6c025c90ad23d16b0aefca5f03c033362f93 Author: Hsuan-Yi ChuDate: 2016-03-16T04:52:17Z DRILL-4510: Force Union-All to happen in a single node > IllegalStateException: Failure while reading vector. Expected vector class > of org.apache.drill.exec.vector.NullableIntVector but was holding vector > class org.apache.drill.exec.vector.NullableVarCharVector > - > > Key: DRILL-4510 > URL: https://issues.apache.org/jira/browse/DRILL-4510 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types >Reporter: Chun Chang >Assignee: Sean Hsuan-Yi Chu >Priority: Critical > > Hit the following regression running advanced automation. Regression happened > between commit b979bebe83d7017880b0763adcbf8eb80acfcee8 and > 1f23b89623c72808f2ee866cec9b4b8a48929d68 > {noformat} > Execution Failures: > /root/drillAutomation/framework-master/framework/resources/Advanced/tpcds/tpcds_sf100/original/query66.sql > Query: > -- start query 66 in stream 0 using template query66.tpl > SELECT w_warehouse_name, >w_warehouse_sq_ft, >w_city, >w_county, >w_state, >w_country, >ship_carriers, >year1, >Sum(jan_sales) AS jan_sales, >Sum(feb_sales) AS feb_sales, >Sum(mar_sales) AS mar_sales, >Sum(apr_sales) AS apr_sales, >Sum(may_sales) AS may_sales, >Sum(jun_sales) AS jun_sales, >Sum(jul_sales) AS jul_sales, >Sum(aug_sales) AS aug_sales, >Sum(sep_sales) AS sep_sales, >Sum(oct_sales) AS oct_sales, >Sum(nov_sales) AS nov_sales, >Sum(dec_sales) AS dec_sales, >Sum(jan_sales / w_warehouse_sq_ft) AS jan_sales_per_sq_foot, >Sum(feb_sales / w_warehouse_sq_ft) AS feb_sales_per_sq_foot, >Sum(mar_sales / w_warehouse_sq_ft) AS mar_sales_per_sq_foot, >Sum(apr_sales / w_warehouse_sq_ft) AS apr_sales_per_sq_foot, >Sum(may_sales / w_warehouse_sq_ft) AS may_sales_per_sq_foot, >Sum(jun_sales / w_warehouse_sq_ft) AS jun_sales_per_sq_foot, >Sum(jul_sales / w_warehouse_sq_ft) AS jul_sales_per_sq_foot, >Sum(aug_sales / w_warehouse_sq_ft) AS aug_sales_per_sq_foot, >Sum(sep_sales / w_warehouse_sq_ft) AS sep_sales_per_sq_foot, >Sum(oct_sales / w_warehouse_sq_ft) AS oct_sales_per_sq_foot, >Sum(nov_sales / w_warehouse_sq_ft) AS nov_sales_per_sq_foot, >Sum(dec_sales / w_warehouse_sq_ft) AS dec_sales_per_sq_foot, >Sum(jan_net) AS jan_net, >Sum(feb_net) AS feb_net, >Sum(mar_net) AS mar_net, >Sum(apr_net) AS apr_net, >Sum(may_net) AS may_net, >Sum(jun_net) AS jun_net, >Sum(jul_net) AS jul_net, >Sum(aug_net) AS aug_net, >Sum(sep_net) AS sep_net, >Sum(oct_net) AS oct_net, >Sum(nov_net) AS nov_net, >Sum(dec_net) AS dec_net > FROM (SELECT w_warehouse_name, >w_warehouse_sq_ft, >
[jira] [Resolved] (DRILL-4372) Drill Operators and Functions should correctly expose their types within Calcite
[ https://issues.apache.org/jira/browse/DRILL-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinfeng Ni resolved DRILL-4372. --- Resolution: Fixed Fix Version/s: 1.7.0 Fixed in commit: c0293354ec79b42ff27ce4ad2113a2ff52a934bd > Drill Operators and Functions should correctly expose their types within > Calcite > > > Key: DRILL-4372 > URL: https://issues.apache.org/jira/browse/DRILL-4372 > Project: Apache Drill > Issue Type: Sub-task > Components: Query Planning & Optimization >Reporter: Sean Hsuan-Yi Chu >Assignee: Sean Hsuan-Yi Chu > Fix For: 1.7.0 > > > Currently, for most operators / functions, Drill would always claim the > return types being nullable-any. > However, in many cases (such as Hive, View, etc.), the types of input columns > are known. So, along with resolving to the correct operators / functions, we > can infer the output types at planning. > Having this mechanism can help speed up many applications, especially where > schemas alone are sufficient (e.g., Limit-0). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4510) IllegalStateException: Failure while reading vector. Expected vector class of org.apache.drill.exec.vector.NullableIntVector but was holding vector class org.apache.dr
[ https://issues.apache.org/jira/browse/DRILL-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197687#comment-15197687 ] Chun Chang commented on DRILL-4510: --- Now, with this commit id, 1.7.0-SNAPSHOT 050ff9679d99b5cdacc86f5501802c3d2a6dd3e3, the error message becomes: {noformat} Failed with exception java.sql.SQLException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes Fragment 2:0 [Error Id: ebd53ba1-f281-4441-8b20-105bd8bb2e06 on atsqa6c88.qa.lab:31010] at org.apache.drill.jdbc.impl.DrillCursor.nextRowInternally(DrillCursor.java:247) at org.apache.drill.jdbc.impl.DrillCursor.next(DrillCursor.java:321) at oadd.net.hydromatic.avatica.AvaticaResultSet.next(AvaticaResultSet.java:187) at org.apache.drill.jdbc.impl.DrillResultSetImpl.next(DrillResultSetImpl.java:172) at org.apache.drill.test.framework.DrillTestJdbc.executeQuery(DrillTestJdbc.java:203) at org.apache.drill.test.framework.DrillTestJdbc.run(DrillTestJdbc.java:93) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: oadd.org.apache.drill.common.exceptions.UserRemoteException: UNSUPPORTED_OPERATION ERROR: Hash aggregate does not support schema changes Fragment 2:0 [Error Id: ebd53ba1-f281-4441-8b20-105bd8bb2e06 on atsqa6c88.qa.lab:31010] at oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:119) at oadd.org.apache.drill.exec.rpc.user.UserClient.handleReponse(UserClient.java:113) at oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:46) at oadd.org.apache.drill.exec.rpc.BasicClientWithConnection.handle(BasicClientWithConnection.java:31) at oadd.org.apache.drill.exec.rpc.RpcBus.handle(RpcBus.java:67) at oadd.org.apache.drill.exec.rpc.RpcBus$RequestEvent.run(RpcBus.java:374) at oadd.org.apache.drill.common.SerializedExecutor$RunnableProcessor.run(SerializedExecutor.java:89) at oadd.org.apache.drill.exec.rpc.RpcBus$SameExecutor.execute(RpcBus.java:252) at oadd.org.apache.drill.common.SerializedExecutor.execute(SerializedExecutor.java:123) at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:285) at oadd.org.apache.drill.exec.rpc.RpcBus$InboundHandler.decode(RpcBus.java:257) at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:89) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at oadd.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at oadd.io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at oadd.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at oadd.io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) at oadd.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) at oadd.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) at oadd.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) at oadd.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at oadd.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at
[jira] [Comment Edited] (DRILL-4398) SYSTEM ERROR: IllegalStateException: Memory was leaked by query
[ https://issues.apache.org/jira/browse/DRILL-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201810#comment-15201810 ] Matt Keranen edited comment on DRILL-4398 at 3/18/16 5:25 PM: -- Getting similar in 1.6.0 with CTAS into Parquet from csv data stored in HDFS: {noformat} Error: SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory leaked: (523264) Allocator(op:1:12:5:ExternalSort) 2000/523264/343731840/357913941 (res/actual/peak/limit) Fragment 1:12 [Error Id: be0fef1f-e02a-422e-808f-2fe171ae7875 on es05:31010] (state=,code=0) {noformat} was (Author: mattk): Getting similar in 1.6.0 with CTAS into Parquet from csv data stored in HDFS. > SYSTEM ERROR: IllegalStateException: Memory was leaked by query > --- > > Key: DRILL-4398 > URL: https://issues.apache.org/jira/browse/DRILL-4398 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JDBC >Affects Versions: 1.5.0 >Reporter: N Campbell >Assignee: Taras Supyk > > Several queries fail with memory leaked errors > select tjoin2.rnum, tjoin1.c1, tjoin2.c1 as c1j2, tjoin2.c2 as c2j2 from > postgres.public.tjoin1 full outer join postgres.public.tjoin2 on tjoin1.c1 = > tjoin2.c1 > select tjoin1.rnum, tjoin1.c1, tjoin2.c1 as c1j2, tjoin2.c2 from > postgres.public.tjoin1, lateral ( select tjoin2.c1, tjoin2.c2 from > postgres.public.tjoin2 where tjoin1.c1=tjoin2.c1) tjoin2 > SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory > leaked: (40960) > Allocator(op:0:0:3:JdbcSubScan) 100/40960/135168/100 > (res/actual/peak/limit) > create table TJOIN1 (RNUM integer not null , C1 integer, C2 integer); > insert into TJOIN1 (RNUM, C1, C2) values ( 0, 10, 15); > insert into TJOIN1 (RNUM, C1, C2) values ( 1, 20, 25); > insert into TJOIN1 (RNUM, C1, C2) values ( 2, NULL, 50); > create table TJOIN2 (RNUM integer not null , C1 integer, C2 char(2)); > insert into TJOIN2 (RNUM, C1, C2) values ( 0, 10, 'BB'); > insert into TJOIN2 (RNUM, C1, C2) values ( 1, 15, 'DD'); > insert into TJOIN2 (RNUM, C1, C2) values ( 2, NULL, 'EE'); > insert into TJOIN2 (RNUM, C1, C2) values ( 3, 10, 'FF'); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4338) Concurrent query remains in CANCELLATION_REQUESTED state
[ https://issues.apache.org/jira/browse/DRILL-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197905#comment-15197905 ] Khurram Faraaz commented on DRILL-4338: --- Problem is reproducible on Drill 1.6.0, JDK 7 and git commit ID : 64ab0a8ec9d98bf96f4d69274dddc180b8efe263 > Concurrent query remains in CANCELLATION_REQUESTED state > - > > Key: DRILL-4338 > URL: https://issues.apache.org/jira/browse/DRILL-4338 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.4.0 > Environment: 4 node cluster CentOS >Reporter: Khurram Faraaz > Attachments: ConcurrencyTest.java, > query_In_cancellation_requested_state.png > > > Execute a query concurrently through a Java program and while the java > program is under execution (executing SQL queries concurrently) issue Ctrl-C > on the prompt where the java program was being executed. > Here are two observations, > (1) There is an Exception in drillbit.log. > (2) Once Ctrl-C was issued to the java program, queries that were under > execution at that point of time, move from FAILED state to > CANCELLATION_REQUESTED state, they do not end up in CANCELED state. Ideally > that last state of these queries should be CANCELED state and not > CANCELLATION_REQUESTED. > Snippet from drillbit.log > {noformat} > 2016-02-02 06:21:21,903 [294fb51d-8a4c-c099-dc90-97434056e3d7:frag:0:0] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51d-8a4c-c099-dc90-97434056e3d7:0:0: State change requested > AWAITING_ALLOCATION --> RUNNING > 2016-02-02 06:21:21,903 [294fb51d-8a4c-c099-dc90-97434056e3d7:frag:0:0] INFO > o.a.d.e.w.f.FragmentStatusReporter - > 294fb51d-8a4c-c099-dc90-97434056e3d7:0:0: State to report: RUNNING > 2016-02-02 06:21:48,560 [UserServer-1] ERROR > o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. > Connection: /10.10.100.201:31010 <--> /10.10.100.201:45087 (user client). > Closing connection. > java.io.IOException: syscall:read(...)() failed: Connection reset by peer > 2016-02-02 06:21:48,562 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51d-8a4c-c099-dc90-97434056e3d7:0:0: State change requested RUNNING --> > FAILED > 2016-02-02 06:21:48,562 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51d-f424-6adc-d668-1659e4353698:0:0: State change requested RUNNING --> > FAILED > 2016-02-02 06:21:48,562 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51d-c7f6-2c8f-0689-af9de21a6d20:0:0: State change requested RUNNING --> > FAILED > 2016-02-02 06:21:48,563 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51e-5de9-0919-be56-52f75a0532f1:0:0: State change requested RUNNING --> > FAILED > 2016-02-02 06:21:48,573 [CONTROL-rpc-event-queue] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51d-f424-6adc-d668-1659e4353698:0:0: State change requested FAILED --> > CANCELLATION_REQUESTED > 2016-02-02 06:21:48,573 [CONTROL-rpc-event-queue] WARN > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51d-f424-6adc-d668-1659e4353698:0:0: Ignoring unexpected state > transition FAILED --> CANCELLATION_REQUESTED > 2016-02-02 06:21:48,580 [CONTROL-rpc-event-queue] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51e-5de9-0919-be56-52f75a0532f1:0:0: State change requested FAILED --> > CANCELLATION_REQUESTED > 2016-02-02 06:21:48,580 [CONTROL-rpc-event-queue] WARN > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51e-5de9-0919-be56-52f75a0532f1:0:0: Ignoring unexpected state > transition FAILED --> CANCELLATION_REQUESTED > 2016-02-02 06:21:48,588 [CONTROL-rpc-event-queue] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51d-c7f6-2c8f-0689-af9de21a6d20:0:0: State change requested FAILED --> > CANCELLATION_REQUESTED > 2016-02-02 06:21:48,588 [CONTROL-rpc-event-queue] WARN > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51d-c7f6-2c8f-0689-af9de21a6d20:0:0: Ignoring unexpected state > transition FAILED --> CANCELLATION_REQUESTED > 2016-02-02 06:21:48,596 [CONTROL-rpc-event-queue] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51d-8a4c-c099-dc90-97434056e3d7:0:0: State change requested FAILED --> > CANCELLATION_REQUESTED > 2016-02-02 06:21:48,596 [CONTROL-rpc-event-queue] WARN > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51d-8a4c-c099-dc90-97434056e3d7:0:0: Ignoring unexpected state > transition FAILED --> CANCELLATION_REQUESTED > 2016-02-02 06:21:48,597 [UserServer-1] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 294fb51d-f424-6adc-d668-1659e4353698:0:0: State change requested FAILED --> > FAILED > 2016-02-02 06:21:48,599 [UserServer-1] WARN > o.a.d.exec.rpc.RpcExceptionHandler - Exception occurred with closed channel. > Connection: /10.10.100.201:31010 <-->
[jira] [Commented] (DRILL-4317) Exceptions on SELECT and CTAS with large CSV files
[ https://issues.apache.org/jira/browse/DRILL-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197383#comment-15197383 ] Deneche A. Hakim commented on DRILL-4317: - [~hgunes] can you please review ? thanks > Exceptions on SELECT and CTAS with large CSV files > -- > > Key: DRILL-4317 > URL: https://issues.apache.org/jira/browse/DRILL-4317 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV >Affects Versions: 1.4.0, 1.5.0 > Environment: 4 node cluster, Hadoop 2.7.0, 14.04.1-Ubuntu >Reporter: Matt Keranen >Assignee: Hanifi Gunes > Fix For: 1.7.0 > > > Selecting from a CSV file or running a CTAS into Parquet generates exceptions. > Source file is ~650MB, a table of 4 key columns followed by 39 numeric data > columns, otherwise a fairly simple format. Example: > {noformat} > 2015-10-17 > 00:00,f5e9v8u2,err,fr7,226020793,76.094,26307,226020793,76.094,26307, > 2015-10-17 > 00:00,c3f9x5z2,err,mi1,1339159295,216.004,177690,1339159295,216.004,177690, > 2015-10-17 > 00:00,r5z2f2i9,err,mi1,7159994629,39718.011,65793,6142021303,30687.811,64630,143777403,40.521,146,75503742,41.905,89,170771174,168.165,198,192565529,370.475,222,97577280,318.068,120,62631452,288.253,68,32371173,189.527,39,41712265,299.184,46,39046408,363.418,47,34182318,465.343,43,127834582,6485.341,145 > 2015-10-17 > 00:00,j9s6i8t2,err,fr7,20580443899,277445.055,67826,2814893469,85447.816,54275,2584757097,608.001,2044,1395571268,769.113,1051,3070616988,3000.005,2284,3413811671,6489.060,2569,1772235156,5806.214,1339,1097879284,5064.120,858,691884865,4035.397,511,672967845,4815.875,518,789163614,7306.684,599,813910495,10632.464,627,1462752147,143470.306,1151 > {noformat} > A "SELECT from `/path/to/file.csv`" runs for 10's of minutes and eventually > results in: > {noformat} > java.lang.IndexOutOfBoundsException: index: 547681, length: 1 (expected: > range(0, 547681)) > at > io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1134) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136) > at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289) > at > io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:26) > at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586) > at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586) > at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586) > at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586) > at > org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:443) > at > org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:125) > at > org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:146) > at > org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:136) > at > org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:94) > at > org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148) > at > org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795) > at > org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179) > at > net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420) > at sqlline.Rows$Row.(Rows.java:157) > at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63) > at > sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) > at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) > at sqlline.SqlLine.print(SqlLine.java:1593) > at sqlline.Commands.execute(Commands.java:852) > at sqlline.Commands.sql(Commands.java:751) > at sqlline.SqlLine.dispatch(SqlLine.java:746) > at sqlline.SqlLine.begin(SqlLine.java:621) > at sqlline.SqlLine.start(SqlLine.java:375) > at sqlline.SqlLine.main(SqlLine.java:268) > {noformat} > A CTAS on the same file with storage as Parquet results in: > {noformat} > Error: SYSTEM ERROR: IllegalArgumentException: length: -260 (expected: >= 0) > Fragment 1:2 > [Error Id: 1807615e-4385-4f85-8402-5900aaa568e9 on es07:31010] > (java.lang.IllegalArgumentException) length: -260 (expected: >= 0) > io.netty.buffer.AbstractByteBuf.checkIndex():1131 > io.netty.buffer.PooledUnsafeDirectByteBuf.nioBuffer():344 >
[jira] [Updated] (DRILL-4518) Two or more columns present in of IN predicate, query returns wrong results.
[ https://issues.apache.org/jira/browse/DRILL-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Khurram Faraaz updated DRILL-4518: -- Attachment: f_20160316.json attached data file used in test. > Two or more columns present in of IN predicate, query > returns wrong results. > -- > > Key: DRILL-4518 > URL: https://issues.apache.org/jira/browse/DRILL-4518 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.7.0 > Environment: 4 node cluster CentOS >Reporter: Khurram Faraaz > Attachments: f_20160316.json > > > Two or more columns present in of IN predicate, query > returns wrong results. > Drill 1.7.0-SNAPSHOT git commit ID: 245da979 > {noformat} > 0: jdbc:drill:schema=dfs.tmp> alter system set > `store.json.all_text_mode`=true; > +---++ > | ok | summary | > +---++ > | true | store.json.all_text_mode updated. | > +---++ > 1 row selected (0.15 seconds) > 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM `f_20160316.json` t WHERE (t.c1) > IN (1234,345643); > +---+ > | c1 | > +---+ > | 1234 | > +---+ > 1 row selected (0.292 seconds) > 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM `f_20160316.json` t WHERE (t.c2) > IN (1234,345643); > +---+ > | c1 | > +---+ > | null | > +---+ > 1 row selected (0.224 seconds) > 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM `f_20160316.json` t WHERE > (t.c1,t.c2) IN (1234,345643); > Error: VALIDATION ERROR: From line 1, column 35 to line 1, column 68: Values > passed to IN operator must have compatible types > SQL Query null > [Error Id: 740e94a7-b61b-4dbf-96f3-8166c4f94164 on centos-04.qa.lab:31010] > (state=,code=0) > Stack trace from drillbit.log for above failure. > 2016-03-17 06:57:40,227 [2915aa9b-381a-119d-2814-711fea9dd07c:foreman] INFO > o.a.drill.exec.work.foreman.Foreman - Query text for query id > 2915aa9b-381a-119d-2814-711fea9dd07c: SELECT * FROM `f_20160316.json` t WHERE > (t.c1,t.c2) IN (1234,345643) > 2016-03-17 06:57:40,286 [2915aa9b-381a-119d-2814-711fea9dd07c:foreman] INFO > o.a.d.exec.planner.sql.SqlConverter - User Error Occurred > org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: From line > 1, column 35 to line 1, column 68: Values passed to IN operator must have > compatible types > SQL Query null > [Error Id: 740e94a7-b61b-4dbf-96f3-8166c4f94164 ] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) > ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.planner.sql.SqlConverter.validate(SqlConverter.java:157) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:581) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:192) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:94) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:927) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:251) > [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > [na:1.7.0_45] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > [na:1.7.0_45] > at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] > Caused by: org.apache.calcite.runtime.CalciteContextException: From line 1, > column 35 to line 1, column 68: Values passed to IN operator must have > compatible types > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) ~[na:1.7.0_45] > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) > ~[na:1.7.0_45] > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > ~[na:1.7.0_45] > at java.lang.reflect.Constructor.newInstance(Constructor.java:526) > ~[na:1.7.0_45] > at >
[jira] [Updated] (DRILL-2282) Eliminate spaces, special characters from names in function templates
[ https://issues.apache.org/jira/browse/DRILL-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vitalii Diravka updated DRILL-2282: --- Issue Type: Improvement (was: Bug) > Eliminate spaces, special characters from names in function templates > - > > Key: DRILL-2282 > URL: https://issues.apache.org/jira/browse/DRILL-2282 > Project: Apache Drill > Issue Type: Improvement > Components: Functions - Drill >Reporter: Mehant Baid >Assignee: Vitalii Diravka > Fix For: 1.7.0 > > Attachments: DRILL-2282-updated.patch, DRILL-2282.patch > > > Having spaces in the name of the functions causes issues while deserializing > such expressions when we try to read the plan fragment. As part of this JIRA > would like to clean up all the templates to not include special characters in > their names. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3745) Hive CHAR not supported
[ https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197552#comment-15197552 ] Arina Ielchiieva commented on DRILL-3745: - Commid id - dd4f03b. > Hive CHAR not supported > --- > > Key: DRILL-3745 > URL: https://issues.apache.org/jira/browse/DRILL-3745 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Nathaniel Auvil >Assignee: Arina Ielchiieva > Labels: doc-impacting > > It doesn’t look like Drill 1.1.0 supports the Hive CHAR type? > In Hive: > create table development.foo > ( > bad CHAR(10) > ); > And then in sqlline: > > use `hive.development`; > > select * from foo; > Error: PARSE ERROR: Unsupported Hive data type CHAR. > Following Hive data types are supported in Drill INFORMATION_SCHEMA: > BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, > BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION > [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] > (state=,code=0) > This was originally found when getting failures trying to connect via JDBS > using Squirrel. We have the Hive plugin enabled with tables using CHAR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4443) MIN/MAX on VARCHAR throw a NullPointerException
[ https://issues.apache.org/jira/browse/DRILL-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4443: - Reviewer: Khurram Faraaz > MIN/MAX on VARCHAR throw a NullPointerException > --- > > Key: DRILL-4443 > URL: https://issues.apache.org/jira/browse/DRILL-4443 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.6.0 > Environment: 4 node cluster CentOS >Reporter: Khurram Faraaz >Assignee: Deneche A. Hakim >Priority: Critical > Fix For: 1.6.0 > > Attachments: DRILL_4443.parquet, test4443.csv > > > Using a simple csv file that contains at least 2 groups of rows: > {noformat} > a, > a, > a, > b, > {noformat} > Running a query with min/max throws a NullPointerException: > {noformat} > SELECT MIN(columns[1]) FROM `test4443.csv` GROUP BY columns[0]; > Error: SYSTEM ERROR: NullPointerException > ... > {noformat} > {noformat} > SELECT MAX(columns[1]) FROM `test4443.csv` GROUP BY columns[0]; > Error: SYSTEM ERROR: NullPointerException > ... > {noformat} > The problem is caused by {{VarCharAggrFunctions.java}} that is not reseting > it's internal buffer properly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4474) Inconsistent behavior while using COUNT in select (Apache drill 1.2.0)
[ https://issues.apache.org/jira/browse/DRILL-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4474: - Reviewer: Khurram Faraaz > Inconsistent behavior while using COUNT in select (Apache drill 1.2.0) > -- > > Key: DRILL-4474 > URL: https://issues.apache.org/jira/browse/DRILL-4474 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.2.0, 1.5.0 > Environment: m3.xlarge AWS instances ( 3 nodes) > CentOS6.5 x64 >Reporter: Shankar >Assignee: Jacques Nadeau >Priority: Blocker > Fix For: 1.6.0 > > > {quote} > * We are using drill to retrieve the business data from game analytic. > * We are running below queries on table of size 50GB (parquet) > * We have found some major inconsistency in data when we use COUNT function. > * Below is the case by case queries and their output. {color:blue}*Please > analyse it carefully, to for clear understanding of behaviour. *{color} > * Please let me know how to resolve this ? (or any earlier JIRA has been > already created). > * Hope this may be fixed in later versions. If not please do the needful. > {quote} > -- > CASE-1 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+ > | count | > +---+ > | 27645752 | > +---+ > 1 row selected (0.281 seconds) > {noformat} > {quote} > {color} > -- > CASE-2 (Wrong result) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +---+---+ > | EXPR$0 | cnt | > +---+---+ > | 37772844 | 2108 | > +---+---+ > 1 row selected (12.597 seconds) > {noformat} > {quote} > {color} > -- > CASE-3 (Wrong result, only first count is correct) > -- > {color:red} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct sessionid), > . . . . . . . > count(case when t.id = '/confirmDrop/btnYes/' and t.event = > 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +-+---+ > | EXPR$0 |cnt| > +-+---+ > | 201941 | 37772844 | > +-+---+ > 1 row selected (8.259 seconds) > {noformat} > {quote} > {color} > -- > CASE-4 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(distinct case when t.id = '/confirmDrop/btnYes/' and > t.event = 'Click' then sessionid end) as cnt > . . . . . . . > from dfs.tmp.a_games_log_visit_base t > . . . . . . . > ; > +--+ > | cnt | > +--+ > | 525 | > +--+ > 1 row selected (14.318 seconds) > {noformat} > {quote} > {color} > -- > CASE-5 (Correct result) > -- > {color:green} > {quote} > {noformat} > 0: jdbc:drill:> select > . . . . . . . > count(sessionid), > . . . . . . . > count(distinct sessionid) > . . . . . . . > from
[jira] [Updated] (DRILL-4519) File system directory-based partition pruning doesn't work correctly with parquet metadata
[ https://issues.apache.org/jira/browse/DRILL-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miroslav Holubec updated DRILL-4519: Description: We have parquet files in folders with following convention /MM/DD/HH. Without drill's parquet metadata directory prunning works seamlessly. {noformat} select dir0, dir1, dir2 from hdfs.test.indexed; dir0 = , dir1 = MM, dir2 = DD, dir3 = HH {noformat} After creating metadata and executing same query, dir0 contains HH folder name instead yearly folder name. dir1...3 are null. {noformat} refresh table metadata hdfs.test.indexed; select dir0, dir1, dir2 from hdfs.test.indexed; dir0 = HH, dir1 = null, dir2 = null, dir3 = null {noformat} was: We have parquet files in folders with following convention /MM/DD/HH. Without drill's parquet metadata directory prunning works seamlessly. {noformat} select dir0, dir1, dir2 from hdfs.test.indexed; dir0 = , dir1 = MM, dir2 = DD, dir3 = HH {noformat} After creating metadata and executing same query, dir0 contains HH folder name instead yearly folder name. dir1...3 are null. {noformat} select dir0, dir1, dir2 from hdfs.test.indexed; dir0 = HH, dir1 = null, dir2 = null, dir3 = null {noformat} > File system directory-based partition pruning doesn't work correctly with > parquet metadata > -- > > Key: DRILL-4519 > URL: https://issues.apache.org/jira/browse/DRILL-4519 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.4.0, 1.5.0 >Reporter: Miroslav Holubec > > We have parquet files in folders with following convention /MM/DD/HH. > Without drill's parquet metadata directory prunning works seamlessly. > {noformat} > select dir0, dir1, dir2 from hdfs.test.indexed; > dir0 = , dir1 = MM, dir2 = DD, dir3 = HH > {noformat} > After creating metadata and executing same query, dir0 contains HH folder > name instead yearly folder name. dir1...3 are null. > {noformat} > refresh table metadata hdfs.test.indexed; > select dir0, dir1, dir2 from hdfs.test.indexed; > dir0 = HH, dir1 = null, dir2 = null, dir3 = null > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3562) Query fails when using flatten on JSON data where some documents have an empty array
[ https://issues.apache.org/jira/browse/DRILL-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201352#comment-15201352 ] Ian Hellstrom commented on DRILL-3562: -- Is this a duplicate of [DRILL-2217|http://issues.apache.org/jira/browse/DRILL-2217]? > Query fails when using flatten on JSON data where some documents have an > empty array > > > Key: DRILL-3562 > URL: https://issues.apache.org/jira/browse/DRILL-3562 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JSON >Affects Versions: 1.1.0 >Reporter: Philip Deegan > Fix For: Future > > > Drill query fails when using flatten when some records contain an empty array > {noformat} > SELECT COUNT(*) FROM (SELECT FLATTEN(t.a.b.c) AS c FROM dfs.`flat.json` t) > flat WHERE flat.c.d.e = 'f' limit 1; > {noformat} > Succeeds on > { "a": { "b": { "c": [ { "d": { "e": "f" } } ] } } } > Fails on > { "a": { "b": { "c": [] } } } > Error > {noformat} > Error: SYSTEM ERROR: ClassCastException: Cannot cast > org.apache.drill.exec.vector.NullableIntVector to > org.apache.drill.exec.vector.complex.RepeatedValueVector > {noformat} > Is it possible to ignore the empty arrays, or do they need to be populated > with dummy data? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4376) Wrong results when doing a count(*) on part of directories with metadata cache
[ https://issues.apache.org/jira/browse/DRILL-4376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197701#comment-15197701 ] ASF GitHub Bot commented on DRILL-4376: --- Github user adeneche closed the pull request at: https://github.com/apache/drill/pull/422 > Wrong results when doing a count(*) on part of directories with metadata cache > -- > > Key: DRILL-4376 > URL: https://issues.apache.org/jira/browse/DRILL-4376 > Project: Apache Drill > Issue Type: Bug > Components: Metadata >Affects Versions: 1.4.0 >Reporter: Deneche A. Hakim >Assignee: Deneche A. Hakim >Priority: Critical > Fix For: 1.7.0 > > > First create some parquet tables in multiple subfolders: > {noformat} > create table dfs.tmp.`test/201501` as select employee_id, full_name from > cp.`employee.json` limit 2; > create table dfs.tmp.`test/201502` as select employee_id, full_name from > cp.`employee.json` limit 2; > create table dfs.tmp.`test/201601` as select employee_id, full_name from > cp.`employee.json` limit 2; > create table dfs.tmp.`test/201602` as select employee_id, full_name from > cp.`employee.json` limit 2; > {noformat} > Running the following query gives the expected count: > {noformat} > select count(*) from dfs.tmp.`test/20160*`; > +-+ > | EXPR$0 | > +-+ > | 4 | > +-+ > {noformat} > But once you create the metadata cache files, the query no longer returns the > correct results: > {noformat} > refresh table metadata dfs.tmp.`test`; > select count(*) from dfs.tmp.`test/20160*`; > +-+ > | EXPR$0 | > +-+ > | 2 | > +-+ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4515) Fix an documentation error related to text file splitting
Deneche A. Hakim created DRILL-4515: --- Summary: Fix an documentation error related to text file splitting Key: DRILL-4515 URL: https://issues.apache.org/jira/browse/DRILL-4515 Project: Apache Drill Issue Type: Improvement Components: Documentation Reporter: Deneche A. Hakim In this documentation page: http://drill.apache.org/docs/text-files-csv-tsv-psv/ We can read the following: {quote} Using a distributed file system, such as HDFS, instead of a local file system to query the files also improves performance because currently Drill does not split files on block splits. {quote} Drill actually attempts to split files on block boundaries when running on HDFS and MapRFS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4518) Two or more columns present in of IN predicate, query returns wrong results.
Khurram Faraaz created DRILL-4518: - Summary: Two or more columns present in of IN predicate, query returns wrong results. Key: DRILL-4518 URL: https://issues.apache.org/jira/browse/DRILL-4518 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Affects Versions: 1.7.0 Environment: 4 node cluster CentOS Reporter: Khurram Faraaz Two or more columns present in of IN predicate, query returns wrong results. Drill 1.7.0-SNAPSHOT git commit ID: 245da979 {noformat} 0: jdbc:drill:schema=dfs.tmp> alter system set `store.json.all_text_mode`=true; +---++ | ok | summary | +---++ | true | store.json.all_text_mode updated. | +---++ 1 row selected (0.15 seconds) 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM `f_20160316.json` t WHERE (t.c1) IN (1234,345643); +---+ | c1 | +---+ | 1234 | +---+ 1 row selected (0.292 seconds) 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM `f_20160316.json` t WHERE (t.c2) IN (1234,345643); +---+ | c1 | +---+ | null | +---+ 1 row selected (0.224 seconds) 0: jdbc:drill:schema=dfs.tmp> SELECT * FROM `f_20160316.json` t WHERE (t.c1,t.c2) IN (1234,345643); Error: VALIDATION ERROR: From line 1, column 35 to line 1, column 68: Values passed to IN operator must have compatible types SQL Query null [Error Id: 740e94a7-b61b-4dbf-96f3-8166c4f94164 on centos-04.qa.lab:31010] (state=,code=0) Stack trace from drillbit.log for above failure. 2016-03-17 06:57:40,227 [2915aa9b-381a-119d-2814-711fea9dd07c:foreman] INFO o.a.drill.exec.work.foreman.Foreman - Query text for query id 2915aa9b-381a-119d-2814-711fea9dd07c: SELECT * FROM `f_20160316.json` t WHERE (t.c1,t.c2) IN (1234,345643) 2016-03-17 06:57:40,286 [2915aa9b-381a-119d-2814-711fea9dd07c:foreman] INFO o.a.d.exec.planner.sql.SqlConverter - User Error Occurred org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: From line 1, column 35 to line 1, column 68: Values passed to IN operator must have compatible types SQL Query null [Error Id: 740e94a7-b61b-4dbf-96f3-8166c4f94164 ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543) ~[drill-common-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.SqlConverter.validate(SqlConverter.java:157) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:581) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:192) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:94) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:927) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:251) [drill-java-exec-1.7.0-SNAPSHOT.jar:1.7.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_45] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45] Caused by: org.apache.calcite.runtime.CalciteContextException: From line 1, column 35 to line 1, column 68: Values passed to IN operator must have compatible types at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.7.0_45] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) ~[na:1.7.0_45] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.7.0_45] at java.lang.reflect.Constructor.newInstance(Constructor.java:526) ~[na:1.7.0_45] at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:405) ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10] at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:714) ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10] at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:702) ~[calcite-core-1.4.0-drill-r10.jar:1.4.0-drill-r10] at org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:3931)
[jira] [Closed] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files
[ https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victoria Markman closed DRILL-4392. --- > CTAS with partition writes an internal field into generated parquet files > - > > Key: DRILL-4392 > URL: https://issues.apache.org/jira/browse/DRILL-4392 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni >Priority: Blocker > Fix For: 1.6.0 > > > On today's master branch: > {code} > select * from sys.version; > +-+---+-++-++ > | version | commit_id | > commit_message|commit_time > | build_email | build_time | > +-+---+-++-++ > | 1.5.0-SNAPSHOT | 9a3a5c4ff670a50a49f61f97dd838da59a12f976 | DRILL-4382: > Remove dependency on drill-logical from vector package | 16.02.2016 @ > 11:58:48 PST | j...@apache.org | 16.02.2016 @ 17:40:44 PST | > +-+---+-++- > {code} > Parquet table created by Drill's CTAS statement has one internal field > "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R". This additional field would not > impact non-star query, but would cause incorrect result for star query. > {code} > use dfs.tmp; > create table nation_ctas partition by (n_regionkey) as select * from > cp.`tpch/nation.parquet`; > select * from dfs.tmp.nation_ctas limit 6; > +--++--+-++ > | n_nationkey | n_name | n_regionkey | > n_comment > | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R | > +--++--+-++ > | 5| ETHIOPIA | 0| ven packages wake quickly. > regu >| true | > | 15 | MOROCCO| 0| rns. blithely bold courts > among the closely regular packages use furiously bold platelets? > | false | > | 14 | KENYA | 0| pending excuses haggle > furiously deposits. pending, express pinto beans wake fluffily past t > | false | > | 0| ALGERIA| 0| haggle. carefully final > deposits detect slyly agai > | false | > | 16 | MOZAMBIQUE | 0| s. ironic, unusual > asymptotes wake blithely r >| false | > | 24 | UNITED STATES | 1| y final packages. slow foxes > cajole quickly. quickly silent platelets breach ironic accounts. unusual > pinto be | true > {code} > This basically breaks all the parquet files created by Drill's CTAS with > partition support. > Also, it will also fail one of the Pre-commit functional test [1] > [1] > https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4516) Transform SUM(1) query to COUNT(1)
[ https://issues.apache.org/jira/browse/DRILL-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sudip Mukherjee updated DRILL-4516: --- Affects Version/s: 1.4.0 > Transform SUM(1) query to COUNT(1) > -- > > Key: DRILL-4516 > URL: https://issues.apache.org/jira/browse/DRILL-4516 > Project: Apache Drill > Issue Type: Improvement > Components: Query Planning & Optimization >Affects Versions: 1.4.0 >Reporter: Sudip Mukherjee > > If we connect drill with tableau we see some query requests like , select > sum(1) tablename. > This results in pulling all the records out of the underlying datasource and > aggregate them to get row count. > The behavior can be optimized if the query gets transformed into a count(1) > query which is likely to be optimized at the datasource level -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4402) pushing unsupported full outer join to Postgres
[ https://issues.apache.org/jira/browse/DRILL-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199542#comment-15199542 ] Taras Supyk commented on DRILL-4402: Looks like this bug is already fixed in new version of calcite. > pushing unsupported full outer join to Postgres > --- > > Key: DRILL-4402 > URL: https://issues.apache.org/jira/browse/DRILL-4402 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JDBC >Affects Versions: 1.5.0 >Reporter: N Campbell >Assignee: Taras Supyk > > Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the > SQL query. > sql SELECT * > FROM "public"."tjoin1" > FULL JOIN "public"."tjoin2" ON "tjoin1"."c1" < "tjoin2"."c1" > plugin postgres > Fragment 0:0 > [Error Id: bc54cf76-f4ff-474c-b3df-fa357bdf0ff8 on centos1:31010] > (org.postgresql.util.PSQLException) ERROR: FULL JOIN is only supported with > merge-joinable or hash-joinable join conditions > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse():2182 > org.postgresql.core.v3.QueryExecutorImpl.processResults():1911 > org.postgresql.core.v3.QueryExecutorImpl.execute():173 > org.postgresql.jdbc.PgStatement.execute():622 > org.postgresql.jdbc.PgStatement.executeWithFlags():458 > org.postgresql.jdbc.PgStatement.executeQuery():374 > org.apache.commons.dbcp.DelegatingStatement.executeQuery():208 > org.apache.commons.dbcp.DelegatingStatement.executeQuery():208 > org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup():177 > org.apache.drill.exec.physical.impl.ScanBatch.():108 > org.apache.drill.exec.physical.impl.ScanBatch.():136 > org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():40 > org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():33 > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():147 > org.apache.drill.exec.physical.impl.ImplCreator.getChildren():170 > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():127 > org.apache.drill.exec.physical.impl.ImplCreator.getChildren():170 > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():127 > org.apache.drill.exec.physical.impl.ImplCreator.getChildren():170 > org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():101 > org.apache.drill.exec.physical.impl.ImplCreator.getExec():79 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():230 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > SQLState: null > ErrorCode: 0 > create table TJOIN1 (RNUM integer not null , C1 integer, C2 integer); > insert into TJOIN1 (RNUM, C1, C2) values ( 0, 10, 15); > insert into TJOIN1 (RNUM, C1, C2) values ( 1, 20, 25); > insert into TJOIN1 (RNUM, C1, C2) values ( 2, NULL, 50); > create table TJOIN2 (RNUM integer not null , C1 integer, C2 char(2)); > insert into TJOIN2 (RNUM, C1, C2) values ( 0, 10, 'BB'); > insert into TJOIN2 (RNUM, C1, C2) values ( 1, 15, 'DD'); > insert into TJOIN2 (RNUM, C1, C2) values ( 2, NULL, 'EE'); > insert into TJOIN2 (RNUM, C1, C2) values ( 3, 10, 'FF'); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4383) Allow passign custom configuration options to a file system through the storage plugin config
[ https://issues.apache.org/jira/browse/DRILL-4383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Ollala updated DRILL-4383: - Reviewer: Chun Chang > Allow passign custom configuration options to a file system through the > storage plugin config > - > > Key: DRILL-4383 > URL: https://issues.apache.org/jira/browse/DRILL-4383 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Reporter: Jason Altekruse >Assignee: Jason Altekruse > Fix For: 1.6.0 > > > A similar feature already exists in the Hive and Hbase plugins, it simply > provides a key/value map for passing custom configuration options to the > underlying storage system. > This would be useful for the filesystem plugin to configure S3 without > needing to create a core-site.xml file or restart Drill. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4484) NPE when querying empty directory
[ https://issues.apache.org/jira/browse/DRILL-4484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deneche A. Hakim resolved DRILL-4484. - Resolution: Fixed Fixed in 71608ca > NPE when querying empty directory > --- > > Key: DRILL-4484 > URL: https://issues.apache.org/jira/browse/DRILL-4484 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.5.0 >Reporter: Victoria Markman >Assignee: Deneche A. Hakim > Fix For: 1.7.0 > > > {code} > 0: jdbc:drill:drillbit=localhost> select count(*) from > dfs.`/drill/xyz/201604*`; > Error: VALIDATION ERROR: null > SQL Query null > 0: jdbc:drill:drillbit=localhost> select count(*) from > dfs.`/drill/xyz/20160401`; > Error: VALIDATION ERROR: null > SQL Query null > [Error Id: 87366a2d-fc90-42f3-a076-aed5efdd27cb on atsqa4-133.qa.lab:31010] > (state=,code=0) > 0: jdbc:drill:drillbit=localhost> select count(*) from > dfs.`/drill/xyz/20160401/`; > Error: VALIDATION ERROR: null > SQL Query null > [Error Id: ac122243-488e-4fb8-b89f-dc01c7e5c63a on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > {code} > [Mon Mar 07 15:00:19 root@/drill/xyz ] # ls -lR > .: > total 5 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160101 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160102 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160103 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160104 > drwxr-xr-x 2 root root 2 Feb 26 16:31 20160105 > drwxr-xr-x 2 root root 1 Feb 26 16:31 20160201 > drwxr-xr-x 2 root root 3 Feb 26 16:31 20160202 > drwxr-xr-x 2 root root 4 Feb 26 16:31 20160301 > drwxr-xr-x 2 root root 0 Feb 26 16:31 20160401 > ./20160101: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160102: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160103: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160104: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160105: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > ./20160201: > total 0 > ./20160202: > total 1 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet > ./20160301: > total 2 > -rw-r--r-- 1 root root 395 Feb 26 16:31 0_0_0.parquet > -rw-r--r-- 1 root root 395 Feb 26 16:31 1_0_0.parquet > -rw-r--r-- 1 root root 395 Feb 26 16:31 2_0_0.parquet > ./20160401: > total 0 > {code} > Hakim's analysis: > {code} > More details about the NPE, actually it's an IllegalArgumentException: what > happens is that during planing no file meets the wildcard selection and the > query should fail during planing with a "Table not found" message, instead > execution starts and the scanner fail because no file was assigned to them > {code} > Drill version: > {code} > #Generated by Git-Commit-Id-Plugin > #Mon Mar 07 19:38:24 UTC 2016 > git.commit.id.abbrev=a2fec78 > git.commit.user.email=adene...@gmail.com > git.commit.message.full=DRILL-4457\: Difference in results returned by window > function over BIGINT data\n\nthis closes \#410\n > git.commit.id=a2fec78695df979e240231cb9d32c7f18274a333 > git.commit.message.short=DRILL-4457\: Difference in results returned by > window function over BIGINT data > git.commit.user.name=adeneche > git.build.user.name=Unknown > git.commit.id.describe=0.9.0-625-ga2fec78-dirty > git.build.user.email=Unknown > git.branch=master > git.commit.time=07.03.2016 @ 17\:38\:42 UTC > git.build.time=07.03.2016 @ 19\:38\:24 UTC > git.remote.origin.url=https\://github.com/apache/drill > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4521) Drill doesn't correctly treat VARIANCE and STDDEV as two phase aggregates
Jacques Nadeau created DRILL-4521: - Summary: Drill doesn't correctly treat VARIANCE and STDDEV as two phase aggregates Key: DRILL-4521 URL: https://issues.apache.org/jira/browse/DRILL-4521 Project: Apache Drill Issue Type: Bug Reporter: Jacques Nadeau Assignee: MinJi Kim These are supposed to be synonyms with STDDEV_POP and VARIANCE_POP but they are handled differently. This causes the reduce aggregates rule to not reduce these and thus they are handled as single phase aggregates. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4517) Reading emtpy Parquet file failes with java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/DRILL-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199442#comment-15199442 ] Tobias commented on DRILL-4517: --- Is this fixed on head (1.7) as mentioned in DRILL-2223? If so we can build our own version > Reading emtpy Parquet file failes with java.lang.IllegalArgumentException > - > > Key: DRILL-4517 > URL: https://issues.apache.org/jira/browse/DRILL-4517 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: Tobias > > When querying a Parquet file that has a schema but no rows the Drill Server > will fail with the below > This looks similar to DRILL-3557 > {noformat} > {{ParquetMetaData{FileMetaData{schema: message TRANSACTION_REPORT { > required int64 MEMBER_ACCOUNT_ID; > required int64 TIMESTAMP_IN_HOUR; > optional int64 APPLICATION_ID; > } > , metadata: {}}}, blocks: []} > {noformat} > {noformat} > Caused by: java.lang.IllegalArgumentException: MinorFragmentId 0 has no read > entries assigned > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) > ~[guava-14.0.1.jar:na] > at > org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:707) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:105) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:68) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:35) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.base.AbstractGroupScan.accept(AbstractGroupScan.java:60) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:102) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:35) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProject(AbstractPhysicalVisitor.java:77) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.config.Project.accept(Project.java:51) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:82) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:35) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:195) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.config.Screen.accept(Screen.java:97) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.SimpleParallelizer.generateWorkUnit(SimpleParallelizer.java:355) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.SimpleParallelizer.getFragments(SimpleParallelizer.java:134) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.work.foreman.Foreman.getQueryWorkUnit(Foreman.java:518) > [drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:405) > [drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:926) > [drill-java-exec-1.5.0.jar:1.5.0] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4520) Error parsing JSON ( a column with different datatypes )
Shankar created DRILL-4520: -- Summary: Error parsing JSON ( a column with different datatypes ) Key: DRILL-4520 URL: https://issues.apache.org/jira/browse/DRILL-4520 Project: Apache Drill Issue Type: Test Reporter: Shankar I am stuck in the middle of somewhere. Could you please help me to resolve below error. I am running query on drill 1.6.0 in cluster on logs json data (150GB size of log file) ( 1 json / line). {quote} solution as per my opinion - 1. Either drill should able to ignore those lines(ANY data type) while reading or creating the table (CTAS). 2. Or Data will get stored as it is with ANY data type if any fields in data differs in their data types. This will be useful in the case where other columns (excluding ANY data type columns) carrying important informations. {quote} h4. -- test.json -- Abount Data : 1. I have just extract 3 lines from logs for test purpose. 2. In data field called "ajaxUrl" is differ in datatype. Sometimes it contains string and sometime array of jsons and null as well. 3. Here in our case - Some events in 150 gb json file are like this where they differ in structure. I could say there are only 0.1% (per 150gb json file) are such events. {noformat} {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus1","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457658600032} {"gameId":"https://daemon2.com/tournDetails.do?type=myGames=1556148_callback=jQuery213043","ajaxData":null,"metadata":null,"ajaxUrl":[{"R":0,"rNo":1,"gid":4,"wal":0,"d":{"gid":4,"pt":3,"wc":2326,"top":"1","reg":true,"brkt":1457771400268,"sk":"2507001010530109","id":56312439,"a":0,"st":145777140,"e":"0.0","j":0,"n":"Loot Qualifier 1","tc":94,"et":0,"syst":1457771456,"rc":14577,"s":5,"t":1,"tk":false,"prnId":56311896,"jc":1,"tp":"10.0","ro":14540,"rp":0,"isprn":false},"fl":"192.168.35.42","aaid":"5828"}],"selectedItem":null,"sessionid":"D18104E8CA3071C7A8F4E141B127","timestamp":1457771458873} {"ajaxData":null,"metadata":null,"ajaxUrl":"/player/updatebonus2","selectedItem":null,"sessionid":"BC497C7C39B3C90AC9E6E9E8194C3","timestamp":1457958600032} {noformat} h4. -- Select Query (ERROR) -- {noformat} select `timestamp`, sessionid, gameid, ajaxUrl, ajaxData from dfs.`/tmp/test.json` t ; {noformat} {color:red} Error: DATA_READ ERROR: Error parsing JSON - You tried to start when you are using a ValueWriter of type NullableVarCharWriterImpl. File /tmp/test.json Record 2 Fragment 0:0 {color} h4. -- Select Query (works Fine with UNION type) -- Tried UNION type (an experimental feature) set `exec.enable_union_type` = true; {noformat} set `exec.enable_union_type` = true; +---+--+ | ok | summary | +---+--+ | true | exec.enable_union_type updated. | +---+--+ 1 row selected (0.193 seconds) select `timestamp`, sessionid, gameid, ajaxUrl, ajaxData from dfs.`/tmp/test.json` t ; +++--+---+---+ | timestamp| sessionid| gameid| ajaxUrl| ajaxData | +++--+---+---+ | 1457658600032 | BC497C7C39B3C90AC9E6E9E8194C3 | null | /player/updatebonus1 | null | | 1457771458873 | D18104E8CA3071C7A8F4E141B127 | https://daemon2.com/tournDetails.do?type=myGames=1556148_callback=jQuery213043 | []| null | | 1457958600032 | BC497C7C39B3C90AC9E6E9E8194C3 | null | /player/updatebonus2 | null | +++--+---+---+ 3 rows selected (0.965 seconds) {noformat} h4. -- CTAS Query (ERROR) -- {noformat} set `exec.enable_union_type` = true; +---+--+ | ok | summary | +---+--+ | true | exec.enable_union_type updated. | +---+--+ 1 row selected (0.193 seconds) create table dfs.tmp.test1 AS select `timestamp`, sessionid, gameid, ajaxUrl,
[jira] [Commented] (DRILL-4203) Parquet File : Date is stored wrongly
[ https://issues.apache.org/jira/browse/DRILL-4203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199633#comment-15199633 ] Laurent Breuillard commented on DRILL-4203: --- Hi all, Have you some news about the pull request of this fix ? I saw that it is flagged 1.6.0 but the issue is still unresolved and release 1.6.0 is available since March 16, 2016. Thank you > Parquet File : Date is stored wrongly > - > > Key: DRILL-4203 > URL: https://issues.apache.org/jira/browse/DRILL-4203 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.4.0 >Reporter: Stéphane Trou >Assignee: Jason Altekruse >Priority: Critical > > Hello, > I have some problems when i try to read parquet files produce by drill with > Spark, all dates are corrupted. > I think the problem come from drill :) > {code} > cat /tmp/date_parquet.csv > Epoch,1970-01-01 > {code} > {code} > 0: jdbc:drill:zk=local> select columns[0] as name, cast(columns[1] as date) > as epoch_date from dfs.tmp.`date_parquet.csv`; > ++-+ > | name | epoch_date | > ++-+ > | Epoch | 1970-01-01 | > ++-+ > {code} > {code} > 0: jdbc:drill:zk=local> create table dfs.tmp.`buggy_parquet`as select > columns[0] as name, cast(columns[1] as date) as epoch_date from > dfs.tmp.`date_parquet.csv`; > +---++ > | Fragment | Number of records written | > +---++ > | 0_0 | 1 | > +---++ > {code} > When I read the file with parquet tools, i found > {code} > java -jar parquet-tools-1.8.1.jar head /tmp/buggy_parquet/ > name = Epoch > epoch_date = 4881176 > {code} > According to > [https://github.com/Parquet/parquet-format/blob/master/LogicalTypes.md#date], > epoch_date should be equals to 0. > Meta : > {code} > java -jar parquet-tools-1.8.1.jar meta /tmp/buggy_parquet/ > file:file:/tmp/buggy_parquet/0_0_0.parquet > creator: parquet-mr version 1.8.1-drill-r0 (build > 6b605a4ea05b66e1a6bf843353abcb4834a4ced8) > extra: drill.version = 1.4.0 > file schema: root > > name:OPTIONAL BINARY O:UTF8 R:0 D:1 > epoch_date: OPTIONAL INT32 O:DATE R:0 D:1 > row group 1: RC:1 TS:93 OFFSET:4 > > name: BINARY SNAPPY DO:0 FPO:4 SZ:52/50/0,96 VC:1 > ENC:RLE,BIT_PACKED,PLAIN > epoch_date: INT32 SNAPPY DO:0 FPO:56 SZ:45/43/0,96 VC:1 > ENC:RLE,BIT_PACKED,PLAIN > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4517) Reading emtpy Parquet file failes with java.lang.IllegalArgumentException
[ https://issues.apache.org/jira/browse/DRILL-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199373#comment-15199373 ] Khurram Faraaz commented on DRILL-4517: --- This needs to be fixed sooner, that is because we need to use empty parquet files in Union All tests, to verify that empty input on either side of Union All operator is handled properly. > Reading emtpy Parquet file failes with java.lang.IllegalArgumentException > - > > Key: DRILL-4517 > URL: https://issues.apache.org/jira/browse/DRILL-4517 > Project: Apache Drill > Issue Type: Bug > Components: Server >Reporter: Tobias > > When querying a Parquet file that has a schema but no rows the Drill Server > will fail with the below > This looks similar to DRILL-3557 > {noformat} > {{ParquetMetaData{FileMetaData{schema: message TRANSACTION_REPORT { > required int64 MEMBER_ACCOUNT_ID; > required int64 TIMESTAMP_IN_HOUR; > optional int64 APPLICATION_ID; > } > , metadata: {}}}, blocks: []} > {noformat} > {noformat} > Caused by: java.lang.IllegalArgumentException: MinorFragmentId 0 has no read > entries assigned > at > com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) > ~[guava-14.0.1.jar:na] > at > org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:707) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:105) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:68) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:35) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.base.AbstractGroupScan.accept(AbstractGroupScan.java:60) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:102) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:35) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProject(AbstractPhysicalVisitor.java:77) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.config.Project.accept(Project.java:51) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:82) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:35) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:195) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.physical.config.Screen.accept(Screen.java:97) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.SimpleParallelizer.generateWorkUnit(SimpleParallelizer.java:355) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.planner.fragment.SimpleParallelizer.getFragments(SimpleParallelizer.java:134) > ~[drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.work.foreman.Foreman.getQueryWorkUnit(Foreman.java:518) > [drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:405) > [drill-java-exec-1.5.0.jar:1.5.0] > at > org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:926) > [drill-java-exec-1.5.0.jar:1.5.0] > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3745) Hive CHAR not supported
[ https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197551#comment-15197551 ] ASF GitHub Bot commented on DRILL-3745: --- Github user arina-ielchiieva closed the pull request at: https://github.com/apache/drill/pull/399 > Hive CHAR not supported > --- > > Key: DRILL-3745 > URL: https://issues.apache.org/jira/browse/DRILL-3745 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Nathaniel Auvil >Assignee: Arina Ielchiieva > Labels: doc-impacting > > It doesn’t look like Drill 1.1.0 supports the Hive CHAR type? > In Hive: > create table development.foo > ( > bad CHAR(10) > ); > And then in sqlline: > > use `hive.development`; > > select * from foo; > Error: PARSE ERROR: Unsupported Hive data type CHAR. > Following Hive data types are supported in Drill INFORMATION_SCHEMA: > BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, > BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION > [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] > (state=,code=0) > This was originally found when getting failures trying to connect via JDBS > using Squirrel. We have the Hive plugin enabled with tables using CHAR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-4519) File system directory-based partition pruning doesn't work correctly with parquet metadata
[ https://issues.apache.org/jira/browse/DRILL-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Miroslav Holubec updated DRILL-4519: Description: We have parquet files in folders with following convention /MM/DD/HH. Without drill's parquet metadata directory prunning works seamlessly. {noformat} select dir0, dir1, dir2 from hdfs.test.indexed; dir0 = , dir1 = MM, dir2 = DD, dir3 = HH {noformat} After creating metadata and executing same query, dir0 contains HH folder name instead yearly folder name. dir1...3 are null. {noformat} select dir0, dir1, dir2 from hdfs.test.indexed; dir0 = HH, dir1 = null, dir2 = null, dir3 = null {noformat} was: We have parquet files in folders with following convention /MM/DD/HH. Without drill's parquet metadata directory prunning works seamlessly. {noformat} select dir0, dir1, dir2 from hdfs.test.indexed; dir0 = , dir1 = MM, dir2 = DD, dir3 = HH {noformat} After creating metadata and executing same query, dir0 contains HH folder name instead yearly folder name. dir1...4 are null. {noformat} select dir0, dir1, dir2 from hdfs.test.indexed; dir0 = HH, dir1 = null, dir2 = null, dir3 = null {noformat} > File system directory-based partition pruning doesn't work correctly with > parquet metadata > -- > > Key: DRILL-4519 > URL: https://issues.apache.org/jira/browse/DRILL-4519 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.4.0, 1.5.0 >Reporter: Miroslav Holubec > > We have parquet files in folders with following convention /MM/DD/HH. > Without drill's parquet metadata directory prunning works seamlessly. > {noformat} > select dir0, dir1, dir2 from hdfs.test.indexed; > dir0 = , dir1 = MM, dir2 = DD, dir3 = HH > {noformat} > After creating metadata and executing same query, dir0 contains HH folder > name instead yearly folder name. dir1...3 are null. > {noformat} > select dir0, dir1, dir2 from hdfs.test.indexed; > dir0 = HH, dir1 = null, dir2 = null, dir3 = null > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (DRILL-4501) Complete MapOrListWriter for all supported data types
[ https://issues.apache.org/jira/browse/DRILL-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aditya Kishore resolved DRILL-4501. --- Resolution: Fixed Resolved by [245da97|https://fisheye6.atlassian.com/changelog/incubator-drill?cs=245da9790813569c5da9404e0fc5e45cc88e22bb]. > Complete MapOrListWriter for all supported data types > - > > Key: DRILL-4501 > URL: https://issues.apache.org/jira/browse/DRILL-4501 > Project: Apache Drill > Issue Type: Improvement > Components: Execution - Data Types >Affects Versions: 1.6.0 >Reporter: Aditya Kishore >Assignee: Aditya Kishore > Fix For: 1.7.0 > > > This interface, at this time, does not include support for many data types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2223) Empty parquet file created with Limit 0 query errors out when querying
[ https://issues.apache.org/jira/browse/DRILL-2223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199393#comment-15199393 ] Khurram Faraaz commented on DRILL-2223: --- Like @amansinha100 said, this should at least write the schema information and metadata which will allow queries to run. I believe that is the correct approach to solve this problem. > Empty parquet file created with Limit 0 query errors out when querying > -- > > Key: DRILL-2223 > URL: https://issues.apache.org/jira/browse/DRILL-2223 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Parquet >Affects Versions: 0.7.0 >Reporter: Aman Sinha > Fix For: Future > > > Doing a CTAS with limit 0 creates a 0 length parquet file which errors out > during querying. This should at least write the schema information and > metadata which will allow queries to run. > {code} > 0: jdbc:drill:zk=local> create table tt_nation2 as select n_nationkey, > n_name, n_regionkey from cp.`tpch/nation.parquet` limit 0; > ++---+ > | Fragment | Number of records written | > ++---+ > | 0_0| 0 | > ++---+ > 1 row selected (0.315 seconds) > 0: jdbc:drill:zk=local> select n_nationkey from tt_nation2; > Query failed: RuntimeException: file:/tmp/tt_nation2/0_0_0.parquet is not a > Parquet file (too small) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-2610) Local File System Storage Plugin
[ https://issues.apache.org/jira/browse/DRILL-2610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201345#comment-15201345 ] Austin Chungath Vincent commented on DRILL-2610: Interesting, having access to the logs on every node is cool. I am going to try working on this. > Local File System Storage Plugin > > > Key: DRILL-2610 > URL: https://issues.apache.org/jira/browse/DRILL-2610 > Project: Apache Drill > Issue Type: New Feature > Components: Storage - Other >Affects Versions: 0.8.0 >Reporter: Sudheesh Katkam > Fix For: Future > > > Create a storage plugin to query files on the local file system on the nodes > in the cluster. For example, users should be able to query log files in > /var/log/drill/ on all nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4317) Exceptions on SELECT and CTAS with large CSV files
[ https://issues.apache.org/jira/browse/DRILL-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197208#comment-15197208 ] Deneche A. Hakim commented on DRILL-4317: - I found a bug in TextInput.updateLengthBasedOnConstraint() when Drill splits csv files. In most cases it works fine but when the split line ends with an empty value AND one of the previous rows in the same last batch contain a value in the last column we see the exception described above. > Exceptions on SELECT and CTAS with large CSV files > -- > > Key: DRILL-4317 > URL: https://issues.apache.org/jira/browse/DRILL-4317 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV >Affects Versions: 1.4.0, 1.5.0 > Environment: 4 node cluster, Hadoop 2.7.0, 14.04.1-Ubuntu >Reporter: Matt Keranen >Assignee: Deneche A. Hakim > > Selecting from a CSV file or running a CTAS into Parquet generates exceptions. > Source file is ~650MB, a table of 4 key columns followed by 39 numeric data > columns, otherwise a fairly simple format. Example: > {noformat} > 2015-10-17 > 00:00,f5e9v8u2,err,fr7,226020793,76.094,26307,226020793,76.094,26307, > 2015-10-17 > 00:00,c3f9x5z2,err,mi1,1339159295,216.004,177690,1339159295,216.004,177690, > 2015-10-17 > 00:00,r5z2f2i9,err,mi1,7159994629,39718.011,65793,6142021303,30687.811,64630,143777403,40.521,146,75503742,41.905,89,170771174,168.165,198,192565529,370.475,222,97577280,318.068,120,62631452,288.253,68,32371173,189.527,39,41712265,299.184,46,39046408,363.418,47,34182318,465.343,43,127834582,6485.341,145 > 2015-10-17 > 00:00,j9s6i8t2,err,fr7,20580443899,277445.055,67826,2814893469,85447.816,54275,2584757097,608.001,2044,1395571268,769.113,1051,3070616988,3000.005,2284,3413811671,6489.060,2569,1772235156,5806.214,1339,1097879284,5064.120,858,691884865,4035.397,511,672967845,4815.875,518,789163614,7306.684,599,813910495,10632.464,627,1462752147,143470.306,1151 > {noformat} > A "SELECT from `/path/to/file.csv`" runs for 10's of minutes and eventually > results in: > {noformat} > java.lang.IndexOutOfBoundsException: index: 547681, length: 1 (expected: > range(0, 547681)) > at > io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1134) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136) > at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289) > at > io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:26) > at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586) > at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586) > at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586) > at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586) > at > org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:443) > at > org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:125) > at > org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:146) > at > org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:136) > at > org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:94) > at > org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148) > at > org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795) > at > org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179) > at > net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420) > at sqlline.Rows$Row.(Rows.java:157) > at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63) > at > sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) > at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) > at sqlline.SqlLine.print(SqlLine.java:1593) > at sqlline.Commands.execute(Commands.java:852) > at sqlline.Commands.sql(Commands.java:751) > at sqlline.SqlLine.dispatch(SqlLine.java:746) > at sqlline.SqlLine.begin(SqlLine.java:621) > at sqlline.SqlLine.start(SqlLine.java:375) > at sqlline.SqlLine.main(SqlLine.java:268) > {noformat} > A CTAS on the same file with storage as Parquet results in: > {noformat} > Error: SYSTEM ERROR: IllegalArgumentException: length: -260 (expected: >= 0) > Fragment 1:2 > [Error Id: 1807615e-4385-4f85-8402-5900aaa568e9 on
[jira] [Commented] (DRILL-4436) Result data gets mixed up when various tables have a column "label"
[ https://issues.apache.org/jira/browse/DRILL-4436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15199767#comment-15199767 ] Serge Harnyk commented on DRILL-4436: - I think the problem is more serious. I created two same tables Gender2 and Civility2 with columns "label2" in BOTH tables. select gender2.label2 as label2 from postgres.public.gender2 join postgres.public.civility2 on genderId = civilityId returns: civilityLabel select civility2.label2 as label2 from postgres.public.gender2 join postgres.public.civility2 on genderId = civilityId returns: null select gender2.label2, civility2.label2 from postgres.public.gender2 join postgres.public.civility2 on genderId = civilityId returns: civilityLabel null Project step works wrong when we select column with same name that in second table. Regardless its name. > Result data gets mixed up when various tables have a column "label" > --- > > Key: DRILL-4436 > URL: https://issues.apache.org/jira/browse/DRILL-4436 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JDBC >Affects Versions: 1.5.0 > Environment: Drill 1.5.0 with Zookeeper on CentOS 7.0 >Reporter: Vincent Uribe >Assignee: Serge Harnyk > > We have two tables in a MySQL database: > CREATE TABLE `Gender` ( > `genderId` bigint(20) NOT NULL AUTO_INCREMENT, > `label` varchar(15) NOT NULL, > PRIMARY KEY (`genderId`) > ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1; > CREATE TABLE `Civility` ( > `civilityId` bigint(20) NOT NULL AUTO_INCREMENT, > `abbreviation` varchar(15) NOT NULL, > `label` varchar(60) DEFAULT NULL > PRIMARY KEY (`civilityId`) > ) ENGINE=InnoDB AUTO_INCREMENT=6 DEFAULT CHARSET=latin1; > With a query on these two tables with Gender.label as 'gender' and > Civility.label as 'civility', we obtain, depending of the query : > * gender in civility > * civility in the gender > * NULL in the other column (gender or civility) > if we drop the table Gender and recreate it with like this: > CREATE TABLE `Gender` ( > `genderId` bigint(20) NOT NULL AUTO_INCREMENT, > `label2` varchar(15) NOT NULL, > PRIMARY KEY (`genderId`)select * from Gender > ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=latin1; > Everything is fine. > I guess something is wrong with the metadata... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4398) SYSTEM ERROR: IllegalStateException: Memory was leaked by query
[ https://issues.apache.org/jira/browse/DRILL-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15201810#comment-15201810 ] Matt Keranen commented on DRILL-4398: - Getting similar in 1.6.0 with CTAS into Parquet from csv data stored in HDFS. > SYSTEM ERROR: IllegalStateException: Memory was leaked by query > --- > > Key: DRILL-4398 > URL: https://issues.apache.org/jira/browse/DRILL-4398 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JDBC >Affects Versions: 1.5.0 >Reporter: N Campbell >Assignee: Taras Supyk > > Several queries fail with memory leaked errors > select tjoin2.rnum, tjoin1.c1, tjoin2.c1 as c1j2, tjoin2.c2 as c2j2 from > postgres.public.tjoin1 full outer join postgres.public.tjoin2 on tjoin1.c1 = > tjoin2.c1 > select tjoin1.rnum, tjoin1.c1, tjoin2.c1 as c1j2, tjoin2.c2 from > postgres.public.tjoin1, lateral ( select tjoin2.c1, tjoin2.c2 from > postgres.public.tjoin2 where tjoin1.c1=tjoin2.c1) tjoin2 > SYSTEM ERROR: IllegalStateException: Memory was leaked by query. Memory > leaked: (40960) > Allocator(op:0:0:3:JdbcSubScan) 100/40960/135168/100 > (res/actual/peak/limit) > create table TJOIN1 (RNUM integer not null , C1 integer, C2 integer); > insert into TJOIN1 (RNUM, C1, C2) values ( 0, 10, 15); > insert into TJOIN1 (RNUM, C1, C2) values ( 1, 20, 25); > insert into TJOIN1 (RNUM, C1, C2) values ( 2, NULL, 50); > create table TJOIN2 (RNUM integer not null , C1 integer, C2 char(2)); > insert into TJOIN2 (RNUM, C1, C2) values ( 0, 10, 'BB'); > insert into TJOIN2 (RNUM, C1, C2) values ( 1, 15, 'DD'); > insert into TJOIN2 (RNUM, C1, C2) values ( 2, NULL, 'EE'); > insert into TJOIN2 (RNUM, C1, C2) values ( 3, 10, 'FF'); -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4405) invalid Postgres SQL generated for CONCAT (literal, literal)
[ https://issues.apache.org/jira/browse/DRILL-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197767#comment-15197767 ] Serge Harnyk commented on DRILL-4405: - Calcite doesnt have a CONCAT() as function, only "||" operator. When Drill parse query it set DrillSqlOperator as SqlOperator. And DrillSqlOperator on the inferReturnType step has only two options for return: Boolean for MinorType.BIT and "ANY" for another. Thats affects a lot of non-calcite functions like "PI". > invalid Postgres SQL generated for CONCAT (literal, literal) > - > > Key: DRILL-4405 > URL: https://issues.apache.org/jira/browse/DRILL-4405 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JDBC >Affects Versions: 1.5.0 >Reporter: N Campbell >Assignee: Serge Harnyk > > select concat( 'FF' , 'FF' ) from postgres.public.tversion > Error: DATA_READ ERROR: The JDBC storage plugin failed while trying setup the > SQL query. > sql SELECT CAST('' AS ANY) AS "EXPR$0" > FROM "public"."tversion" > plugin postgres > Fragment 0:0 > [Error Id: c3f24106-8d75-4a57-a638-ac5f0aca0769 on centos1:31010] > (org.postgresql.util.PSQLException) ERROR: syntax error at or near "ANY" > Position: 23 > org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse():2182 > org.postgresql.core.v3.QueryExecutorImpl.processResults():1911 > org.postgresql.core.v3.QueryExecutorImpl.execute():173 > org.postgresql.jdbc.PgStatement.execute():622 > org.postgresql.jdbc.PgStatement.executeWithFlags():458 > org.postgresql.jdbc.PgStatement.executeQuery():374 > org.apache.commons.dbcp.DelegatingStatement.executeQuery():208 > org.apache.commons.dbcp.DelegatingStatement.executeQuery():208 > org.apache.drill.exec.store.jdbc.JdbcRecordReader.setup():177 > org.apache.drill.exec.physical.impl.ScanBatch.():108 > org.apache.drill.exec.physical.impl.ScanBatch.():136 > org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():40 > org.apache.drill.exec.store.jdbc.JdbcBatchCreator.getBatch():33 > org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():147 > org.apache.drill.exec.physical.impl.ImplCreator.getChildren():170 > org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():101 > org.apache.drill.exec.physical.impl.ImplCreator.getExec():79 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():230 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1142 > java.util.concurrent.ThreadPoolExecutor$Worker.run():617 > java.lang.Thread.run():745 > SQLState: null > ErrorCode: 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4409) projecting literal will result in an empty resultset
[ https://issues.apache.org/jira/browse/DRILL-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197798#comment-15197798 ] Serge Harnyk commented on DRILL-4409: - PostgreSQL JDBC Driver returns metadata code for literal parts of query, which seems java.sql.Types.OTHER. PostgreSQL itself doesnt treat literals as VARCHAR or another string type. For example, MySQL driver returns 12 metadata code, which seems java.sql.Types.VARCHAR. When Drill faces java.sql.Types.OTHER it skips work with cell: org/apache/drill/exec/store/jdbc/JdbcRecordReader.java, line 190 I think Taras cant reproduced this bug on Oracle for same reason. > projecting literal will result in an empty resultset > > > Key: DRILL-4409 > URL: https://issues.apache.org/jira/browse/DRILL-4409 > Project: Apache Drill > Issue Type: Bug > Components: Storage - JDBC >Affects Versions: 1.5.0 >Reporter: N Campbell >Assignee: Serge Harnyk > > A query which projects a literal as shown against a Postgres table will > result in an empty result set being returned. > select 'BB' from postgres.public.tversion -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4517) Reading emtpy Parquet file failes with java.lang.IllegalArgumentException
Tobias created DRILL-4517: - Summary: Reading emtpy Parquet file failes with java.lang.IllegalArgumentException Key: DRILL-4517 URL: https://issues.apache.org/jira/browse/DRILL-4517 Project: Apache Drill Issue Type: Bug Components: Server Reporter: Tobias When querying a Parquet file that has a schema but no rows the Drill Server will fail with the below This looks similar to DRILL-3557 {noformat} {{ParquetMetaData{FileMetaData{schema: message TRANSACTION_REPORT { required int64 MEMBER_ACCOUNT_ID; required int64 TIMESTAMP_IN_HOUR; optional int64 APPLICATION_ID; } , metadata: {}}}, blocks: []} {noformat} {noformat} Caused by: java.lang.IllegalArgumentException: MinorFragmentId 0 has no read entries assigned at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) ~[guava-14.0.1.jar:na] at org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:707) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.store.parquet.ParquetGroupScan.getSpecificScan(ParquetGroupScan.java:105) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:68) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.Materializer.visitGroupScan(Materializer.java:35) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.physical.base.AbstractGroupScan.accept(AbstractGroupScan.java:60) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:102) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.Materializer.visitOp(Materializer.java:35) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitProject(AbstractPhysicalVisitor.java:77) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.physical.config.Project.accept(Project.java:51) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:82) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.Materializer.visitStore(Materializer.java:35) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.physical.base.AbstractPhysicalVisitor.visitScreen(AbstractPhysicalVisitor.java:195) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.physical.config.Screen.accept(Screen.java:97) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.SimpleParallelizer.generateWorkUnit(SimpleParallelizer.java:355) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.planner.fragment.SimpleParallelizer.getFragments(SimpleParallelizer.java:134) ~[drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.work.foreman.Foreman.getQueryWorkUnit(Foreman.java:518) [drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.work.foreman.Foreman.runPhysicalPlan(Foreman.java:405) [drill-java-exec-1.5.0.jar:1.5.0] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:926) [drill-java-exec-1.5.0.jar:1.5.0] {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4459) SchemaChangeException while querying hive json table
[ https://issues.apache.org/jira/browse/DRILL-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197507#comment-15197507 ] ASF GitHub Bot commented on DRILL-4459: --- Github user jaltekruse commented on a diff in the pull request: https://github.com/apache/drill/pull/431#discussion_r56354881 --- Diff: contrib/storage-hive/core/src/test/java/org/apache/drill/exec/fn/hive/TestInbuiltHiveUDFs.java --- @@ -43,4 +47,17 @@ public void testEncode() throws Exception { .baselineValues(new Object[] { null }) .go(); } + + @Test // DRILL-4459 + public void testGetJsonObject() throws Exception { +setColumnWidths(new int[]{260}); +String query = "select * from hive.simple_json where GET_JSON_OBJECT(simple_json.json, '$.DocId') = 'DocId2'"; +List results = testSqlWithResults(query); +String expected = "json\n" + "{\"DocId\":\"DocId2\",\"User\":{\"Id\":122,\"Username\":\"larry122\",\"Name\":" + --- End diff -- Can you specify this baseline as a complex object instead of a string? The testBuilder can be used to check results against java POJOs and it includes helper methods listOF/mapOf for building up complex structures. > SchemaChangeException while querying hive json table > > > Key: DRILL-4459 > URL: https://issues.apache.org/jira/browse/DRILL-4459 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill, Functions - Hive >Affects Versions: 1.4.0 > Environment: MapR-Drill 1.4.0 > Hive-1.2.0 >Reporter: Vitalii Diravka >Assignee: Vitalii Diravka > Fix For: 1.7.0 > > > getting the SchemaChangeException while querying json documents stored in > hive table. > {noformat} > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > > Error in expression at index -1. Error: Missing function implementation: > [castBIT(VAR16CHAR-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > {noformat} > minimum reproduce > {noformat} > created sample json documents using the attached script(randomdata.sh) > hive>create table simplejson(json string); > hive>load data local inpath '/tmp/simple.json' into table simplejson; > now query it through Drill. > Drill Version > select * from sys.version; > +---++-+-++ > | commit_id | commit_message | commit_time | build_email | build_time | > +---++-+-++ > | eafe0a245a0d4c0234bfbead10c6b2d7c8ef413d | DRILL-3901: Don't do early > expansion of directory in the non-metadata-cache case because it already > happens during ParquetGroupScan's metadata gathering operation. | 07.10.2015 > @ 17:12:57 UTC | Unknown | 07.10.2015 @ 17:36:16 UTC | > +---++-+-++ > 0: jdbc:drill:zk=> select * from hive.`default`.simplejson where > GET_JSON_OBJECT(simplejson.json, '$.DocId') = 'DocId2759947' limit 1; > Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to > materialize incoming schema. Errors: > > Error in expression at index -1. Error: Missing function implementation: > [castBIT(VAR16CHAR-OPTIONAL)]. Full expression: --UNKNOWN EXPRESSION--.. > Fragment 1:1 > [Error Id: 74f054a8-6f1d-4ddd-9064-3939fcc82647 on ip-10-0-0-233:31010] > (state=,code=0) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3745) Hive CHAR not supported
[ https://issues.apache.org/jira/browse/DRILL-3745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-3745: Fix Version/s: (was: 1.6.0) 1.7.0 > Hive CHAR not supported > --- > > Key: DRILL-3745 > URL: https://issues.apache.org/jira/browse/DRILL-3745 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Nathaniel Auvil >Assignee: Arina Ielchiieva > Labels: doc-impacting > Fix For: 1.7.0 > > > It doesn’t look like Drill 1.1.0 supports the Hive CHAR type? > In Hive: > create table development.foo > ( > bad CHAR(10) > ); > And then in sqlline: > > use `hive.development`; > > select * from foo; > Error: PARSE ERROR: Unsupported Hive data type CHAR. > Following Hive data types are supported in Drill INFORMATION_SCHEMA: > BOOLEAN, BYTE, SHORT, INT, LONG, FLOAT, DOUBLE, DATE, TIMESTAMP, > BINARY, DECIMAL, STRING, VARCHAR, LIST, MAP, STRUCT and UNION > [Error Id: 58bf3940-3c09-4ad2-8f52-d052dffd4b17 on dtpg05:31010] > (state=,code=0) > This was originally found when getting failures trying to connect via JDBS > using Squirrel. We have the Hive plugin enabled with tables using CHAR. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (DRILL-3993) Rebase Drill on Calcite 1.7.0 release
[ https://issues.apache.org/jira/browse/DRILL-3993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jacques Nadeau updated DRILL-3993: -- Summary: Rebase Drill on Calcite 1.7.0 release (was: Rebase Drill on Calcite 1.5.0 release) > Rebase Drill on Calcite 1.7.0 release > - > > Key: DRILL-3993 > URL: https://issues.apache.org/jira/browse/DRILL-3993 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.2.0 >Reporter: Sudheesh Katkam >Assignee: Jacques Nadeau > > Calcite keeps moving, and now we need to catch up to Calcite 1.5, and ensure > there are no regressions. > Also, how do we resolve this 'catching up' issue in the long term? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-4317) Exceptions on SELECT and CTAS with large CSV files
[ https://issues.apache.org/jira/browse/DRILL-4317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197382#comment-15197382 ] ASF GitHub Bot commented on DRILL-4317: --- GitHub user adeneche opened a pull request: https://github.com/apache/drill/pull/432 DRILL-4317: Exceptions on SELECT and CTAS with large CSV files You can merge this pull request into a Git repository by running: $ git pull https://github.com/adeneche/incubator-drill DRILL-4317 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/432.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #432 commit 5813c8684c1900a156a82c0651914f97aeb87f6f Author: adenecheDate: 2016-03-16T13:47:18Z DRILL-4317: Exceptions on SELECT and CTAS with large CSV files > Exceptions on SELECT and CTAS with large CSV files > -- > > Key: DRILL-4317 > URL: https://issues.apache.org/jira/browse/DRILL-4317 > Project: Apache Drill > Issue Type: Bug > Components: Storage - Text & CSV >Affects Versions: 1.4.0, 1.5.0 > Environment: 4 node cluster, Hadoop 2.7.0, 14.04.1-Ubuntu >Reporter: Matt Keranen >Assignee: Deneche A. Hakim > > Selecting from a CSV file or running a CTAS into Parquet generates exceptions. > Source file is ~650MB, a table of 4 key columns followed by 39 numeric data > columns, otherwise a fairly simple format. Example: > {noformat} > 2015-10-17 > 00:00,f5e9v8u2,err,fr7,226020793,76.094,26307,226020793,76.094,26307, > 2015-10-17 > 00:00,c3f9x5z2,err,mi1,1339159295,216.004,177690,1339159295,216.004,177690, > 2015-10-17 > 00:00,r5z2f2i9,err,mi1,7159994629,39718.011,65793,6142021303,30687.811,64630,143777403,40.521,146,75503742,41.905,89,170771174,168.165,198,192565529,370.475,222,97577280,318.068,120,62631452,288.253,68,32371173,189.527,39,41712265,299.184,46,39046408,363.418,47,34182318,465.343,43,127834582,6485.341,145 > 2015-10-17 > 00:00,j9s6i8t2,err,fr7,20580443899,277445.055,67826,2814893469,85447.816,54275,2584757097,608.001,2044,1395571268,769.113,1051,3070616988,3000.005,2284,3413811671,6489.060,2569,1772235156,5806.214,1339,1097879284,5064.120,858,691884865,4035.397,511,672967845,4815.875,518,789163614,7306.684,599,813910495,10632.464,627,1462752147,143470.306,1151 > {noformat} > A "SELECT from `/path/to/file.csv`" runs for 10's of minutes and eventually > results in: > {noformat} > java.lang.IndexOutOfBoundsException: index: 547681, length: 1 (expected: > range(0, 547681)) > at > io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1134) > at > io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:136) > at io.netty.buffer.WrappedByteBuf.getBytes(WrappedByteBuf.java:289) > at > io.netty.buffer.UnsafeDirectLittleEndian.getBytes(UnsafeDirectLittleEndian.java:26) > at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586) > at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586) > at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586) > at io.netty.buffer.DrillBuf.getBytes(DrillBuf.java:586) > at > org.apache.drill.exec.vector.VarCharVector$Accessor.get(VarCharVector.java:443) > at > org.apache.drill.exec.vector.accessor.VarCharAccessor.getBytes(VarCharAccessor.java:125) > at > org.apache.drill.exec.vector.accessor.VarCharAccessor.getString(VarCharAccessor.java:146) > at > org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:136) > at > org.apache.drill.exec.vector.accessor.VarCharAccessor.getObject(VarCharAccessor.java:94) > at > org.apache.drill.exec.vector.accessor.BoundCheckingAccessor.getObject(BoundCheckingAccessor.java:148) > at > org.apache.drill.jdbc.impl.TypeConvertingSqlAccessor.getObject(TypeConvertingSqlAccessor.java:795) > at > org.apache.drill.jdbc.impl.AvaticaDrillSqlAccessor.getObject(AvaticaDrillSqlAccessor.java:179) > at > net.hydromatic.avatica.AvaticaResultSet.getObject(AvaticaResultSet.java:351) > at > org.apache.drill.jdbc.impl.DrillResultSetImpl.getObject(DrillResultSetImpl.java:420) > at sqlline.Rows$Row.(Rows.java:157) > at sqlline.IncrementalRows.hasNext(IncrementalRows.java:63) > at > sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) > at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) > at sqlline.SqlLine.print(SqlLine.java:1593) > at sqlline.Commands.execute(Commands.java:852) > at
[jira] [Commented] (DRILL-4392) CTAS with partition writes an internal field into generated parquet files
[ https://issues.apache.org/jira/browse/DRILL-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200100#comment-15200100 ] Victoria Markman commented on DRILL-4392: - This is fixed now, test is passing in the latest nightly precommit run: http://10.10.104.91:8080/view/Nightly/job/Functional-Baseline-104.61/151/consoleFull > CTAS with partition writes an internal field into generated parquet files > - > > Key: DRILL-4392 > URL: https://issues.apache.org/jira/browse/DRILL-4392 > Project: Apache Drill > Issue Type: Bug >Reporter: Jinfeng Ni >Assignee: Jinfeng Ni >Priority: Blocker > Fix For: 1.6.0 > > > On today's master branch: > {code} > select * from sys.version; > +-+---+-++-++ > | version | commit_id | > commit_message|commit_time > | build_email | build_time | > +-+---+-++-++ > | 1.5.0-SNAPSHOT | 9a3a5c4ff670a50a49f61f97dd838da59a12f976 | DRILL-4382: > Remove dependency on drill-logical from vector package | 16.02.2016 @ > 11:58:48 PST | j...@apache.org | 16.02.2016 @ 17:40:44 PST | > +-+---+-++- > {code} > Parquet table created by Drill's CTAS statement has one internal field > "P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R". This additional field would not > impact non-star query, but would cause incorrect result for star query. > {code} > use dfs.tmp; > create table nation_ctas partition by (n_regionkey) as select * from > cp.`tpch/nation.parquet`; > select * from dfs.tmp.nation_ctas limit 6; > +--++--+-++ > | n_nationkey | n_name | n_regionkey | > n_comment > | P_A_R_T_I_T_I_O_N_C_O_M_P_A_R_A_T_O_R | > +--++--+-++ > | 5| ETHIOPIA | 0| ven packages wake quickly. > regu >| true | > | 15 | MOROCCO| 0| rns. blithely bold courts > among the closely regular packages use furiously bold platelets? > | false | > | 14 | KENYA | 0| pending excuses haggle > furiously deposits. pending, express pinto beans wake fluffily past t > | false | > | 0| ALGERIA| 0| haggle. carefully final > deposits detect slyly agai > | false | > | 16 | MOZAMBIQUE | 0| s. ironic, unusual > asymptotes wake blithely r >| false | > | 24 | UNITED STATES | 1| y final packages. slow foxes > cajole quickly. quickly silent platelets breach ironic accounts. unusual > pinto be | true > {code} > This basically breaks all the parquet files created by Drill's CTAS with > partition support. > Also, it will also fail one of the Pre-commit functional test [1] > [1] > https://github.com/mapr/drill-test-framework/blob/master/framework/resources/Functional/ctas/ctas_auto_partition/general/data/drill3361.q -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (DRILL-3549) Default value for planner.memory.max_query_memory_per_node needs to be increased
[ https://issues.apache.org/jira/browse/DRILL-3549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15197897#comment-15197897 ] Victoria Markman commented on DRILL-3549: - I hope this setting will not be hard coded, but calculated based on cluster settings + whatever else needs to be taken into consideration ... > Default value for planner.memory.max_query_memory_per_node needs to be > increased > > > Key: DRILL-3549 > URL: https://issues.apache.org/jira/browse/DRILL-3549 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization >Affects Versions: 1.2.0 >Reporter: Abhishek Girish >Assignee: Deneche A. Hakim >Priority: Critical > Labels: usability > Fix For: 1.7.0 > > > The current default value for planner.memory.max_query_memory_per_node is > 2147483648 (2 GB). This value is not enough, given the addition of window > function support. Most queries on reasonably sized data & cluster setup fail > with OOM due to insufficient memory. > The improve usability, the default needs to be increased to a reasonably > sized value (could be determined based on Drill Max Direct Memory). -- This message was sent by Atlassian JIRA (v6.3.4#6332)