[jira] [Resolved] (DRILL-3119) Query stays in "CANCELLATION_REQUESTED" status in UI after OOM of Direct buffer memory
[ https://issues.apache.org/jira/browse/DRILL-3119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman resolved DRILL-3119. -- Resolution: Duplicate Fix Version/s: 1.11.0 > Query stays in "CANCELLATION_REQUESTED" status in UI after OOM of Direct > buffer memory > -- > > Key: DRILL-3119 > URL: https://issues.apache.org/jira/browse/DRILL-3119 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.0.0 >Reporter: Hao Zhu >Assignee: Roman > Fix For: 1.11.0 > > > Tested in 1.0.0 with below commit id: > {code} > > select * from sys.version; > +---+++--++ > | commit_id | > commit_message |commit_time | > build_email | build_time | > +---+++--++ > | d8b19759657698581cc0d01d7038797952888123 | DRILL-3100: > TestImpersonationDisabledWithMiniDFS fails on Windows | 15.05.2015 @ > 01:18:03 EDT | Unknown | 15.05.2015 @ 03:07:10 EDT | > +---+++--++ > 1 row selected (0.26 seconds) > {code} > How to reproduce: > 1. Single node cluster. > 2. Reduce DRILL_MAX_DIRECT_MEMORY="2G". > 3. Run a hash join which is big enough to trigger OOM. > eg: > {code} > select count(*) from > ( > select a.* from dfs.root.`user/hive/warehouse/passwords_csv_big` a, > dfs.root.`user/hive/warehouse/passwords_csv_big` b > where a.columns[1]=b.columns[1] > ); > {code} > After that, drillbit.log shows OOM: > {code} > 2015-05-16 19:24:34,391 [2aa866ba-8939-b184-0ba2-291734329f88:frag:4:4] INFO > o.a.d.e.w.fragment.FragmentExecutor - > 2aa866ba-8939-b184-0ba2-291734329f88:4:4: State change requested from RUNNING > --> FINISHED for > 2015-05-16 19:24:34,391 [2aa866ba-8939-b184-0ba2-291734329f88:frag:4:4] INFO > o.a.d.e.w.f.AbstractStatusReporter - State changed for > 2aa866ba-8939-b184-0ba2-291734329f88:4:4. New state: FINISHED > 2015-05-16 19:24:38,561 [BitServer-5] ERROR > o.a.d.exec.rpc.RpcExceptionHandler - Exception in RPC communication. > Connection: /10.0.0.31:31012 <--> /10.0.0.31:41923 (data server). Closing > connection. > io.netty.handler.codec.DecoderException: java.lang.OutOfMemoryError: Direct > buffer memory > at > io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:233) > ~[netty-codec-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847) > [netty-transport-4.0.27.Final.jar:4.0.27.Final] > at > io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:618) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:329) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:250) > [netty-transport-native-epoll-4.0.27.Final-linux-x86_64.jar:na] > at > io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) > [netty-common-4.0.27.Final.jar:4.0.27.Final] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45] > Caused by: java.lang.OutOfMemoryError: Direct buffer memory > at java.nio.Bits.reserveMemory(Bits.java:658) ~[na:1.8.0_45] > at java.nio.DirectByteBuff
[jira] [Resolved] (DRILL-3665) Deadlock while executing CTAS that runs out of memory
[ https://issues.apache.org/jira/browse/DRILL-3665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman resolved DRILL-3665. -- Resolution: Duplicate Fix Version/s: (was: Future) 1.11.0 > Deadlock while executing CTAS that runs out of memory > - > > Key: DRILL-3665 > URL: https://issues.apache.org/jira/browse/DRILL-3665 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Flow >Affects Versions: 1.2.0 >Reporter: Victoria Markman >Assignee: Roman >Priority: Critical > Fix For: 1.11.0 > > Attachments: drillbit.log.drill-3665, jstack.txt > > > I had a query running out of memory during CTAS and after that drillbit was > rendered unusable: > {code} > 0: jdbc:drill:schema=dfs> create table lineitem as select > . . . . . . . . . . . . > cast(columns[0] as int) l_orderkey, > . . . . . . . . . . . . > cast(columns[1] as int) l_partkey, > . . . . . . . . . . . . > cast(columns[2] as int) l_suppkey, > . . . . . . . . . . . . > cast(columns[3] as int) l_linenumber, > . . . . . . . . . . . . > cast(columns[4] as double) l_quantity, > . . . . . . . . . . . . > cast(columns[5] as double) l_extendedprice, > . . . . . . . . . . . . > cast(columns[6] as double) l_discount, > . . . . . . . . . . . . > cast(columns[7] as double) l_tax, > . . . . . . . . . . . . > cast(columns[8] as varchar(200)) l_returnflag, > . . . . . . . . . . . . > cast(columns[9] as varchar(200)) l_linestatus, > . . . . . . . . . . . . > cast(columns[10] as date) l_shipdate, > . . . . . . . . . . . . > cast(columns[11] as date) l_commitdate, > . . . . . . . . . . . . > cast(columns[12] as date) l_receiptdate, > . . . . . . . . . . . . > cast(columns[13] as varchar(200)) > l_shipinstruct, > . . . . . . . . . . . . > cast(columns[14] as varchar(200)) l_shipmode, > . . . . . . . . . . . . > cast(columns[15] as varchar(200)) l_comment > . . . . . . . . . . . . > from `lineitem.dat`; > Error: RESOURCE ERROR: One or more nodes ran out of memory while executing > the query. > Fragment 1:10 > [Error Id: 11084315-5388-4500-b165-642a5f595ebf on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > Here is drill's behavior after that: > 1. Tried to run: "select * from sys.options" in the same sqlline session - > hangs. > 2. Was able to start sqlline and connect to drillbit: > - If you try running anything on this connection: it hangs. > - Issue ^C --> you will get result if you are lucky (these queries > will appear as: "CANCELLATION_REQUESTED" on WebUI) > (I only tried querying sys.memory, sys.options which possibly have > a different code path than queries from actual user data) > - If you are not lucky, you will get this error below: > {code} > 0: jdbc:drill:schema=dfs> show files; > java.lang.RuntimeException: java.sql.SQLException: Unexpected > RuntimeException: java.lang.IllegalArgumentException: Buffer has negative > reference count. > at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) > at > sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87) > at sqlline.TableOutputFormat.print(TableOutputFormat.java:118) > at sqlline.SqlLine.print(SqlLine.java:1583) > at sqlline.Commands.execute(Commands.java:852) > at sqlline.Commands.sql(Commands.java:751) > at sqlline.SqlLine.dispatch(SqlLine.java:738) > at sqlline.SqlLine.begin(SqlLine.java:612) > at sqlline.SqlLine.start(SqlLine.java:366) > at sqlline.SqlLine.main(SqlLine.java:259) > {code} > or maybe something like this: > {code} > 0: jdbc:drill:schema=dfs> select count(*) from nation group by n_regionkey; > Error: CONNECTION ERROR: Exceeded timeout (5000) while waiting send > intermediate work fragments to remote nodes. Sent 1 and only heard response > back from 0 nodes. > [Error Id: 6abce8e9-78a1-4b3d-bcec-503930482b40 on atsqa4-133.qa.lab:31010] > (state=,code=0) > {code} > I'm attaching results of a jstack and drillbit.log and so far I was not able > to reproduce this problem again (working on it). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (DRILL-4685) RpcException: Data not accepted downstream
[ https://issues.apache.org/jira/browse/DRILL-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Roman resolved DRILL-4685. -- Resolution: Cannot Reproduce > RpcException: Data not accepted downstream > -- > > Key: DRILL-4685 > URL: https://issues.apache.org/jira/browse/DRILL-4685 > Project: Apache Drill > Issue Type: Bug > Components: Execution - RPC >Affects Versions: 1.7.0 >Reporter: Rahul Challapalli >Assignee: Roman > Attachments: drillbit-receiver.log, drillbit-sender.log, > instance1-profile.json, instance2-profile.json > > > commit # : d93a3633815ed1c7efd6660eae62b7351a2c9739 > Scenario : Each of the below 2 queries are duplicated 10 times to make a > total of 20 queries. Now I use 10 concurrent clients to run the queries. > Query 1 : > {code} > select count(*) from ( > select max(length(concat(str1,str2))) max_length from ( > select > substring(regexp_replace(s.enlarged_comment, 'iron..', 'iconic'), 4) > str1, > substring(regexp_replace(s.enlarged_comment, 'm.*ne', 'iconic'), 4) > str2 > from ( > select > concat(o_comment, o_comment, o_comment, o_comment, o_comment, > o_comment, o_comment, o_comment, o_comment, o_comment, o_comment, > o_comment, o_comment, o_comment, o_comment) enlarged_comment, > o_orderdate, > concat(o_clerk, '...') o_clerk > from orders_nocompression_256 > where o_orderdate > date '1900-01-01' + interval '1' year > ) s > where > position('...' in o_clerk) > 0 > and length(concat(enlarged_comment, o_clerk)) > 100 > limit 500 > ) s1 > )s2 where max_length = 0 > {code} > Query 2 : > {code} > Select count(*) from lineitem_nocompression_256 > where > ( > l_tax in (0.02,0.06,0.04,0.05,0.0,0.07,0.08,0.03,0.01) > and l_linestatus='F' > ) > or ( > ( length(l_comment) between 0 and 50) > and ( > substr(l_shipmode, 1,2)='R' > or substr(l_shipmode, 1,2)='A' > and l_tax > 0.05 > ) > ) > or ( > ( l_extendedprice between 1.0 and 10.0 ) > and l_linestatus='O' > ) > or ( > l_extendedprice*l_discount*l_tax < 45.00 > and l_shipdate > date '1996-03-13' + interval '1' year > ) > or ( > l_commitdate in ( > date '1996-02-12', date '1996-02-28', date '1996-03-05', date > '1996-03-30', date '1996-03-14', date '1996-02-07', date '1997-01-14', date > '1994-01-04' > ) > and l_tax in ( > 0.02,0.06,0.04,0.05,0.0,0.07,0.08,0.03,0.01 > ) > and length(l_comment) > 15 > ) > or ( > position('con' in regexp_replace(l_comment, 'm.*ne', 'iconic')) > 10 > and ( > length(regexp_replace(concat(lower(l_shipinstruct), > lower(l_shipmode), l_comment), 'd.*ne', '')) > 0 > or l_orderkey>5 > or l_partkey>1500 > or l_linenumber=7 > ) > ); > {code} > Out of the 20 queries submitted 2 copies of the first query failed with the > same error. Below is the summary of the run > {code} > PASS (8.185 min) > /root/drillAutomation/framework/framework/resources/Advanced/concurrency/cpu_heavy/q2.q > (connection: 147182110) > PASS (8.191 min) > /root/drillAutomation/framework/framework/resources/Advanced/concurrency/cpu_heavy/q2.q > (connection: 671870783) > PASS (8.287 min) > /root/drillAutomation/framework/framework/resources/Advanced/concurrency/cpu_heavy/q2.q > (connection: 640915121) > PASS (8.444 min) > /root/drillAutomation/framework/framework/resources/Advanced/concurrency/cpu_heavy/q2.q > (connection: 778960233) > PASS (9.232 min) > /root/drillAutomation/framework/framework/resources/Advanced/concurrency/cpu_heavy/q1.q > (connection: 2022177583) > PASS (2.423 min) > /root/drillAutomation/framework/framework/resources/Advanced/concurrency/cpu_heavy/q2.q > (connection: 778960233) > PASS (11.67 min) > /root/drillAutomation/framework/framework/resources/Advanced/concurrency/cpu_heavy/q1.q > (connection: 1673732733) > PASS (2.693 min) > /root/drillAutomation/framework/framework/resources/Advanced/concurrency/cpu_heavy/q2.q > (connection: 1673732733) > PASS (15.32 min) > /root/drillAutomation/framework/framework/resources/Advanced/concurrency/cpu_heavy/q2.q > (connection: 1651684372) > [#25] Query failed: > oadd.org.apache.drill.common.exceptions.UserRemoteException: SYSTEM ERROR: > RpcException: Data not accepted downstream. > Fragment 2:43 > [Error Id: 04168b77-dfdd-4e6c-9e86-33317c82947b on atsqa6c81.qa.lab:31010] > at > oadd.org.apache.drill.exec.rpc.user.QueryResultHandler.resultArrived(QueryResultHandler.java:123) > at > oadd.org.apache.drill.exec.rpc.user.
[jira] [Created] (DRILL-5609) Resources leak on parquet table when the query hangs with CANCELLATION_REQUESTED state
Roman created DRILL-5609: Summary: Resources leak on parquet table when the query hangs with CANCELLATION_REQUESTED state Key: DRILL-5609 URL: https://issues.apache.org/jira/browse/DRILL-5609 Project: Apache Drill Issue Type: Bug Components: Execution - Flow Affects Versions: 1.11.0 Reporter: Roman I tried to run tpcds_sf100-query2 on parquet table in 10 concurrency threads on single node drillbit cluster (I use Drill with DRILL-5599 fix) and caught a resources leak. The query hanged in CANCELLATION_REQUESTED state. Steps to reproduce: 1) Start ConcurrencyTest.java with tpcds_sf100-query2 on parquet table (in attachment); 2) Wait 3-5 seconds and make Ctrl+c to kill a client. 3) Retry step 2) several times until you get "CANCELLATION_REQUESTED" on some queries. Queries will hang until drillbit restart. If we make "top", we can see that drillbit uses CPU. Jstack example: {code:xml} "26af36b2-7a44-5af8-e0c3-95a4f132fc7a:frag:14:1" #1268 daemon prio=10 os_prio=0 tid=0x7f25a5afa800 nid=0x16f2 runnable [0x7f2535a5a000] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked <0x000728ca82b0> (a java.lang.InterruptedException) at java.lang.Throwable.(Throwable.java:250) at java.lang.Exception.(Exception.java:54) at java.lang.InterruptedException.(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:439) at org.apache.drill.exec.store.parquet.columnreaders.AsyncPageReader.clear(AsyncPageReader.java:301) at org.apache.drill.exec.store.parquet.columnreaders.ColumnReader.clear(ColumnReader.java:147) at org.apache.drill.exec.store.parquet.columnreaders.ReadState.close(ReadState.java:179) at org.apache.drill.exec.store.parquet.columnreaders.ParquetRecordReader.close(ParquetRecordReader.java:318) at org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:209) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109) at org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51) at org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:133) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162) at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:105) at org.apache.drill.exec.physical.impl.broadcastsender.BroadcastSenderRootExec.innerNext(BroadcastSenderRootExec.java:95) at org.apache.drill.exec.physical.impl.BaseRootExec.next(BaseRootExec.java:95) at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:234) at org.apache.drill.exec.work.fragment.FragmentExecutor$1.run(FragmentExecutor.java:227) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1595) at org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:227) at org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748) {code} I added drillbit.log and full jstack log in attachments. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5044) After the dynamic registration of multiple jars simultaneously not all UDFs were registered
Roman created DRILL-5044: Summary: After the dynamic registration of multiple jars simultaneously not all UDFs were registered Key: DRILL-5044 URL: https://issues.apache.org/jira/browse/DRILL-5044 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.9.0 Reporter: Roman Assignee: Arina Ielchiieva I tried to register 21 jars simultaneously (property 'udf.retry-attempts' = 30) and not all jars were registered. As I see in output, all function were registered and /staging directory was empty, but not all of jars were moved into /registry directory. For example, after simultaneously registration I saw "The following UDFs in jar test-1.1.jar have been registered: [test1(VARCHAR-REQUIRED)" message, but this jar was not in /registry directory. If I tried to run function test1, I got this error: "Error: SYSTEM ERROR: SqlValidatorException: No match found for function signature test1()". And if I tried reregister this jar, I got "Jar with test-1.1.jar name has been already registered". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-5007) Dynamic UDF lazy-init does not work correctly in multi-node cluster
Roman created DRILL-5007: Summary: Dynamic UDF lazy-init does not work correctly in multi-node cluster Key: DRILL-5007 URL: https://issues.apache.org/jira/browse/DRILL-5007 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.9.0 Reporter: Roman Assignee: Arina Ielchiieva When I registered jar in 1st node and ran long query with function for first time I had got error: {quote} Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema. Errors: Error in expression at index 0. Error: Missing function implementation:[f(VARCHAR-OPTIONAL, VARCHAR-OPTIONAL). Full expression: null.. {quote} When I tried to run this query second time it finished correctly. It seems another nodes did not get new function. So lazy-init does not work well on another nodes before query fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4995) Allow lazy init when dynamic UDF support is disabled
Roman created DRILL-4995: Summary: Allow lazy init when dynamic UDF support is disabled Key: DRILL-4995 URL: https://issues.apache.org/jira/browse/DRILL-4995 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.9.0 Reporter: Roman Assignee: Arina Ielchiieva Steps in 2 nodes cluster: In 1st node: 1. Register jar 2. Run function (success) 3. Disable dynamic UDF support 4. Run function again (success) In 2nd node: 5. Try to run function (failed). In 1st node the function was initialized before disabling dynamic UDF support. But in 2nd node the function was not initialized. So It seems we need to allow lazy initialization when dynamic UDF support is disabled. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4963) Issues when overloading Drill native functions with dynamic UDFs
Roman created DRILL-4963: Summary: Issues when overloading Drill native functions with dynamic UDFs Key: DRILL-4963 URL: https://issues.apache.org/jira/browse/DRILL-4963 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.9.0 Reporter: Roman I created jar file which overloads 3 DRILL native functions (LOG(VARCHAR-REQUIRED), CURRENT_DATE(VARCHAR-REQUIRED) and ABS(VARCHAR-REQUIRED,VARCHAR-REQUIRED)) and registered it as dynamic UDF. If I try to use my functions I will get errors: {code:xml} SELECT CURRENT_DATE('test') FROM (VALUES(1)); {code} Error: FUNCTION ERROR: CURRENT_DATE does not support operand types (CHAR) SQL Query null {code:xml} SELECT ABS('test','test') FROM (VALUES(1)); {code} Error: FUNCTION ERROR: ABS does not support operand types (CHAR,CHAR) SQL Query null {code:xml} SELECT LOG('test') FROM (VALUES(1)); {code} Error: SYSTEM ERROR: DrillRuntimeException: Failure while materializing expression in constant expression evaluator LOG('test'). Errors: Error in expression at index -1. Error: Missing function implementation: castTINYINT(VARCHAR-REQUIRED). Full expression: UNKNOWN EXPRESSION. But if I rerun all this queries after "DrillRuntimeException", they will run correctly. It seems that Drill have not updated the function signature before that error. Also if I add jar as usual UDF (copy jar to /drill_home/jars/3rdparty and restart drillbits), all queries will run correctly without errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4962) Drill registers UDFs from packages which were not included in drill-module.conf
Roman created DRILL-4962: Summary: Drill registers UDFs from packages which were not included in drill-module.conf Key: DRILL-4962 URL: https://issues.apache.org/jira/browse/DRILL-4962 Project: Apache Drill Issue Type: Bug Components: Functions - Drill Affects Versions: 1.8.0 Reporter: Roman Priority: Minor Example of UDF structure: ... package.test ... package.test2 ... package.test3 If I add "package.test2" into the drill-module.conf I will see UDFs from this package after jar registering (as expected). But if I add "package.test" or "package.te" into the drill-module.conf I will see all UDFs from all packages (test, test2, test3) after jar registering. So it seems Drill has specific registration logic for UDFs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4927) Add support for Null Equality Joins
Roman created DRILL-4927: Summary: Add support for Null Equality Joins Key: DRILL-4927 URL: https://issues.apache.org/jira/browse/DRILL-4927 Project: Apache Drill Issue Type: Improvement Components: Query Planning & Optimization Affects Versions: 1.8.0 Reporter: Roman Assignee: Roman Join with an equality condition which allows null=null fails. For example, if we use some of this queries: {code:sql} select ... FROM t1, t2 WHERE t1.c1 = t2.c2 OR (t1.c1 IS NULL AND t2.c2 IS NULL); select ... FROM t1 INNER JOIN t2 ON t1.c1 = t2.c2 OR (t1.c1 IS NULL AND t2.c2 IS NULL); {code} we got "UNSUPPORTED_OPERATION ERROR". We should add support of this option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4926) Add support for Null Equality Joins
Roman created DRILL-4926: Summary: Add support for Null Equality Joins Key: DRILL-4926 URL: https://issues.apache.org/jira/browse/DRILL-4926 Project: Apache Drill Issue Type: Improvement Components: Query Planning & Optimization Affects Versions: 1.8.0 Reporter: Roman Assignee: Roman Join with an equality condition which allows null=null fails. For example, if we use some of this queries: {code:sql} select ... FROM t1, t2 WHERE t1.c1 = t2.c2 OR (t1.c1 IS NULL AND t2.c2 IS NULL); select ... FROM t1 INNER JOIN t2 ON t1.c1 = t2.c2 OR (t1.c1 IS NULL AND t2.c2 IS NULL); {code} we got "UNSUPPORTED_OPERATION ERROR". We should add support of this option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (DRILL-4824) JSON with complex nested data produces incorrect output with missing fields
Roman created DRILL-4824: Summary: JSON with complex nested data produces incorrect output with missing fields Key: DRILL-4824 URL: https://issues.apache.org/jira/browse/DRILL-4824 Project: Apache Drill Issue Type: New Feature Components: Storage - JSON Affects Versions: 1.7.0 Reporter: Roman Assignee: Roman Fix For: Future There is incorrect output in case of JSON file with complex nested data. Here is a JSON file: {code:none|title=example.json|borderStyle=solid} { "Field1" : { } } { "Field1" : { "InnerField1": {"key1":"value1"}, "InnerField2": {"key2":"value2"} } } { "Field1" : { "InnerField3" : ["value3", "value4"], "InnerField4" : ["value5", "value6"] } } {code} Here is actual result after command "select Field1 from dfs.`/tmp/example.json`;": {code:none} +---+ | Field1 | +---+ {"InnerField1":{},"InnerField2":{},"InnerField3":[],"InnerField4":[]} {"InnerField1":{"key1":"value1"},"InnerField2" {"key2":"value2"},"InnerField3":[],"InnerField4":[]} {"InnerField1":{},"InnerField2":{},"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]} +--+ {code} I think it is no need to output missing fields. In case of deeply nested structure we will get unreadable for user result. So my expected result is: {code:none} +--+ | Field1 | +--+ |{} {"InnerField1":{"key1":"value1"},"InnerField2":{"key2":"value2"}} {"InnerField3":["value3","value4"],"InnerField4":["value5","value6"]} +--+ {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)