[jira] [Created] (DRILL-6531) Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, 5:54 PM Bridget Bevens
Bridget Bevens created DRILL-6531: - Summary: Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, 5:54 PM Bridget Bevens Key: DRILL-6531 URL: https://issues.apache.org/jira/browse/DRILL-6531 Project: Apache Drill Issue Type: Task Components: Documentation Reporter: Bridget Bevens Assignee: Bridget Bevens Fix For: 1.14.0 Hi Bridget, There seems to be an error in the example shown in https://drill.apache.org/docs/custom-function-interfaces/ Custom Function Interfaces - Apache Drill drill.apache.org Implement the Drill interface appropriate for the type of function that you want to develop. Each interface provides a set of required holders where you input data types that your function uses and required methods that Drill calls to perform your function’s operations. The error is logical, not relating to the main topic (Aggregate Function Interface), but may slightly confuse anyone carefully reading this doc (like me ☺) The error is – the red line should come before the brown line: @Override public void add() { if (in.value < min.value) { min.value = in.value; secondMin.value = min.value; } That is - Should be: @Override public void add() { if (in.value < min.value) { secondMin.value = min.value; min.value = in.value; } This comes from interpreting the name of the new function (“The second most minimum”). While on the subject – looks like the reset() function is also wrong (need to reset to high numbers, not zero): @Override public void reset() { min.value = 0; è 9 secondMin.value = 0; è 9 } Thanks, Boaz -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6530) JVM crash with a Lateral Unnest query involving multiple json files with one file having a schema change of one column from string to list
Kedar Sankar Behera created DRILL-6530: -- Summary: JVM crash with a Lateral Unnest query involving multiple json files with one file having a schema change of one column from string to list Key: DRILL-6530 URL: https://issues.apache.org/jira/browse/DRILL-6530 Project: Apache Drill Issue Type: Bug Affects Versions: 1.14.0 Reporter: Kedar Sankar Behera Fix For: 1.14.0 Attachments: 0_0_92.json, 0_0_93.json, drillbit.log, drillbit.out, hs_err_pid32076.log JVM crash with a Lateral Unnest query involving multiple json files with one file having a schema change of one column from string to list . Query :- {code} SELECT customer.c_custkey,customer.c_acctbal,orders.o_orderkey, orders.o_totalprice,orders.o_orderdate,orders.o_shippriority,customer.c_address,orders.o_orderpriority,customer.c_comment FROM customer, LATERAL (SELECT O.ord.o_orderkey as o_orderkey, O.ord.o_totalprice as o_totalprice,O.ord.o_orderdate as o_orderdate ,O.ord.o_shippriority as o_shippriority,O.ord.o_orderpriority as o_orderpriority FROM UNNEST(customer.c_orders) O(ord))orders; {code} The error got was {code} o.a.d.e.p.impl.join.LateralJoinBatch - Output batch still has some space left, getting new batches from left and right 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_custkey 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_phone 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_acctbal 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_orders 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_mktsegment 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_address 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_nationkey 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_name 2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_comment 2018-06-21 15:25:16,316 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG o.a.d.e.v.c.AbstractContainerVector - Field [o_comment] mutated from [NullableVarCharVector] to [RepeatedVarCharVector] 2018-06-21 15:25:16,318 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG o.a.drill.exec.vector.UInt4Vector - Reallocating vector [[`$offsets$` (UINT4:REQUIRED)]]. # of bytes: [16384] -> [32768] {code} On Further investigating with [~shamirwasia] it's found that the crash only happens when [o_comment] mutates from [NullableVarCharVector] to [RepeatedVarCharVector],not the other way around Please find the logs stack trace and the data file -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6529) Prokect Batch Sizing causes two LargeFileCompilation tests to fail
Karthikeyan Manivannan created DRILL-6529: - Summary: Prokect Batch Sizing causes two LargeFileCompilation tests to fail Key: DRILL-6529 URL: https://issues.apache.org/jira/browse/DRILL-6529 Project: Apache Drill Issue Type: Improvement Components: Execution - Relational Operators Reporter: Karthikeyan Manivannan Assignee: Karthikeyan Manivannan Timeout failures are seen in TestLargeFileCompilation testExternal_Sort and testTop_N_Sort. These tests are stress tests for compilation where the queries cover projections over 5000 columns and sort over 500 columns. These tests pass if they are run stand-alone. Something triggers the timeouts when the tests are run in parallel as part of a unit test run. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6528) Planner setting the wrong number of records to read (Parquet Reader)
salim achouche created DRILL-6528: - Summary: Planner setting the wrong number of records to read (Parquet Reader) Key: DRILL-6528 URL: https://issues.apache.org/jira/browse/DRILL-6528 Project: Apache Drill Issue Type: Bug Components: Query Planning & Optimization Reporter: salim achouche - Recently fixed the Flat Parquet reader to honor the number of records to read - Though few tests failed: TestUnionDistinct.testUnionDistinctEmptySides:356 Different number of records returned expected:<5> but was:<1> TestUnionAll.testUnionAllEmptySides:355 Different number of records returned expected:<5> but was:<1> - I debugged one of them and realized the Planner was setting the wrong number of rows to read (in this case, one) - You can put a break point and see this happening: Class: ParquetGroupScan Method: updateRowGroupInfo(long maxRecords) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
PRs for Drill 1.14
Hi Boaz, I just submitted a PR which should be relatively easy to review, but I’d like to get it into 1.14. It is DRILL-6519. Thanks, — C
[jira] [Created] (DRILL-6527) Update option name for Drill Parquet native reader
Vitalii Diravka created DRILL-6527: -- Summary: Update option name for Drill Parquet native reader Key: DRILL-6527 URL: https://issues.apache.org/jira/browse/DRILL-6527 Project: Apache Drill Issue Type: Improvement Components: Storage - Hive, Storage - Parquet Affects Versions: 1.14.0 Reporter: Vitalii Diravka Fix For: 1.15.0 The old option name to enable Drill parquet reader is "store.hive.optimize_scan_with_native_readers". Starting from DRILL-6454 one new native reader is introduced, therefore more precise option name is added for parquet native reader too. A new option name for parquet reader is "store.hive.parquet.optimize_scan_with_native_reader". The old one is deprecated and should be removed starting from Drill 1.15.0 release. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (DRILL-6526) Refactor FileSystemConfig to disallow direct access from the code its variables
Arina Ielchiieva created DRILL-6526: --- Summary: Refactor FileSystemConfig to disallow direct access from the code its variables Key: DRILL-6526 URL: https://issues.apache.org/jira/browse/DRILL-6526 Project: Apache Drill Issue Type: Task Affects Versions: 1.13.0 Reporter: Arina Ielchiieva Assignee: Arina Ielchiieva Fix For: 1.14.0 Refactor FileSystemConfig to disallow direct access from the code its variables. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Question about SubsetTransformer
Hi, drill devs: I'm developing a storage plugin for our own storage engine and I'm confusing about the SubsetTransformer class while reading the source code. It seems that its goal is to find all physical implementations of an input and add it directly to the current as input. But I think this maybe unnecessary since volcano planner will search rules and do the transformation for us. What I guess is that this can accelerate the optimization process. -- Liu, Renjie Software Engineer, MVAD