[jira] [Created] (DRILL-6531) Errors in example for "Aggregate Function Interface" Boaz Ben-Zvi Fri 6/15, 5:54 PM Bridget Bevens

2018-06-22 Thread Bridget Bevens (JIRA)
Bridget Bevens created DRILL-6531:
-

 Summary: Errors in example for "Aggregate Function Interface" Boaz 
Ben-Zvi Fri 6/15, 5:54 PM Bridget Bevens
 Key: DRILL-6531
 URL: https://issues.apache.org/jira/browse/DRILL-6531
 Project: Apache Drill
  Issue Type: Task
  Components: Documentation
Reporter: Bridget Bevens
Assignee: Bridget Bevens
 Fix For: 1.14.0


Hi Bridget,

 

   There seems to be an error in the example shown in 
https://drill.apache.org/docs/custom-function-interfaces/
Custom Function Interfaces - Apache Drill
drill.apache.org
Implement the Drill interface appropriate for the type of function that you 
want to develop. Each interface provides a set of required holders where you 
input data types that your function uses and required methods that Drill calls 
to perform your function’s operations.

The error is logical, not relating to the main topic (Aggregate Function 
Interface), but may slightly confuse anyone carefully reading this doc (like me 
☺)

The error is – the red line should come before the brown line:

@Override

public void add() {

if (in.value < min.value) {

  min.value = in.value;

  secondMin.value = min.value;

}

That is - Should be:

 

@Override

public void add() {

if (in.value < min.value) {

  secondMin.value = min.value;

  min.value = in.value;

}

  This comes from interpreting the name of the new function (“The second most 
minimum”).

While on the subject – looks like the reset() function is also wrong (need to 
reset to high numbers, not zero):

 

@Override

public void reset() {

  min.value = 0;  è  9

  secondMin.value = 0;  è  9

}

  Thanks,

 

Boaz

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6530) JVM crash with a Lateral Unnest query involving multiple json files with one file having a schema change of one column from string to list

2018-06-22 Thread Kedar Sankar Behera (JIRA)
Kedar Sankar Behera created DRILL-6530:
--

 Summary: JVM crash with a Lateral Unnest query involving multiple 
json files with one file having a schema change of one column from string to 
list
 Key: DRILL-6530
 URL: https://issues.apache.org/jira/browse/DRILL-6530
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Kedar Sankar Behera
 Fix For: 1.14.0
 Attachments: 0_0_92.json, 0_0_93.json, drillbit.log, drillbit.out, 
hs_err_pid32076.log

JVM crash with a Lateral Unnest query involving multiple json files with one 
file having a schema change of one column from string to list .

Query :- 
{code}
SELECT customer.c_custkey,customer.c_acctbal,orders.o_orderkey, 
orders.o_totalprice,orders.o_orderdate,orders.o_shippriority,customer.c_address,orders.o_orderpriority,customer.c_comment
FROM customer, LATERAL 
(SELECT O.ord.o_orderkey as o_orderkey, O.ord.o_totalprice as 
o_totalprice,O.ord.o_orderdate as o_orderdate ,O.ord.o_shippriority as 
o_shippriority,O.ord.o_orderpriority 
as o_orderpriority FROM UNNEST(customer.c_orders) O(ord))orders;
{code}
The error got was 
{code}
o.a.d.e.p.impl.join.LateralJoinBatch - Output batch still has some space left, 
getting new batches from left and right
2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_custkey
2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_phone
2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_acctbal
2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_orders
2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_mktsegment
2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_address
2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_nationkey
2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_name
2018-06-21 15:25:16,303 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
o.a.d.exec.physical.impl.ScanBatch - set record count 0 for vv c_comment
2018-06-21 15:25:16,316 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
o.a.d.e.v.c.AbstractContainerVector - Field [o_comment] mutated from 
[NullableVarCharVector] to [RepeatedVarCharVector]
2018-06-21 15:25:16,318 [24d3da36-bdb8-cb5b-594c-82135bfb84aa:frag:0:0] DEBUG 
o.a.drill.exec.vector.UInt4Vector - Reallocating vector [[`$offsets$` 
(UINT4:REQUIRED)]]. # of bytes: [16384] -> [32768]
{code}

On Further investigating with [~shamirwasia] it's found that the crash only 
happens when [o_comment] mutates from  [NullableVarCharVector]  to 
[RepeatedVarCharVector],not the other way around

Please find the logs stack trace and the data file

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6529) Prokect Batch Sizing causes two LargeFileCompilation tests to fail

2018-06-22 Thread Karthikeyan Manivannan (JIRA)
Karthikeyan Manivannan created DRILL-6529:
-

 Summary: Prokect Batch Sizing causes two LargeFileCompilation 
tests to fail
 Key: DRILL-6529
 URL: https://issues.apache.org/jira/browse/DRILL-6529
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Relational Operators
Reporter: Karthikeyan Manivannan
Assignee: Karthikeyan Manivannan


Timeout failures are seen in TestLargeFileCompilation testExternal_Sort and 
testTop_N_Sort. These tests are stress tests for compilation where the queries 
cover projections over 5000 columns and sort over 500 columns. These tests pass 
if they are run stand-alone. Something triggers the timeouts when the tests are 
run in parallel as part of a unit test run.

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6528) Planner setting the wrong number of records to read (Parquet Reader)

2018-06-22 Thread salim achouche (JIRA)
salim achouche created DRILL-6528:
-

 Summary: Planner setting the wrong number of records to read 
(Parquet Reader)
 Key: DRILL-6528
 URL: https://issues.apache.org/jira/browse/DRILL-6528
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: salim achouche


- Recently fixed the Flat Parquet reader to honor the number of records to read
 - Though few tests failed:
TestUnionDistinct.testUnionDistinctEmptySides:356 Different number of records 
returned expected:<5> but was:<1>
TestUnionAll.testUnionAllEmptySides:355 Different number of records returned 
expected:<5> but was:<1>

 - I debugged one of them and realized the Planner was setting the wrong number 
of rows to read (in this case, one)
 - You can put a break point and see this happening:
Class: ParquetGroupScan
Method: updateRowGroupInfo(long maxRecords)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


PRs for Drill 1.14

2018-06-22 Thread Charles Givre
Hi Boaz, 
I just submitted a PR which should be relatively easy to review, but I’d like 
to get it into 1.14.  It is DRILL-6519.
Thanks,
— C

[jira] [Created] (DRILL-6527) Update option name for Drill Parquet native reader

2018-06-22 Thread Vitalii Diravka (JIRA)
Vitalii Diravka created DRILL-6527:
--

 Summary: Update option name for Drill Parquet native reader
 Key: DRILL-6527
 URL: https://issues.apache.org/jira/browse/DRILL-6527
 Project: Apache Drill
  Issue Type: Improvement
  Components: Storage - Hive, Storage - Parquet
Affects Versions: 1.14.0
Reporter: Vitalii Diravka
 Fix For: 1.15.0


The old option name to enable Drill parquet reader is 
"store.hive.optimize_scan_with_native_readers".
Starting from DRILL-6454 one new native reader is introduced, therefore more 
precise option name is added for parquet native reader too.
A new option name for parquet reader is 
"store.hive.parquet.optimize_scan_with_native_reader".
The old one is deprecated and should be removed starting from Drill 1.15.0 
release.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-6526) Refactor FileSystemConfig to disallow direct access from the code its variables

2018-06-22 Thread Arina Ielchiieva (JIRA)
Arina Ielchiieva created DRILL-6526:
---

 Summary: Refactor FileSystemConfig to disallow direct access from 
the code its variables
 Key: DRILL-6526
 URL: https://issues.apache.org/jira/browse/DRILL-6526
 Project: Apache Drill
  Issue Type: Task
Affects Versions: 1.13.0
Reporter: Arina Ielchiieva
Assignee: Arina Ielchiieva
 Fix For: 1.14.0


Refactor FileSystemConfig to disallow direct access from the code its variables.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Question about SubsetTransformer

2018-06-22 Thread Renjie Liu
Hi, drill devs:
I'm developing a storage plugin for our own storage engine and I'm
confusing about the SubsetTransformer  class while reading the source code.
It seems that its goal is to find all physical implementations of an input
and add it directly to the current as input. But I think this maybe
unnecessary since volcano planner will search rules and do the
transformation for us. What I guess is that this can accelerate the
optimization process.
-- 
Liu, Renjie
Software Engineer, MVAD