[jira] [Created] (DRILL-7278) Refactor result set loader projection mechanism

2019-05-25 Thread Paul Rogers (JIRA)
Paul Rogers created DRILL-7278:
--

 Summary: Refactor result set loader projection mechanism
 Key: DRILL-7278
 URL: https://issues.apache.org/jira/browse/DRILL-7278
 Project: Apache Drill
  Issue Type: Improvement
Affects Versions: 1.16.0
Reporter: Paul Rogers
Assignee: Paul Rogers
 Fix For: 1.17.0






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7261) Simplify Easy format config for new scan framework

2019-05-25 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-7261:
---
Reviewer: Arina Ielchiieva

> Simplify Easy format config for new scan framework
> --
>
> Key: DRILL-7261
> URL: https://issues.apache.org/jira/browse/DRILL-7261
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Rollup of related CSV V3 fixes along with supporting row set framework fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7261) Simplify Easy format config for new scan framework

2019-05-25 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848328#comment-16848328
 ] 

ASF GitHub Bot commented on DRILL-7261:
---

paul-rogers commented on pull request #1796: DRILL-7261: Simplify Easy 
framework config for new scan
URL: https://github.com/apache/drill/pull/1796
 
 
   Most format plugins are created using the Easy format plugin. A recent
   change added support for the "row set" scan framework. After converting
   the text and log reader plugins, it became clear that the setup code
   could be made simpler.
   
   * Add the user name to the "file scan" framework.
   * Pass the file system, split and user name to the batch reader via
the "schema negotiator" rather than via the constructor.
   * Create the traditional "scan batch" scan or the new row-set scan via
   functions instead of classes.
   * Add Easy config option and method to choose the kind of scan
   framework.
   * Add Easy config options for some newer options such as whether the
   plugin supports statistics.
   
   Tested by running all unit tests for the CSV reader which is based on
   the new framework, and by testing the converted log reader (that reader
   is not part of this commit.)
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Simplify Easy format config for new scan framework
> --
>
> Key: DRILL-7261
> URL: https://issues.apache.org/jira/browse/DRILL-7261
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Rollup of related CSV V3 fixes along with supporting row set framework fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (DRILL-7261) Simplify Easy format config for new scan framework

2019-05-25 Thread Paul Rogers (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers updated DRILL-7261:
---
Summary: Simplify Easy format config for new scan framework  (was: Rollup 
of CSV V3 fixes)

> Simplify Easy format config for new scan framework
> --
>
> Key: DRILL-7261
> URL: https://issues.apache.org/jira/browse/DRILL-7261
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.16.0
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.17.0
>
>
> Rollup of related CSV V3 fixes along with supporting row set framework fixes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (DRILL-7277) Bug in planner with redundant order-by

2019-05-25 Thread Ted Dunning (JIRA)


[ 
https://issues.apache.org/jira/browse/DRILL-7277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16848071#comment-16848071
 ] 

Ted Dunning commented on DRILL-7277:


This query:
{{select row_number() over (order by department_id desc) r, department_id
from (select department_id
 from  cp.`employee.json`
 order by department_id desc) ;}}

blows beets as below but putting department_id first in the output doesn't.

{{java.sql.SQLException: [MapR][DrillJDBCDriver](500165) Query execution error. 
Details: SYSTEM ERROR: CannotPlanException: Node 
[rel#26937:Subset#4.LOGICAL.ANY([]).[1 DESC]] could not be implemented; planner 
state:

Root: rel#26937:Subset#4.LOGICAL.ANY([]).[1 DESC]
Original rel:
LogicalProject(subset=[rel#26937:Subset#4.LOGICAL.ANY([]).[1 DESC]], r=[$1], 
department_id=[$0]): rowcount = 100.0, cumulative cost = {100.0 rows, 200.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 26935
  LogicalWindow(subset=[rel#26934:Subset#3.NONE.ANY([]).[1 DESC]], 
window#0=[window(partition {} order by [0 DESC] rows between UNBOUNDED 
PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])]): rowcount = 100.0, cumulative 
cost = {100.0 rows, 200.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 26933
LogicalSort(subset=[rel#26932:Subset#2.NONE.ANY([]).[0 DESC]], sort0=[$0], 
dir0=[DESC]): rowcount = 100.0, cumulative cost = {100.0 rows, 
1842.0680743952366 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 26931
  LogicalProject(subset=[rel#26930:Subset#1.NONE.ANY([]).[]], 
department_id=[$1]): rowcount = 100.0, cumulative cost = {100.0 rows, 100.0 
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 26929
EnumerableTableScan(subset=[rel#26928:Subset#0.ENUMERABLE.ANY([]).[]], 
table=[[cp, employee.json]]): rowcount = 100.0, cumulative cost = {100.0 rows, 
101.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 26880

Sets:
Set#0, type: RecordType(DYNAMIC_STAR **, ANY department_id)
rel#26928:Subset#0.ENUMERABLE.ANY([]).[], best=rel#26880, 
importance=0.59049001
rel#26880:EnumerableTableScan.ENUMERABLE.ANY([]).[](table=[cp, 
employee.json]), rowcount=100.0, cumulative cost={100.0 rows, 101.0 cpu, 0.0 
io, 0.0 network, 0.0 memory}
rel#26952:Subset#0.LOGICAL.ANY([]).[], best=rel#26954, 
importance=0.3247695
rel#26954:DrillScanRel.LOGICAL.ANY([]).[](table=[cp, 
employee.json],groupscan=EasyGroupScan [selectionRoot=classpath:/employee.json, 
numFiles=1, columns=[`**`, `department_id`], 
files=[classpath:/employee.json]]), rowcount=463.0, cumulative cost={463.0 
rows, 463.0 cpu, 0.0 io, 0.0 network, 0.0 memory}
Set#1, type: RecordType(ANY department_id)
rel#26930:Subset#1.NONE.ANY([]).[], best=null, importance=0.6561

rel#26929:LogicalProject.NONE.ANY([]).[](input=rel#26928:Subset#0.ENUMERABLE.ANY([]).[],department_id=$1),
 rowcount=100.0, cumulative cost={inf}
rel#26931:LogicalSort.NONE.ANY([]).[0 
DESC](input=rel#26930:Subset#1.NONE.ANY([]).[],sort0=$0,dir0=DESC), 
rowcount=100.0, cumulative cost={inf}
rel#26943:Subset#1.LOGICAL.ANY([]).[], best=rel#26950, importance=0.405
rel#26944:DrillSortRel.LOGICAL.ANY([]).[0 
DESC](input=rel#26943:Subset#1.LOGICAL.ANY([]).[],sort0=$0,dir0=DESC), 
rowcount=463.0, cumulative cost={926.0 rows, 11830.070504167705 cpu, 0.0 io, 
0.0 network, 0.0 memory}
rel#26950:DrillScanRel.LOGICAL.ANY([]).[](table=[cp, 
employee.json],groupscan=EasyGroupScan [selectionRoot=classpath:/employee.json, 
numFiles=1, columns=[`department_id`], files=[classpath:/employee.json]]), 
rowcount=463.0, cumulative cost={463.0 rows, 463.0 cpu, 0.0 io, 0.0 network, 
0.0 memory}

rel#26953:DrillProjectRel.LOGICAL.ANY([]).[](input=rel#26952:Subset#0.LOGICAL.ANY([]).[],department_id=$1),
 rowcount=463.0, cumulative cost={926.0 rows, 4630463.0 cpu, 0.0 io, 0.0 
network, 0.0 memory}
rel#26946:Subset#1.NONE.ANY([]).[0 DESC], best=null, 
importance=0.7291
rel#26931:LogicalSort.NONE.ANY([]).[0 
DESC](input=rel#26930:Subset#1.NONE.ANY([]).[],sort0=$0,dir0=DESC), 
rowcount=100.0, cumulative cost={inf}
rel#26947:Subset#1.LOGICAL.ANY([]).[1 DESC], best=null, importance=0.81
rel#26948:Subset#1.LOGICAL.ANY([]).[0 DESC], best=rel#26944, 
importance=0.405
rel#26944:DrillSortRel.LOGICAL.ANY([]).[0 
DESC](input=rel#26943:Subset#1.LOGICAL.ANY([]).[],sort0=$0,dir0=DESC), 
rowcount=463.0, cumulative cost={926.0 rows, 11830.070504167705 cpu, 0.0 io, 
0.0 network, 0.0 memory}
Set#3, type: RecordType(ANY department_id, BIGINT w0$o0)
rel#26934:Subset#3.NONE.ANY([]).[1 DESC], best=null, importance=0.81
rel#26933:LogicalWindow.NONE.ANY([]).[[1 
DESC]](input=rel#26946:Subset#1.NONE.ANY([]).[0 DESC],window#0=window(partition 
{} order by [0 DESC] rows between UNBOUNDED PRECEDING and CURRENT ROW aggs 
[ROW_NUMBER()])), 

[jira] [Created] (DRILL-7277) Bug in planner with redundant order-by

2019-05-25 Thread Ted Dunning (JIRA)
Ted Dunning created DRILL-7277:
--

 Summary: Bug in planner with redundant order-by
 Key: DRILL-7277
 URL: https://issues.apache.org/jira/browse/DRILL-7277
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 1.14.0
Reporter: Ted Dunning






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)