Re: Window function

2016-11-25 Thread Nitin Pawar
adding dev list for comments

On Wed, Nov 23, 2016 at 7:04 PM, Nitin Pawar 
wrote:

> Hi,
>
> according to DRILL-3596 ,
> lead or lag function are limited to use offset as 1.
>
> according to documentation on postgres
> lag(value any [, offset integer [, default any ]]) same type as value
> returns value evaluated at the row that is offset rows before the current
> row within the partition; if there is no such row, instead return default.
> Both offset and default are evaluated with respect to the current row. If
> omitted, offset defaults to 1 and default to null
>
>
> is there any plan to allow offset according to needs but not restrict
> equal to 1
>
> usecase :
>
> I have daily data for a month.
> every day I want to do a delta with last week same day like compare monday
> with monday and tuesday with tuesday so basically do a lag(col, 7)
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar


Re: Window function

2016-12-01 Thread Nitin Pawar
any help on this ?

from  class NoFrameSupportTemplate, I see that

inIndex is hard coded to point  to previous row in case of lag and
next row in case of lead.

Is there a way I can modify this and pass it as parameter to pic
appropriate row?


On Fri, Nov 25, 2016 at 2:57 PM, Nitin Pawar 
wrote:

> adding dev list for comments
>
> On Wed, Nov 23, 2016 at 7:04 PM, Nitin Pawar 
> wrote:
>
>> Hi,
>>
>> according to DRILL-3596
>> , lead or lag function
>> are limited to use offset as 1.
>>
>> according to documentation on postgres
>> lag(value any [, offset integer [, default any ]]) same type as value
>> returns value evaluated at the row that is offset rows before the
>> current row within the partition; if there is no such row, instead return
>> default. Both offset and default are evaluated with respect to the
>> current row. If omitted, offset defaults to 1 and default to null
>>
>>
>> is there any plan to allow offset according to needs but not restrict
>> equal to 1
>>
>> usecase :
>>
>> I have daily data for a month.
>> every day I want to do a delta with last week same day like compare
>> monday with monday and tuesday with tuesday so basically do a lag(col, 7)
>>
>> --
>> Nitin Pawar
>>
>
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar


Re: Window function

2016-12-02 Thread deneche abdelhakim
Hello Nitin,

It's definitely possible to support offsets other than 1 for Lead and Lag,
the main reason I didn't do it is just lack of time :P

Things that need to be done to make Lag (or Lead) support offsets other
than 1:
- WindowFunction.Lead should extract the offset value from its FunctionCall
argument, you can look at WindowFunctionNtile.numTilesFromExpression() for
and example on how to do that.
- make sure calls to copyNext() and copyPrev() in NoFrameSupportTemplate
use the offset and not the hard coded value (you already figured that out)
- finally make sure you update UnsupportedOperatorsVisitor to no longer
throw an exception when we pass an offset value other than 1 to Lead or
Lag. Just search for DRILL-3596 in that class and you will find the if
block that need to be removed

I think this should be enough to get it to work in the general case, do you
want to volunteer and get this done ? that would be an awesome contribution
to the project.

Thanks

On Thu, Dec 1, 2016 at 10:10 PM Nitin Pawar  wrote:

> any help on this ?
>
> from  class NoFrameSupportTemplate, I see that
>
> inIndex is hard coded to point  to previous row in case of lag and
> next row in case of lead.
>
> Is there a way I can modify this and pass it as parameter to pic
> appropriate row?
>
>
> On Fri, Nov 25, 2016 at 2:57 PM, Nitin Pawar 
> wrote:
>
> > adding dev list for comments
> >
> > On Wed, Nov 23, 2016 at 7:04 PM, Nitin Pawar 
> > wrote:
> >
> >> Hi,
> >>
> >> according to DRILL-3596
> >> , lead or lag
> function
> >> are limited to use offset as 1.
> >>
> >> according to documentation on postgres
> >> lag(value any [, offset integer [, default any ]]) same type as value
> >> returns value evaluated at the row that is offset rows before the
> >> current row within the partition; if there is no such row, instead
> return
> >> default. Both offset and default are evaluated with respect to the
> >> current row. If omitted, offset defaults to 1 and default to null
> >>
> >>
> >> is there any plan to allow offset according to needs but not restrict
> >> equal to 1
> >>
> >> usecase :
> >>
> >> I have daily data for a month.
> >> every day I want to do a delta with last week same day like compare
> >> monday with monday and tuesday with tuesday so basically do a lag(col,
> 7)
> >>
> >> --
> >> Nitin Pawar
> >>
> >
> >
> >
> > --
> > Nitin Pawar
> >
>
>
>
> --
> Nitin Pawar
>


Re: Window function

2016-12-03 Thread Khurram Faraaz
Hakim, thanks for sharing those details and the explanation.
I will file a JIRA and anyone interested can pick it up and provide the fix
to support OFFSET values greater than one for LAG window function.

Regards,
Khurram

On Fri, Dec 2, 2016 at 10:18 PM, deneche abdelhakim 
wrote:

> Hello Nitin,
>
> It's definitely possible to support offsets other than 1 for Lead and Lag,
> the main reason I didn't do it is just lack of time :P
>
> Things that need to be done to make Lag (or Lead) support offsets other
> than 1:
> - WindowFunction.Lead should extract the offset value from its FunctionCall
> argument, you can look at WindowFunctionNtile.numTilesFromExpression() for
> and example on how to do that.
> - make sure calls to copyNext() and copyPrev() in NoFrameSupportTemplate
> use the offset and not the hard coded value (you already figured that out)
> - finally make sure you update UnsupportedOperatorsVisitor to no longer
> throw an exception when we pass an offset value other than 1 to Lead or
> Lag. Just search for DRILL-3596 in that class and you will find the if
> block that need to be removed
>
> I think this should be enough to get it to work in the general case, do you
> want to volunteer and get this done ? that would be an awesome contribution
> to the project.
>
> Thanks
>
> On Thu, Dec 1, 2016 at 10:10 PM Nitin Pawar 
> wrote:
>
> > any help on this ?
> >
> > from  class NoFrameSupportTemplate, I see that
> >
> > inIndex is hard coded to point  to previous row in case of lag and
> > next row in case of lead.
> >
> > Is there a way I can modify this and pass it as parameter to pic
> > appropriate row?
> >
> >
> > On Fri, Nov 25, 2016 at 2:57 PM, Nitin Pawar 
> > wrote:
> >
> > > adding dev list for comments
> > >
> > > On Wed, Nov 23, 2016 at 7:04 PM, Nitin Pawar 
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> according to DRILL-3596
> > >> <https://issues.apache.org/jira/browse/DRILL-3596>, lead or lag
> > function
> > >> are limited to use offset as 1.
> > >>
> > >> according to documentation on postgres
> > >> lag(value any [, offset integer [, default any ]]) same type as value
> > >> returns value evaluated at the row that is offset rows before the
> > >> current row within the partition; if there is no such row, instead
> > return
> > >> default. Both offset and default are evaluated with respect to the
> > >> current row. If omitted, offset defaults to 1 and default to null
> > >>
> > >>
> > >> is there any plan to allow offset according to needs but not restrict
> > >> equal to 1
> > >>
> > >> usecase :
> > >>
> > >> I have daily data for a month.
> > >> every day I want to do a delta with last week same day like compare
> > >> monday with monday and tuesday with tuesday so basically do a lag(col,
> > 7)
> > >>
> > >> --
> > >> Nitin Pawar
> > >>
> > >
> > >
> > >
> > > --
> > > Nitin Pawar
> > >
> >
> >
> >
> > --
> > Nitin Pawar
> >
>


Re: Window function

2016-12-03 Thread Khurram Faraaz
   1. DRILL-5099 <https://issues.apache.org/jira/browse/DRILL-5099> is
   created to track this.


On Sat, Dec 3, 2016 at 5:46 PM, Khurram Faraaz  wrote:

> Hakim, thanks for sharing those details and the explanation.
> I will file a JIRA and anyone interested can pick it up and provide the
> fix to support OFFSET values greater than one for LAG window function.
>
> Regards,
> Khurram
>
> On Fri, Dec 2, 2016 at 10:18 PM, deneche abdelhakim 
> wrote:
>
>> Hello Nitin,
>>
>> It's definitely possible to support offsets other than 1 for Lead and Lag,
>> the main reason I didn't do it is just lack of time :P
>>
>> Things that need to be done to make Lag (or Lead) support offsets other
>> than 1:
>> - WindowFunction.Lead should extract the offset value from its
>> FunctionCall
>> argument, you can look at WindowFunctionNtile.numTilesFromExpression()
>> for
>> and example on how to do that.
>> - make sure calls to copyNext() and copyPrev() in NoFrameSupportTemplate
>> use the offset and not the hard coded value (you already figured that out)
>> - finally make sure you update UnsupportedOperatorsVisitor to no longer
>> throw an exception when we pass an offset value other than 1 to Lead or
>> Lag. Just search for DRILL-3596 in that class and you will find the if
>> block that need to be removed
>>
>> I think this should be enough to get it to work in the general case, do
>> you
>> want to volunteer and get this done ? that would be an awesome
>> contribution
>> to the project.
>>
>> Thanks
>>
>> On Thu, Dec 1, 2016 at 10:10 PM Nitin Pawar 
>> wrote:
>>
>> > any help on this ?
>> >
>> > from  class NoFrameSupportTemplate, I see that
>> >
>> > inIndex is hard coded to point  to previous row in case of lag and
>> > next row in case of lead.
>> >
>> > Is there a way I can modify this and pass it as parameter to pic
>> > appropriate row?
>> >
>> >
>> > On Fri, Nov 25, 2016 at 2:57 PM, Nitin Pawar 
>> > wrote:
>> >
>> > > adding dev list for comments
>> > >
>> > > On Wed, Nov 23, 2016 at 7:04 PM, Nitin Pawar > >
>> > > wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> according to DRILL-3596
>> > >> <https://issues.apache.org/jira/browse/DRILL-3596>, lead or lag
>> > function
>> > >> are limited to use offset as 1.
>> > >>
>> > >> according to documentation on postgres
>> > >> lag(value any [, offset integer [, default any ]]) same type as value
>> > >> returns value evaluated at the row that is offset rows before the
>> > >> current row within the partition; if there is no such row, instead
>> > return
>> > >> default. Both offset and default are evaluated with respect to the
>> > >> current row. If omitted, offset defaults to 1 and default to null
>> > >>
>> > >>
>> > >> is there any plan to allow offset according to needs but not restrict
>> > >> equal to 1
>> > >>
>> > >> usecase :
>> > >>
>> > >> I have daily data for a month.
>> > >> every day I want to do a delta with last week same day like compare
>> > >> monday with monday and tuesday with tuesday so basically do a
>> lag(col,
>> > 7)
>> > >>
>> > >> --
>> > >> Nitin Pawar
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Nitin Pawar
>> > >
>> >
>> >
>> >
>> > --
>> > Nitin Pawar
>> >
>>
>
>


Re: Window function

2016-12-04 Thread Nitin Pawar
Thanks Deneche for the explanation.

Thanks Khurram for the ticket.

let me see if I can pick it up and close it soon.

On Sun, Dec 4, 2016 at 12:11 AM, Khurram Faraaz 
wrote:

>1. DRILL-5099 <https://issues.apache.org/jira/browse/DRILL-5099> is
>created to track this.
>
>
> On Sat, Dec 3, 2016 at 5:46 PM, Khurram Faraaz 
> wrote:
>
> > Hakim, thanks for sharing those details and the explanation.
> > I will file a JIRA and anyone interested can pick it up and provide the
> > fix to support OFFSET values greater than one for LAG window function.
> >
> > Regards,
> > Khurram
> >
> > On Fri, Dec 2, 2016 at 10:18 PM, deneche abdelhakim 
> > wrote:
> >
> >> Hello Nitin,
> >>
> >> It's definitely possible to support offsets other than 1 for Lead and
> Lag,
> >> the main reason I didn't do it is just lack of time :P
> >>
> >> Things that need to be done to make Lag (or Lead) support offsets other
> >> than 1:
> >> - WindowFunction.Lead should extract the offset value from its
> >> FunctionCall
> >> argument, you can look at WindowFunctionNtile.numTilesFromExpression()
> >> for
> >> and example on how to do that.
> >> - make sure calls to copyNext() and copyPrev() in NoFrameSupportTemplate
> >> use the offset and not the hard coded value (you already figured that
> out)
> >> - finally make sure you update UnsupportedOperatorsVisitor to no longer
> >> throw an exception when we pass an offset value other than 1 to Lead or
> >> Lag. Just search for DRILL-3596 in that class and you will find the if
> >> block that need to be removed
> >>
> >> I think this should be enough to get it to work in the general case, do
> >> you
> >> want to volunteer and get this done ? that would be an awesome
> >> contribution
> >> to the project.
> >>
> >> Thanks
> >>
> >> On Thu, Dec 1, 2016 at 10:10 PM Nitin Pawar 
> >> wrote:
> >>
> >> > any help on this ?
> >> >
> >> > from  class NoFrameSupportTemplate, I see that
> >> >
> >> > inIndex is hard coded to point  to previous row in case of lag and
> >> > next row in case of lead.
> >> >
> >> > Is there a way I can modify this and pass it as parameter to pic
> >> > appropriate row?
> >> >
> >> >
> >> > On Fri, Nov 25, 2016 at 2:57 PM, Nitin Pawar  >
> >> > wrote:
> >> >
> >> > > adding dev list for comments
> >> > >
> >> > > On Wed, Nov 23, 2016 at 7:04 PM, Nitin Pawar <
> nitinpawar...@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > >> Hi,
> >> > >>
> >> > >> according to DRILL-3596
> >> > >> <https://issues.apache.org/jira/browse/DRILL-3596>, lead or lag
> >> > function
> >> > >> are limited to use offset as 1.
> >> > >>
> >> > >> according to documentation on postgres
> >> > >> lag(value any [, offset integer [, default any ]]) same type as
> value
> >> > >> returns value evaluated at the row that is offset rows before the
> >> > >> current row within the partition; if there is no such row, instead
> >> > return
> >> > >> default. Both offset and default are evaluated with respect to the
> >> > >> current row. If omitted, offset defaults to 1 and default to null
> >> > >>
> >> > >>
> >> > >> is there any plan to allow offset according to needs but not
> restrict
> >> > >> equal to 1
> >> > >>
> >> > >> usecase :
> >> > >>
> >> > >> I have daily data for a month.
> >> > >> every day I want to do a delta with last week same day like compare
> >> > >> monday with monday and tuesday with tuesday so basically do a
> >> lag(col,
> >> > 7)
> >> > >>
> >> > >> --
> >> > >> Nitin Pawar
> >> > >>
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Nitin Pawar
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Nitin Pawar
> >> >
> >>
> >
> >
>



-- 
Nitin Pawar


[jira] [Created] (DRILL-3360) Window function defined within another window function

2015-06-24 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-3360:
-

 Summary: Window function defined within another window function 
 Key: DRILL-3360
 URL: https://issues.apache.org/jira/browse/DRILL-3360
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.1.0
 Environment: CentOS 4 node cluster
Reporter: Khurram Faraaz
Assignee: Jinfeng Ni


Window function defined within another window function, Postgres 9.3 does not 
support this, Drill supports it and we see results being returned. We should 
not support this kind of query.

>From Postgres 9.3
{code}
postgres=# select rank() over(order by row_number() over(order by col_int)) 
from vwOnParq_wCst;
ERROR:  window functions are not allowed in window definitions
LINE 1: select rank() over(order by row_number() over(order by col_i...
{code}

>From execution on Drill
{code}
0: jdbc:drill:schema=dfs.tmp> select rank() over(order by row_number() 
over(order by col_int)) from vwOnParq_wCst;
+-+
| EXPR$0  |
+-+
| 1   |
| 2   |
| 3   |
| 4   |
| 5   |
| 6   |
| 7   |
| 8   |
| 9   |
| 10  |
| 11  |
| 12  |
| 13  |
| 14  |
| 15  |
| 16  |
| 17  |
| 18  |
| 19  |
| 20  |
| 21  |
| 22  |
| 23  |
| 24  |
| 25  |
| 26  |
| 27  |
| 28  |
| 29  |
| 30  |
+-+
30 rows selected (0.377 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


how should window function works

2014-12-11 Thread Abdel Hakim Deneche
as part of DRILL-1487 <https://issues.apache.org/jira/browse/DRILL-1487>, I
did some research about sql window functions. I am posting here what I've
found so far, to make sure I am getting everything right, and to discuss
how we can fix the current implementation.

a window function performs a calculation across a set of table rows that
are somehow related to the current row. It's similar to aggregate
functions, but doesn't cause rows to be grouped into a single output row.
The over clause determines exactly how the rows of the query are split up
for processing by the window function.

For each row, there is a set of rows within it's partition called it's
"window frame". Many (but not all) window functions act only on the rows of
the window frame, rather than of the whole partition. All aggregate
functions act on the rows of the window frame.

the over clause can contain a "partition by" clause, an "order by" clause
and a frame clause. When "partition by" is omitted there is one single
partition that contains all rows. If "order by" is supplied then the frame
consists of all rows from the start of the partition up through the current
row, plus any following rows that are equal to the current row according to
the "order by" clause. When "order by" is omitted, the default frame
consists of all rows in the partition. The frame clause specifies the set
of rows constituting the window frame.

DRILL-1487 deals with a specific case: which rows belong to the window
frame when an "order by" clause is available. But there are several other
problems related to window functions in Drill:

1/ omitting "partition by", "order by" or both throws an exception.

2/ using a different field in "partition by" and "order by" doesn't produce
the expected results. For example, the following query should display the
rows sorted by position_id and salary, but the result are only sorted by
position_id:

SELECT employee_id, position_id, salary, AVG(salary) OVER(PARTITION BY
position_id ORDER BY salary) FROM cp.`employee.json` LIMIT 20;

3/ frame clauses are ignored, for example the following query doesn't work
as expected ("ROWS CURRENT ROW" has no effect):

SELECT employee_id, position_id, salary, AVG(salary) OVER(PARTITION BY
position_id ORDER BY position_id ROWS CURRENT ROW) FROM cp.`employee.json`
LIMIT 20;

4/ starting from Drill 0.7.0 window functions throw an exception (see
DRILL-1844 <https://issues.apache.org/jira/browse/DRILL-1844>)


Re: how should window function works

2015-01-10 Thread Abdel Hakim Deneche
I've been working on a new Window operator (DRILL-1908
<https://issues.apache.org/jira/browse/DRILL-1908>), I decided to start
from scratch because I had to extend from AbstractRecordBatch instead of
AbstractSingleRecordBatch, but I did reuse lot's of code from the existing
window record batch.

DRILL-1908 comments contain lot's of informations about the design and the
work I've been doing. The new operator supports over clauses with "order
by" clauses, it also supports over clauses WITHOUT "order by" clauses,
although there is a bug (DRILL-1852
<https://issues.apache.org/jira/browse/DRILL-1852>) in optiq-drill that
prevents such queries to be planed.

I added some unit tests, using the new test framework, but I still need to
add more tests to make sure edge cases run fine.

The code as it is requires some cleaning and refactoring, and I still need
to support "frame clauses
<http://www.postgresql.org/docs/9.1/static/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS>",
but I designed the operator with that in mind so technically I just need to
rewrite one function to get frame clauses supported.

On Thu, Dec 11, 2014 at 3:52 PM, Abdel Hakim Deneche 
wrote:

> as part of DRILL-1487 <https://issues.apache.org/jira/browse/DRILL-1487>,
> I did some research about sql window functions. I am posting here what I've
> found so far, to make sure I am getting everything right, and to discuss
> how we can fix the current implementation.
>
> a window function performs a calculation across a set of table rows that
> are somehow related to the current row. It's similar to aggregate
> functions, but doesn't cause rows to be grouped into a single output row.
> The over clause determines exactly how the rows of the query are split up
> for processing by the window function.
>
> For each row, there is a set of rows within it's partition called it's
> "window frame". Many (but not all) window functions act only on the rows of
> the window frame, rather than of the whole partition. All aggregate
> functions act on the rows of the window frame.
>
> the over clause can contain a "partition by" clause, an "order by" clause
> and a frame clause. When "partition by" is omitted there is one single
> partition that contains all rows. If "order by" is supplied then the frame
> consists of all rows from the start of the partition up through the current
> row, plus any following rows that are equal to the current row according to
> the "order by" clause. When "order by" is omitted, the default frame
> consists of all rows in the partition. The frame clause specifies the set
> of rows constituting the window frame.
>
> DRILL-1487 deals with a specific case: which rows belong to the window
> frame when an "order by" clause is available. But there are several other
> problems related to window functions in Drill:
>
> 1/ omitting "partition by", "order by" or both throws an exception.
>
> 2/ using a different field in "partition by" and "order by" doesn't
> produce the expected results. For example, the following query should
> display the rows sorted by position_id and salary, but the result are only
> sorted by position_id:
>
> SELECT employee_id, position_id, salary, AVG(salary) OVER(PARTITION BY
> position_id ORDER BY salary) FROM cp.`employee.json` LIMIT 20;
>
> 3/ frame clauses are ignored, for example the following query doesn't work
> as expected ("ROWS CURRENT ROW" has no effect):
>
> SELECT employee_id, position_id, salary, AVG(salary) OVER(PARTITION BY
> position_id ORDER BY position_id ROWS CURRENT ROW) FROM cp.`employee.json`
> LIMIT 20;
>
> 4/ starting from Drill 0.7.0 window functions throw an exception (see
> DRILL-1844 <https://issues.apache.org/jira/browse/DRILL-1844>)
>
>


select * + window function gives strange column names

2015-05-28 Thread Abdel Hakim Deneche
I have a small json file that contains the following columns:
employee_id, position_id, salary, sub and count

when I run the following query:

select *, count(*) over(partition by position_id) from `myfile.json`;

I get this list of columns:

T1¦¦employee_id  |  T1¦¦position_id  |  T1¦¦sub  |  T1¦¦salary  |
>  position_id  |  w0$o0  |  w0$o00  |  EXPR$1 |


is this correct ?


-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



[jira] [Created] (DRILL-1908) new window function implementation

2014-12-30 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-1908:
---

 Summary: new window function implementation
 Key: DRILL-1908
 URL: https://issues.apache.org/jira/browse/DRILL-1908
 Project: Apache Drill
  Issue Type: Improvement
  Components: Execution - Operators
Reporter: Deneche A. Hakim
Assignee: Deneche A. Hakim
Priority: Critical


In order to fix DRILL-1487 a complete rewrite of the 
StreamingWindowFrameRecordBatch may be needed. The purpose of this issue is to 
report my progress and share my thoughts with the community in order to get a 
proper implementation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: select * + window function gives strange column names

2015-05-28 Thread Abdel Hakim Deneche
never mind, DRILL-3210 
was just filled for a similar problem

On Thu, May 28, 2015 at 3:51 PM, Abdel Hakim Deneche 
wrote:

> I have a small json file that contains the following columns:
> employee_id, position_id, salary, sub and count
>
> when I run the following query:
>
> select *, count(*) over(partition by position_id) from `myfile.json`;
>
> I get this list of columns:
>
> T1¦¦employee_id  |  T1¦¦position_id  |  T1¦¦sub  |  T1¦¦salary  |
>>  position_id  |  w0$o0  |  w0$o00  |  EXPR$1 |
>
>
> is this correct ?
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> 
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



[jira] [Created] (DRILL-3651) Window function should not be allowed in order by clause of over clause of window function

2015-08-14 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-3651:
---

 Summary: Window function should not be allowed in order by clause 
of over clause of window function
 Key: DRILL-3651
 URL: https://issues.apache.org/jira/browse/DRILL-3651
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Minor


Should not parse and throw an error according to SQL standard.
"ISO/IEC 9075-2:2011(E) 7.11 " 
d) If WDX has a window ordering clause, then WDEF shall not specify  (hope I'm reading it correctly)

{code}
SELECT rank() OVER (ORDER BY rank() OVER (ORDER BY c1)) from t1;
{code}

Instead, drill returns result:
{code}
0: jdbc:drill:schema=dfs> SELECT rank() OVER (ORDER BY rank() OVER (ORDER BY 
c1)) from t1;
+-+
| EXPR$0  |
+-+
| 1   |
| 2   |
| 3   |
| 4   |
| 5   |
| 6   |
| 7   |
| 8   |
| 9   |
| 10  |
+-+
10 rows selected (0.336 seconds)
{code}

Postgres throws an error in this case:
{code}
postgres=# SELECT rank() OVER (ORDER BY rank() OVER (ORDER BY c1)) from t1;
ERROR:  window functions are not allowed in window definitions
LINE 1: SELECT rank() OVER (ORDER BY rank() OVER (ORDER BY c1)) from...
{code}

Courtesy of postgres test suite.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4810) CalciteContextException - SELECT star, window function

2016-07-27 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4810:
-

 Summary: CalciteContextException - SELECT star, window function
 Key: DRILL-4810
 URL: https://issues.apache.org/jira/browse/DRILL-4810
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.8.0
Reporter: Khurram Faraaz


This is a problem seen with regular window functions (not specific to nested 
aggregates).

{noformat}
SELECT *,  OVER...
does not work on Drill
{noformat}

Same query works on Postgres and on same data.

{noformat}
postgres=# select *, MIN(col9) over(partition by col7 order by col8) from 
fewrwspqq_101 GROUP BY col0,col1,col2,col3,col4,col5,col6,col7,col8,col9;
col0|col1 |   col2   |   col3| col4 |   
   col5   |col6| col7 | col8 | col9 
| min  
+-+--+---+--+-++--+--+--+--
  1 |   65534 |  256 |1234.9 | 20:26:18.58  | 
2014-03-02 00:28:02.338 | 1952-08-14 | f| CA   | 
AXCZ | 
AXCZ
 13 | 200 |1 |-65534 | 19:20:30.5   | 
2004-06-02 00:28:02.418 | 1972-04-03 | f| GA   | 
PXCD | 
AXCZ
  7 |  17 |  -33 |  33.9 | 13:13:13.13  | 
2006-05-02 00:28:02.748 | 1992-12-12 | f| I
...  
{noformat}

Drill throws error
{noformat}
0: jdbc:drill:schema=dfs.tmp> select *, MIN(col9) over(partition by col7 order 
by col8) from `allTypsUniq.parquet` GROUP BY 
col0,col1,col2,col3,col4,col5,col6,col7,col8,col9;
Error: VALIDATION ERROR: At line 1, column 8: Expression 
'allTypsUniq.parquet.*' is not being grouped

SQL Query null

[Error Id: f83e438f-5cdf-42f2-a548-5dc17b49da07 on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

Upon expanding star in the project Drill returns results
{noformat}
0: jdbc:drill:schema=dfs.tmp> select 
col0,col1,col2,col3,col4,col5,col6,col7,col8,col9, MIN(col9) over(partition by 
col7 order by col8) from `allTypsUniq.parquet` GROUP BY 
col0,col1,col2,col3,col4,col5,col6,col7,col8,col9;
+-+--+---++---+--+-++---+---+---+
|col0 | col1 |   col2|col3| col4  | 
  col5   |col6 |  col7  | col8  |   
  col9  |EXPR$10
|
+-+--+---++---+--+-++---+---+---+
| 1   | 65534| 256.0 | 1234.9 | 20:26:18.580  | 
2014-03-02 00:28:02.338  | 1952-08-14  | false  | CA| 
AXCZ  | 
AXCZ  |
| 13  | 200  | 1.0   | -65534.0   | 19:20:30.500  | 
2004-06-02 00:28:02.418  | 1972-04-03  | false  | GA| 
PXCD  | 
AXCZ  |
| 7   | 17   | -33.0 | 33.9   | 13:13:13.130  | 
2006-05-02 00:28:02.748  | 1992-12-12  | false  | IA| 
UXCB  | 
AXCZ  |
...
{noformat}

Stack trace from drillbit.log

{noformat}
2016-07-27 13:52:45,103 [28674351-99f0-2691-02f8-e6d576c85be1:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
28674351-99f0-2691-02f8-e6d576c85be1: select *, MIN(col9) over(partition by 
col7 order by col8) from `allTypsUniq.parquet` GROUP BY 
col0,col1,col2,col3,col4,col5,col6,col7,col8,col9
2016-07-27 13:52:45,132 [28674351-99f0-2691-02f8-e6d576c85be1:foreman] INFO  
o.a.d.exec.planner.sql.SqlConverter - User Error Occurred
org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: At line 1, 
column 8: Expression 'allTypsUniq.parquet.*' is not being grouped

SQL Query null

[Error Id: c6758b36-8cb6-4965-861f-2aaf573c4002 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.j

[jira] [Created] (DRILL-1844) window function throws an exception

2014-12-11 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-1844:
---

 Summary: window function throws an exception
 Key: DRILL-1844
 URL: https://issues.apache.org/jira/browse/DRILL-1844
 Project: Apache Drill
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Deneche A. Hakim


Starting from Drill 0.7.0, The following query:
{code}
SELECT employee_id,position_id, salary, avg(salary) OVER (PARTITION BY 
position_id order by position_id) FROM cp.`employee.json` order by employee_id;
{code}
Throws an _IllegalStateException_:
{code}
Attempted to close accountor with 6 buffer(s) still allocatedfor QueryId: 
2b7606b9-eddf-05db-2b8d-d72c5aa54e95, MajorFragmentId: 0, MinorFragmentId: 0.
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 30051: DRILL-1908: new window function implementation

2015-01-19 Thread abdelhakim deneche

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/
---

Review request for drill.


Bugs: DRILL-1908
https://issues.apache.org/jira/browse/DRILL-1908


Repository: drill-git


Description
---

In order to fix DRILL-1487 a complete rewrite of the 
StreamingWindowFrameRecordBatch was needed. This patch adds a new 
WindowFrameRecordBatch that correctly handles window functions with or without 
order by clauses.
This code still lacks support for frame clauses and may be optimized to reduce 
unneeded frame computations.


Diffs
-

  
common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java 
28424a5 
  common/src/main/java/org/apache/drill/common/logical/data/Window.java 6dba77c 
  exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
5288f5d 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
 17738ee 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
 9b8929f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
 a3e7940 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
 b4e3fed 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
 9588cef 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
 f1a8bc0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
 00c20b2 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
 7c04477 

Diff: https://reviews.apache.org/r/30051/diff/


Testing
---

I initially added 10 unit tests to make sure the results are correct both with 
and without an order by clause, but the tests require test files that are 
relatively large (5MB). Luckily I also wrote a small piece of code to generate 
the test data, and will later update the patch with tests that generate the 
test data files on the fly.


Thanks,

abdelhakim deneche



[jira] [Created] (DRILL-3218) Window function usage throws CompileException

2015-05-29 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-3218:
-

 Summary: Window function usage throws CompileException
 Key: DRILL-3218
 URL: https://issues.apache.org/jira/browse/DRILL-3218
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.0.0
 Environment: faec150598840c40827e6493992d81209aa936da
Reporter: Khurram Faraaz
Assignee: Chris Westin



PARTITION BY date ORDER BY timestamp

{code}
0: jdbc:drill:schema=dfs.tmp> SELECT MAX(columns[0]) OVER (PARTITION BY 
columns[6] ORDER BY columns[4]) FROM `allTypData2.csv`;
Error: SYSTEM ERROR: org.codehaus.commons.compiler.CompileException: Line 330, 
Column 31: Unknown variable or type "incoming"

Fragment 0:0

[Error Id: 285af8f1-ddb4-4d3e-a2d7-bfaef20df5e0 on centos-02.qa.lab:31010] 
(state=,code=0)
{code}

I will add more details in a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-20 Thread abdelhakim deneche

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/
---

(Updated Jan. 20, 2015, 5:34 p.m.)


Review request for drill.


Changes
---

- added several unit tests
- added a new "window-test-data" module in contrib/data to download the test 
data from S3 (I am using my own bucket for now, and will update it to drill 
bucket later). 
- Removed ignored window tests "TestWindowFunctions"


Bugs: DRILL-1908
https://issues.apache.org/jira/browse/DRILL-1908


Repository: drill-git


Description
---

In order to fix DRILL-1487 a complete rewrite of the 
StreamingWindowFrameRecordBatch was needed. This patch adds a new 
WindowFrameRecordBatch that correctly handles window functions with or without 
order by clauses.
This code still lacks support for frame clauses and may be optimized to reduce 
unneeded frame computations.


Diffs (updated)
-

  
common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java 
28424a5 
  common/src/main/java/org/apache/drill/common/logical/data/Window.java 6dba77c 
  contrib/data/pom.xml 86075f2 
  contrib/data/window-test-data/pom.xml PRE-CREATION 
  exec/java-exec/pom.xml 90734a5 
  exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
5288f5d 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
 17738ee 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/GenerateTestData.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
 9b8929f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
 a3e7940 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
 b4e3fed 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
 9588cef 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
 f1a8bc0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
 00c20b2 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
 7c04477 
  
exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java 
6eff6db 

Diff: https://reviews.apache.org/r/30051/diff/


Testing (updated)
---

Unit tests are available to test window functions in mulitple scenarios:
- b1.p1: single batch with a single partition
- b1.p2: 2 batches, each containing a different parition
- b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
- b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd batch 
and has rows in 3 batches
- b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
enough saved batches to call it's framer.doWork() without the need to call 
next(incoming)

All tests, except the last one, come in 2 variations: with and without "order 
by" clause


Thanks,

abdelhakim deneche



Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-20 Thread abdelhakim deneche

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/
---

(Updated Jan. 20, 2015, 5:40 p.m.)


Review request for drill.


Changes
---

added a note about GenerateTestData.java


Bugs: DRILL-1908
https://issues.apache.org/jira/browse/DRILL-1908


Repository: drill-git


Description (updated)
---

In order to fix DRILL-1487 a complete rewrite of the 
StreamingWindowFrameRecordBatch was needed. This patch adds a new 
WindowFrameRecordBatch that correctly handles window functions with or without 
order by clauses.
This code still lacks support for frame clauses and may be optimized to reduce 
unneeded frame computations.

**Note: I forgot to remove GenerateTestData.java from this patch, it was used 
to generate the test data files and it's not needed. I will make sure to remove 
it in the upcoming patch update. You can just ignore it for now.**


Diffs
-

  
common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java 
28424a5 
  common/src/main/java/org/apache/drill/common/logical/data/Window.java 6dba77c 
  contrib/data/pom.xml 86075f2 
  contrib/data/window-test-data/pom.xml PRE-CREATION 
  exec/java-exec/pom.xml 90734a5 
  exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
5288f5d 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
 17738ee 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/GenerateTestData.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
 9b8929f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
 a3e7940 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
 b4e3fed 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
 9588cef 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
 f1a8bc0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
 00c20b2 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
 7c04477 
  
exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java 
6eff6db 

Diff: https://reviews.apache.org/r/30051/diff/


Testing
---

Unit tests are available to test window functions in mulitple scenarios:
- b1.p1: single batch with a single partition
- b1.p2: 2 batches, each containing a different parition
- b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
- b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd batch 
and has rows in 3 batches
- b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
enough saved batches to call it's framer.doWork() without the need to call 
next(incoming)

All tests, except the last one, come in 2 variations: with and without "order 
by" clause


Thanks,

abdelhakim deneche



Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-21 Thread abdelhakim deneche

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/
---

(Updated Jan. 21, 2015, 11:03 p.m.)


Review request for drill.


Changes
---

added 'window.enable' boolean configuration option (default to false), when set 
to false Drill will display "window functions are not supported".
updated TestWindowFrame to set 'window.enable' to true before launching the 
tests


Bugs: DRILL-1908
https://issues.apache.org/jira/browse/DRILL-1908


Repository: drill-git


Description
---

In order to fix DRILL-1487 a complete rewrite of the 
StreamingWindowFrameRecordBatch was needed. This patch adds a new 
WindowFrameRecordBatch that correctly handles window functions with or without 
order by clauses.
This code still lacks support for frame clauses and may be optimized to reduce 
unneeded frame computations.

**Note: I forgot to remove GenerateTestData.java from this patch, it was used 
to generate the test data files and it's not needed. I will make sure to remove 
it in the upcoming patch update. You can just ignore it for now.**


Diffs (updated)
-

  
common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java 
28424a5 
  common/src/main/java/org/apache/drill/common/logical/data/Window.java 6dba77c 
  contrib/data/pom.xml 86075f2 
  contrib/data/window-test-data/pom.xml PRE-CREATION 
  exec/java-exec/pom.xml 90734a5 
  exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 190c13f 
  exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
5288f5d 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
 17738ee 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
 9b8929f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
 a3e7940 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
 b4e3fed 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
 9588cef 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
 f1a8bc0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
 00c20b2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 f20627d 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
 7c04477 
  
exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java 
6eff6db 

Diff: https://reviews.apache.org/r/30051/diff/


Testing
---

Unit tests are available to test window functions in mulitple scenarios:
- b1.p1: single batch with a single partition
- b1.p2: 2 batches, each containing a different parition
- b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
- b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd batch 
and has rows in 3 batches
- b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
enough saved batches to call it's framer.doWork() without the need to call 
next(incoming)

All tests, except the last one, come in 2 variations: with and without "order 
by" clause


Thanks,

abdelhakim deneche



Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-21 Thread abdelhakim deneche

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/
---

(Updated Jan. 21, 2015, 11:05 p.m.)


Review request for drill.


Bugs: DRILL-1908
https://issues.apache.org/jira/browse/DRILL-1908


Repository: drill-git


Description (updated)
---

In order to fix DRILL-1487 a complete rewrite of the 
StreamingWindowFrameRecordBatch was needed. This patch adds a new 
WindowFrameRecordBatch that correctly handles window functions with or without 
order by clauses.
This code still lacks support for frame clauses and may be optimized to reduce 
unneeded frame computations.


Diffs
-

  
common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java 
28424a5 
  common/src/main/java/org/apache/drill/common/logical/data/Window.java 6dba77c 
  contrib/data/pom.xml 86075f2 
  contrib/data/window-test-data/pom.xml PRE-CREATION 
  exec/java-exec/pom.xml 90734a5 
  exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 190c13f 
  exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
5288f5d 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
 17738ee 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
 9b8929f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
 a3e7940 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
 b4e3fed 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
 9588cef 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
 f1a8bc0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
 00c20b2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 f20627d 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
 7c04477 
  
exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java 
6eff6db 

Diff: https://reviews.apache.org/r/30051/diff/


Testing
---

Unit tests are available to test window functions in mulitple scenarios:
- b1.p1: single batch with a single partition
- b1.p2: 2 batches, each containing a different parition
- b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
- b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd batch 
and has rows in 3 batches
- b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
enough saved batches to call it's framer.doWork() without the need to call 
next(incoming)

All tests, except the last one, come in 2 variations: with and without "order 
by" clause


Thanks,

abdelhakim deneche



Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-23 Thread Chris Westin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review69499
---



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java


If I'm understanding this correctly, you're saying that you don't begin to 
process a batch until you've received all of the last partition that starts in 
that batch.

Couldn't this be a problem is the partition is very, very large, possibly 
extending to the end of the data stream?



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java


Instead of copying the values, can't we just hang on to the previous batch 
until the frame no longer needs values from it?



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java


Shouldn't these all be static?



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java


static?


- Chris Westin


On Jan. 21, 2015, 11:05 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 21, 2015, 11:05 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  a3e7940 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  b4e3fed 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
>  f1a8bc0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
>  00c20b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  f20627d 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
>  7c04477 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java
>  6eff6db 
> 
> Diff: https://reviews.apache.org/r/30051/diff/
> 
> 
> Testing
> ---
> 
> Unit tests are available to test window functions in mulitple scenarios:
> - b1.p1: single batch with a single partition
> - b1.p2: 2 batches, each containing a different parition
> - b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
> - b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd 
> batch and has rows in 3 batches
> - b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
> edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
> enough saved batches to call it's framer.doWork() without the

Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-23 Thread abdelhakim deneche

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review69506
---



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java


Yes, especially if the user uses "over()" then we'll have one single 
parition containing all rows.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java


when a batch is processed, it means we are ready to send a container 
downstream. We need to copy those value vectors to the container because 
because they are part of the schema.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java


probably. Honestly, I copied most of the code generation stuff from 
StreamingWindowRecordBatch which was also copied "as it is" from 
StreamingAggBatch, but I guess you are right, these should be static.


- abdelhakim deneche


On Jan. 21, 2015, 11:05 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 21, 2015, 11:05 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  a3e7940 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  b4e3fed 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
>  f1a8bc0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
>  00c20b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  f20627d 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
>  7c04477 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java
>  6eff6db 
> 
> Diff: https://reviews.apache.org/r/30051/diff/
> 
> 
> Testing
> ---
> 
> Unit tests are available to test window functions in mulitple scenarios:
> - b1.p1: single batch with a single partition
> - b1.p2: 2 batches, each containing a different parition
> - b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
> - b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd 
> batch and has rows in 3 batches
> - b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
> edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
> enough saved batches to call it's framer.doWork() without the need to call 
> next(incoming)
> 
> All tests, except the last one, come in 2 variations: with and without "order 
> by" cl

Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-26 Thread Jacques Nadeau


> On Jan. 23, 2015, 11:39 p.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java,
> >  line 273
> > 
> >
> > Shouldn't these all be static?

No.  We should actually fix the casing on the names. We originally had these as 
static but we actually maintain state inside the mappings which means they 
can't be static.  In most of the code they were originally static but then we 
realized the mistake.  We fixed the static part but I don't think we did a good 
job of fixing the case of all the declarations.  He is being with consistent 
with what we have elsewhere but what is elsewhere isn't right stylistically.


> On Jan. 23, 2015, 11:39 p.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java,
> >  line 299
> > 
> >
> > static?

same as above.


- Jacques


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review69499
---


On Jan. 21, 2015, 11:05 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 21, 2015, 11:05 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  a3e7940 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  b4e3fed 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
>  f1a8bc0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
>  00c20b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  f20627d 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
>  7c04477 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java
>  6eff6db 
> 
> Diff: https://reviews.apache.org/r/30051/diff/
> 
> 
> Testing
> ---
> 
> Unit tests are available to test window functions in mulitple scenarios:
> - b1.p1: single batch with a single partition
> - b1.p2: 2 batches, each containing a different parition
> - b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
> - b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd 
> batch and has rows in 3 batches
> - b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
> edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
> enough saved batches to call it's framer.doWork() without the need to call 
> next(incoming)
> 
> All tes

Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-26 Thread Jacques Nadeau


> On Jan. 24, 2015, 12:20 a.m., abdelhakim deneche wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java,
> >  line 273
> > 
> >
> > probably. Honestly, I copied most of the code generation stuff from 
> > StreamingWindowRecordBatch which was also copied "as it is" from 
> > StreamingAggBatch, but I guess you are right, these should be static.

no, see my other comments.


- Jacques


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review69506
---


On Jan. 21, 2015, 11:05 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 21, 2015, 11:05 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  a3e7940 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  b4e3fed 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
>  f1a8bc0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
>  00c20b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  f20627d 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
>  7c04477 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java
>  6eff6db 
> 
> Diff: https://reviews.apache.org/r/30051/diff/
> 
> 
> Testing
> ---
> 
> Unit tests are available to test window functions in mulitple scenarios:
> - b1.p1: single batch with a single partition
> - b1.p2: 2 batches, each containing a different parition
> - b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
> - b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd 
> batch and has rows in 3 batches
> - b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
> edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
> enough saved batches to call it's framer.doWork() without the need to call 
> next(incoming)
> 
> All tests, except the last one, come in 2 variations: with and without "order 
> by" clause
> 
> 
> Thanks,
> 
> abdelhakim deneche
> 
>



Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-26 Thread abdelhakim deneche


> On Jan. 23, 2015, 11:39 p.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java,
> >  line 85
> > 
> >
> > If I'm understanding this correctly, you're saying that you don't begin 
> > to process a batch until you've received all of the last partition that 
> > starts in that batch.
> > 
> > Couldn't this be a problem is the partition is very, very large, 
> > possibly extending to the end of the data stream?

Depending on the window definition we may be able to process and release a 
batch before we receive all rows of it's partitions, but this only works for 
some specific cases. A more general solution should save the batches to disk 
until they are ready to be processed.


- abdelhakim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review69499
---


On Jan. 21, 2015, 11:05 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 21, 2015, 11:05 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  a3e7940 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  b4e3fed 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
>  f1a8bc0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
>  00c20b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  f20627d 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
>  7c04477 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java
>  6eff6db 
> 
> Diff: https://reviews.apache.org/r/30051/diff/
> 
> 
> Testing
> ---
> 
> Unit tests are available to test window functions in mulitple scenarios:
> - b1.p1: single batch with a single partition
> - b1.p2: 2 batches, each containing a different parition
> - b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
> - b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd 
> batch and has rows in 3 batches
> - b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
> edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
> enough saved batches to call it's framer.doWork() without the need to call 
> next(incoming)
> 
> All tests, except the last one, come in 2 variations: with and without "order 
> by" clause
> 
> 
> Thanks,
> 
> abdelhakim deneche
> 
>



Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-26 Thread Jacques Nadeau

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review69617
---



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java


This should only be in sql parsing, not here.  Please check with Aman & 
Sean in how they are disabling SQL functionality and then do the same for this 
(based on option).



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java


You should specifically note that this doesn't cover the situation where we 
have multiple distinct partitions.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java


This logic seems a little backwards from what I was expecting.  Batches are 
physical concept.  Shouldn't these be stated based on p0.  For example, it 
seems like we should be able to process p0 the moment we have received b0.  We 
can't do p1 until we get to b2 and see that p1 has ended, etc.  As you've 
written it sounds like we can't do p0 until we know the end of p1.  I guess 
this might make sense if we have small partitions and you're trying to work a 
batch at a time inside the generated code.  This sacrifices more memory for 
potentially better performance.  Is this the reason you're operating this way?



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java


Given that you're not handling schema change, it would be best to fail 
rather than return incorrect result.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java


We should save the failure state in the fragment context here.  I think the 
method is fail(Exception e) or similar.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java


Let's discuss this.  I think there is a disconnect here.



exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java


Please add a negative test case when using multiple partitions.  In that 
case, the failure should happen in the planning phase, not execution.


- Jacques Nadeau


On Jan. 21, 2015, 11:05 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 21, 2015, 11:05 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  a3e7940 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  b4e3fed 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.ja

Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-26 Thread abdelhakim deneche


> On Jan. 26, 2015, 5:51 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java,
> >  line 72
> > 
> >
> > You should specifically note that this doesn't cover the situation 
> > where we have multiple distinct partitions.

What do you mean by "multiple distinct partitions" ?


> On Jan. 26, 2015, 5:51 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java,
> >  line 87
> > 
> >
> > This logic seems a little backwards from what I was expecting.  Batches 
> > are physical concept.  Shouldn't these be stated based on p0.  For example, 
> > it seems like we should be able to process p0 the moment we have received 
> > b0.  We can't do p1 until we get to b2 and see that p1 has ended, etc.  As 
> > you've written it sounds like we can't do p0 until we know the end of p1.  
> > I guess this might make sense if we have small partitions and you're trying 
> > to work a batch at a time inside the generated code.  This sacrifices more 
> > memory for potentially better performance.  Is this the reason you're 
> > operating this way?

This comes from the first iterations of the window operator: processing one 
"full" batch at a time makes it possible to transfer all incoming vectors into 
the outgoing container. Once I started supporting "order by" and window frames, 
vector transfers could no longer be done in the general case. The code could be 
optimized to detect when transfers are possible.

It is possible to process and return partitions as soon as they end, but the 
remaining rows of the processed batch need to be kept into memory until they 
are ready to be processed.


- abdelhakim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review69617
---


On Jan. 21, 2015, 11:05 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 21, 2015, 11:05 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  a3e7940 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  b4e3fed 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
>  f1a8bc0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
>  00c20b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  f20627d 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/windo

Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-26 Thread abdelhakim deneche


> On Jan. 23, 2015, 11:39 p.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java,
> >  line 196
> > 
> >
> > Instead of copying the values, can't we just hang on to the previous 
> > batch until the frame no longer needs values from it?

when a batch is processed, it means we are ready to send a container 
downstream. We need to copy those value vectors to the container because 
because they are part of the schema.

PS: copying my previous comment here, to make it easier for others to track


> On Jan. 23, 2015, 11:39 p.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java,
> >  line 273
> > 
> >
> > Shouldn't these all be static?
> 
> Jacques Nadeau wrote:
> No.  We should actually fix the casing on the names. We originally had 
> these as static but we actually maintain state inside the mappings which 
> means they can't be static.  In most of the code they were originally static 
> but then we realized the mistake.  We fixed the static part but I don't think 
> we did a good job of fixing the case of all the declarations.  He is being 
> with consistent with what we have elsewhere but what is elsewhere isn't right 
> stylistically.

I did a search for GeneratorMapping in the code and found both CahinedHashTable 
and HashJoinBatch use the following convention:
- GeneratorMapping field is declated as "static final"
- MappingSet field is declared as "final"

Should I do the same ?


- abdelhakim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review69499
---


On Jan. 21, 2015, 11:05 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 21, 2015, 11:05 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  a3e7940 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  b4e3fed 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
>  f1a8bc0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
>  00c20b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  f20627d 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
>  7c04477 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java
>  6eff6db 
> 
> Diff: https://reviews.apache.org/r/30051/diff/
>

Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-28 Thread abdelhakim deneche


> On Jan. 26, 2015, 5:51 p.m., Jacques Nadeau wrote:
> > exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java,
> >  line 38
> > 
> >
> > Please add a negative test case when using multiple partitions.  In 
> > that case, the failure should happen in the planning phase, not execution.

WindowPrel.getPhysicalOperator(PhysicalPlanCreator creator) has the following:
```
checkState(windows.size() == 1, "Only one window is expected in WindowPrel");
```


- abdelhakim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review69617
---


On Jan. 28, 2015, 7:50 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 28, 2015, 7:50 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/OverFinder.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  26d23f2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  e2c7e9e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
>  3b7adca 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
>  f1a8bc0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
>  00c20b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrel.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
>  79603eb 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  f20627d 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
>  a9d2ef8 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java
>  6eff6db 
> 
> Diff: https://reviews.apache.org/r/30051/diff/
> 
> 
> Testing
> ---
> 
> Unit tests are available to test window functions in mulitple scenarios:
> - b1.p1: single batch with a single partition
> - b1.p2: 2 batches, each containing a different parition
> - b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
> - b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd 
> batch and has rows in 3 batches
> - b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
> edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
> enough saved batches to call it's fram

Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-28 Thread abdelhakim deneche

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/
---

(Updated Jan. 28, 2015, 7:50 p.m.)


Review request for drill.


Changes
---

- updated GeneratorMapping and MappingSet variables to follow same convention 
used in HashJoinBatch and ChainedHashTable
- check and disable window function in sql parsing
- fail when schema changes
- use fail(Exception e) to report errors
- rename StreamingWindowPrel/StreamingWindowPrule to WindowPrel/WindowPrule


Bugs: DRILL-1908
https://issues.apache.org/jira/browse/DRILL-1908


Repository: drill-git


Description
---

In order to fix DRILL-1487 a complete rewrite of the 
StreamingWindowFrameRecordBatch was needed. This patch adds a new 
WindowFrameRecordBatch that correctly handles window functions with or without 
order by clauses.
This code still lacks support for frame clauses and may be optimized to reduce 
unneeded frame computations.


Diffs (updated)
-

  
common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java 
28424a5 
  common/src/main/java/org/apache/drill/common/logical/data/Window.java 6dba77c 
  contrib/data/pom.xml 86075f2 
  contrib/data/window-test-data/pom.xml PRE-CREATION 
  exec/java-exec/pom.xml 90734a5 
  exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 190c13f 
  exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
5288f5d 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
 17738ee 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/OverFinder.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
 9b8929f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
 26d23f2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
 e2c7e9e 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
 9588cef 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
 3b7adca 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
 f1a8bc0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
 00c20b2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrel.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
 79603eb 
  
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 f20627d 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
 a9d2ef8 
  
exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java 
6eff6db 

Diff: https://reviews.apache.org/r/30051/diff/


Testing
---

Unit tests are available to test window functions in mulitple scenarios:
- b1.p1: single batch with a single partition
- b1.p2: 2 batches, each containing a different parition
- b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
- b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd batch 
and has rows in 3 batches
- b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
enough saved batches to call it's framer.doWork() without the need to call 
next(incoming)

All tests, except the last one, come in 2 variations: with and without "order 
by" clause


Thanks,

abdelhakim deneche



Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-29 Thread Chris Westin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review70296
---



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/GenerateTestData.java


Make the file prefix a command-line argument so that people besides 
yourself can run this.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/GenerateTestData.java


add
final Partition partition = partitions[p];
and then use partition throughout the loop body. This avoids bounds 
checking and dereferencing on every use.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java


The logger should be private. Yes, I know a lot of them aren't, and that's 
a bug.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java


logger should be private.



exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java


Can container and batches be final?


- Chris Westin


On Jan. 28, 2015, 7:50 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 28, 2015, 7:50 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/OverFinder.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  26d23f2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  e2c7e9e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
>  3b7adca 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
>  f1a8bc0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
>  00c20b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrel.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
>  79603eb 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  f20627d 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
>  a9d2ef8 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java
>  6eff6db 
> 
> Diff: https://reviews.apache.org

Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-30 Thread abdelhakim deneche


> On Jan. 30, 2015, 1:12 a.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java,
> >  line 34
> > 
> >
> > Can container and batches be final?
> 
> abdelhakim deneche wrote:
> yes indeed.

marking batches and container as final means I need to pass their values to the 
constructor. WindowFrameTemplate is used as a basis for the code generation of 
the framer and I couldn't find how to pass arguments to the constructor of a 
generated class instance.


- abdelhakim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review70296
---


On Jan. 28, 2015, 7:50 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 28, 2015, 7:50 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/OverFinder.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  26d23f2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  e2c7e9e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
>  3b7adca 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
>  f1a8bc0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
>  00c20b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrel.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
>  79603eb 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  f20627d 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
>  a9d2ef8 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java
>  6eff6db 
> 
> Diff: https://reviews.apache.org/r/30051/diff/
> 
> 
> Testing
> ---
> 
> Unit tests are available to test window functions in mulitple scenarios:
> - b1.p1: single batch with a single partition
> - b1.p2: 2 batches, each containing a different parition
> - b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
> - b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd 
> batch and has rows in 3 batches
> - b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
> edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
> eno

Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-30 Thread Chris Westin


> On Jan. 24, 2015, 12:20 a.m., abdelhakim deneche wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java,
> >  line 85
> > 
> >
> > Yes, especially if the user uses "over()" then we'll have one single 
> > parition containing all rows.

Then it seems like we should have some way of limiting this and aborting, and 
submit a ticket to possibly find another way to store the required data (spool 
to disk, like sort?) if this happens in the future.


- Chris


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review69506
---


On Jan. 28, 2015, 7:50 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 28, 2015, 7:50 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/OverFinder.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  26d23f2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  e2c7e9e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
>  3b7adca 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
>  f1a8bc0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
>  00c20b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrel.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
>  79603eb 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  f20627d 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
>  a9d2ef8 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java
>  6eff6db 
> 
> Diff: https://reviews.apache.org/r/30051/diff/
> 
> 
> Testing
> ---
> 
> Unit tests are available to test window functions in mulitple scenarios:
> - b1.p1: single batch with a single partition
> - b1.p2: 2 batches, each containing a different parition
> - b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
> - b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd 
> batch and has rows in 3 batches
> - b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
> edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
> enough saved batches to call it's

Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-30 Thread abdelhakim deneche


> On Jan. 30, 2015, 1:12 a.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/GenerateTestData.java,
> >  line 121
> > 
> >
> > Make the file prefix a command-line argument so that people besides 
> > yourself can run this.

Argh, I forgot to remove GenerateTestData again!
I just used this to generate the data used in the tests, it was never intended 
to be part of the final code.
Sorry about that


> On Jan. 30, 2015, 1:12 a.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/GenerateTestData.java,
> >  line 172
> > 
> >
> > add
> > final Partition partition = partitions[p];
> > and then use partition throughout the loop body. This avoids bounds 
> > checking and dereferencing on every use.

same as above


> On Jan. 30, 2015, 1:12 a.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java,
> >  line 55
> > 
> >
> > The logger should be private. Yes, I know a lot of them aren't, and 
> > that's a bug.

Good catch. Will fix it asap


> On Jan. 30, 2015, 1:12 a.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java,
> >  line 34
> > 
> >
> > Can container and batches be final?

yes indeed.


- abdelhakim


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review70296
---


On Jan. 28, 2015, 7:50 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 28, 2015, 7:50 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/OverFinder.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  26d23f2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  e2c7e9e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
>  3b7adca 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
>  f1a8bc0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
>  00c20b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrel.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java
>  PRE-CRE

Re: Review Request 30051: DRILL-1908: new window function implementation

2015-01-30 Thread Jacques Nadeau


> On Jan. 30, 2015, 1:12 a.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java,
> >  line 32
> > 
> >
> > logger should be private.

I disagree.  Our current standard is package private.  If you think we should 
change this throughout the code, we should have a discussion but my preference 
is to maintain the current standard until we decide upon a new one.


- Jacques


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review70296
---


On Jan. 28, 2015, 7:50 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 28, 2015, 7:50 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/OverFinder.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  26d23f2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  e2c7e9e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
>  3b7adca 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
>  f1a8bc0 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
>  00c20b2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrel.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
>  79603eb 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
>  f20627d 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
>  a9d2ef8 
>   
> exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java
>  6eff6db 
> 
> Diff: https://reviews.apache.org/r/30051/diff/
> 
> 
> Testing
> ---
> 
> Unit tests are available to test window functions in mulitple scenarios:
> - b1.p1: single batch with a single partition
> - b1.p2: 2 batches, each containing a different parition
> - b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
> - b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd 
> batch and has rows in 3 batches
> - b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
> edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
> enough saved batches to call it's framer.doWork() without the need to call 
> next(incoming)
> 
> All t

Re: Review Request 30051: DRILL-1908: new window function implementation

2015-02-02 Thread Chris Westin


> On Jan. 30, 2015, 1:12 a.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/GenerateTestData.java,
> >  line 121
> > 
> >
> > Make the file prefix a command-line argument so that people besides 
> > yourself can run this.
> 
> abdelhakim deneche wrote:
> Argh, I forgot to remove GenerateTestData again!
> I just used this to generate the data used in the tests, it was never 
> intended to be part of the final code.
> Sorry about that

I would check it in. We might need it again in the future. What if something 
changes and we have to re-generate the test data?


> On Jan. 30, 2015, 1:12 a.m., Chris Westin wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java,
> >  line 32
> > 
> >
> > logger should be private.
> 
> Jacques Nadeau wrote:
> I disagree.  Our current standard is package private.  If you think we 
> should change this throughout the code, we should have a discussion but my 
> preference is to maintain the current standard until we decide upon a new one.

Loggers identify their source in log messages thanks to the class argument 
given to getLogger(). They're meant to be associated with a class in a 
one-to-one manner -- why else would getLogger() have this parameter?

There are no uses of the pattern "otherclass.logger...", so there's no reason 
for them to be package private. However, I have come across a few uses where a 
derived class uses the logger from its base class, and this is confusing. This 
has sent me looking in the wrong place for the source of the message, so we 
shouldn't do it. I've assumed it was accidental, and slipped by because the 
base class's logger wasn't private, so the author was able to use it without 
realizing it. Making them private will prevent that, and ensure that log 
messages correctly identify their real source.

Because we have not written standard that described this, and because it goes 
against common best practice elsewhere, I've been converting these to private 
wherever I've come across them. In only a couple of cases has this made me add 
new loggers where a derived class was accidentally using its base class's 
logger.


- Chris


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/#review70296
---


On Jan. 28, 2015, 7:50 p.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30051/
> ---
> 
> (Updated Jan. 28, 2015, 7:50 p.m.)
> 
> 
> Review request for drill.
> 
> 
> Bugs: DRILL-1908
> https://issues.apache.org/jira/browse/DRILL-1908
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> In order to fix DRILL-1487 a complete rewrite of the 
> StreamingWindowFrameRecordBatch was needed. This patch adds a new 
> WindowFrameRecordBatch that correctly handles window functions with or 
> without order by clauses.
> This code still lacks support for frame clauses and may be optimized to 
> reduce unneeded frame computations.
> 
> 
> Diffs
> -
> 
>   
> common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java
>  28424a5 
>   common/src/main/java/org/apache/drill/common/logical/data/Window.java 
> 6dba77c 
>   contrib/data/pom.xml 86075f2 
>   contrib/data/window-test-data/pom.xml PRE-CREATION 
>   exec/java-exec/pom.xml 90734a5 
>   exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 
> 190c13f 
>   exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
> 5288f5d 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
>  17738ee 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/OverFinder.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
>  9b8929f 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
>  26d23f2 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
>  e2c7e9e 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
>  9588cef 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
>  PRE-CREATION 
>   
> exec/java-exec/src/main/java/org/apache/drill/e

Re: Review Request 30051: DRILL-1908: new window function implementation

2015-02-12 Thread abdelhakim deneche

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/
---

(Updated Feb. 12, 2015, 6:20 p.m.)


Review request for drill and Jacques Nadeau.


Changes
---

rebased the patch.


Bugs: DRILL-1908
https://issues.apache.org/jira/browse/DRILL-1908


Repository: drill-git


Description
---

In order to fix DRILL-1487 a complete rewrite of the 
StreamingWindowFrameRecordBatch was needed. This patch adds a new 
WindowFrameRecordBatch that correctly handles window functions with or without 
order by clauses.
This code still lacks support for frame clauses and may be optimized to reduce 
unneeded frame computations.


Diffs (updated)
-

  
common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java 
28424a5 
  common/src/main/java/org/apache/drill/common/logical/data/Window.java 6dba77c 
  contrib/data/pom.xml 86075f2 
  contrib/data/window-test-data/pom.xml PRE-CREATION 
  exec/java-exec/pom.xml 06f60fb 
  exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 5efcce8 
  exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
5288f5d 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
 17738ee 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/OverFinder.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
 9b8929f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
 26d23f2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
 e2c7e9e 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
 9588cef 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
 3b7adca 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
 f1a8bc0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
 00c20b2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrel.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
 6b3d301 
  
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 aa0a5ad 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
 a9d2ef8 
  
exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java 
6eff6db 

Diff: https://reviews.apache.org/r/30051/diff/


Testing (updated)
---

Unit tests are available to test window functions in mulitple scenarios:
- b1.p1: single batch with a single partition
- b1.p2: 2 batches, each containing a different parition
- b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
- b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd batch 
and has rows in 3 batches
- b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
enough saved batches to call it's framer.doWork() without the need to call 
next(incoming)

All tests, except the last one, come in 2 variations: with and without "order 
by" clause

all unit tests pass. functional, sf100 and customer tests don't add any new 
failures


Thanks,

abdelhakim deneche



Re: Review Request 30051: DRILL-1908: new window function implementation

2015-02-13 Thread abdelhakim deneche

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/
---

(Updated Feb. 13, 2015, 11:09 p.m.)


Review request for drill and Jacques Nadeau.


Changes
---

StreamingWindowRecordBatch will stop the query if it can't allocate value 
vectors


Bugs: DRILL-1908
https://issues.apache.org/jira/browse/DRILL-1908


Repository: drill-git


Description
---

In order to fix DRILL-1487 a complete rewrite of the 
StreamingWindowFrameRecordBatch was needed. This patch adds a new 
WindowFrameRecordBatch that correctly handles window functions with or without 
order by clauses.
This code still lacks support for frame clauses and may be optimized to reduce 
unneeded frame computations.


Diffs (updated)
-

  
common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java 
28424a5 
  common/src/main/java/org/apache/drill/common/logical/data/Window.java 6dba77c 
  contrib/data/pom.xml 86075f2 
  contrib/data/window-test-data/pom.xml PRE-CREATION 
  exec/java-exec/pom.xml 06f60fb 
  exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 5efcce8 
  exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
5288f5d 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
 17738ee 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/OverFinder.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
 9b8929f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
 26d23f2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
 e2c7e9e 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
 9588cef 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
 3b7adca 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
 f1a8bc0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
 00c20b2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrel.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
 6b3d301 
  
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 aa0a5ad 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
 a9d2ef8 
  
exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java 
6eff6db 

Diff: https://reviews.apache.org/r/30051/diff/


Testing
---

Unit tests are available to test window functions in mulitple scenarios:
- b1.p1: single batch with a single partition
- b1.p2: 2 batches, each containing a different parition
- b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
- b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd batch 
and has rows in 3 batches
- b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
enough saved batches to call it's framer.doWork() without the need to call 
next(incoming)

All tests, except the last one, come in 2 variations: with and without "order 
by" clause

all unit tests pass. functional, sf100 and customer tests don't add any new 
failures


Thanks,

abdelhakim deneche



Re: Review Request 30051: DRILL-1908: new window function implementation

2015-03-03 Thread abdelhakim deneche

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30051/
---

(Updated March 3, 2015, 10 p.m.)


Review request for drill and Jacques Nadeau.


Changes
---

patch rebased to master


Bugs: DRILL-1908
https://issues.apache.org/jira/browse/DRILL-1908


Repository: drill-git


Description
---

In order to fix DRILL-1487 a complete rewrite of the 
StreamingWindowFrameRecordBatch was needed. This patch adds a new 
WindowFrameRecordBatch that correctly handles window functions with or without 
order by clauses.
This code still lacks support for frame clauses and may be optimized to reduce 
unneeded frame computations.


Diffs (updated)
-

  
common/src/main/java/org/apache/drill/common/logical/data/AbstractBuilder.java 
28424a5 
  common/src/main/java/org/apache/drill/common/logical/data/Window.java 6dba77c 
  contrib/data/pom.xml 86075f2 
  contrib/data/window-test-data/pom.xml PRE-CREATION 
  exec/java-exec/pom.xml 28eca2b 
  exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java 5efcce8 
  exec/java-exec/src/main/java/org/apache/drill/exec/opt/BasicOptimizer.java 
5288f5d 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/WindowPOP.java
 17738ee 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/OverFinder.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameBatchCreator.java
 9b8929f 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameRecordBatch.java
 87209eb 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFrameTemplate.java
 e2c7e9e 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/StreamingWindowFramer.java
 9588cef 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameBatchCreator.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameTemplate.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFramer.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillRuleSets.java
 3b7adca 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrel.java
 f1a8bc0 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/StreamingWindowPrule.java
 00c20b2 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrel.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java
 PRE-CREATION 
  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/handlers/DefaultSqlHandler.java
 232778a 
  
exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
 3d3e96f 
  
exec/java-exec/src/test/java/org/apache/drill/exec/physical/impl/window/TestWindowFrame.java
 a9d2ef8 
  
exec/java-exec/src/test/java/org/apache/drill/exec/sql/TestWindowFunctions.java 
6eff6db 

Diff: https://reviews.apache.org/r/30051/diff/


Testing
---

Unit tests are available to test window functions in mulitple scenarios:
- b1.p1: single batch with a single partition
- b1.p2: 2 batches, each containing a different parition
- b2.p4: 2 batches and 4 partitions, one partition has rows in both batches
- b3.p2: 3 batches and 2 partitions, one partition includes the whole 2nd batch 
and has rows in 3 batches
- b4.p4: 4 batches and 4 partitions, the partitions are arranged to test an 
edge case: the 2nd time innerNext() is called, WindowFrameRecordBatch has 
enough saved batches to call it's framer.doWork() without the need to call 
next(incoming)

All tests, except the last one, come in 2 variations: with and without "order 
by" clause

all unit tests pass. functional, sf100 and customer tests don't add any new 
failures


Thanks,

abdelhakim deneche



[jira] [Created] (DRILL-3619) Add support for NTILE window function

2015-08-10 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-3619:
---

 Summary: Add support for NTILE window function
 Key: DRILL-3619
 URL: https://issues.apache.org/jira/browse/DRILL-3619
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Deneche A. Hakim
Assignee: Deneche A. Hakim






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3619) Add support for NTILE window function

2015-08-18 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim resolved DRILL-3619.
-
Resolution: Fixed

Fixed in b55e2328d929df5d361c038f63fdeffadb0e544c

> Add support for NTILE window function
> -
>
> Key: DRILL-3619
> URL: https://issues.apache.org/jira/browse/DRILL-3619
> Project: Apache Drill
>  Issue Type: Sub-task
>  Components: Execution - Relational Operators
>Reporter: Deneche A. Hakim
>Assignee: Deneche A. Hakim
>  Labels: window_function
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3680) window function query returns Incorrect results

2015-08-20 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-3680:
-

 Summary: window function query returns Incorrect results 
 Key: DRILL-3680
 URL: https://issues.apache.org/jira/browse/DRILL-3680
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.2.0
 Environment: private-branch 
https://github.com/adeneche/incubator-drill/tree/new-window-funcs
Reporter: Khurram Faraaz
Assignee: Chris Westin
Priority: Critical



Query plan from Drill for the query that returns wrong results
{code}
0: jdbc:drill:schema=dfs.tmp> explain plan for select c1 , c2 , lead(c2) OVER ( 
PARTITION BY c2 ORDER BY c1) lead_c2 FROM (SELECT c1 , c2, ntile(3) 
over(PARTITION BY c2 ORDER BY c1) FROM `tblWnulls.parquet`);
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(c1=[$0], c2=[$1], lead_c2=[$2])
00-02Project(c1=[$0], c2=[$1], lead_c2=[$2])
00-03  Project(c1=[$0], c2=[$1], $2=[$3])
00-04Window(window#0=[window(partition {1} order by [0] range 
between UNBOUNDED PRECEDING and CURRENT ROW aggs [LEAD($1)])])
00-05  Window(window#0=[window(partition {1} order by [0] range 
between UNBOUNDED PRECEDING and CURRENT ROW aggs [NTILE($2)])])
00-06SelectionVectorRemover
00-07  Sort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC])
00-08Project(c1=[$1], c2=[$0])
00-09  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tmp/tblWnulls.parquet]], 
selectionRoot=maprfs:/tmp/tblWnulls.parquet, numFiles=1, columns=[`c1`, `c2`]]])
{code}

Results returned by Drill.
{code}
0: jdbc:drill:schema=dfs.tmp> select c1 , c2 , lead(c2) OVER ( PARTITION BY c2 
ORDER BY c1) lead_c2 FROM (SELECT c1 , c2, ntile(3) over(PARTITION BY c2 ORDER 
BY c1) FROM `tblWnulls.parquet`);
+-+---+--+
| c1  |  c2   | lead_c2  |
+-+---+--+
| 0   | a | null |
| 1   | a | null |
| 5   | a | null |
| 10  | a | null |
| 11  | a | null |
| 14  | a | null |
| 1   | a | null |
| 2   | b | null |
| 9   | b | null |
| 13  | b | null |
| 17  | b | null |
| 4   | c | null |
| 6   | c | null |
| 8   | c | null |
| 12  | c | null |
| 13  | c | null |
| 13  | c | null |
| null| c | null |
| 10  | d | null |
| 11  | d | null |
| 2147483647  | d | null |
| 2147483647  | d | null |
| null| d | null |
| null| d | null |
| -1  | e | null |
| 15  | e | null |
| 19  | null  | null |
| 65536   | null  | null |
| 100 | null  | null |
| null| null  | null |
+-+---+--+
30 rows selected (0.339 seconds)
{code}

Results returned by Postgres

{code}
postgres=# select c1 , c2 , lead(c2) OVER ( PARTITION BY c2 ORDER BY c1) 
lead_c2 FROM (SELECT c1 , c2, ntile(3) over(PARTITION BY c2 ORDER BY c1) FROM 
t222) sub_query;
 c1 | c2 | lead_c2 
++-
  0 | a  | a
  1 | a  | a
  5 | a  | a
 10 | a  | a
 11 | a  | a
 14 | a  | a
  1 | a  | 
  2 | b  | b
  9 | b  | b
 13 | b  | b
 17 | b  | 
  4 | c  | c
  6 | c  | c
  8 | c  | c
 12 | c  | c
 13 | c  | c
 13 | c  | c
| c  | 
 10 | d  | d
 11 | d  | d
 2147483647 | d  | d
 2147483647 | d  | d
| d  | d
| d  | 
 -1 | e  | e
 15 | e  | 
 19 || 
  65536 || 
100 || 
|| 
(30 rows)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4444) Window function query results in IllegalStateException

2016-02-26 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-:
-

 Summary: Window function query results in IllegalStateException
 Key: DRILL-
 URL: https://issues.apache.org/jira/browse/DRILL-
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.6.0
Reporter: Khurram Faraaz


Window function query results in IllegalStateException
Drill 1.6.0 commit ID: 6d5f4983

{noformat}
0: jdbc:drill:schema=dfs.tmp> SELECT
. . . . . . . . . . . . . . > RANK() OVER (ORDER BY c1 DESC),
. . . . . . . . . . . . . . > AVG(c3) OVER (PARTITION BY c8 ORDER BY 
MIN(c3) DESC NULLS FIRST RANGE BETWEEN CURRENT ROW AND CURRENT ROW)
. . . . . . . . . . . . . . > FROM dfs.tmp.`t_alltype`
. . . . . . . . . . . . . . > WINDOW w AS (PARTITION BY c8 ORDER BY c2 DESC 
NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
Error: SYSTEM ERROR: IllegalStateException: This generator does not support 
mappings beyond

Fragment 0:0

[Error Id: a273f3c1-47a7-450b-b9d7-65c2608089d5 on centos-03.qa.lab:31010] 
(state=,code=0)
{noformat}

Stack trace from drillbit.log

{noformat}
2016-02-26 11:25:32,925 [292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
292fc9d3-28f5-6eb0-ec6a-99d5b90ec968: SELECT
RANK() OVER (ORDER BY c1 DESC),
AVG(c3) OVER (PARTITION BY c8 ORDER BY MIN(c3) DESC NULLS FIRST RANGE 
BETWEEN CURRENT ROW AND CURRENT ROW)
FROM dfs.tmp.`t_alltype`
WINDOW w AS (PARTITION BY c8 ORDER BY c2 DESC NULLS FIRST RANGE BETWEEN 
UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
2016-02-26 11:25:33,056 [292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:foreman] INFO  
o.a.d.exec.store.parquet.Metadata - Took 1 ms to get file statuses
2016-02-26 11:25:33,059 [292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:foreman] INFO  
o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 1 
using 1 threads. Time: 2ms total, 2.395101ms avg, 2ms max.
2016-02-26 11:25:33,059 [292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:foreman] INFO  
o.a.d.exec.store.parquet.Metadata - Fetch parquet metadata: Executed 1 out of 1 
using 1 threads. Earliest start: 1.325000 μs, Latest start: 1.325000 μs, 
Average start: 1.325000 μs .
2016-02-26 11:25:33,059 [292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:foreman] INFO  
o.a.d.exec.store.parquet.Metadata - Took 2 ms to read file metadata
2016-02-26 11:25:33,130 [292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:0:0: 
State change requested AWAITING_ALLOCATION --> RUNNING
2016-02-26 11:25:33,130 [292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:frag:0:0] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:0:0: 
State to report: RUNNING
2016-02-26 11:25:33,151 [292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:0:0: 
State change requested RUNNING --> FAILED
2016-02-26 11:25:33,151 [292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:0:0: 
State change requested FAILED --> FINISHED
2016-02-26 11:25:33,152 [292fc9d3-28f5-6eb0-ec6a-99d5b90ec968:frag:0:0] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: IllegalStateException: This 
generator does not support mappings beyond

Fragment 0:0

[Error Id: a273f3c1-47a7-450b-b9d7-65c2608089d5 on centos-03.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
IllegalStateException: This generator does not support mappings beyond

Fragment 0:0

[Error Id: a273f3c1-47a7-450b-b9d7-65c2608089d5 on centos-03.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
 ~[drill-common-1.6.0-SNAPSHOT.jar:1.6.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:318)
 [drill-java-exec-1.6.0-SNAPSHOT.jar:1.6.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:185)
 [drill-java-exec-1.6.0-SNAPSHOT.jar:1.6.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:287)
 [drill-java-exec-1.6.0-SNAPSHOT.jar:1.6.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.6.0-SNAPSHOT.jar:1.6.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_65]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_65]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
Caused by: java.lang.IllegalStateException: This generator does not support 
mappings beyond
at 
org.apache.drill.exec.compile.s

[jira] [Created] (DRILL-4847) Window function query results in OOM Exception.

2016-08-16 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4847:
-

 Summary: Window function query results in OOM Exception.
 Key: DRILL-4847
 URL: https://issues.apache.org/jira/browse/DRILL-4847
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.8.0
 Environment: 4 node cluster CentOS
Reporter: Khurram Faraaz
Priority: Critical


Window function query results in OOM Exception.

Drill version 1.8.0-SNAPSHOT git commit ID: 38ce31ca
MapRBuildVersion 5.1.0.37549.GA

{noformat}
0: jdbc:drill:schema=dfs.tmp> SELECT clientname, audiencekey, spendprofileid, 
postalcd, provincecd, provincename, postalcode_json, country_json, 
province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER (PARTITION BY 
spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 ELSE 0 END) ASC, 
provincecd ASC) as rn FROM `MD593.parquet` limit 3;
Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the 
query.

Failure while allocating buffer.
Fragment 0:0

[Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

Stack trace from drillbit.log

{noformat}
2016-08-16 07:25:44,590 [284d4006-9f9d-b893-9352-4f54f9b1d52a:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
284d4006-9f9d-b893-9352-4f54f9b1d52a: SELECT clientname, audiencekey, 
spendprofileid, postalcd, provincecd, provincename, postalcode_json, 
country_json, province_json, town_json, dma_json, msa_json, ROW_NUMBER() OVER 
(PARTITION BY spendprofileid  ORDER BY (CASE WHEN postalcd IS NULL THEN 9 ELSE 
0 END) ASC, provincecd ASC) as rn FROM `MD593.parquet` limit 3
...
2016-08-16 07:25:46,273 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
o.a.d.e.p.i.xsort.ExternalSortBatch - Completed spilling to 
/tmp/drill/spill/284d4006-9f9d-b893-9352-4f54f9b1d52a_majorfragment0_minorfragment0_operator8/2
2016-08-16 07:25:46,283 [284d4006-9f9d-b893-9352-4f54f9b1d52a:frag:0:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - User Error Occurred
org.apache.drill.common.exceptions.UserException: RESOURCE ERROR: One or more 
nodes ran out of memory while executing the query.

Failure while allocating buffer.

[Error Id: 2287fe71-f0cb-469a-a563-11580fceb1c5 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
 ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:242)
 [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_101]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_101]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Failure while 
allocating buffer.
at 
org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:187)
 ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:331)
 ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.complex.RepeatedMapVector$RepeatedMapTransferPair.(RepeatedMapVector.java:307)
 ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.vector.complex.RepeatedMapVector.getTransferPair(RepeatedMapVector.java:161)
 ~[vector-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.record.SimpleVectorWrapper.cloneAndTransfer(SimpleVectorWrapper.java:66)
 ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.record.VectorContainer.cloneAndTransfer(VectorContainer.java:204)
 ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.record.VectorContainer.getTransferClone(VectorContainer.java:157)
 ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:569)
 ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.innerNext(ExternalSortBatch.java:414)
 ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:162)
 ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.n

[jira] [Created] (DRILL-1910) ROW_NUMBER() window function throws exception during execution

2014-12-30 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-1910:
---

 Summary: ROW_NUMBER() window function throws exception during 
execution
 Key: DRILL-1910
 URL: https://issues.apache.org/jira/browse/DRILL-1910
 Project: Apache Drill
  Issue Type: New Feature
  Components: Functions - Drill
Affects Versions: 0.8.0
Reporter: Victoria Markman


Window function ROW_NUMBER() is currently:
* Supported in Calcite
* Not supported in DRILL
* Not documented in DRILL documentation

It throws exception during execution:

{code}
0: jdbc:drill:schema=dfs> select a1, row_number() over (partition by a1) from 
`t.json`;
Query failed: Query failed: Unexpected exception during fragment 
initialization: null

Error: exception while executing query: Failure while executing query. 
(state=,code=0)
0: jdbc:drill:schema=dfs> select a1, row_number() over () from `t.json`;
Query failed: Query failed: Unexpected exception during fragment 
initialization: null

Error: exception while executing query: Failure while executing query. 
(state=,code=0)
{code}


According to Jira, DRILL-1487 is targeted for 0.8.0 release.
DRILL-1902 is marked as critical bug as well.

We have two options:
1) Make this function work for 0.8.0
2) Throw unsupported error message in 0.8.0 and implement row_number() 
later

If we decide to go with option #1, can we please file a separate bug for error 
message 
in 0.8.0 and turn  this one into an enhancement request.

{code}
drill 0.8.0
git.commit.id.abbrev=5f70ba1 
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Window function query takes too long to complete and return results

2015-06-09 Thread Khurram Faraaz
Query that uses window functions takes too long to complete and return
results. It returns close to a million records, for which it took 533.8
seconds ~8 minutes
Input CSV file has two columns, one integer and another varchar type
column. Please let me know if this needs to be investigated and I can
report a JIRA to track this if required ?

Size of the input CSV file

root@centos-01 ~]# hadoop fs -ls /tmp/manyDuplicates.csv

-rwxr-xr-x   3 root root   27889455 2015-06-10 01:26 /tmp/manyDuplicates.csv

{code}

select count(*) over(partition by cast(columns[1] as varchar(25)) order by
cast(columns[0] as bigint)) from `manyDuplicates.csv`;

...

1,000,007 rows selected (533.857 seconds)
{code}

There are five distinct values in columns[1] in the CSV file. = [FIVE
PARTITIONS]

{code}

0: jdbc:drill:schema=dfs.tmp> select distinct columns[1] from
`manyDuplicates.csv`;

*+---+*

*| **   EXPR$0** |*

*+---+*

*| * * |*

*| * * |*

*| * * |*

*| * * |*

*| * * |*

*+---+*

5 rows selected (1.906 seconds)
{code}

Here is the count for each of those values in columns[1]

{code}

0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
`manyDuplicates.csv` where columns[1] = '';

*+-+*

*| **EXPR$0 ** |*

*+-+*

*| *200484 * |*

*+-+*

1 row selected (0.961 seconds)

{code}


{code}

0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
`manyDuplicates.csv` where columns[1] = '';

*+-+*

*| **EXPR$0 ** |*

*+-+*

*| *199353 * |*

*+-+*

1 row selected (0.86 seconds)

{code}


{code}

0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
`manyDuplicates.csv` where columns[1] = '';

*+-+*

*| **EXPR$0 ** |*

*+-+*

*| *200702 * |*

*+-+*

1 row selected (0.826 seconds)

{code}


{code}

0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
`manyDuplicates.csv` where columns[1] = '';

*+-+*

*| **EXPR$0 ** |*

*+-+*

*| *199916 * |*

*+-+*

1 row selected (0.851 seconds)

{code}


{code}

0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
`manyDuplicates.csv` where columns[1] = '';

*+-+*

*| **EXPR$0 ** |*

*+-+*

*| *199552 * |*

*+-+*

1 row selected (0.827 seconds)
{code}

Thanks,
Khurram


[jira] [Created] (DRILL-3293) CTAS with window function fails with NPE

2015-06-15 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-3293:
---

 Summary: CTAS with window function fails with NPE
 Key: DRILL-3293
 URL: https://issues.apache.org/jira/browse/DRILL-3293
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Chris Westin


{code}
0: jdbc:drill:schema=dfs> create table wf_t1 as select sum(a1) over(partition 
by a1) from t1;
Error: SYSTEM ERROR:

Fragment 0:0

[Error Id: 96897b46-70c0-4373-9d85-ca7501cb1479 on atsqa4-133.qa.lab:31010] 
(state=,code=0)
{code}

drillbit.log
{code}
[Error Id: bde0d90b-7eaa-4772-9316-9c58a46b01d2 on atsqa4-133.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:

Fragment 0:0

[Error Id: bde0d90b-7eaa-4772-9316-9c58a46b01d2 on atsqa4-133.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:522)
 ~[drill-common-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:325)
 [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:181)
 [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:294)
 [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_71]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_71]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: java.lang.UnsupportedOperationException: null
at 
org.apache.drill.exec.expr.TypeHelper.getValueVectorClass(TypeHelper.java:674) 
~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.VectorContainer.addOrGet(VectorContainer.java:82) 
~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:421)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.WriterRecordBatch.innerNext(WriterRecordBatch.java:92)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95)
 ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
 ~[drill-java-exec-1.

[jira] [Resolved] (DRILL-3293) CTAS with window function fails with UnsupportedOperationException

2015-06-22 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-3293.
---
Resolution: Fixed

fixed in commit: ffae1691c0cd526ed1095fbabbc0855d016790d7

> CTAS with window function fails with UnsupportedOperationException
> --
>
> Key: DRILL-3293
> URL: https://issues.apache.org/jira/browse/DRILL-3293
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.0.0
>Reporter: Victoria Markman
>Assignee: Jinfeng Ni
>  Labels: window_function
> Fix For: 1.1.0
>
> Attachments: t1_parquet
>
>
> {code}
> 0: jdbc:drill:schema=dfs> create table wf_t1 as select sum(a1) over(partition 
> by a1) from t1;
> Error: SYSTEM ERROR:
> Fragment 0:0
> [Error Id: 96897b46-70c0-4373-9d85-ca7501cb1479 on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> drillbit.log
> {code}
> [Error Id: bde0d90b-7eaa-4772-9316-9c58a46b01d2 on atsqa4-133.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR:
> Fragment 0:0
> [Error Id: bde0d90b-7eaa-4772-9316-9c58a46b01d2 on atsqa4-133.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:522)
>  ~[drill-common-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:325)
>  [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:181)
>  [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:294)
>  [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>  [drill-common-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  [na:1.7.0_71]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  [na:1.7.0_71]
> at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.UnsupportedOperationException: null
> at 
> org.apache.drill.exec.expr.TypeHelper.getValueVectorClass(TypeHelper.java:674)
>  ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.VectorContainer.addOrGet(VectorContainer.java:82)
>  ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema(ProjectRecordBatch.java:421)
>  ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:78)
>  ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146)
>  ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
>  ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95)
>  ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext(AbstractSingleRecordBatch.java:51)
>  ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext(ProjectRecordBatch.java:129)
>  ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:146)
>  ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
>  ~[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
> at 
> org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:95)
>  ~[drill-java-exec-1.1

[jira] [Created] (DRILL-7404) window function RANGE with compound ORDER BY

2019-10-14 Thread benj (Jira)
benj created DRILL-7404:
---

 Summary: window function RANGE with compound ORDER BY
 Key: DRILL-7404
 URL: https://issues.apache.org/jira/browse/DRILL-7404
 Project: Apache Drill
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 1.16.0
Reporter: benj


When creating a ticket CALCITE-3402 (to ask for improve the window functions), 
it's appears that the documentation of drill seems not up to date

[https://drill.apache.org/docs/aggregate-window-functions/]
{code:java}
frame_clause
If an ORDER BY clause is used for an aggregate function, an explicit frame 
clause is required. The frame clause refines the set of rows in a function's 
window, including or excluding sets of rows within the ordered result. The 
frame clause consists of the ROWS or RANGE keyword and associated specifiers.
{code}
 But it's currently (1.16) possible to write ORDER BY clause in window function 
+without+ specify an explicit RANGE clause.

In this case, an +implicit+ frame clause is used.

And normally the default/implicit framing option is {{RANGE UNBOUNDED 
PRECEDING}}, which is the same as {{RANGE BETWEEN UNBOUNDED PRECEDING AND 
CURRENT ROW (and should perhaps also be more explicitly specified) }}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (DRILL-7420) window function improve ROWS clause/frame possibilities

2019-10-23 Thread benj (Jira)
benj created DRILL-7420:
---

 Summary: window function improve ROWS clause/frame possibilities
 Key: DRILL-7420
 URL: https://issues.apache.org/jira/browse/DRILL-7420
 Project: Apache Drill
  Issue Type: New Feature
Affects Versions: 1.16.0
Reporter: benj


The possibility of window frame are currently limited in Apache Drill.
  
 ROWS clauses is only possible with "BETWEEN UNBOUNDED PRECEDING AND CURRENT 
ROW".
 It will be useful to have possibilities to use:
 * "BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING"
 * "BETWEEN x PRECEDING AND y FOLLOWING"

{code:sql}
/* ROWS clause is only possible with "BETWEEN UNBOUNDED PRECEDING AND CURRENT 
ROW" */
apache drill> SELECT *, sum(a) OVER(ORDER BY b ROWS BETWEEN UNBOUNDED PRECEDING 
AND CURRENT ROW)  FROM (SELECT 1 a, 1 b, 1 c);
+---+---+---++
| a | b | c | EXPR$3 |
+---+---+---++
| 1 | 1 | 1 | 1  |
+---+---+---++
1 row selected (1.357 seconds)

/* ROWS is currently not possible with "BETWEEN UNBOUNDED PRECEDING AND 
UNBOUNDED FOLLOWING" (it's possible with RANGE but with single ORDER BY only ) 
*/
apache drill> SELECT *, sum(a) OVER(ORDER BY b, c ROWS BETWEEN UNBOUNDED 
PRECEDING AND UNBOUNDED FOLLOWING)  FROM (SELECT 1 a, 1 b, 1 c);
Error: UNSUPPORTED_OPERATION ERROR: This type of window frame is currently not 
supported 
See Apache Drill JIRA: DRILL-3188

/* ROWS is currently not possible with "BETWEEN x PRECEDING AND y FOLLOWING" */
apache drill> SELECT *, sum(a) OVER(ORDER BY b ROWS BETWEEN 1 PRECEDING AND 1 
FOLLOWING)  FROM (SELECT 1 a, 1 b, 1 c);
Error: UNSUPPORTED_OPERATION ERROR: This type of window frame is currently not 
supported 
See Apache Drill JIRA: DRILL-3188
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [jira] [Created] (DRILL-3619) Add support for NTILE window function

2015-08-10 Thread Ted Dunning
Let me know if you want some suggestions for approximate ntile
implementations.

On Mon, Aug 10, 2015 at 10:42 AM, Deneche A. Hakim (JIRA) 
wrote:

> Deneche A. Hakim created DRILL-3619:
> ---
>
>  Summary: Add support for NTILE window function
>  Key: DRILL-3619
>  URL: https://issues.apache.org/jira/browse/DRILL-3619
>  Project: Apache Drill
>   Issue Type: Sub-task
> Reporter: Deneche A. Hakim
> Assignee: Deneche A. Hakim
>
>
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.3.4#6332)
>


Re: [jira] [Created] (DRILL-3619) Add support for NTILE window function

2015-08-10 Thread Abdel Hakim Deneche
I got a "basic" implementation running, but it's still in testing.
It's seems to be giving correct results, but it's tied to Drill's Window
operator because it still need to compute all window functions if necessary.

All suggestions are welcome, it's always interesting to learn about
alternative ways to implement the same feature.

Thanks

On Mon, Aug 10, 2015 at 11:15 AM, Ted Dunning  wrote:

> Let me know if you want some suggestions for approximate ntile
> implementations.
>
> On Mon, Aug 10, 2015 at 10:42 AM, Deneche A. Hakim (JIRA)  >
> wrote:
>
> > Deneche A. Hakim created DRILL-3619:
> > ---
> >
> >  Summary: Add support for NTILE window function
> >  Key: DRILL-3619
> >  URL: https://issues.apache.org/jira/browse/DRILL-3619
> >  Project: Apache Drill
> >   Issue Type: Sub-task
> > Reporter: Deneche A. Hakim
> > Assignee: Deneche A. Hakim
> >
> >
> >
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.3.4#6332)
> >
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available>


[jira] [Created] (DRILL-3786) Query with window function fails with IllegalFormatConversionException

2015-09-15 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-3786:
--

 Summary: Query with window function fails with 
IllegalFormatConversionException
 Key: DRILL-3786
 URL: https://issues.apache.org/jira/browse/DRILL-3786
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.2.0
 Environment: 10 Performance Nodes
DRILL_MAX_DIRECT_MEMORY=100g
DRILL_INIT_HEAP="8g"
DRILL_MAX_HEAP="8g"
planner.memory.query_max_memory_per_node bumped up to 20 GB
TPC-DS SF 1000 dataset (Parquet)
Reporter: Abhishek Girish
Assignee: Jinfeng Ni


Query fails with Runtime exception:

{code:sql}
SELECT sum(s.ss_quantity) OVER (PARTITION BY s.ss_store_sk, s.ss_customer_sk 
ORDER BY s.ss_store_sk) FROM store_sales s LIMIT 20;
java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
IllegalFormatConversionException: d != java.lang.Character

Fragment 1:0

[Error Id: 12b51c0c-4992-4ceb-89c4-c99307529c7e on ucs-node8.perf.lab:31010]
at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
at 
sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
at sqlline.SqlLine.print(SqlLine.java:1583)
at sqlline.Commands.execute(Commands.java:852)
at sqlline.Commands.sql(Commands.java:751)
at sqlline.SqlLine.dispatch(SqlLine.java:738)
at sqlline.SqlLine.begin(SqlLine.java:612)
at sqlline.SqlLine.start(SqlLine.java:366)
at sqlline.SqlLine.main(SqlLine.java:259)
{code}

Query logs and profile attached. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4061) Incorrect results returned by window function query.

2015-11-10 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4061:
-

 Summary: Incorrect results returned by window function query.
 Key: DRILL-4061
 URL: https://issues.apache.org/jira/browse/DRILL-4061
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.3.0
 Environment: 4 node cluster CentOS
Reporter: Khurram Faraaz


Window function query that uses lag function returns incorrect results.
sys.version => 3a73f098
Drill 1.3
Test parquet file is attached here.

{code}
0: jdbc:drill:schema=dfs.tmp> CREATE TABLE testrepro AS SELECT CAST(columns[0] 
AS INT) col0, CAST(columns[1] AS INT) col1 FROM `testRepro.csv`;
+---++
| Fragment  | Number of records written  |
+---++
| 0_0   | 11 |
+---++
1 row selected (0.542 seconds)
0: jdbc:drill:schema=dfs.tmp> select col1, 1 / (col1 - lag(col1) OVER (ORDER BY 
col0)) from testrepro;
+---+-+
| col1  | EXPR$1  |
+---+-+
| 11| null|
| 9 | 0   |
| 0 | 0   |
| 10| 0   |
| 19| 0   |
| 13| 0   |
| 17| 0   |
| -1| 0   |
| 1 | 0   |
| 20| 0   |
| 100   | 0   |
+---+-+
11 rows selected (0.451 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4061) Incorrect results returned by window function query.

2015-11-11 Thread Khurram Faraaz (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Khurram Faraaz resolved DRILL-4061.
---
Resolution: Not A Problem

> Incorrect results returned by window function query.
> 
>
> Key: DRILL-4061
> URL: https://issues.apache.org/jira/browse/DRILL-4061
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.3.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>     Attachments: 0_0_0.parquet
>
>
> Window function query that uses lag function returns incorrect results.
> sys.version => 3a73f098
> Drill 1.3
> Test parquet file is attached here.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE testrepro AS SELECT 
> CAST(columns[0] AS INT) col0, CAST(columns[1] AS INT) col1 FROM 
> `testRepro.csv`;
> +---++
> | Fragment  | Number of records written  |
> +---++
> | 0_0   | 11 |
> +---++
> 1 row selected (0.542 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select col1, 1 / (col1 - lag(col1) OVER (ORDER 
> BY col0)) from testrepro;
> +---+-+
> | col1  | EXPR$1  |
> +---+-+
> | 11| null|
> | 9 | 0   |
> | 0 | 0   |
> | 10| 0   |
> | 19| 0   |
> | 13| 0   |
> | 17| 0   |
> | -1| 0   |
> | 1 | 0   |
> | 20| 0   |
> | 100   | 0   |
> +---+-+
> 11 rows selected (0.451 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3786) Query with window function fails with IllegalFormatConversionException

2015-11-13 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim resolved DRILL-3786.
-
Resolution: Fixed

Fixed in a639c51c7b893e16bd714ef659395a4207a4c5be

> Query with window function fails with IllegalFormatConversionException
> --
>
> Key: DRILL-3786
> URL: https://issues.apache.org/jira/browse/DRILL-3786
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.2.0
> Environment: 10 Performance Nodes
> DRILL_MAX_DIRECT_MEMORY=100g
> DRILL_INIT_HEAP="8g"
> DRILL_MAX_HEAP="8g"
> planner.memory.query_max_memory_per_node bumped up to 20 GB
> TPC-DS SF 1000 dataset (Parquet)
>Reporter: Abhishek Girish
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: drillbit.log.txt, query_profile.json
>
>
> Query fails with Runtime exception:
> {code:sql}
> SELECT sum(s.ss_quantity) OVER (PARTITION BY s.ss_store_sk, s.ss_customer_sk 
> ORDER BY s.ss_store_sk) FROM store_sales s LIMIT 20;
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
> IllegalFormatConversionException: d != java.lang.Character
> Fragment 1:0
> [Error Id: 12b51c0c-4992-4ceb-89c4-c99307529c7e on ucs-node8.perf.lab:31010]
>   at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>   at 
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>   at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>   at sqlline.SqlLine.print(SqlLine.java:1583)
>   at sqlline.Commands.execute(Commands.java:852)
>   at sqlline.Commands.sql(Commands.java:751)
>   at sqlline.SqlLine.dispatch(SqlLine.java:738)
>   at sqlline.SqlLine.begin(SqlLine.java:612)
>   at sqlline.SqlLine.start(SqlLine.java:366)
>   at sqlline.SqlLine.main(SqlLine.java:259)
> {code}
> Query logs and profile attached. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3786) Query with window function fails with IllegalFormatConversionException

2016-01-09 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim resolved DRILL-3786.
-
Resolution: Fixed

Fixed by DRILL-4174

> Query with window function fails with IllegalFormatConversionException
> --
>
> Key: DRILL-3786
> URL: https://issues.apache.org/jira/browse/DRILL-3786
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.2.0
> Environment: 10 Performance Nodes
> DRILL_MAX_DIRECT_MEMORY=100g
> DRILL_INIT_HEAP="8g"
> DRILL_MAX_HEAP="8g"
> planner.memory.query_max_memory_per_node bumped up to 20 GB
> TPC-DS SF 1000 dataset (Parquet)
>Reporter: Abhishek Girish
>Assignee: Deneche A. Hakim
>Priority: Critical
> Fix For: 1.4.0
>
> Attachments: drillbit.log.txt, query_profile.json
>
>
> Query fails with Runtime exception:
> {code:sql}
> SELECT sum(s.ss_quantity) OVER (PARTITION BY s.ss_store_sk, s.ss_customer_sk 
> ORDER BY s.ss_store_sk) FROM store_sales s LIMIT 20;
> java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
> IllegalFormatConversionException: d != java.lang.Character
> Fragment 1:0
> [Error Id: 12b51c0c-4992-4ceb-89c4-c99307529c7e on ucs-node8.perf.lab:31010]
>   at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>   at 
> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
>   at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
>   at sqlline.SqlLine.print(SqlLine.java:1583)
>   at sqlline.Commands.execute(Commands.java:852)
>   at sqlline.Commands.sql(Commands.java:751)
>   at sqlline.SqlLine.dispatch(SqlLine.java:738)
>   at sqlline.SqlLine.begin(SqlLine.java:612)
>   at sqlline.SqlLine.start(SqlLine.java:366)
>   at sqlline.SqlLine.main(SqlLine.java:259)
> {code}
> Query logs and profile attached. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4267) Multiple window function operators instead of one

2016-01-12 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-4267:
---

 Summary: Multiple window function operators instead of one
 Key: DRILL-4267
 URL: https://issues.apache.org/jira/browse/DRILL-4267
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.5.0
Reporter: Deneche A. Hakim
Priority: Minor


Changing the order of window functions in a query changes the number of window 
function operators in the plan.

The following query generates a plan with a single window function operator:
{noformat}
0: jdbc:drill:zk=local> EXPLAIN PLAN FOR SELECT ROW_NUMBER() OVER w, COUNT(*) 
OVER w FROM cp.`employee.json` WINDOW w AS (PARTITION BY position_id ORDER BY 
salary);
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(EXPR$0=[$0], EXPR$1=[$1])
00-02Project(EXPR$0=[$0], EXPR$1=[$1])
00-03  Project($0=[$2], $1=[$3])
00-04Window(window#0=[window(partition {0} order by [1] rows 
between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER(), COUNT()])])
00-05  SelectionVectorRemover
00-06Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-07  Scan(groupscan=[EasyGroupScan 
[selectionRoot=classpath:/employee.json, numFiles=1, columns=[`position_id`, 
`salary`], files=[classpath:/employee.json]]])
{noformat}

But when we permute the window functions in the query we get 2 window function 
operators in the plan:
{noformat}
0: jdbc:drill:zk=local> EXPLAIN PLAN FOR SELECT COUNT(*) OVER w, ROW_NUMBER() 
OVER w FROM cp.`employee.json` WINDOW w AS (PARTITION BY position_id ORDER BY 
salary);
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(EXPR$0=[$0], EXPR$1=[$1])
00-02Project(EXPR$0=[$0], EXPR$1=[$1])
00-03  Project($0=[$2], $1=[$3])
00-04Window(window#0=[window(partition {0} order by [1] rows 
between UNBOUNDED PRECEDING and CURRENT ROW aggs [ROW_NUMBER()])])
00-05  Window(window#0=[window(partition {0} order by [1] range 
between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT()])])
00-06SelectionVectorRemover
00-07  Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-08Scan(groupscan=[EasyGroupScan 
[selectionRoot=classpath:/employee.json, numFiles=1, columns=[`position_id`, 
`salary`], files=[classpath:/employee.json]]])
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4808) CTE query with window function results in AssertionError

2016-07-26 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4808:
-

 Summary: CTE query with window function results in AssertionError
 Key: DRILL-4808
 URL: https://issues.apache.org/jira/browse/DRILL-4808
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.8.0
Reporter: Khurram Faraaz


Below query that uses CTE and window functions results in AssertionError
Same query over same data works on Postgres.
MapR Drill 1.8.0 commit ID : 34ca63ba

{noformat}
0: jdbc:drill:schema=dfs.tmp> WITH v1 ( a, b, c, d ) AS
. . . . . . . . . . . . . . > (
. . . . . . . . . . . . . . > SELECT col0, col8, MAX(MIN(col8)) over 
(partition by col7 order by col8) as max_col8, col7 from `allTypsUniq.parquet` 
GROUP BY col0,col7,col8
. . . . . . . . . . . . . . > )
. . . . . . . . . . . . . . > select * from ( select a, b, c, d from v1 where c 
> 'IN' GROUP BY a,b,c,d order by a,b,c,d);
Error: SYSTEM ERROR: AssertionError: Internal error: Type 'RecordType(ANY col0, 
ANY col8, ANY max_col8, ANY col7)' has no field 'a'


[Error Id: 5c058176-741a-42cd-8433-0cd81115776b on centos-01.qa.lab:31010] 
(state=,code=0)
{noformat}

Stack trace from drillbit.log for above failing query

{noformat}
2016-07-26 16:57:04,627 [2868699e-ae56-66f4-9439-8db2132ef265:foreman] INFO  
o.a.drill.exec.work.foreman.Foreman - Query text for query id 
2868699e-ae56-66f4-9439-8db2132ef265: WITH v1 ( a, b, c, d ) AS
(
SELECT col0, col8, MAX(MIN(col8)) over (partition by col7 order by col8) as 
max_col8, col7 from `allTypsUniq.parquet` GROUP BY col0,col7,col8
)
select * from ( select a, b, c, d from v1 where c > 'IN' GROUP BY a,b,c,d order 
by a,b,c,d)
2016-07-26 16:57:04,666 [2868699e-ae56-66f4-9439-8db2132ef265:foreman] ERROR 
o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: AssertionError: Internal 
error: Type 'RecordType(ANY col0, ANY col8, ANY max_col8, ANY col7)' has no 
field 'a'


[Error Id: 5c058176-741a-42cd-8433-0cd81115776b on centos-01.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError: 
Internal error: Type 'RecordType(ANY col0, ANY col8, ANY max_col8, ANY col7)' 
has no field 'a'


[Error Id: 5c058176-741a-42cd-8433-0cd81115776b on centos-01.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:543)
 ~[drill-common-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:791)
 [drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:901) 
[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:271) 
[drill-java-exec-1.8.0-SNAPSHOT.jar:1.8.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_101]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_101]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_101]
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
exception during fragment initialization: Internal error: Type 'RecordType(ANY 
col0, ANY col8, ANY max_col8, ANY col7)' has no field 'a'
... 4 common frames omitted
Caused by: java.lang.AssertionError: Internal error: Type 'RecordType(ANY col0, 
ANY col8, ANY max_col8, ANY col7)' has no field 'a'
at org.apache.calcite.util.Util.newInternal(Util.java:777) 
~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
at 
org.apache.calcite.rex.RexBuilder.makeFieldAccess(RexBuilder.java:167) 
~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3225)
 ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.access$1500(SqlToRelConverter.java:185)
 ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4181)
 ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:3603)
 ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:274) 
~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:4062)
 ~[calcite-core-1.4.0-drill-r14.jar:1.4.0-drill-r14]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList(SqlToRelCon

Re: Window function query takes too long to complete and return results

2015-06-09 Thread Abdel Hakim Deneche
please open a JIRA issue. please provide the test file (compressed) or a
script to generate similar data.

Thanks!

On Tue, Jun 9, 2015 at 6:55 PM, Khurram Faraaz  wrote:

> Query that uses window functions takes too long to complete and return
> results. It returns close to a million records, for which it took 533.8
> seconds ~8 minutes
> Input CSV file has two columns, one integer and another varchar type
> column. Please let me know if this needs to be investigated and I can
> report a JIRA to track this if required ?
>
> Size of the input CSV file
>
> root@centos-01 ~]# hadoop fs -ls /tmp/manyDuplicates.csv
>
> -rwxr-xr-x   3 root root   27889455 2015-06-10 01:26
> /tmp/manyDuplicates.csv
>
> {code}
>
> select count(*) over(partition by cast(columns[1] as varchar(25)) order by
> cast(columns[0] as bigint)) from `manyDuplicates.csv`;
>
> ...
>
> 1,000,007 rows selected (533.857 seconds)
> {code}
>
> There are five distinct values in columns[1] in the CSV file. = [FIVE
> PARTITIONS]
>
> {code}
>
> 0: jdbc:drill:schema=dfs.tmp> select distinct columns[1] from
> `manyDuplicates.csv`;
>
> *+---+*
>
> *| **   EXPR$0** |*
>
> *+---+*
>
> *| * * |*
>
> *| * * |*
>
> *| * * |*
>
> *| * * |*
>
> *| * * |*
>
> *+---+*
>
> 5 rows selected (1.906 seconds)
> {code}
>
> Here is the count for each of those values in columns[1]
>
> {code}
>
> 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> `manyDuplicates.csv` where columns[1] = '';
>
> *+-+*
>
> *| **EXPR$0 ** |*
>
> *+-+*
>
> *| *200484 * |*
>
> *+-+*
>
> 1 row selected (0.961 seconds)
>
> {code}
>
>
> {code}
>
> 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> `manyDuplicates.csv` where columns[1] = '';
>
> *+-+*
>
> *| **EXPR$0 ** |*
>
> *+-+*
>
> *| *199353 * |*
>
> *+-+*
>
> 1 row selected (0.86 seconds)
>
> {code}
>
>
> {code}
>
> 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> `manyDuplicates.csv` where columns[1] = '';
>
> *+-+*
>
> *| **EXPR$0 ** |*
>
> *+-+*
>
> *| *200702 * |*
>
> *+-+*
>
> 1 row selected (0.826 seconds)
>
> {code}
>
>
> {code}
>
> 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> `manyDuplicates.csv` where columns[1] = '';
>
> *+-+*
>
> *| **EXPR$0 ** |*
>
> *+-+*
>
> *| *199916 * |*
>
> *+-+*
>
> 1 row selected (0.851 seconds)
>
> {code}
>
>
> {code}
>
> 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> `manyDuplicates.csv` where columns[1] = '';
>
> *+-+*
>
> *| **EXPR$0 ** |*
>
> *+-+*
>
> *| *199552 * |*
>
> *+-+*
>
> 1 row selected (0.827 seconds)
> {code}
>
> Thanks,
> Khurram
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



Re: Window function query takes too long to complete and return results

2015-06-09 Thread Steven Phillips
In cases like this where you are printing millions of record in SQLLINE,
you should pipe the output to /dev/null or to a file, and measure the
performance that way. I'm guessing that most of the time in this case is
spent printing the output to the console, and thus really unrelated to
Drill performance. If piping the data to a file or /dev/null causes the
query to run much faster, than it probably isn't a real issue.

also, anytime you are investigating a performance related issue, you should
always check the profile. In this case, I suspect you might see that most
of the time is spent in the WAIT time of the SCREEN operator. That would
indicate that client side processing is slowing the query down.

On Tue, Jun 9, 2015 at 7:09 PM, Abdel Hakim Deneche 
wrote:

> please open a JIRA issue. please provide the test file (compressed) or a
> script to generate similar data.
>
> Thanks!
>
> On Tue, Jun 9, 2015 at 6:55 PM, Khurram Faraaz 
> wrote:
>
> > Query that uses window functions takes too long to complete and return
> > results. It returns close to a million records, for which it took 533.8
> > seconds ~8 minutes
> > Input CSV file has two columns, one integer and another varchar type
> > column. Please let me know if this needs to be investigated and I can
> > report a JIRA to track this if required ?
> >
> > Size of the input CSV file
> >
> > root@centos-01 ~]# hadoop fs -ls /tmp/manyDuplicates.csv
> >
> > -rwxr-xr-x   3 root root   27889455 2015-06-10 01:26
> > /tmp/manyDuplicates.csv
> >
> > {code}
> >
> > select count(*) over(partition by cast(columns[1] as varchar(25)) order
> by
> > cast(columns[0] as bigint)) from `manyDuplicates.csv`;
> >
> > ...
> >
> > 1,000,007 rows selected (533.857 seconds)
> > {code}
> >
> > There are five distinct values in columns[1] in the CSV file. = [FIVE
> > PARTITIONS]
> >
> > {code}
> >
> > 0: jdbc:drill:schema=dfs.tmp> select distinct columns[1] from
> > `manyDuplicates.csv`;
> >
> > *+---+*
> >
> > *| **   EXPR$0** |*
> >
> > *+---+*
> >
> > *| * * |*
> >
> > *| * * |*
> >
> > *| * * |*
> >
> > *| * * |*
> >
> > *| * * |*
> >
> > *+---+*
> >
> > 5 rows selected (1.906 seconds)
> > {code}
> >
> > Here is the count for each of those values in columns[1]
> >
> > {code}
> >
> > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > `manyDuplicates.csv` where columns[1] = '';
> >
> > *+-+*
> >
> > *| **EXPR$0 ** |*
> >
> > *+-+*
> >
> > *| *200484 * |*
> >
> > *+-+*
> >
> > 1 row selected (0.961 seconds)
> >
> > {code}
> >
> >
> > {code}
> >
> > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > `manyDuplicates.csv` where columns[1] = '';
> >
> > *+-+*
> >
> > *| **EXPR$0 ** |*
> >
> > *+-+*
> >
> > *| *199353 * |*
> >
> > *+-+*
> >
> > 1 row selected (0.86 seconds)
> >
> > {code}
> >
> >
> > {code}
> >
> > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > `manyDuplicates.csv` where columns[1] = '';
> >
> > *+-+*
> >
> > *| **EXPR$0 ** |*
> >
> > *+-+*
> >
> > *| *200702 * |*
> >
> > *+-+*
> >
> > 1 row selected (0.826 seconds)
> >
> > {code}
> >
> >
> > {code}
> >
> > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > `manyDuplicates.csv` where columns[1] = '';
> >
> > *+-+*
> >
> > *| **EXPR$0 ** |*
> >
> > *+-+*
> >
> > *| *199916 * |*
> >
> > *+-+*
> >
> > 1 row selected (0.851 seconds)
> >
> > {code}
> >
> >
> > {code}
> >
> > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > `manyDuplicates.csv` where columns[1] = '';
> >
> > *+-+*
> >
> > *| **EXPR$0 ** |*
> >
> > *+-+*
> >
> > *| *199552 * |*
> >
> > *+-+*
> >
> > 1 row selected (0.827 seconds)
> > {code}
> >
> > Thanks,
> > Khurram
> >
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> <
> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
> >
>



-- 
 Steven Phillips
 Software Engineer

 mapr.com


Re: Window function query takes too long to complete and return results

2015-06-09 Thread Khurram Faraaz
JIRA 3269 is opened to track this behavior.
I tried to iterate over the ResultSet from a JDBC program, I only iterated
over the results until there were records, no results were
processed/printed. It still took close to nine minutes to complete
execution.

Here is a snippet of what I did from JDBC.

String query = "select count(*) over(partition by cast(columns[1] as
varchar(25)) order by cast(columns[0] as bigint)) from `manyDuplicates.csv`"
;



ResultSet rs = stmt.executeQuery(query);


while (rs.next()) {

System.out.println("1");

}

On Tue, Jun 9, 2015 at 9:56 PM, Steven Phillips 
wrote:

> In cases like this where you are printing millions of record in SQLLINE,
> you should pipe the output to /dev/null or to a file, and measure the
> performance that way. I'm guessing that most of the time in this case is
> spent printing the output to the console, and thus really unrelated to
> Drill performance. If piping the data to a file or /dev/null causes the
> query to run much faster, than it probably isn't a real issue.
>
> also, anytime you are investigating a performance related issue, you should
> always check the profile. In this case, I suspect you might see that most
> of the time is spent in the WAIT time of the SCREEN operator. That would
> indicate that client side processing is slowing the query down.
>
> On Tue, Jun 9, 2015 at 7:09 PM, Abdel Hakim Deneche  >
> wrote:
>
> > please open a JIRA issue. please provide the test file (compressed) or a
> > script to generate similar data.
> >
> > Thanks!
> >
> > On Tue, Jun 9, 2015 at 6:55 PM, Khurram Faraaz 
> > wrote:
> >
> > > Query that uses window functions takes too long to complete and return
> > > results. It returns close to a million records, for which it took 533.8
> > > seconds ~8 minutes
> > > Input CSV file has two columns, one integer and another varchar type
> > > column. Please let me know if this needs to be investigated and I can
> > > report a JIRA to track this if required ?
> > >
> > > Size of the input CSV file
> > >
> > > root@centos-01 ~]# hadoop fs -ls /tmp/manyDuplicates.csv
> > >
> > > -rwxr-xr-x   3 root root   27889455 2015-06-10 01:26
> > > /tmp/manyDuplicates.csv
> > >
> > > {code}
> > >
> > > select count(*) over(partition by cast(columns[1] as varchar(25)) order
> > by
> > > cast(columns[0] as bigint)) from `manyDuplicates.csv`;
> > >
> > > ...
> > >
> > > 1,000,007 rows selected (533.857 seconds)
> > > {code}
> > >
> > > There are five distinct values in columns[1] in the CSV file. = [FIVE
> > > PARTITIONS]
> > >
> > > {code}
> > >
> > > 0: jdbc:drill:schema=dfs.tmp> select distinct columns[1] from
> > > `manyDuplicates.csv`;
> > >
> > > *+---+*
> > >
> > > *| **   EXPR$0** |*
> > >
> > > *+---+*
> > >
> > > *| * * |*
> > >
> > > *| * * |*
> > >
> > > *| * * |*
> > >
> > > *| * * |*
> > >
> > > *| * * |*
> > >
> > > *+---+*
> > >
> > > 5 rows selected (1.906 seconds)
> > > {code}
> > >
> > > Here is the count for each of those values in columns[1]
> > >
> > > {code}
> > >
> > > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > > `manyDuplicates.csv` where columns[1] = '';
> > >
> > > *+-+*
> > >
> > > *| **EXPR$0 ** |*
> > >
> > > *+-+*
> > >
> > > *| *200484 * |*
> > >
> > > *+-+*
> > >
> > > 1 row selected (0.961 seconds)
> > >
> > > {code}
> > >
> > >
> > > {code}
> > >
> > > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > > `manyDuplicates.csv` where columns[1] = '';
> > >
> > > *+-+*
> > >
> > > *| **EXPR$0 ** |*
> > >
> > > *+-+*
> > >
> > > *| *199353 * |*
> > >
> > > *+-+*
> > >
> > > 1 row selected (0.86 seconds)
> > >
> > > {code}
> > >
> > >
> > > {code}
> > >
> > > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > > `manyDuplicates.csv` where columns[1] = '';
> > >
> > > *+-+*
> > >
> > > *| **EXPR$0 ** |*
> > >
> > > *+-+*
> > >
> > > *| *200702 * |*
> > >
> > > *+-+*
> > >
> > > 1 row selected (0.826 seconds)
> > >
> > > {code}
> > >
> > >
> > > {code}
> > >
> > > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > > `manyDuplicates.csv` where columns[1] = '';
> > >
> > > *+-+*
> > >
> > > *| **EXPR$0 ** |*
> > >
> > > *+-+*
> > >
> > > *| *199916 * |*
> > >
> > > *+-+*
> > >
> > > 1 row selected (0.851 seconds)
> > >
> > > {code}
> > >
> > >
> > > {code}
> > >
> > > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > > `manyDuplicates.csv` where columns[1] = '';
> > >
> > > *+-+*
> > >
> > > *| **EXPR$0 ** |*
> > >
> > > *+-+*
> > >
> > > *| *199552 * |*
> > >
> > > *+-+*
> > >
> > > 1 

Re: Window function query takes too long to complete and return results

2015-06-10 Thread Abdel Hakim Deneche
I tried the query using the new implementation (DRILL-3200) and it's much
more faster: 14 seconds compared to 523 seconds using the current
implementation. I didn't check the results though.

On Tue, Jun 9, 2015 at 11:30 PM, Khurram Faraaz 
wrote:

> JIRA 3269 is opened to track this behavior.
> I tried to iterate over the ResultSet from a JDBC program, I only iterated
> over the results until there were records, no results were
> processed/printed. It still took close to nine minutes to complete
> execution.
>
> Here is a snippet of what I did from JDBC.
>
> String query = "select count(*) over(partition by cast(columns[1] as
> varchar(25)) order by cast(columns[0] as bigint)) from
> `manyDuplicates.csv`"
> ;
>
>
>
> ResultSet rs = stmt.executeQuery(query);
>
>
> while (rs.next()) {
>
> System.out.println("1");
>
> }
>
> On Tue, Jun 9, 2015 at 9:56 PM, Steven Phillips 
> wrote:
>
> > In cases like this where you are printing millions of record in SQLLINE,
> > you should pipe the output to /dev/null or to a file, and measure the
> > performance that way. I'm guessing that most of the time in this case is
> > spent printing the output to the console, and thus really unrelated to
> > Drill performance. If piping the data to a file or /dev/null causes the
> > query to run much faster, than it probably isn't a real issue.
> >
> > also, anytime you are investigating a performance related issue, you
> should
> > always check the profile. In this case, I suspect you might see that most
> > of the time is spent in the WAIT time of the SCREEN operator. That would
> > indicate that client side processing is slowing the query down.
> >
> > On Tue, Jun 9, 2015 at 7:09 PM, Abdel Hakim Deneche <
> adene...@maprtech.com
> > >
> > wrote:
> >
> > > please open a JIRA issue. please provide the test file (compressed) or
> a
> > > script to generate similar data.
> > >
> > > Thanks!
> > >
> > > On Tue, Jun 9, 2015 at 6:55 PM, Khurram Faraaz 
> > > wrote:
> > >
> > > > Query that uses window functions takes too long to complete and
> return
> > > > results. It returns close to a million records, for which it took
> 533.8
> > > > seconds ~8 minutes
> > > > Input CSV file has two columns, one integer and another varchar type
> > > > column. Please let me know if this needs to be investigated and I can
> > > > report a JIRA to track this if required ?
> > > >
> > > > Size of the input CSV file
> > > >
> > > > root@centos-01 ~]# hadoop fs -ls /tmp/manyDuplicates.csv
> > > >
> > > > -rwxr-xr-x   3 root root   27889455 2015-06-10 01:26
> > > > /tmp/manyDuplicates.csv
> > > >
> > > > {code}
> > > >
> > > > select count(*) over(partition by cast(columns[1] as varchar(25))
> order
> > > by
> > > > cast(columns[0] as bigint)) from `manyDuplicates.csv`;
> > > >
> > > > ...
> > > >
> > > > 1,000,007 rows selected (533.857 seconds)
> > > > {code}
> > > >
> > > > There are five distinct values in columns[1] in the CSV file. = [FIVE
> > > > PARTITIONS]
> > > >
> > > > {code}
> > > >
> > > > 0: jdbc:drill:schema=dfs.tmp> select distinct columns[1] from
> > > > `manyDuplicates.csv`;
> > > >
> > > > *+---+*
> > > >
> > > > *| **   EXPR$0** |*
> > > >
> > > > *+---+*
> > > >
> > > > *| * * |*
> > > >
> > > > *| * * |*
> > > >
> > > > *| * * |*
> > > >
> > > > *| * * |*
> > > >
> > > > *| * * |*
> > > >
> > > > *+---+*
> > > >
> > > > 5 rows selected (1.906 seconds)
> > > > {code}
> > > >
> > > > Here is the count for each of those values in columns[1]
> > > >
> > > > {code}
> > > >
> > > > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > > > `manyDuplicates.csv` where columns[1] = '';
> > > >
> > > > *+-+*
> > > >
> > > > *| **EXPR$0 ** |*
> > > >
> > > > *+-+*
> > > >
> > > > *| *200484 * |*
> > > >
> > > > *+-+*
> > > >
> > > > 1 row selected (0.961 seconds)
> > > >
> > > > {code}
> > > >
> > > >
> > > > {code}
> > > >
> > > > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > > > `manyDuplicates.csv` where columns[1] = '';
> > > >
> > > > *+-+*
> > > >
> > > > *| **EXPR$0 ** |*
> > > >
> > > > *+-+*
> > > >
> > > > *| *199353 * |*
> > > >
> > > > *+-+*
> > > >
> > > > 1 row selected (0.86 seconds)
> > > >
> > > > {code}
> > > >
> > > >
> > > > {code}
> > > >
> > > > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > > > `manyDuplicates.csv` where columns[1] = '';
> > > >
> > > > *+-+*
> > > >
> > > > *| **EXPR$0 ** |*
> > > >
> > > > *+-+*
> > > >
> > > > *| *200702 * |*
> > > >
> > > > *+-+*
> > > >
> > > > 1 row selected (0.826 seconds)
> > > >
> > > > {code}
> > > >
> > > >
> > > > {code}
> > > >
> > > > 0: jdbc:drill:schema=dfs.tmp> 

Re: Window function query takes too long to complete and return results

2015-06-10 Thread Khurram Faraaz
Great! I will re-run the query on latest level and also verify results
against Postgress results.

On Wed, Jun 10, 2015 at 9:27 AM, Abdel Hakim Deneche 
wrote:

> I tried the query using the new implementation (DRILL-3200) and it's much
> more faster: 14 seconds compared to 523 seconds using the current
> implementation. I didn't check the results though.
>
> On Tue, Jun 9, 2015 at 11:30 PM, Khurram Faraaz 
> wrote:
>
> > JIRA 3269 is opened to track this behavior.
> > I tried to iterate over the ResultSet from a JDBC program, I only
> iterated
> > over the results until there were records, no results were
> > processed/printed. It still took close to nine minutes to complete
> > execution.
> >
> > Here is a snippet of what I did from JDBC.
> >
> > String query = "select count(*) over(partition by cast(columns[1] as
> > varchar(25)) order by cast(columns[0] as bigint)) from
> > `manyDuplicates.csv`"
> > ;
> >
> >
> >
> > ResultSet rs = stmt.executeQuery(query);
> >
> >
> > while (rs.next()) {
> >
> > System.out.println("1");
> >
> > }
> >
> > On Tue, Jun 9, 2015 at 9:56 PM, Steven Phillips 
> > wrote:
> >
> > > In cases like this where you are printing millions of record in
> SQLLINE,
> > > you should pipe the output to /dev/null or to a file, and measure the
> > > performance that way. I'm guessing that most of the time in this case
> is
> > > spent printing the output to the console, and thus really unrelated to
> > > Drill performance. If piping the data to a file or /dev/null causes the
> > > query to run much faster, than it probably isn't a real issue.
> > >
> > > also, anytime you are investigating a performance related issue, you
> > should
> > > always check the profile. In this case, I suspect you might see that
> most
> > > of the time is spent in the WAIT time of the SCREEN operator. That
> would
> > > indicate that client side processing is slowing the query down.
> > >
> > > On Tue, Jun 9, 2015 at 7:09 PM, Abdel Hakim Deneche <
> > adene...@maprtech.com
> > > >
> > > wrote:
> > >
> > > > please open a JIRA issue. please provide the test file (compressed)
> or
> > a
> > > > script to generate similar data.
> > > >
> > > > Thanks!
> > > >
> > > > On Tue, Jun 9, 2015 at 6:55 PM, Khurram Faraaz  >
> > > > wrote:
> > > >
> > > > > Query that uses window functions takes too long to complete and
> > return
> > > > > results. It returns close to a million records, for which it took
> > 533.8
> > > > > seconds ~8 minutes
> > > > > Input CSV file has two columns, one integer and another varchar
> type
> > > > > column. Please let me know if this needs to be investigated and I
> can
> > > > > report a JIRA to track this if required ?
> > > > >
> > > > > Size of the input CSV file
> > > > >
> > > > > root@centos-01 ~]# hadoop fs -ls /tmp/manyDuplicates.csv
> > > > >
> > > > > -rwxr-xr-x   3 root root   27889455 2015-06-10 01:26
> > > > > /tmp/manyDuplicates.csv
> > > > >
> > > > > {code}
> > > > >
> > > > > select count(*) over(partition by cast(columns[1] as varchar(25))
> > order
> > > > by
> > > > > cast(columns[0] as bigint)) from `manyDuplicates.csv`;
> > > > >
> > > > > ...
> > > > >
> > > > > 1,000,007 rows selected (533.857 seconds)
> > > > > {code}
> > > > >
> > > > > There are five distinct values in columns[1] in the CSV file. =
> [FIVE
> > > > > PARTITIONS]
> > > > >
> > > > > {code}
> > > > >
> > > > > 0: jdbc:drill:schema=dfs.tmp> select distinct columns[1] from
> > > > > `manyDuplicates.csv`;
> > > > >
> > > > > *+---+*
> > > > >
> > > > > *| **   EXPR$0** |*
> > > > >
> > > > > *+---+*
> > > > >
> > > > > *| * * |*
> > > > >
> > > > > *| * * |*
> > > > >
> > > > > *| * * |*
> > > > >
> > > > > *| * * |*
> > > > >
> > > > > *| * * |*
> > > > >
> > > > > *+---+*
> > > > >
> > > > > 5 rows selected (1.906 seconds)
> > > > > {code}
> > > > >
> > > > > Here is the count for each of those values in columns[1]
> > > > >
> > > > > {code}
> > > > >
> > > > > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > > > > `manyDuplicates.csv` where columns[1] = '';
> > > > >
> > > > > *+-+*
> > > > >
> > > > > *| **EXPR$0 ** |*
> > > > >
> > > > > *+-+*
> > > > >
> > > > > *| *200484 * |*
> > > > >
> > > > > *+-+*
> > > > >
> > > > > 1 row selected (0.961 seconds)
> > > > >
> > > > > {code}
> > > > >
> > > > >
> > > > > {code}
> > > > >
> > > > > 0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from
> > > > > `manyDuplicates.csv` where columns[1] = '';
> > > > >
> > > > > *+-+*
> > > > >
> > > > > *| **EXPR$0 ** |*
> > > > >
> > > > > *+-+*
> > > > >
> > > > > *| *199353 * |*
> > > > >
> > > > > *+-+*
> > > > >
> > > > > 1 row selected (0.86 seconds)
> > > > >
> > 

[jira] [Created] (DRILL-3307) Query with window function runs out of memory

2015-06-17 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-3307:
--

 Summary: Query with window function runs out of memory
 Key: DRILL-3307
 URL: https://issues.apache.org/jira/browse/DRILL-3307
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.1.0
 Environment: Data set: TPC-DS SF 100 Parquet
Number of Nodes: 4
Reporter: Abhishek Girish
Assignee: Deneche A. Hakim
 Attachments: drillbit.log.txt

Query with window function runs out of memory:
{code:sql}
 SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) AS 
TotalSpend FROM store_sales ss ORDER BY 1 LIMIT 20;
java.lang.RuntimeException: java.sql.SQLException: RESOURCE ERROR: One or more 
nodes ran out of memory while executing the query.

Fragment 3:0

[Error Id: 9af19064-9175-46a4-b557-714d1c77cd76 on abhi6.qa.lab:31010]
at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
at 
sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:85)
at sqlline.TableOutputFormat.print(TableOutputFormat.java:116)
at sqlline.SqlLine.print(SqlLine.java:1583)
at sqlline.Commands.execute(Commands.java:852)
at sqlline.Commands.sql(Commands.java:751)
at sqlline.SqlLine.dispatch(SqlLine.java:738)
at sqlline.SqlLine.begin(SqlLine.java:612)
at sqlline.SqlLine.start(SqlLine.java:366)
at sqlline.SqlLine.main(SqlLine.java:259)
{code}

Plan:
{code}
00-00Screen : rowType = RecordType(ANY TotalSpend): rowcount = 
2.87997024E8, cumulative cost = {4.3487550824E9 rows, 5.7539970079068695E10 
cpu, 0.0 io, 7.077814861824E12 network, 4.607952384E9 memory}, id = 142297
00-01  SelectionVectorRemover : rowType = RecordType(ANY TotalSpend): 
rowcount = 2.87997024E8, cumulative cost = {4.31995538E9 rows, 
5.751117037666869E10 cpu, 0.0 io, 7.077814861824E12 network, 4.607952384E9 
memory}, id = 142296
00-02Limit(fetch=[20]) : rowType = RecordType(ANY TotalSpend): rowcount 
= 2.87997024E8, cumulative cost = {4.031958356E9 rows, 5.722317335266869E10 
cpu, 0.0 io, 7.077814861824E12 network, 4.607952384E9 memory}, id = 142295
00-03  SingleMergeExchange(sort0=[0 ASC]) : rowType = RecordType(ANY 
TotalSpend): rowcount = 2.87997024E8, cumulative cost = {4.031958336E9 rows, 
5.722317327266869E10 cpu, 0.0 io, 7.077814861824E12 network, 4.607952384E9 
memory}, id = 142294
01-01SelectionVectorRemover : rowType = RecordType(ANY TotalSpend): 
rowcount = 2.87997024E8, cumulative cost = {3.743961312E9 rows, 
5.261522088866869E10 cpu, 0.0 io, 5.89817905152E12 network, 4.607952384E9 
memory}, id = 142293
01-02  TopN(limit=[20]) : rowType = RecordType(ANY TotalSpend): 
rowcount = 2.87997024E8, cumulative cost = {3.455964288E9 rows, 
5.232722386466869E10 cpu, 0.0 io, 5.89817905152E12 network, 4.607952384E9 
memory}, id = 142292
01-03Project(TotalSpend=[$0]) : rowType = RecordType(ANY 
TotalSpend): rowcount = 2.87997024E8, cumulative cost = {3.167967264E9 rows, 
4.734841414759049E10 cpu, 0.0 io, 5.89817905152E12 network, 4.607952384E9 
memory}, id = 142291
01-04  HashToRandomExchange(dist0=[[$0]]) : rowType = 
RecordType(ANY TotalSpend, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 
2.87997024E8, cumulative cost = {3.167967264E9 rows, 4.734841414759049E10 cpu, 
0.0 io, 5.89817905152E12 network, 4.607952384E9 memory}, id = 142290
02-01UnorderedMuxExchange : rowType = RecordType(ANY 
TotalSpend, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 2.87997024E8, cumulative 
cost = {2.87997024E9 rows, 4.274046176359049E10 cpu, 0.0 io, 3.538907430912E12 
network, 4.607952384E9 memory}, id = 142289
03-01  Project(TotalSpend=[$0], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($0))]) : rowType = 
RecordType(ANY TotalSpend, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 
2.87997024E8, cumulative cost = {2.591973216E9 rows, 4.245246473959049E10 cpu, 
0.0 io, 3.538907430912E12 network, 4.607952384E9 memory}, id = 142288
03-02Project(TotalSpend=[CASE(>($2, 0), CAST($3):ANY, 
null)]) : rowType = RecordType(ANY TotalSpend): rowcount = 2.87997024E8, 
cumulative cost = {2.303976192E9 rows, 4.130047664359049E10 cpu, 0.0 io, 
3.538907430912E12 network, 4.607952384E9 memory}, id = 142287
03-03  Window(window#0=[window(partition {1} order by 
[] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [COUNT($0), 
$SUM0($0)])]) : rowType = RecordType(ANY ss_net_paid_inc_tax, ANY ss_store_sk, 
BIGINT w0$o0, ANY w0$o1): rowcount = 2.87997024E8, cumulative cost = 
{2.015979168E9 rows, 4.014848854759049E10 cpu, 0.0 io, 3.538907430912E12 
network, 4.607952384E9 memory}, id = 142286
03-04SelectionVectorRemover : rowType = 
RecordType(ANY ss_net_paid_

[jira] [Created] (DRILL-3337) Queries with Window Function DENSE_RANK fail with SchemaChangeException

2015-06-22 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-3337:
--

 Summary: Queries with Window Function DENSE_RANK fail with 
SchemaChangeException
 Key: DRILL-3337
 URL: https://issues.apache.org/jira/browse/DRILL-3337
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.1.0
Reporter: Abhishek Girish
Assignee: Aman Sinha


Example queries which result in exceptions:

DENSE_RANK WF with ORDER BY 1 column and GROUP BY, ORDER BY on the main query
{code:sql}
SELECT DENSE_RANK() OVER (ORDER BY ss.ss_store_sk) FROM store_sales ss GROUP BY 
ss.ss_store_sk, ss.ss_net_paid_inc_tax ORDER BY 1 LIMIT 20;
Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: 
Failure while materializing expression. 
Error in expression at index 0.  Error: Missing function implementation: 
[dense_rank(INT-REQUIRED)].  Full expression: null.
Fragment 4:10
[Error Id: 4b9187db-e770-4e7f-afe4-0d4dfc045088 on abhi6.qa.lab:31010] 
(state=,code=0)
{code}

DENSE_RANK WF with PARTITION BY 2 columns and ORDER BY 2 column and GROUP BY, 
ORDER BY on the main query
{code:sql}
SELECT DENSE_RANK() OVER (PARTITION BY s.ss_store_sk, s.ss_customer_sk ORDER BY 
s.ss_store_sk, s.ss_customer_sk) FROM store_sales s GROUP BY s.ss_store_sk, 
s.ss_customer_sk, s.ss_quantity ORDER BY 1  LIMIT 20;
Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: 
Failure while materializing expression. 
Error in expression at index 0.  Error: Missing function implementation: 
[dense_rank(INT-REQUIRED)].  Full expression: null.
Fragment 5:22
[Error Id: 3ac6e4ce-5bb3-4058-b806-3d0becbbd0d1 on abhi6.qa.lab:31010] 
(state=,code=0)
{code}


*The queries execute fine on Postgres*



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 36278: DRILL-3189: Disable DISALLOW PARTIAL in window function grammar

2015-07-07 Thread Sean Hsuan-Yi Chu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36278/
---

Review request for drill, Aman Sinha and Jinfeng Ni.


Bugs: DRILL-3189
https://issues.apache.org/jira/browse/DRILL-3189


Repository: drill-git


Description
---

Disable disallow partial in Over-Clause


Diffs
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/UnsupportedOperatorsVisitor.java
 9bbd537 
  exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 
8676c28 

Diff: https://reviews.apache.org/r/36278/diff/


Testing
---

All requested


Thanks,

Sean Hsuan-Yi Chu



[jira] [Created] (DRILL-3647) Handle null as input to window function NTILE

2015-08-14 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-3647:
-

 Summary: Handle null as input to window function NTILE 
 Key: DRILL-3647
 URL: https://issues.apache.org/jira/browse/DRILL-3647
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.2.0
 Environment: private-branch 
https://github.com/adeneche/incubator-drill/tree/new-window-funcs
Reporter: Khurram Faraaz
Assignee: Chris Westin


We need to handle null as input to window functions. NTILE function must return 
null as output when input is null.

{code}
0: jdbc:drill:schema=dfs.tmp> select col7 , col0 , ntile(null) over(partition 
by col7 order by col0) lead_col0 from FEWRWSPQQ_101;
Error: PARSE ERROR: From line 1, column 22 to line 1, column 37: Argument to 
function 'NTILE' must not be NULL


[Error Id: e5e69582-8502-4a99-8ba1-dffdfb8ac028 on centos-04.qa.lab:31010] 
(state=,code=0)
{code}

{code}
0: jdbc:drill:schema=dfs.tmp> select col7 , col0 , lead(null) over(partition by 
col7 order by col0) lead_col0 from FEWRWSPQQ_101;
Error: PARSE ERROR: From line 1, column 27 to line 1, column 30: Illegal use of 
'NULL'


[Error Id: 6824ca01-e3f1-4338-b4c8-5535e7a42e13 on centos-04.qa.lab:31010] 
(state=,code=0)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] drill pull request: DRILL-3786: Query with window function fails w...

2015-11-04 Thread adeneche
GitHub user adeneche opened a pull request:

https://github.com/apache/drill/pull/239

DRILL-3786: Query with window function fails with IllegalFormatConver…

…sionException

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/adeneche/incubator-drill DRILL-3786

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/239.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #239


commit c166a4eb977537fd377bfc5a0c81617df9bc739c
Author: adeneche 
Date:   2015-09-21T21:32:25Z

DRILL-3786: Query with window function fails with 
IllegalFormatConversionException




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3786: Query with window function fails w...

2015-11-13 Thread parthchandra
Github user parthchandra commented on the pull request:

https://github.com/apache/drill/pull/239#issuecomment-156506328
  
+1. Looks good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request: DRILL-3786: Query with window function fails w...

2015-11-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/drill/pull/239


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (DRILL-4148) NullPointerException in planning when running window function query

2016-02-01 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-4148.
--
Resolution: Cannot Reproduce

> NullPointerException in planning when running window function query
> ---
>
> Key: DRILL-4148
> URL: https://issues.apache.org/jira/browse/DRILL-4148
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.4.0
> Environment: 4 node cluster CentOS
>Reporter: Khurram Faraaz
>Assignee: Sean Hsuan-Yi Chu
>
> Failing test case : Functional/window_functions/lag_func/lag_Fn_4.q
> Tests were run on JDK8 with assertions enabled.
> {noformat}
> [root@centos-01 lag_func]# java -version
> openjdk version "1.8.0_65"
> OpenJDK Runtime Environment (build 1.8.0_65-b17)
> OpenJDK 64-Bit Server VM (build 25.65-b01, mixed mode)
> [root@centos-01 lag_func]# uname -a
> Linux centos-01 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 
> x86_64 x86_64 x86_64 GNU/Linux
> {noformat}
> Stack trace from drillbit.log
> {noformat}
> 2015-12-01 03:31:50,133 [29a2eb59-d5a3-f0f2-d2df-9ad4e5d17109:foreman] ERROR 
> o.a.drill.exec.work.foreman.Foreman - SYSTEM ERROR: NullPointerException
> [Error Id: ab046c45-4a2d-428d-8c72-592a02ea53e5 on centos-02.qa.lab:31010]
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> NullPointerException
> [Error Id: ab046c45-4a2d-428d-8c72-592a02ea53e5 on centos-02.qa.lab:31010]
> at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:534)
>  ~[drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:742)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:841)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:786)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
> [drill-common-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:788)
>  [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:894) 
> [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:255) 
> [drill-java-exec-1.4.0-SNAPSHOT.jar:1.4.0-SNAPSHOT]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_65]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_65]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_65]
> Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
> exception during fragment initialization: null
> ... 4 common frames omitted
> Caused by: java.lang.NullPointerException: null
> at 
> com.google.common.base.Preconditions.checkNotNull(Preconditions.java:191) 
> ~[guava-14.0.1.jar:na]
> at 
> org.apache.calcite.rel.metadata.CachingRelMetadataProvider$CachingInvocationHandler.(CachingRelMetadataProvider.java:104)
>  ~[calcite-core-1.4.0-drill-r9.jar:1.4.0-drill-r9]
> at 
> org.apache.calcite.rel.metadata.CachingRelMetadataProvider$2.apply(CachingRelMetadataProvider.java:78)
>  ~[calcite-core-1.4.0-drill-r9.jar:1.4.0-drill-r9]
> at 
> org.apache.calcite.rel.metadata.CachingRelMetadataProvider$2.apply(CachingRelMetadataProvider.java:75)
>  ~[calcite-core-1.4.0-drill-r9.jar:1.4.0-drill-r9]
> at 
> org.apache.calcite.plan.hep.HepRelMetadataProvider$1.apply(HepRelMetadataProvider.java:45)
>  ~[calcite-core-1.4.0-drill-r9.jar:1.4.0-drill-r9]
> at 
> org.apache.calcite.plan.hep.HepRelMetadataProvider$1.apply(HepRelMetadataProvider.java:34)
>  ~[calcite-core-1.4.0-drill-r9.jar:1.4.0-drill-r9]
> at 
> org.apache.calcite.rel.metadata.CachingRelMetadataProvider$2.apply(CachingRelMetadataProvider.java:77)
>  ~[calcite-core-1.4.0-drill-r9.jar:1.4.0-drill-r9]
> at 
> org.apache.calcite.rel.metadata.CachingRelMetadataProvider$2.apply(CachingRelMetadataProvider.java:75)
>  ~[calcite-core-1.4.0-drill-r9.jar:1.4.0-drill-r9]
> 

[jira] [Created] (DRILL-3182) Window function with DISTINCT qualifier returns seemingly incorrect result

2015-05-26 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-3182:
---

 Summary: Window function with DISTINCT qualifier returns seemingly 
incorrect result
 Key: DRILL-3182
 URL: https://issues.apache.org/jira/browse/DRILL-3182
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Daniel Barclay (Drill)


Both count(distinct ) and count(all ) return the same result. 
It does not look correct to me and I'm not sure what the correct behavior is 
going to be.

(1) Latest postgres does not support distinct with Window functions:

postgres=# select a2, count(distinct b2) over(partition by a2) from t2;
ERROR:  DISTINCT is not implemented for window functions
LINE 1: select a2, count(distinct b2) over(partition by a2) from t2;
   ^
postgres=# select a2, avg(distinct a2) over(partition by a2) from t2;
ERROR:  DISTINCT is not implemented for window functions
LINE 1: select a2, avg(distinct a2) over(partition by a2) from t2;

(2) Calcite does not support this either:  
https://github.com/apache/incubator-calcite/blob/master/doc/reference.md

Do we support it ? If not,  I think we should throw an error ...

{code}
0: jdbc:drill:schema=dfs> select * from t2;
+-++-+
| a2  |   b2   | c2  |
+-++-+
| 0   | zzz| 2014-12-31  |
| 1   | a  | 2015-01-01  |
| 2   | b  | 2015-01-02  |
| 2   | b  | 2015-01-02  |
| 2   | b  | 2015-01-02  |
| 3   | c  | 2015-01-03  |
| 4   | d  | 2015-01-04  |
| 5   | e  | 2015-01-05  |
| 6   | f  | 2015-01-06  |
| 7   | g  | 2015-01-07  |
| 7   | g  | 2015-01-07  |
| 8   | h  | 2015-01-08  |
| 9   | i  | 2015-01-09  |
+-++-+
13 rows selected (0.134 seconds)

0: jdbc:drill:schema=dfs> select a2, count(distinct b2) over(partition by a2) 
from t2;
+-+-+
| a2  | EXPR$1  |
+-+-+
| 0   | 1   |
| 1   | 1   |
| 2   | 3   |
| 2   | 3   |
| 2   | 3   |
| 3   | 1   |
| 4   | 1   |
| 5   | 1   |
| 6   | 1   |
| 7   | 2   |
| 7   | 2   |
| 8   | 1   |
| 9   | 1   |
+-+-+
13 rows selected (0.224 seconds)

0: jdbc:drill:schema=dfs> select a2, count(b2) over(partition by a2) from t2;
+-+-+
| a2  | EXPR$1  |
+-+-+
| 0   | 1   |
| 1   | 1   |
| 2   | 3   |
| 2   | 3   |
| 2   | 3   |
| 3   | 1   |
| 4   | 1   |
| 5   | 1   |
| 6   | 1   |
| 7   | 2   |
| 7   | 2   |
| 8   | 1   |
| 9   | 1   |
+-+-+
13 rows selected (0.219 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3189) Disable ALLOW PARTIAL/DISALLOW PARTIAL in window function grammar

2015-05-26 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-3189:
---

 Summary: Disable ALLOW PARTIAL/DISALLOW PARTIAL in window function 
grammar
 Key: DRILL-3189
 URL: https://issues.apache.org/jira/browse/DRILL-3189
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
Priority: Critical


It does not seem to be implemented on the drill side. Looks like Calcite 
specific grammar. Don't see it SQL Standard.

Looks like wrong result:
{code}
0: jdbc:drill:schema=dfs> select a2, sum(a2) over(partition by a2 order by a2 
rows between 1 preceding and 1 following disallow partial) from t2 order by a2;
+-+-+
| a2  | EXPR$1  |
+-+-+
| 0   | null|
| 1   | null|
| 2   | 6   |
| 2   | 6   |
| 2   | 6   |
| 3   | null|
| 4   | null|
| 5   | null|
| 6   | null|
| 7   | 14  |
| 7   | 14  |
| 8   | null|
| 9   | null|
+-+-+
13 rows selected (0.213 seconds)
{code}

{code}
0: jdbc:drill:schema=dfs> select a2, sum(a2) over(partition by a2 order by a2 
rows between 1 preceding and 1 following allow partial) from t2 order by a2;
+-+-+
| a2  | EXPR$1  |
+-+-+
| 0   | 0   |
| 1   | 1   |
| 2   | 6   |
| 2   | 6   |
| 2   | 6   |
| 3   | 3   |
| 4   | 4   |
| 5   | 5   |
| 6   | 6   |
| 7   | 14  |
| 7   | 14  |
| 8   | 8   |
| 9   | 9   |
+-+-+
13 rows selected (0.208 seconds)
{code}

{code}
0: jdbc:drill:schema=dfs> select a2, sum(a2) over(partition by a2 order by a2 
disallow partial) from t2 order by a2;
Error: PARSE ERROR: From line 1, column 53 to line 1, column 68: Cannot use 
DISALLOW PARTIAL with window based on RANGE
[Error Id: 984c4b81-9eb0-401d-b36a-9580640b4a78 on atsqa4-133.qa.lab:31010] 
(state=,code=0)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3219) Filter is not pushed into subquery with window function

2015-05-29 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-3219:
---

 Summary: Filter is not pushed into subquery  with window function
 Key: DRILL-3219
 URL: https://issues.apache.org/jira/browse/DRILL-3219
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni


{code}
0: jdbc:drill:schema=dfs> explain plan for select * from ( select a1, b1, c1, 
sum(a1) over(partition by b1) from t1 ) where c1 is not null;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(a1=[$0], b1=[$1], c1=[$2], EXPR$3=[$3])
00-02Project(a1=[$0], b1=[$1], c1=[$2], EXPR$3=[CASE(>($3, 0), 
CAST($4):ANY, null)])
00-03  SelectionVectorRemover
00-04Filter(condition=[IS NOT NULL($2)])
00-05  Window(window#0=[window(partition {1} order by [] range 
between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [COUNT($0), 
$SUM0($0)])])
00-06SelectionVectorRemover
00-07  Sort(sort0=[$1], dir0=[ASC])
00-08Project(a1=[$2], b1=[$1], c1=[$0])
00-09  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/t1]], 
selectionRoot=/drill/testdata/subqueries/t1, numFiles=1, columns=[`a1`, `b1`, 
`c1`]]])
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Review Request 35960: DRILL-3307: Query with window function runs out of memory

2015-06-26 Thread abdelhakim deneche

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35960/
---

Review request for drill and Steven Phillips.


Bugs: DRILL-3307
https://issues.apache.org/jira/browse/DRILL-3307


Repository: drill-git


Description
---

Fixed sort to only use copier allocator when spilling to disk


Diffs
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 02a1c08 

Diff: https://reviews.apache.org/r/35960/diff/


Testing
---

ongoing...


Thanks,

abdelhakim deneche



Re: Review Request 36278: DRILL-3189: Disable DISALLOW PARTIAL in window function grammar

2015-07-09 Thread Sean Hsuan-Yi Chu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36278/
---

(Updated July 9, 2015, 5:33 p.m.)


Review request for drill, Aman Sinha and Jinfeng Ni.


Changes
---

Addressed the comment


Bugs: DRILL-3189
https://issues.apache.org/jira/browse/DRILL-3189


Repository: drill-git


Description
---

Disable disallow partial in Over-Clause


Diffs (updated)
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/UnsupportedOperatorsVisitor.java
 9bbd537 
  exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 
7071bea 

Diff: https://reviews.apache.org/r/36278/diff/


Testing
---

All requested


Thanks,

Sean Hsuan-Yi Chu



Re: Review Request 36278: DRILL-3189: Disable DISALLOW PARTIAL in window function grammar

2015-07-10 Thread Sean Hsuan-Yi Chu

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36278/
---

(Updated July 10, 2015, 11:33 p.m.)


Review request for drill, Aman Sinha and Jinfeng Ni.


Changes
---

new patch


Bugs: DRILL-3189
https://issues.apache.org/jira/browse/DRILL-3189


Repository: drill-git


Description
---

Disable disallow partial in Over-Clause


Diffs (updated)
-

  
exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/UnsupportedOperatorsVisitor.java
 9bbd537 
  exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 
7071bea 

Diff: https://reviews.apache.org/r/36278/diff/


Testing
---

All requested


Thanks,

Sean Hsuan-Yi Chu



Re: Review Request 36278: DRILL-3189: Disable DISALLOW PARTIAL in window function grammar

2015-07-10 Thread Aman Sinha

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36278/#review91375
---

Ship it!


Ship It!

- Aman Sinha


On July 10, 2015, 11:33 p.m., Sean Hsuan-Yi Chu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36278/
> ---
> 
> (Updated July 10, 2015, 11:33 p.m.)
> 
> 
> Review request for drill, Aman Sinha and Jinfeng Ni.
> 
> 
> Bugs: DRILL-3189
> https://issues.apache.org/jira/browse/DRILL-3189
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> Disable disallow partial in Over-Clause
> 
> 
> Diffs
> -
> 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/planner/sql/parser/UnsupportedOperatorsVisitor.java
>  9bbd537 
>   exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 
> 7071bea 
> 
> Diff: https://reviews.apache.org/r/36278/diff/
> 
> 
> Testing
> ---
> 
> All requested
> 
> 
> Thanks,
> 
> Sean Hsuan-Yi Chu
> 
>



[jira] [Created] (DRILL-5916) Drill document window function example on LAST_VALUE is incorrect

2017-10-31 Thread Raymond Wong (JIRA)
Raymond Wong created DRILL-5916:
---

 Summary: Drill document window function example on LAST_VALUE is 
incorrect
 Key: DRILL-5916
 URL: https://issues.apache.org/jira/browse/DRILL-5916
 Project: Apache Drill
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.11.0
Reporter: Raymond Wong
Priority: Minor


The top and bottom review count example query result is showing incorrect 
values for the LAST_VALUE column. 
([https://drill.apache.org/docs/analyzing-data-using-window-functions/] )

The LAST_VALUE column should have the same value as the review count of each 
row because the default Window Frame is RANGE BETWEEN UNBOUNDED PRECEDING AND 
*CURRENT ROW*.

Query result using 2017 yelp data set.

{quote}
SELECT name, city, review_count,
  FIRST_VALUE(review_count)
OVER(PARTITION BY city ORDER BY review_count DESC) AS top_review_count,
  LAST_VALUE(review_count)
OVER(PARTITION BY city ORDER BY review_count DESC) AS bottom_review_count
FROM dfs.yelp.`yelp_academic_dataset_business.json`
LIMIT 30

||name  ||city   
||review_count ||top_review_count||bottom_review_count ||
|Lululemon Athletica   ||5  
  |5|5   |
|Aberdour Castle   |Aberdour|4  
  |4|4   |
|Cupz N' Crepes|Ahwatukee   |236
  |236  |236 |
|My Wine Cellar|Ahwatukee   |158
  |236  |158 |
|Florencia Pizza Bistro|Ahwatukee   |129
  |236  |129 |
|Barro's Pizza |Ahwatukee   |62 
  |236  |62  |
|Kathy's Alterations   |Ahwatukee   |30 
  |236  |30  |
|Hertz Rent A Car  |Ahwatukee   |26 
  |236  |26  |
|Active Kids Pediatrics|Ahwatukee   |18 
  |236  |18  |
|Dental by Design  |Ahwatukee   |18 
  |236  |18  |
|Desert Dog Pet Care   |Ahwatukee   |10 
  |236  |10  |
|McDonald's|Ahwatukee   |7  
  |236  |7   |
|U-Haul|Ahwatukee   |6  
  |236  |6   |
|Sprinkler Detective   |Ahwatukee   |5  
  |236  |5   |
|Hi-Health |Ahwatukee   |4  
  |236  |4   |
|Healthy and Clean Living Environments |Ahwatukee   |4  
  |236  |4   |
|Designs By Christa|Ahwatukee   |4  
  |236  |4   |
{quote}

Changing the LAST_VAULE's Window Frame to RANGE BETWEEN UNBOUNDED PRECEDING AND 
*UNBOUNDED FOLLOWING*.

{quote}
SELECT name, city, review_count,
  FIRST_VALUE(review_count)
OVER(PARTITION BY city ORDER BY review_count DESC) AS top_review_count,
  LAST_VALUE(review_count)
OVER(PARTITION BY city ORDER BY review_count DESC RANGE BETWEEN UNBOUNDED 
PRECEDING AND UNBOUNDED FOLLOWING) AS bottom_review_count
FROM dfs.yelp.`yelp_academic_dataset_business.json`
LIMIT 30
;

||name  ||city
||review_count ||top_review_count ||bottom_review_count ||
|Lululemon Athletica   ||5  
  |5|5   |
|Aberdour Castle   |Aberdour|4  
  |4|4   |
|Cupz N' Crepes|Ahwatukee   |236
  |236  |4   |
|My Wine Cellar|Ahwatukee   |158
  |236  |4   |
|Florencia Pizza Bistro|Ahwatukee   |129
  |236  |4   |
|Barro's Pizza |Ahwatukee   |62 
  |236  |4   |
|Kathy's Alterations   |Ahwatukee   |30 
  |236  |4   |
|Hertz Rent A Car  |Ahwatukee  

[jira] [Created] (DRILL-4453) Difference in results over char data, window function query

2016-02-29 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4453:
-

 Summary: Difference in results over char data, window function 
query
 Key: DRILL-4453
 URL: https://issues.apache.org/jira/browse/DRILL-4453
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.6.0
 Environment: 4 node cluster
Reporter: Khurram Faraaz


Window function query with frame clause returns results that are different from 
those returned by same query on Postgres 9.3 of same data.

The difference is that there are two extra nulls returned by Drill where as 
Postgres does not. Note that the two tables have same number of nulls in both 
Drill and Postgres.

{noformat}
postgres=# \d t_alltype
 Table "public.t_alltype"
 Column |Type | Modifiers
+-+---
 c1 | integer |
 c2 | integer |
 c3 | bigint  |
 c4 | character(256)  |
 c5 | character varying(256)  |
 c6 | timestamp without time zone |
 c7 | date|
 c8 | boolean |
 c9 | double precision|
postgres=# select c4 from t_alltype where c4 is null;
 c4




(3 rows)

{noformat}
postgres=# SELECT MIN(c4) OVER(PARTITION BY c8 ORDER BY c1 ROWS BETWEEN 
UNBOUNDED PRECEDING AND CURRENT ROW) FROM t_alltype;

   min
--
 gwfrW
 ZAFOcferhjkcl
 ZAFOcferhjkcl
 ZAFOcferhjkcl
 ZAFOcferhjkcl
 ...
 ...
 
 ApKK
 ApKK















(145 rows)
{noformat}

Parquet schema details

{noformat}
[root@centos-01 parquet-tools]# ./parquet-schema 
./Datasources/window_functions/t_alltype.parquet
message root {
  optional int32 c1;
  optional int32 c2;
  optional int64 c3;
  optional binary c4 (UTF8);
  optional binary c5 (UTF8);
  optional int64 c6 (TIMESTAMP_MILLIS);
  optional int32 c7 (DATE);
  optional boolean c8;
  optional double c9;
}
{noformat}

On Drill 1.6.0 

{noformat}
0: jdbc:drill:schema=dfs.tmp> SELECT MIN(c4) OVER(PARTITION BY c8 ORDER BY c1 
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM dfs.tmp.`t_alltype`;
++
| EXPR$0 |
++
| gwfrW  |
| ZAFOcferhjkcl  |
| ZAFOcferhjkcl  |
| ZAFOcferhjkcl  |
| ZAFOcferhjkcl  |
...
...
| ApKK |
| ApKK |
|  |
|  |
|  |
|  |
|  |
|  |
|  |
|  |
|  |
|  |
| null |
| null |
|  |
|  |
|  |
+--+
145 rows selected (0.409 seconds)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3280) Missing OVER clause in window function query results in AssertionError

2015-06-11 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-3280:
-

 Summary: Missing OVER clause in window function query results in 
AssertionError
 Key: DRILL-3280
 URL: https://issues.apache.org/jira/browse/DRILL-3280
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.0.0
Reporter: Khurram Faraaz
Assignee: Chris Westin


Missing OVER clause results in AssertionError.
Instead, we will need an error message that said, "window function call 
requires an OVER clause"

{code}
0: jdbc:drill:schema=dfs.tmp> select rank(), cume_dist() over w from 
`allDataInPrq/0_0_0.parquet` window w as (partition by col_chr order by 
col_dbl);
Error: SYSTEM ERROR: org.apache.drill.exec.work.foreman.ForemanException: 
Unexpected exception during fragment initialization: null


[Error Id: f8675256-eea9-4ca6-859c-4c0b714f27a0 on centos-02.qa.lab:31010] 
(state=,code=0)
{code}

Stack trace from drillbit.log

{code}
2015-06-11 20:50:42,054 [2a860b5d-dd87-087f-3730-bf47a10f5d97:foreman] ERROR 
o.a.d.c.exceptions.UserException - SYSTEM ERROR: 
org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
during fragment initialization: null


[Error Id: f8675256-eea9-4ca6-859c-4c0b714f27a0 on centos-02.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception 
during fragment initialization: null


[Error Id: f8675256-eea9-4ca6-859c-4c0b714f27a0 on centos-02.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:522)
 ~[drill-common-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$ForemanResult.close(Foreman.java:738)
 [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:840)
 [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.processEvent(Foreman.java:782)
 [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.common.EventProcessor.sendEvent(EventProcessor.java:73) 
[drill-common-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman$StateSwitch.moveToState(Foreman.java:784)
 [drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
org.apache.drill.exec.work.foreman.Foreman.moveToState(Foreman.java:893) 
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:253) 
[drill-java-exec-1.1.0-SNAPSHOT-rebuffed.jar:1.1.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
[na:1.7.0_45]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_45]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]
Caused by: org.apache.drill.exec.work.foreman.ForemanException: Unexpected 
exception during fragment initialization: null
... 4 common frames omitted
Caused by: java.lang.AssertionError: null
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.getRootField(SqlToRelConverter.java:3810)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.adjustInputRef(SqlToRelConverter.java:3139)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3114)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.access$1400(SqlToRelConverter.java:180)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4061)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:3489)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:274) 
~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:3944)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertSortExpression(SqlToRelConverter.java:3962)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertOver(SqlToRelConverter.java:1756)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.access$1000(SqlToRelConverter.java:180)
 ~[calcite-core-1.1.0

[jira] [Created] (DRILL-3294) False schema change exception in CTAS with AVG window function

2015-06-15 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-3294:
---

 Summary: False schema change exception in CTAS with AVG window 
function
 Key: DRILL-3294
 URL: https://issues.apache.org/jira/browse/DRILL-3294
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Chris Westin


This bug could be related to DRILL-3293, but since it's a different function 
and different symptom, I'm filing a new one.

{code}
0: jdbc:drill:schema=dfs> create table wf_t1(a1) as select avg(a1) 
over(partition by a1) from t1;
Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: 
Failure while trying to materialize incoming schema.  Errors:
 
Error in expression at index -1.  Error: Missing function implementation: 
[castTINYINT(NULL-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..

Fragment 0:0

[Error Id: 1ca5af3a-0ea7-4b75-b493-74f6404d4894 on atsqa4-133.qa.lab:31010] 
(state=,code=0)
{code}

Query works correctly by itself:
{code}
0: jdbc:drill:schema=dfs> select avg(a1) over(partition by a1) from t1;
+-+
| EXPR$0  |
+-+
| 1   |
| 2   |
| 3   |
| 4   |
| 5   |
| 6   |
| 7   |
| 9   |
| 10  |
| null|
+-+
10 rows selected (0.181 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3294) False schema change exception in CTAS with AVG window function

2015-06-22 Thread Jinfeng Ni (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jinfeng Ni resolved DRILL-3294.
---
Resolution: Fixed

Fixed in commit :  ffae1691c0cd526ed1095fbabbc0855d016790d7. 



> False schema change exception in CTAS with AVG window function
> --
>
> Key: DRILL-3294
> URL: https://issues.apache.org/jira/browse/DRILL-3294
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.0.0
>Reporter: Victoria Markman
>Assignee: Jinfeng Ni
>  Labels: window_function
> Fix For: 1.1.0
>
> Attachments: t1_parquet
>
>
> This bug could be related to DRILL-3293, but since it's a different function 
> and different symptom, I'm filing a new one.
> {code}
> 0: jdbc:drill:schema=dfs> create table wf_t1(a1) as select avg(a1) 
> over(partition by a1) from t1;
> Error: SYSTEM ERROR: org.apache.drill.exec.exception.SchemaChangeException: 
> Failure while trying to materialize incoming schema.  Errors:
>  
> Error in expression at index -1.  Error: Missing function implementation: 
> [castTINYINT(NULL-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: 1ca5af3a-0ea7-4b75-b493-74f6404d4894 on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> Query works correctly by itself:
> {code}
> 0: jdbc:drill:schema=dfs> select avg(a1) over(partition by a1) from t1;
> +-+
> | EXPR$0  |
> +-+
> | 1   |
> | 2   |
> | 3   |
> | 4   |
> | 5   |
> | 6   |
> | 7   |
> | 9   |
> | 10  |
> | null|
> +-+
> 10 rows selected (0.181 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3352) Extra re-distribution when evaluating window function after GROUP BY

2015-06-23 Thread Aman Sinha (JIRA)
Aman Sinha created DRILL-3352:
-

 Summary: Extra re-distribution when evaluating window function 
after GROUP BY
 Key: DRILL-3352
 URL: https://issues.apache.org/jira/browse/DRILL-3352
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.0.0
Reporter: Aman Sinha
Assignee: Aman Sinha


Consider the following query and plan: 
{code}
explain plan for select min(l_partkey) over (partition by l_suppkey) from 
lineitem group by l_partkey, l_suppkey limit 1;

00-00Screen
00-01  Project(EXPR$0=[$0])
00-02SelectionVectorRemover
00-03  Limit(fetch=[1])
00-04UnionExchange
01-01  Project($0=[$3])
01-02Window(window#0=[window(partition {1} order by [] range 
between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [MIN($0)])])
01-03  SelectionVectorRemover
01-04Sort(sort0=[$1], dir0=[ASC])
01-05  Project(l_partkey=[$0], l_suppkey=[$1], $f2=[$2])
01-06HashToRandomExchange(dist0=[[$1]])
02-01  UnorderedMuxExchange
03-01Project(l_partkey=[$0], l_suppkey=[$1], 
$f2=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1))])
03-02  HashAgg(group=[{0, 1}], agg#0=[MIN($2)])
03-03Project(l_partkey=[$0], l_suppkey=[$1], 
$f2=[$2])
03-04  HashToRandomExchange(dist0=[[$0]], 
dist1=[[$1]])
04-01UnorderedMuxExchange
05-01  Project(l_partkey=[$0], 
l_suppkey=[$1], $f2=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1, 
hash64AsDouble($0)))])
05-02HashAgg(group=[{0, 1}], 
agg#0=[MIN($0)])
05-03  Project(l_partkey=[$1], 
l_suppkey=[$0])
05-04
Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/Users/asinha/data/tpch-sf1/lineitem]], 
selectionRoot=/Users/asinha/data/tpch-sf1/lineitem, numFiles=1, 
columns=[`l_partkey`, `l_suppkey`]]])
{code}

Here, we do a distribution for the HashAgg on 2 columns: {l_partkey, 
l_suppkey}.  Subsequently, we re-distribute on {l_suppkey} only since the 
window function has a partition-by l_suppkey.  The second re-distribute could 
be avoided if the first distribution for the HashAgg was done on l_suppkey 
only.   The reason we do distribution on all grouping columns is to avoid skew 
problems.   However, in many cases especially when a window function is 
involved, it may make sense to only distribute on 1 column. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3404) Filter on window function does not appear in query plan

2015-06-26 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-3404:
-

 Summary: Filter on window function does not appear in query plan
 Key: DRILL-3404
 URL: https://issues.apache.org/jira/browse/DRILL-3404
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.1.0
 Environment: 4 node cluster on CentOS
Reporter: Khurram Faraaz
Assignee: Jinfeng Ni
Priority: Critical


Filter is missing in the query plan for the below query in Drill, and hence 
wrong results are returned.

Results from Drill
{code}
0: jdbc:drill:schema=dfs.tmp> select c1, c2, w_sum from ( select c1, c2, sum ( 
c1 ) over ( partition by c2 order by c1 asc nulls first ) w_sum from 
`tblWnulls` ) sub_query where w_sum is not null;
+-+---+-+
| c1  |  c2   |w_sum|
+-+---+-+
| 0   | a | 0   |
| 1   | a | 1   |
| 5   | a | 6   |
| 10  | a | 16  |
| 11  | a | 27  |
| 14  | a | 41  |
| 1   | a | 11152   |
| 2   | b | 2   |
| 9   | b | 11  |
| 13  | b | 24  |
| 17  | b | 41  |
| null| c | null|
| 4   | c | 4   |
| 6   | c | 10  |
| 8   | c | 18  |
| 12  | c | 30  |
| 13  | c | 56  |
| 13  | c | 56  |
| null| d | null|
| null| d | null|
| 10  | d | 10  |
| 11  | d | 21  |
| 2147483647  | d | 4294967315  |
| 2147483647  | d | 4294967315  |
| -1  | e | -1  |
| 15  | e | 14  |
| null| null  | null|
| 19  | null  | 19  |
| 65536   | null  | 6   |
| 100 | null  | 106 |
+-+---+-+
30 rows selected (0.337 seconds)
{code}

Explain plan for the above query from Drill
{code}
0: jdbc:drill:schema=dfs.tmp> explain plan for select c1, c2, w_sum from ( 
select c1, c2, sum ( c1 ) over ( partition by c2 order by c1 asc nulls first ) 
w_sum from `tblWnulls` ) sub_query where w_sum is not null;
+--+---+
|   



text



   | json  |
+--+---+
| 00-00Screen
00-01  Project(c1=[$0], c2=[$1], w_sum=[$2])
00-02Project(c1=[$0], c2=[$1], w_sum=[CASE(>($2, 0), $3, null)])
00-03  Window(window#0=[window(partition {1} order by [0 
ASC-nulls-first] range between UNBOUNDED PRECEDING and CURRENT ROW aggs 
[COUNT($0), $SUM0($0)])])
00-04SelectionVectorRemover
00-05  Sort(sort0=[$1], sort1=[$0], dir0=[ASC], 
dir1=[ASC-nulls-first])
00-06Project(c1=[$1], c2=[$0])
00-07 

Re: Review Request 35960: DRILL-3307: Query with window function runs out of memory

2015-06-26 Thread Steven Phillips

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35960/#review89600
---

Ship it!


Ship It!

- Steven Phillips


On June 27, 2015, 1:12 a.m., abdelhakim deneche wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35960/
> ---
> 
> (Updated June 27, 2015, 1:12 a.m.)
> 
> 
> Review request for drill and Steven Phillips.
> 
> 
> Bugs: DRILL-3307
> https://issues.apache.org/jira/browse/DRILL-3307
> 
> 
> Repository: drill-git
> 
> 
> Description
> ---
> 
> Fixed sort to only use copier allocator when spilling to disk
> 
> 
> Diffs
> -
> 
>   
> exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
>  02a1c08 
> 
> Diff: https://reviews.apache.org/r/35960/diff/
> 
> 
> Testing
> ---
> 
> ongoing...
> 
> 
> Thanks,
> 
> abdelhakim deneche
> 
>



[jira] [Created] (DRILL-3580) wrong plan for window function queries containing function(col1 + colb)

2015-07-30 Thread Deneche A. Hakim (JIRA)
Deneche A. Hakim created DRILL-3580:
---

 Summary: wrong plan for window function queries containing 
function(col1 + colb)
 Key: DRILL-3580
 URL: https://issues.apache.org/jira/browse/DRILL-3580
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.1.0
Reporter: Deneche A. Hakim
Assignee: Jinfeng Ni
Priority: Critical


The following query has a wrong plan:
{noformat}
explain plan for select position_id, salary, sum(salary) over (partition by 
position_id), sum(position_id + salary) over (partition by position_id) from 
cp.`employee.json` limit 20;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  ProjectAllowDup(position_id=[$0], salary=[$1], EXPR$2=[$2], 
EXPR$3=[$3])
00-02SelectionVectorRemover
00-03  Limit(fetch=[20])
00-04Project(position_id=[$0], salary=[$1], w0$o0=[$2], w0$o00=[$4])
00-05  Window(window#0=[window(partition {0} order by [] range 
between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($3)])])
00-06Project(position_id=[$1], salary=[$2], w0$o0=[$3], 
$3=[+($1, $2)])
00-07  Window(window#0=[window(partition {1} order by [] range 
between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($2)])])
00-08SelectionVectorRemover
00-09  Sort(sort0=[$1], dir0=[ASC])
00-10Project(T13¦¦*=[$0], position_id=[$1], salary=[$2])
00-11  Scan(groupscan=[EasyGroupScan 
[selectionRoot=classpath:/employee.json, numFiles=1, columns=[`*`], 
files=[classpath:/employee.json]]])
{noformat}

The plan contains 2 window operators which shouldn't be possible according to 
DRILL-3196. 

The results are also incorrect.

Depending on which aggregation or window function used we get wrong results or 
an IndexOutOfBounds exception



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3595) Wrong results returned by query that uses LEAD window function

2015-08-03 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-3595:
-

 Summary: Wrong results returned by query that uses LEAD window 
function
 Key: DRILL-3595
 URL: https://issues.apache.org/jira/browse/DRILL-3595
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Codegen, Execution - Flow
Affects Versions: 1.2.0
 Environment: private-branch-new-window-funcs
Reporter: Khurram Faraaz
Assignee: Chris Westin


Query that uses LEAD window function returns wrong results on developer's 
private branch (new-window-funcs).

Results returned by Drill

{code}
0: jdbc:drill:schema=dfs.tmp> select lead(c1) over w from union_01 window w as 
(partition by c3 order by c1) limit 10;
+-+
| EXPR$0  |
+-+
| 878 |
| -150|
| 402 |
| 402 |
| 402 |
| 402 |
| 402 |
| 160 |
| 160 |
| 160 |
+-+
10 rows selected (0.349 seconds)
{code}

Results returned by Postgres

{code}
postgres=# select lead(c1) over w from union_01 window w as (partition by c3 
order by c1) limit 10;
 lead 
--
 
 
  402
  402
  402
  402
  402
 
  160
  160
(10 rows)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4320) Difference in query plan on JDK8 for window function query

2016-01-27 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4320:
-

 Summary: Difference in query plan on JDK8 for window function query
 Key: DRILL-4320
 URL: https://issues.apache.org/jira/browse/DRILL-4320
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.4.0
 Environment: 4 node cluster CentOS
Reporter: Khurram Faraaz


Difference in query plan seen in window function query on JDK8 with below test 
environment, the difference being that a Project is missing after the initial 
Scan, the new plan looks more optimized. Should we update the expected query 
plan or further investigation is required ?

Java 8
MapR Drill 1.4.0 GA
JDK8
MapR FS 5.0.0 GA

Functional/window_functions/optimization/plan/pp_03.sql

{noformat}
Actual plan 

00-00Screen
00-01  Project(EXPR$0=[$0])
00-02Project($0=[$2])
00-03  Window(window#0=[window(partition {1} order by [] range between 
UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($0)])])
00-04SelectionVectorRemover
00-05  Sort(sort0=[$1], dir0=[ASC])
00-06Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///drill/testdata/subqueries/t1]], 
selectionRoot=maprfs:/drill/testdata/subqueries/t1, numFiles=1, 
usedMetadataFile=false, columns=[`a1`, `c1`]]])

Expected plan 

 Screen
 .*Project.*
   .*Project.*
 .*Window.*range between UNBOUNDED PRECEDING and UNBOUNDED 
FOLLOWING aggs.*
   .*SelectionVectorRemover.*
 .*Sort.*
   .*Project.*
 .*Scan.*
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-4457) Difference in results returned by window function over BIGINT data

2016-03-01 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-4457:
-

 Summary: Difference in results returned by window function over 
BIGINT data
 Key: DRILL-4457
 URL: https://issues.apache.org/jira/browse/DRILL-4457
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.6.0
 Environment: 4 node cluster
Reporter: Khurram Faraaz


Difference in results returned by window function query over same data on Drill 
vs on Postgres.
Drill 1.6.0 commit ID 6d5f4983

{noformat}
Verification Failures:
/root/public_framework/drill-test-framework/framework/resources/Functional/window_functions/frameclause/RBCRACR/RBCRACR_bgint_6.q
Query:
SELECT FIRST_VALUE(c3) OVER(PARTITION BY c8 ORDER BY c1 RANGE BETWEEN CURRENT 
ROW AND CURRENT ROW) FROM `t_alltype.parquet`
 Expected number of rows: 145
Actual number of rows from Drill: 145
 Number of matching rows: 143
  Number of rows missing: 2
   Number of rows unexpected: 2

These rows are not expected (first 10):
36022570792
21011901540311080

These rows are missing (first 10):
null (2 time(s))
{noformat}

Here is the difference in results, Drill 1.6.0 returns 36022570792 whereas 
Postgres returns null, and another difference is that Drill returns 
21011901540311080 whereas Postgres returns null.

{noformat}
[root@centos-01 drill-output]# diff -cb 
RBCRACR_RBCRACR_bgint_6.output_Tue_Mar_01_10\:36\:42_UTC_2016 
../resources/Functional/window_functions/frameclause/RBCRACR/RBCRACR_bgint_6.e
*** RBCRACR_RBCRACR_bgint_6.output_Tue_Mar_01_10:36:42_UTC_2016 2016-03-01 
10:36:43.012382649 +
--- 
../resources/Functional/window_functions/frameclause/RBCRACR/RBCRACR_bgint_6.e  
2016-03-01 10:32:56.605677914 +
***
*** 55,61 
  5424751352
  3734160392
  36022570792
! 36022570792
  584831936
  37102817894137256
  61958708627376736
--- 55,61 
  5424751352
  3734160392
  36022570792
! null
  584831936
  37102817894137256
  61958708627376736
***
*** 64,70 
  29537626363643852
  52598911986023288
  21011901540311080
! 21011901540311080
  17990322900862228
  61608051272
  3136812789494
--- 64,70 
  29537626363643852
  52598911986023288
  21011901540311080
! null
  17990322900862228
  61608051272
  3136812789494
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-4457) Difference in results returned by window function over BIGINT data

2016-03-07 Thread Deneche A. Hakim (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deneche A. Hakim resolved DRILL-4457.
-
Resolution: Fixed

Fixed in a2fec78695df979e240231cb9d32c7f18274a333

> Difference in results returned by window function over BIGINT data
> --
>
> Key: DRILL-4457
> URL: https://issues.apache.org/jira/browse/DRILL-4457
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Flow
>Affects Versions: 1.6.0
> Environment: 4 node cluster
>Reporter: Khurram Faraaz
>Assignee: Deneche A. Hakim
>  Labels: window_function
> Fix For: 1.6.0
>
>
> Difference in results returned by window function query over same data on 
> Drill vs on Postgres.
> Drill 1.6.0 commit ID 6d5f4983
> {noformat}
> Verification Failures:
> /root/public_framework/drill-test-framework/framework/resources/Functional/window_functions/frameclause/RBCRACR/RBCRACR_bgint_6.q
> Query:
> SELECT FIRST_VALUE(c3) OVER(PARTITION BY c8 ORDER BY c1 RANGE BETWEEN CURRENT 
> ROW AND CURRENT ROW) FROM `t_alltype.parquet`
>  Expected number of rows: 145
> Actual number of rows from Drill: 145
>  Number of matching rows: 143
>   Number of rows missing: 2
>Number of rows unexpected: 2
> These rows are not expected (first 10):
> 36022570792
> 21011901540311080
> These rows are missing (first 10):
> null (2 time(s))
> {noformat}
> Here is the difference in results, Drill 1.6.0 returns 36022570792 whereas 
> Postgres returns null, and another difference is that Drill returns 
> 21011901540311080 whereas Postgres returns null.
> {noformat}
> [root@centos-01 drill-output]# diff -cb 
> RBCRACR_RBCRACR_bgint_6.output_Tue_Mar_01_10\:36\:42_UTC_2016 
> ../resources/Functional/window_functions/frameclause/RBCRACR/RBCRACR_bgint_6.e
> *** RBCRACR_RBCRACR_bgint_6.output_Tue_Mar_01_10:36:42_UTC_2016   
> 2016-03-01 10:36:43.012382649 +
> --- 
> ../resources/Functional/window_functions/frameclause/RBCRACR/RBCRACR_bgint_6.e
> 2016-03-01 10:32:56.605677914 +
> ***
> *** 55,61 
>   5424751352
>   3734160392
>   36022570792
> ! 36022570792
>   584831936
>   37102817894137256
>   61958708627376736
> --- 55,61 
>   5424751352
>   3734160392
>   36022570792
> ! null
>   584831936
>   37102817894137256
>   61958708627376736
> ***
> *** 64,70 
>   29537626363643852
>   52598911986023288
>   21011901540311080
> ! 21011901540311080
>   17990322900862228
>   61608051272
>   3136812789494
> --- 64,70 
>   29537626363643852
>   52598911986023288
>   21011901540311080
> ! null
>   17990322900862228
>   61608051272
>   3136812789494
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-7034) Window function over a malformed CSV file crashes the JVM

2019-02-08 Thread Boaz Ben-Zvi (JIRA)
Boaz Ben-Zvi created DRILL-7034:
---

 Summary: Window function over a malformed CSV file crashes the JVM 
 Key: DRILL-7034
 URL: https://issues.apache.org/jira/browse/DRILL-7034
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.15.0
Reporter: Boaz Ben-Zvi


The JVM crashes executing window functions over (an ordered) CSV file with a 
small format issue - an empty line.

To create: Take the following simple `a.csvh` file:
{noformat}
amount
10
11
{noformat}

And execute a simple window function like
{code:sql}
select max(amount) over(order by amount) FROM dfs.`/data/a.csvh`;
{code}

Then add an empty line between the `10` and the `11`:
{noformat}
amount
10

11
{noformat}

 and try again:
{noformat}
0: jdbc:drill:zk=local> select max(amount) over(order by amount) FROM 
dfs.`/data/a.csvh`;
+-+
| EXPR$0  |
+-+
| 10  |
| 11  |
+-+
2 rows selected (3.554 seconds)
0: jdbc:drill:zk=local> select max(amount) over(order by amount) FROM 
dfs.`/data/a.csvh`;
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0001064aeae7, pid=23450, tid=0x6103
#
# JRE version: Java(TM) SE Runtime Environment (8.0_181-b13) (build 
1.8.0_181-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.181-b13 mixed mode bsd-amd64 
compressed oops)
# Problematic frame:
# J 6719% C2 
org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.memcmp(JIIJII)I (188 
bytes) @ 0x0001064aeae7 [0x0001064ae920+0x1c7]
#
# Core dump written. Default location: /cores/core or core.23450
#
# An error report file with more information is saved as:
# /Users/boazben-zvi/IdeaProjects/drill/hs_err_pid23450.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
#
Abort trap: 6 (core dumped)
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (DRILL-7034) Window function over a malformed CSV file crashes the JVM

2019-03-11 Thread Arina Ielchiieva (JIRA)


 [ 
https://issues.apache.org/jira/browse/DRILL-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arina Ielchiieva resolved DRILL-7034.
-
Resolution: Fixed

> Window function over a malformed CSV file crashes the JVM 
> --
>
> Key: DRILL-7034
> URL: https://issues.apache.org/jira/browse/DRILL-7034
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.15.0
>Reporter: Boaz Ben-Zvi
>Priority: Major
> Fix For: 1.16.0
>
> Attachments: hs_err_pid23450.log, janino8470007454663483217.java
>
>
> The JVM crashes executing window functions over (an ordered) CSV file with a 
> small format issue - an empty line.
> To create: Take the following simple `a.csvh` file:
> {noformat}
> amount
> 10
> 11
> {noformat}
> And execute a simple window function like
> {code:sql}
> select max(amount) over(order by amount) FROM dfs.`/data/a.csvh`;
> {code}
> Then add an empty line between the `10` and the `11`:
> {noformat}
> amount
> 10
> 11
> {noformat}
>  and try again:
> {noformat}
> 0: jdbc:drill:zk=local> select max(amount) over(order by amount) FROM 
> dfs.`/data/a.csvh`;
> +-+
> | EXPR$0  |
> +-+
> | 10  |
> | 11  |
> +-+
> 2 rows selected (3.554 seconds)
> 0: jdbc:drill:zk=local> select max(amount) over(order by amount) FROM 
> dfs.`/data/a.csvh`;
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x0001064aeae7, pid=23450, tid=0x6103
> #
> # JRE version: Java(TM) SE Runtime Environment (8.0_181-b13) (build 
> 1.8.0_181-b13)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.181-b13 mixed mode bsd-amd64 
> compressed oops)
> # Problematic frame:
> # J 6719% C2 
> org.apache.drill.exec.expr.fn.impl.ByteFunctionHelpers.memcmp(JIIJII)I (188 
> bytes) @ 0x0001064aeae7 [0x0001064ae920+0x1c7]
> #
> # Core dump written. Default location: /cores/core or core.23450
> #
> # An error report file with more information is saved as:
> # /Users/boazben-zvi/IdeaProjects/drill/hs_err_pid23450.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.java.com/bugreport/crash.jsp
> #
> Abort trap: 6 (core dumped)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (DRILL-4512) Revisit the changes for DRILL-3404 (using SUM0 for window function)

2016-03-15 Thread Aman Sinha (JIRA)
Aman Sinha created DRILL-4512:
-

 Summary: Revisit the changes for DRILL-3404 (using SUM0 for window 
function)
 Key: DRILL-4512
 URL: https://issues.apache.org/jira/browse/DRILL-4512
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.6.0
Reporter: Aman Sinha
Assignee: Aman Sinha


DRILL-3404 was an incorrect results issue related to SUM0 window function over 
nullable column containing null values.  The change done in Calcite for that 
issue should be reverted because on the latest master,  after I revert the 
Calcite change, I still get the correct result.  The Explain plan also shows 
that the new plan is different from the old one.  It seems there may have been 
nullability related fix(es) on Calcite. 

New plan after reverting the change for DRILL-3404:

{noformat}
00-00Screen
00-01  Project(c1=[$0], c2=[$1], w_sum=[$2])
00-02Project(c1=[$0], c2=[$1], w_sum=[CASE(>($2, 0), $3, null)])
00-03  SelectionVectorRemover
00-04Filter(condition=[>($2, 0)])
00-05  Window(window#0=[window(partition {1} order by [0 
ASC-nulls-first] range between UNBOUNDED PRECEDING and CURRENT ROW aggs 
[COUNT($0), $SUM0($0)])])
00-06SelectionVectorRemover
00-07  Sort(sort0=[$1], sort1=[$0], dir0=[ASC], 
dir1=[ASC-nulls-first])
00-08Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath 
[path=file:/Users/asinha/incubator-drill/exec/java-exec/src/test/resources/window/table_with_nulls.parquet]],
 
selectionRoot=file:/Users/asinha/incubator-drill/exec/java-exec/src/test/resources/window/table_with_nulls.parquet,
 numFiles=1, usedMetadataFile=false, columns=[`c1`, `c2`]]])

{noformat}

For reference, here's the old plan copied from DRILL-3404:

{noformat}
| 00-00Screen
00-01  Project(c1=[$0], c2=[$1], w_sum=[$2])
00-02Project(c1=[$0], c2=[$1], w_sum=[CASE(>($2, 0), $3, null)])
00-03  Window(window#0=[window(partition {1} order by [0 
ASC-nulls-first] range between UNBOUNDED PRECEDING and CURRENT ROW aggs 
[COUNT($0), $SUM0($0)])])
00-04SelectionVectorRemover
00-05  Sort(sort0=[$1], sort1=[$0], dir0=[ASC], 
dir1=[ASC-nulls-first])
00-06Project(c1=[$1], c2=[$0])
00-07  Scan(groupscan=[ParquetGroupScan 
[entries=[ReadEntryWithPath [path=maprfs:///tmp/tblWnulls]], 
selectionRoot=/tmp/tblWnulls, numFiles=1, columns=[`c1`, `c2`]]])
{noformat}

Notice the two plans are different due to the extra filter condition present in 
the new plan.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (DRILL-3211) Assert in a query with window function and group by clause

2015-05-28 Thread Victoria Markman (JIRA)
Victoria Markman created DRILL-3211:
---

 Summary: Assert in a query with window function and group by 
clause 
 Key: DRILL-3211
 URL: https://issues.apache.org/jira/browse/DRILL-3211
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.0.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni


{code}
0: jdbc:drill:schema=dfs> select sum(a1) over (partition by b1)  from t1 group 
by b1;
Error: SYSTEM ERROR: java.lang.AssertionError: Internal error: while converting 
SUM(`t1`.`a1`)
[Error Id: 21872cfa-6f09-4e92-aee6-5dd8698cf9e7 on atsqa4-133.qa.lab:31010] 
(state=,code=0)
{code}

drillbit.log
{code}
Caused by: java.lang.AssertionError: Internal error: while converting 
SUM(`t1`.`a1`)
at org.apache.calcite.util.Util.newInternal(Util.java:790) 
~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.ReflectiveConvertletTable$2.convertCall(ReflectiveConvertletTable.java:152)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:60)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertOver(SqlToRelConverter.java:1762)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.access$1000(SqlToRelConverter.java:180)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:3937)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:2521)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:2342)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:604)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:564)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2741)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:522)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at org.apache.calcite.prepare.PlannerImpl.convert(PlannerImpl.java:198) 
~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:246)
 ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:182)
 ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:177)
 ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:902) 
[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:240) 
[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
... 3 common frames omitted
Caused by: java.lang.reflect.InvocationTargetException: null
at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source) 
~[na:na]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.7.0_71]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71]
at 
org.apache.calcite.sql2rel.ReflectiveConvertletTable$2.convertCall(ReflectiveConvertletTable.java:142)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
... 19 common frames omitted
Caused by: java.lang.AssertionError: null
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.getRootField(SqlToRelConverter.java:3810)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.adjustInputRef(SqlToRelConverter.java:3139)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.convertIdentifier(SqlToRelConverter.java:3114)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter.access$1400(SqlToRelConverter.java:180)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
at 
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit(SqlToRelConverter.java:4061)
 ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
   

[jira] [Created] (DRILL-3269) Window function query takes too long to complete and return results

2015-06-09 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-3269:
-

 Summary: Window function query takes too long to complete and 
return results
 Key: DRILL-3269
 URL: https://issues.apache.org/jira/browse/DRILL-3269
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.0.0
 Environment: 1de6aed93efce8a524964371d96673b8ef192d89
Reporter: Khurram Faraaz
Assignee: Chris Westin
Priority: Minor


Query that uses window functions takes too long to complete and return results. 
It returns close to a million records, for which it took 533.8 seconds ~8 
minutes
Input CSV file has two columns, one integer and another varchar type column. 
Please take a look.

Size of the input CSV file
root@centos-01 ~]# hadoop fs -ls /tmp/manyDuplicates.csv
-rwxr-xr-x   3 root root   27889455 2015-06-10 01:26 /tmp/manyDuplicates.csv

{code}
select count(*) over(partition by cast(columns[1] as varchar(25)) order by 
cast(columns[0] as bigint)) from `manyDuplicates.csv`;
...
1,000,007 rows selected (533.857 seconds)
{code}

There are five distinct values in columns[1] in the CSV file. = [FIVE 
PARTITIONS]

{code}
0: jdbc:drill:schema=dfs.tmp> select distinct columns[1] from 
`manyDuplicates.csv`;
+---+
|EXPR$0 |
+---+
|   |
|   |
|   |
|   |
|   |
+---+
5 rows selected (1.906 seconds)
{code}

Here is the count for each of those values in columns[1]

{code}
0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from 
`manyDuplicates.csv` where columns[1] = '';
+-+
| EXPR$0  |
+-+
| 200484  |
+-+
1 row selected (0.961 seconds)
{code}

{code}
0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from 
`manyDuplicates.csv` where columns[1] = '';
+-+
| EXPR$0  |
+-+
| 199353  |
+-+
1 row selected (0.86 seconds)
{code}

{code}
0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from 
`manyDuplicates.csv` where columns[1] = '';
+-+
| EXPR$0  |
+-+
| 200702  |
+-+
1 row selected (0.826 seconds)
{code}

{code}
0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from 
`manyDuplicates.csv` where columns[1] = '';
+-+
| EXPR$0  |
+-+
| 199916  |
+-+
1 row selected (0.851 seconds)
{code}

{code}
0: jdbc:drill:schema=dfs.tmp> select count(columns[1]) from 
`manyDuplicates.csv` where columns[1] = '';
+-+
| EXPR$0  |
+-+
| 199552  |
+-+
1 row selected (0.827 seconds)
{code}

Query plan for the long running query
{code}
| 00-00Screen
00-01  UnionExchange
01-01Project(EXPR$0=[$0])
01-02  Project($0=[$2])
01-03Window(window#0=[window(partition {1} order by [0] range 
between UNBOUNDED PRECEDING and CURRENT ROW aggs [COUNT()])])
01-04  SelectionVectorRemover
01-05Sort(sort0=[$1], sort1=[$0], dir0=[ASC], dir1=[ASC])
01-06  Project($0=[$0], $1=[$1])
01-07HashToRandomExchange(dist0=[[$1]])
02-01  UnorderedMuxExchange
03-01Project($0=[$0], $1=[$1], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[castInt(hash64AsDouble($1))])
03-02  Project($0=[CAST(ITEM($0, 0)):BIGINT], 
$1=[CAST(ITEM($0, 1)):VARCHAR(25) CHARACTER SET "ISO-8859-1" COLLATE 
"ISO-8859-1$en_US$primary"])
03-03Scan(groupscan=[EasyGroupScan 
[selectionRoot=/tmp/manyDuplicates.csv, numFiles=1, columns=[`columns`[0], 
`columns`[1]], files=[maprfs:///tmp/manyDuplicates.csv]]])
{code}

python script to generate data in CSV format
{code}
import random
f = open('/Users/kungfo/manyDuplicates.csv', 'a')
for i in range(1,00):

f.write(str(random.choice(xrange(1,100)))+','+str(random.choice(['','','','','']))+'\n')
f.flush()


{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (DRILL-3211) Assert in a query with window function and group by clause

2015-06-27 Thread Sean Hsuan-Yi Chu (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Hsuan-Yi Chu resolved DRILL-3211.
--
Resolution: Duplicate

> Assert in a query with window function and group by clause 
> ---
>
> Key: DRILL-3211
> URL: https://issues.apache.org/jira/browse/DRILL-3211
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Query Planning & Optimization
>Affects Versions: 1.0.0
>Reporter: Victoria Markman
>Assignee: Sean Hsuan-Yi Chu
>  Labels: window_function
> Fix For: 1.1.0
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select sum(a1) over (partition by b1)  from t1 
> group by b1;
> Error: SYSTEM ERROR: java.lang.AssertionError: Internal error: while 
> converting SUM(`t1`.`a1`)
> [Error Id: 21872cfa-6f09-4e92-aee6-5dd8698cf9e7 on atsqa4-133.qa.lab:31010] 
> (state=,code=0)
> {code}
> drillbit.log
> {code}
> Caused by: java.lang.AssertionError: Internal error: while converting 
> SUM(`t1`.`a1`)
> at org.apache.calcite.util.Util.newInternal(Util.java:790) 
> ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> at 
> org.apache.calcite.sql2rel.ReflectiveConvertletTable$2.convertCall(ReflectiveConvertletTable.java:152)
>  ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> at 
> org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall(SqlNodeToRexConverterImpl.java:60)
>  ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertOver(SqlToRelConverter.java:1762)
>  ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.access$1000(SqlToRelConverter.java:180)
>  ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression(SqlToRelConverter.java:3937)
>  ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.createAggImpl(SqlToRelConverter.java:2521)
>  ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertAgg(SqlToRelConverter.java:2342)
>  ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:604)
>  ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:564)
>  ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2741)
>  ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> at 
> org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:522)
>  ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> at 
> org.apache.calcite.prepare.PlannerImpl.convert(PlannerImpl.java:198) 
> ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:246)
>  ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
> at 
> org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:182)
>  ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
> at 
> org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:177)
>  ~[drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
> at 
> org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:902) 
> [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
> at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:240) 
> [drill-java-exec-1.0.0-mapr-r1-rebuffed.jar:1.0.0-mapr-r1]
> ... 3 common frames omitted
> Caused by: java.lang.reflect.InvocationTargetException: null
> at sun.reflect.GeneratedMethodAccessor120.invoke(Unknown Source) 
> ~[na:na]
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[na:1.7.0_71]
> at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71]
> at 
> org.apache.calcite.sql2rel.ReflectiveConvertletTable$2.convertCall(ReflectiveConvertletTable.java:142)
>  ~[calcite-core-1.1.0-drill-r7.jar:1.1.0-drill-r7]
> ... 19 common frames omitted
> Caused by: java.lang.AssertionError: null
> at 
> org.apache.calcite.sql2rel.SqlTo

[jira] [Created] (DRILL-3555) Changing defaults for planner.memory.max_query_memory_per_node causes queries with window function to fail

2015-07-24 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-3555:
--

 Summary: Changing defaults for 
planner.memory.max_query_memory_per_node causes queries with window function to 
fail
 Key: DRILL-3555
 URL: https://issues.apache.org/jira/browse/DRILL-3555
 Project: Apache Drill
  Issue Type: Bug
  Components: Query Planning & Optimization
Affects Versions: 1.1.0, 1.2.0
 Environment: 4 Nodes. Direct Memory= 48 GB each
Reporter: Abhishek Girish
Assignee: Jinfeng Ni
Priority: Critical


Changing the default value for planner.memory.max_query_memory_per_node from 2 
GB to anything higher causes queries with window functions to fail. 

Changed system options
{code:sql}
> select * from sys.options where status like '%CHANGE%';
+---+--+-+--+-+-+---++
|   name|   kind   |  type   |  status  |   
num_val   | string_val  | bool_val  | float_val  |
+---+--+-+--+-+-+---++
| planner.enable_decimal_data_type  | BOOLEAN  | SYSTEM  | CHANGED  | 
null| null| true  | null   |
| planner.memory.max_query_memory_per_node  | LONG | SYSTEM  | CHANGED  | 
8589934592  | null| null  | null   |
+---+--+-+--+-+-+---++
2 rows selected (0.249 seconds)
{code}

Query
{code:sql}
> SELECT SUM(ss.ss_net_paid_inc_tax) OVER (PARTITION BY ss.ss_store_sk) FROM 
> store_sales ss LIMIT 20;

java.lang.RuntimeException: java.sql.SQLException: SYSTEM ERROR: 
DrillRuntimeException: Adding this batch causes the total size to exceed max 
allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. 
maxBytes 1073741824
Fragment 1:0
[Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010]
at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
at 
sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:87)
at sqlline.TableOutputFormat.print(TableOutputFormat.java:118)
at sqlline.SqlLine.print(SqlLine.java:1583)
at sqlline.Commands.execute(Commands.java:852)
at sqlline.Commands.sql(Commands.java:751)
at sqlline.SqlLine.dispatch(SqlLine.java:738)
at sqlline.SqlLine.begin(SqlLine.java:612)
at sqlline.SqlLine.start(SqlLine.java:366)
at sqlline.SqlLine.main(SqlLine.java:259)
{code}


Log:
{code}
2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: 
State change requested RUNNING --> FINISHED

2015-07-23 18:16:52,292 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:2:2] INFO  
o.a.d.e.w.f.FragmentStatusReporter - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:2:2: 
State to report: FINISHED

2015-07-23 18:17:05,485 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR 
o.a.d.e.p.i.s.SortRecordBatchBuilder - Adding this batch causes the total size 
to exceed max allowed size. Current runningBytes 1073638500, Incoming 
batchBytes 127875. maxBytes 1073741824

2015-07-23 18:17:05,486 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: 
State change requested RUNNING --> FAILED

...

2015-07-23 18:17:05,990 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:1:0: 
State change requested FAILED --> FINISHED

2015-07-23 18:17:05,999 [2a4e6e2e-8cfa-ed8f-de56-e6c5517b5da6:frag:1:0] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: DrillRuntimeException: 
Adding this batch causes the total size to exceed max allowed size. Current 
runningBytes 1073638500, Incoming batchBytes 127875. maxBytes 1073741824

Fragment 1:0

[Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010]

org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
DrillRuntimeException: Adding this batch causes the total size to exceed max 
allowed size. Current runningBytes 1073638500, Incoming batchBytes 127875. 
maxBytes 1073741824

Fragment 1:0

[Error Id: 9c2ec9cf-21c6-4d5e-b0d6-7cd59e32c49d on abhi1:31010]

at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:523)
 ~[drill-common-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]

at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:323)
 [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]

at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:178)
 [drill-java-exec-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]

at 
org.apache.drill.exec.wo

  1   2   >