Hi Nitin,

As it turns out, I just had to fix a bug in the windowing operator. I'm not an 
expert on this operator, but perhaps I can offer a suggestion or two.

We have a few existing unit tests for window functions in TestWindowFrame. They 
are a bit hard to follow, however. Take a look at testFix3605(), which does:

select
  col2,
  lead(col2) over(partition by col2 order by col0) as lead_col2
from
  dfs.`window/fewRowsAllData.parquet`

When executed, we use the NoFrameSupportTemplate [1] class to do the work. 
Specifically, processPartition() contains code that handles the lead/lag by 1 
case.

I found it useful to enable saving of the generated code: [2]. When you step 
into the generated code, in Eclipse, you can set the source path to include 
/tmp/drill/codegen (on Linux/Mac). You can then see the contents of the 
generated copyPrev() and copyNext() functions.

If you step into these generated functions, you can see that the code simply 
takes two indexes: a to and a from, then copies the data from one to the other. 
As a result, I suspect that you do not need to change the generated code to 
achieve your goal.

Instead, you may want to change the processPartition() function. Instead of the 
simple +/-1 logic it currently has, use your lead/lag offset instead.

By the way, a handy way to share work is simply to push your work to your 
private GitHub repo, then link to that code.

Thanks,
- Paul


[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/NoFrameSupportTemplate.java#L139
[2] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java#L366
 

    On Wednesday, November 6, 2019, 10:04:12 PM PST, Nitin Pawar 
<nitinpawar...@gmail.com> wrote:  
 
 any help on this?


On Tue, Nov 5, 2019 at 7:09 PM Nitin Pawar <nitinpawar...@gmail.com> wrote:

> Ohh ok
> let me provide a google drive url
> Here
> <https://drive.google.com/file/d/1no-xq40Q2HEoWx5xG1t5dOx5BKTFH72F/view?usp=sharing>
> is the link. Can you check if can access it.
>
> Thanks,
> Nitin
>
> On Tue, Nov 5, 2019 at 7:02 PM Charles Givre <cgi...@gmail.com> wrote:
>
>> Hi Nitin,
>> It seems to have been filtered out.
>>
>>
>> > On Nov 5, 2019, at 8:29 AM, Nitin Pawar <nitinpawar...@gmail.com>
>> wrote:
>> >
>> > Hi Charles,
>> >
>> > I have attached git patch.
>> > I was currently doing for lag function only for testing purposes
>> >
>> > Thanks,
>> > Nitin
>> >
>> > On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cgi...@gmail.com <mailto:
>> cgi...@gmail.com>> wrote:
>> > Hi Nitin,
>> > Thanks for your question.  Could you/did you share your code?  If not,
>> could you please post a draft PR so that we can take a look and offer
>> suggestions?
>> > Thanks,
>> > -- C
>> >
>> >
>> > > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <nitinpawar...@gmail.com
>> <mailto:nitinpawar...@gmail.com>> wrote:
>> > >
>> > > Hi Devs,
>> > >
>> > > I had sent request for this almost 2.5 years ago. Trying it again now.
>> > >
>> > > Currently Apache drill window functions LEAD and LAG support offset
>> as 1.
>> > > In another words in a given window these functions can return either
>> > > previous or next row only.
>> > >
>> > >
>> > > I am trying modify the behavior these function and allow offset >=1 in
>> > > query such as
>> > > select employee_id, department_id,salary, lag(salary,*4*)
>> over(partition by
>> > > department_id order by salary asc) from  cp.`employee.json`;
>> > >
>> > > I have managed to remove the limitation which fails the query can not
>> have
>> > > offset > 1 and able to pass the offset to actual function
>> implementation.
>> > >
>> > > Currently I am stuck where the record processor is crossing the window
>> > > boundary of department_id and gets row from next/previous window in
>> > > lead/lag function
>> > >
>> > > For eg: If you notice in row 2 for department_id=2, it is getting
>> previous
>> > > windows of department_id=1
>> > >
>> > > Here is sample output for below query
>> > > apache drill> select  employee_id, department_id,salary, lag(salary,4)
>> > > over(partition by department_id order by salary asc) from
>> > > cp.`employee.json` where department_id <=3;
>> > > +-------------+---------------+---------+----------+
>> > > | employee_id | department_id | salary  |  EXPR$3  |
>> > > +-------------+---------------+---------+----------+
>> > > | 20          | 1            | 30000.0 | null    |
>> > > | 5          | 1            | 35000.0 | null    |
>> > > | 22          | 1            | 35000.0 | null    |
>> > > | 21          | 1            | 35000.0 | null    |
>> > > | 2          | 1            | 40000.0 | 30000.0  |
>> > > | 4          | 1            | 40000.0 | 35000.0  |
>> > > | 1          | 1            | 80000.0 | 35000.0  |
>> > > | 37          | 2            | 6700.0  | null    |
>> > > | 38          | 2            | 8000.0  | 40000.0  |
>> > > | 39          | 2            | 10000.0 | 40000.0  |
>> > > | 40          | 2            | 10000.0 | 80000.0  |
>> > > | 6          | 2            | 25000.0 | 6700.0  |
>> > > | 42          | 3            | 5000.0  | null    |
>> > > | 41          | 3            | 8500.0  | 10000.0  |
>> > > | 7          | 3            | 15000.0 | 10000.0  |
>> > > | 36          | 3            | 45000.0 | 25000.0  |
>> > > +-------------+---------------+---------+----------+
>> > >
>> > >
>> > > Thanks,
>> > > Nitin Pawar
>> >
>> >
>> >
>> > --
>> > Nitin Pawar
>>
>>
>
> --
> Nitin Pawar
>


-- 
Nitin Pawar
  

Reply via email to