Thanks for the links and guidance Paul.
I will  use your advice and see if I can get past the limitation.

Thanks,
Nitin

On Thu, Nov 7, 2019 at 1:00 PM Paul Rogers <par0...@yahoo.com.invalid>
wrote:

> Hi Nitin,
>
> As it turns out, I just had to fix a bug in the windowing operator. I'm
> not an expert on this operator, but perhaps I can offer a suggestion or two.
>
> We have a few existing unit tests for window functions in TestWindowFrame.
> They are a bit hard to follow, however. Take a look at testFix3605(), which
> does:
>
> select
>   col2,
>   lead(col2) over(partition by col2 order by col0) as lead_col2
> from
>   dfs.`window/fewRowsAllData.parquet`
>
> When executed, we use the NoFrameSupportTemplate [1] class to do the work.
> Specifically, processPartition() contains code that handles the lead/lag by
> 1 case.
>
> I found it useful to enable saving of the generated code: [2]. When you
> step into the generated code, in Eclipse, you can set the source path to
> include /tmp/drill/codegen (on Linux/Mac). You can then see the contents of
> the generated copyPrev() and copyNext() functions.
>
> If you step into these generated functions, you can see that the code
> simply takes two indexes: a to and a from, then copies the data from one to
> the other. As a result, I suspect that you do not need to change the
> generated code to achieve your goal.
>
> Instead, you may want to change the processPartition() function. Instead
> of the simple +/-1 logic it currently has, use your lead/lag offset instead.
>
> By the way, a handy way to share work is simply to push your work to your
> private GitHub repo, then link to that code.
>
> Thanks,
> - Paul
>
>
> [1]
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/NoFrameSupportTemplate.java#L139
> [2]
> https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java#L366
>
>     On Wednesday, November 6, 2019, 10:04:12 PM PST, Nitin Pawar <
> nitinpawar...@gmail.com> wrote:
>
>  any help on this?
>
>
> On Tue, Nov 5, 2019 at 7:09 PM Nitin Pawar <nitinpawar...@gmail.com>
> wrote:
>
> > Ohh ok
> > let me provide a google drive url
> > Here
> > <
> https://drive.google.com/file/d/1no-xq40Q2HEoWx5xG1t5dOx5BKTFH72F/view?usp=sharing
> >
> > is the link. Can you check if can access it.
> >
> > Thanks,
> > Nitin
> >
> > On Tue, Nov 5, 2019 at 7:02 PM Charles Givre <cgi...@gmail.com> wrote:
> >
> >> Hi Nitin,
> >> It seems to have been filtered out.
> >>
> >>
> >> > On Nov 5, 2019, at 8:29 AM, Nitin Pawar <nitinpawar...@gmail.com>
> >> wrote:
> >> >
> >> > Hi Charles,
> >> >
> >> > I have attached git patch.
> >> > I was currently doing for lag function only for testing purposes
> >> >
> >> > Thanks,
> >> > Nitin
> >> >
> >> > On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cgi...@gmail.com
> <mailto:
> >> cgi...@gmail.com>> wrote:
> >> > Hi Nitin,
> >> > Thanks for your question.  Could you/did you share your code?  If not,
> >> could you please post a draft PR so that we can take a look and offer
> >> suggestions?
> >> > Thanks,
> >> > -- C
> >> >
> >> >
> >> > > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <nitinpawar...@gmail.com
> >> <mailto:nitinpawar...@gmail.com>> wrote:
> >> > >
> >> > > Hi Devs,
> >> > >
> >> > > I had sent request for this almost 2.5 years ago. Trying it again
> now.
> >> > >
> >> > > Currently Apache drill window functions LEAD and LAG support offset
> >> as 1.
> >> > > In another words in a given window these functions can return either
> >> > > previous or next row only.
> >> > >
> >> > >
> >> > > I am trying modify the behavior these function and allow offset >=1
> in
> >> > > query such as
> >> > > select employee_id, department_id,salary, lag(salary,*4*)
> >> over(partition by
> >> > > department_id order by salary asc) from  cp.`employee.json`;
> >> > >
> >> > > I have managed to remove the limitation which fails the query can
> not
> >> have
> >> > > offset > 1 and able to pass the offset to actual function
> >> implementation.
> >> > >
> >> > > Currently I am stuck where the record processor is crossing the
> window
> >> > > boundary of department_id and gets row from next/previous window in
> >> > > lead/lag function
> >> > >
> >> > > For eg: If you notice in row 2 for department_id=2, it is getting
> >> previous
> >> > > windows of department_id=1
> >> > >
> >> > > Here is sample output for below query
> >> > > apache drill> select  employee_id, department_id,salary,
> lag(salary,4)
> >> > > over(partition by department_id order by salary asc) from
> >> > > cp.`employee.json` where department_id <=3;
> >> > > +-------------+---------------+---------+----------+
> >> > > | employee_id | department_id | salary  |  EXPR$3  |
> >> > > +-------------+---------------+---------+----------+
> >> > > | 20          | 1            | 30000.0 | null    |
> >> > > | 5          | 1            | 35000.0 | null    |
> >> > > | 22          | 1            | 35000.0 | null    |
> >> > > | 21          | 1            | 35000.0 | null    |
> >> > > | 2          | 1            | 40000.0 | 30000.0  |
> >> > > | 4          | 1            | 40000.0 | 35000.0  |
> >> > > | 1          | 1            | 80000.0 | 35000.0  |
> >> > > | 37          | 2            | 6700.0  | null    |
> >> > > | 38          | 2            | 8000.0  | 40000.0  |
> >> > > | 39          | 2            | 10000.0 | 40000.0  |
> >> > > | 40          | 2            | 10000.0 | 80000.0  |
> >> > > | 6          | 2            | 25000.0 | 6700.0  |
> >> > > | 42          | 3            | 5000.0  | null    |
> >> > > | 41          | 3            | 8500.0  | 10000.0  |
> >> > > | 7          | 3            | 15000.0 | 10000.0  |
> >> > > | 36          | 3            | 45000.0 | 25000.0  |
> >> > > +-------------+---------------+---------+----------+
> >> > >
> >> > >
> >> > > Thanks,
> >> > > Nitin Pawar
> >> >
> >> >
> >> >
> >> > --
> >> > Nitin Pawar
> >>
> >>
> >
> > --
> > Nitin Pawar
> >
>
>
> --
> Nitin Pawar
>



-- 
Nitin Pawar

Reply via email to