Thanks for the links and guidance Paul. I will use your advice and see if I can get past the limitation.
Thanks, Nitin On Thu, Nov 7, 2019 at 1:00 PM Paul Rogers <par0...@yahoo.com.invalid> wrote: > Hi Nitin, > > As it turns out, I just had to fix a bug in the windowing operator. I'm > not an expert on this operator, but perhaps I can offer a suggestion or two. > > We have a few existing unit tests for window functions in TestWindowFrame. > They are a bit hard to follow, however. Take a look at testFix3605(), which > does: > > select > col2, > lead(col2) over(partition by col2 order by col0) as lead_col2 > from > dfs.`window/fewRowsAllData.parquet` > > When executed, we use the NoFrameSupportTemplate [1] class to do the work. > Specifically, processPartition() contains code that handles the lead/lag by > 1 case. > > I found it useful to enable saving of the generated code: [2]. When you > step into the generated code, in Eclipse, you can set the source path to > include /tmp/drill/codegen (on Linux/Mac). You can then see the contents of > the generated copyPrev() and copyNext() functions. > > If you step into these generated functions, you can see that the code > simply takes two indexes: a to and a from, then copies the data from one to > the other. As a result, I suspect that you do not need to change the > generated code to achieve your goal. > > Instead, you may want to change the processPartition() function. Instead > of the simple +/-1 logic it currently has, use your lead/lag offset instead. > > By the way, a handy way to share work is simply to push your work to your > private GitHub repo, then link to that code. > > Thanks, > - Paul > > > [1] > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/NoFrameSupportTemplate.java#L139 > [2] > https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/window/WindowFrameRecordBatch.java#L366 > > On Wednesday, November 6, 2019, 10:04:12 PM PST, Nitin Pawar < > nitinpawar...@gmail.com> wrote: > > any help on this? > > > On Tue, Nov 5, 2019 at 7:09 PM Nitin Pawar <nitinpawar...@gmail.com> > wrote: > > > Ohh ok > > let me provide a google drive url > > Here > > < > https://drive.google.com/file/d/1no-xq40Q2HEoWx5xG1t5dOx5BKTFH72F/view?usp=sharing > > > > is the link. Can you check if can access it. > > > > Thanks, > > Nitin > > > > On Tue, Nov 5, 2019 at 7:02 PM Charles Givre <cgi...@gmail.com> wrote: > > > >> Hi Nitin, > >> It seems to have been filtered out. > >> > >> > >> > On Nov 5, 2019, at 8:29 AM, Nitin Pawar <nitinpawar...@gmail.com> > >> wrote: > >> > > >> > Hi Charles, > >> > > >> > I have attached git patch. > >> > I was currently doing for lag function only for testing purposes > >> > > >> > Thanks, > >> > Nitin > >> > > >> > On Tue, Nov 5, 2019 at 6:34 PM Charles Givre <cgi...@gmail.com > <mailto: > >> cgi...@gmail.com>> wrote: > >> > Hi Nitin, > >> > Thanks for your question. Could you/did you share your code? If not, > >> could you please post a draft PR so that we can take a look and offer > >> suggestions? > >> > Thanks, > >> > -- C > >> > > >> > > >> > > On Nov 5, 2019, at 7:27 AM, Nitin Pawar <nitinpawar...@gmail.com > >> <mailto:nitinpawar...@gmail.com>> wrote: > >> > > > >> > > Hi Devs, > >> > > > >> > > I had sent request for this almost 2.5 years ago. Trying it again > now. > >> > > > >> > > Currently Apache drill window functions LEAD and LAG support offset > >> as 1. > >> > > In another words in a given window these functions can return either > >> > > previous or next row only. > >> > > > >> > > > >> > > I am trying modify the behavior these function and allow offset >=1 > in > >> > > query such as > >> > > select employee_id, department_id,salary, lag(salary,*4*) > >> over(partition by > >> > > department_id order by salary asc) from cp.`employee.json`; > >> > > > >> > > I have managed to remove the limitation which fails the query can > not > >> have > >> > > offset > 1 and able to pass the offset to actual function > >> implementation. > >> > > > >> > > Currently I am stuck where the record processor is crossing the > window > >> > > boundary of department_id and gets row from next/previous window in > >> > > lead/lag function > >> > > > >> > > For eg: If you notice in row 2 for department_id=2, it is getting > >> previous > >> > > windows of department_id=1 > >> > > > >> > > Here is sample output for below query > >> > > apache drill> select employee_id, department_id,salary, > lag(salary,4) > >> > > over(partition by department_id order by salary asc) from > >> > > cp.`employee.json` where department_id <=3; > >> > > +-------------+---------------+---------+----------+ > >> > > | employee_id | department_id | salary | EXPR$3 | > >> > > +-------------+---------------+---------+----------+ > >> > > | 20 | 1 | 30000.0 | null | > >> > > | 5 | 1 | 35000.0 | null | > >> > > | 22 | 1 | 35000.0 | null | > >> > > | 21 | 1 | 35000.0 | null | > >> > > | 2 | 1 | 40000.0 | 30000.0 | > >> > > | 4 | 1 | 40000.0 | 35000.0 | > >> > > | 1 | 1 | 80000.0 | 35000.0 | > >> > > | 37 | 2 | 6700.0 | null | > >> > > | 38 | 2 | 8000.0 | 40000.0 | > >> > > | 39 | 2 | 10000.0 | 40000.0 | > >> > > | 40 | 2 | 10000.0 | 80000.0 | > >> > > | 6 | 2 | 25000.0 | 6700.0 | > >> > > | 42 | 3 | 5000.0 | null | > >> > > | 41 | 3 | 8500.0 | 10000.0 | > >> > > | 7 | 3 | 15000.0 | 10000.0 | > >> > > | 36 | 3 | 45000.0 | 25000.0 | > >> > > +-------------+---------------+---------+----------+ > >> > > > >> > > > >> > > Thanks, > >> > > Nitin Pawar > >> > > >> > > >> > > >> > -- > >> > Nitin Pawar > >> > >> > > > > -- > > Nitin Pawar > > > > > -- > Nitin Pawar > -- Nitin Pawar