Hi Allan, I think I found an answer to your problem: 1) Modify PhysicalPlanResetter by adding:
@Override public void visitCounter(POCounter counter) throws VisitorException { counter.reset(); } 2) Modify POCounter by adding @Override public void reset() { localCount = 0L; taskID = "-1"; incrementer = 1; } I get this result on this file + script: ------------------------------------------ | a | id:int | value:chararray | ------------------------------------------ | | 6 | g | ------------------------------------------ ----------------------------------------------------------- | b | rank_a:long | id:int | value:chararray | ----------------------------------------------------------- | | 1 | 6 | g | ----------------------------------------------------------- grunt> cat file.txt 1 a 2 b 3 c 3 d 4 e 6 f 6 g 8 h grunt> a = load 'file.txt' as (id:int, value:chararray); grunt> b = rank a; grunt> illustrate b Hope it helps. Cheers, -- Gianmarco On Tue, Aug 7, 2012 at 4:34 PM, Allan <aaven...@gmail.com> wrote: > Hi to everybody! > > I'm working on the implementation of rank operator, which successfully > passed all the e2e tests on a cluster. > Rank operator is composed by two physical operators: POCounter and PORank, > and it provides two functionalities: > > 1) First functionality is similar to ROW NUMBER like on SQL, which > provides a sequential number to each tuple. > This is implemented by two map-only works (one for each physical > operator). > > - POCounter adds to each tuple the task identifier (which is processing > it) and a local counter. Furthermore, POCounter register the total number > of processed tuples by each task, through the used of global counters. > After finished the POCounter, it is calculated the cumulative sum, which > is the summation of the total tuples processed by previous tasks, i.e. for > task0 cumulative sum is 0 (there is not tuples before), task1 cumulative > sum is the number of tuples processed by task0 (the only task before it is > task0), and so on. > > - Finally, PORank reads the corresponding cumulative according to the task > id of each tuple and sums the local counter at the tuple. > > An input example for the POCount could be: > > (1,n,5) > (8,a,0) > (0,b,9) > > result of POCounter, and input to the PORank: > > (0,1,1,n,5) > (0,2,8,a,0) > (0,3,0,b,9) > > and result after PORank processing: > > (1,1,n,5) > (2,8,a,0) > (3,0,b,9) > > > 2) Second functionality is RANK BY, which is based on set of ordered > columns. > And it requires another methodology: > First, the dataset is group by the desired columns. Then, this result is > sorted by the columns specified. And, at the end this result is processed > by POCounter and PORank. > As in the previous case, POCounter adds to each tuple the task identifier > and the local counter. But here, local counter is not sequentially > incremented. Instead, it is added the number of tuples in the bag (produced > within the previous "group by"). > Another particular change is the fact of the global counter is also > incremented by the size of bags on each tuple. > > Finally, PORank does the same as the previous implementation without > change. After that, the rank column is spread to each component on the bag > within a for each operator. > > An input example for the POCounter (after sorting and grouping): > On this case, I would like to rank by the first column. > > (0,{(0,b,9)}) > (1,{(1,n,5)}) > (8,{(8,a,0)}) > > And after being processed by POCounter, and an input example for the > PORank: > > (0,1,0,{(0,b,9)}) > (0,2,1,{(1,n,5)}) > (0,3,8,{(8,a,0)}) > > Then, the resulting after PORank: > > (1,0,{(0,b,9)}) > (2,1,{(1,n,5)}) > (3,8,{(8,a,0)}) > > Finally, the rank value is spread to each element at the bag through a for > each operator, resulting: > > (1,0,b,9) > (2,1,n,5) > (3,8,a,0) > > After testing some options, I got a way to illustrate the rank operator, > but I have some problems: > > 1.- I guess that due to the illustrator algorithm, resulting tuples after > POCounter produces numbers high counters values two or three times than > expected, for example: > (0,38,1,n,5) > (0,39,8,a,0) > (0,40,0,b,9) > > 2.- Until now, I get 1 tuple example after illustrate. How could I get at > least three or four tuples as result? > > Thanks in advance for your replies, > > -- > > Allan AvendaƱo S. > Computer Engineer > Ex-SWY22 Participant > Rome - Italy > Gmail: aaven...@gmail.com > -- > >