[ https://issues.apache.org/jira/browse/ARROW-14162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421912#comment-17421912 ]
Weston Pace commented on ARROW-14162: ------------------------------------- The call to `head` is triggering an (immediate?) call to the legacy scanner head method. The resulting dataset is then returned. Then the remaining dplyr execution is resolved against the in-memory data. ExecPlan is not used at all. So it is first fetching the first 4 rows and then sorting instead of sorting and then fetching. If this is truly a blocker for 6.0.0 then it might be an problem. The head can't be applied in R because it would read in all of the data (presumably you could abort the read partway through but I think this would be overly complex). If we want to do a proper ordered head in C++ then my recommendation would be the batch index scheme proposed in the sequencing doc [here](https://docs.google.com/document/d/1MfVE9td9D4n5y-PTn66kk4-9xG7feXs1zSFf-qxQgPs/edit?usp=sharing) but I'm not sure we want to tackle that as part of 6.0.0. As a short term solution we can modify the sorting sink node to accept a limit argument. That should be a reasonably quick solution and could maybe fit in 6.0.0 but I'm not sure how much time we want to invest in stop-gap measures. > [R] Simple arrange %>% head does not respect ordering > ----------------------------------------------------- > > Key: ARROW-14162 > URL: https://issues.apache.org/jira/browse/ARROW-14162 > Project: Apache Arrow > Issue Type: Bug > Components: R > Reporter: Weston Pace > Priority: Blocker > > This was originally reported by [~jonkeane] in ARROW-13893 but that issue was > covering a different topic so I am opening a new issue for this specific > behavior. > {code:r} > > library(arrow) > > library(dplyr) > > > > tab <- Table$create(mtcars) > > > > tab %>% > + arrange(mpg) %>% > + head(4) %>% > + collect() > mpg cyl disp hp drat wt qsec vs am gear carb > 1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 > 2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 > 3 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 > 4 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 > > > > mtcars %>% > + arrange(mpg) %>% > + head(4) %>% > + collect() > mpg cyl disp hp drat wt qsec vs am gear carb > Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4 > Lincoln Continental 10.4 8 460 215 3.00 5.424 17.82 0 0 3 4 > Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4 > Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)