[GitHub] [arrow] DavZim commented on issue #14732: [R] Filter operations not shown when called before summarise

via GitHub Tue, 07 Feb 2023 23:31:19 -0800


DavZim commented on issue #14732:
URL: https://github.com/apache/arrow/issues/14732#issuecomment-1422150624


   Ill try to look into it later this or next week.
   While at this, I think it would be good to show the file in the printout as 
well.
   That is, if we have two files with the same datastructure but different data 
+ different names, any caching on it (because it works on the printout of the 
query, I think) might be a cache overlap.
   For example taking the example above, I would suggest something like this as 
an output
   
   ```
   library(arrow)
   library(dplyr)
   ds_file <- file.path(tempdir(), "mtcars")
   
   write_dataset(mtcars |> select(mpg, cyl), ds_file)
   ds <- open_dataset(ds_file)
   
   ds |> filter(mpg > 25)
   #> FileSystemDataset (query) File: /tmp/RtmptUJio4/mtcars                    
#<===== added this
   #> mpg: double
   #> cyl: double
   #> 
   #> * Filter: (mpg > 25)
   #> See $.data for the source Arrow object
   ```
   Do you think this is worth it and should be packed in this PR/Issue as well 
or should I open a new issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] DavZim commented on issue #14732: [R] Filter operations not shown when called before summarise

Reply via email to