DavZim commented on issue #14732: URL: https://github.com/apache/arrow/issues/14732#issuecomment-1422150624
Ill try to look into it later this or next week. While at this, I think it would be good to show the file in the printout as well. That is, if we have two files with the same datastructure but different data + different names, any caching on it (because it works on the printout of the query, I think) might be a cache overlap. For example taking the example above, I would suggest something like this as an output ``` library(arrow) library(dplyr) ds_file <- file.path(tempdir(), "mtcars") write_dataset(mtcars |> select(mpg, cyl), ds_file) ds <- open_dataset(ds_file) ds |> filter(mpg > 25) #> FileSystemDataset (query) File: /tmp/RtmptUJio4/mtcars #<===== added this #> mpg: double #> cyl: double #> #> * Filter: (mpg > 25) #> See $.data for the source Arrow object ``` Do you think this is worth it and should be packed in this PR/Issue as well or should I open a new issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
