Thomas Mock created ARROW-16777: ----------------------------------- Summary: printing data in Table/RecordBatch print method Key: ARROW-16777 URL: https://issues.apache.org/jira/browse/ARROW-16777 Project: Apache Arrow Issue Type: Improvement Components: Python, R Reporter: Thomas Mock
Related to ARROW-16776 but after a brief discussion with Neal Richardson, he requested that I split the improvement request into separate issues. When working with Arrow datasets/tables, I often find myself wanting to interactively print or "see" the results of a query or the first few rows of the data without having to fully collect into memory. It would be ideal to lazily print some data with Table/RecordBatch print methods, however, currently, the print methods return schema without data. IE: ``` r library(dplyr) library(arrow) mtcars %>% arrow::write_parquet("mtcars.parquet") car_ds <- arrow::open_dataset("mtcars.parquet") car_ds #> FileSystemDataset with 1 Parquet file #> mpg: double #> cyl: double #> disp: double #> hp: double #> drat: double #> wt: double #> qsec: double #> vs: double #> am: double #> gear: double #> carb: double #> #> See $metadata for additional Schema metadata car_ds %>% compute() #> Table #> 32 rows x 11 columns #> $mpg <double> #> $cyl <double> #> $disp <double> #> $hp <double> #> $drat <double> #> $wt <double> #> $qsec <double> #> $vs <double> #> $am <double> #> $gear <double> #> $carb <double> #> #> See $metadata for additional Schema metadata ``` -- This message was sent by Atlassian Jira (v8.20.7#820007)