thisisnic commented on code in PR #14514: URL: https://github.com/apache/arrow/pull/14514#discussion_r1012447084
########## r/vignettes/data_objects.Rmd: ########## @@ -0,0 +1,206 @@ +--- +title: "Data objects" +description: > + Learn about Scalar, Array, Table, and Dataset objects in `arrow` + (among others), how they relate to each other, as well as their + relationships to familiar R objects like data frames and vectors +output: rmarkdown::html_vignette +--- + +This article describes the various data object types supplied by `arrow`, and documents how these objects are structured. + +```{r include=FALSE} +library(arrow, warn.conflicts = FALSE) +``` + +The `arrow` package supplies several object classes that are used to represent data. `RecordBatch`, `Table`, and `Dataset` objects are two-dimensional rectangular data structures used to store tabular data. For columnar, one-dimensional data, the `Array` and `ChunkedArray` classes are provided. Finally, `Scalar` objects represent individual values. The table below summarizes these objects and shows how you can create new instances using the [`R6`](https://r6.r-lib.org/) class object, as well as convenience functions that provide the same functionality in a more traditional R-like fashion: + +| Dim | Class | How to create an instance | Convenience function | +| --- | -------------- | ----------------------------------------------| --------------------------------------------- | +| 0 | `Scalar` | `Scalar$create(value, type)` | | +| 1 | `Array` | `Array$create(vector, type)` | | +| 1 | `ChunkedArray` | `ChunkedArray$create(..., type)` | `chunked_array(..., type)` | +| 2 | `RecordBatch` | `RecordBatch$create(...)` | `record_batch(...)` | +| 2 | `Table` | `Table$create(...)` | `arrow_table(...)` | +| 2 | `Dataset` | `Dataset$create(sources, schema)` | `open_dataset(sources, schema)` | + +Later in the article we'll look at each of these in more detail. + +For now we note that each of these object classes corresponds to a class of the same name in the underlying Arrow C++ library. It is also worth mentioning that the `arrow` package also defines classes that do not exist in the C++ library including: + +* `ArrowDatum`: inherited by `Scalar`, `Array`, and `ChunkedArray` +* `ArrowTabular`: inherited by `RecordBatch` and `Table` +* `ArrowObject`: inherited by all Arrow objects Review Comment: Sorry, I was reading it as new content as I'd not looking at the getting started page in ages and ages! Honestly, I'd just err on the side of your own judgment in cases like this; I agree this section is for one of the dev vignettes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org