paleolimbot commented on code in PR #596:
URL: https://github.com/apache/arrow-nanoarrow/pull/596#discussion_r1759427744
##########
src/nanoarrow/common/array.c:
##########
@@ -449,14 +462,33 @@ static ArrowErrorCode ArrowArrayFinalizeBuffers(struct
ArrowArray* array) {
NANOARROW_RETURN_NOT_OK(ArrowArrayFinalizeBuffers(array->dictionary));
}
+ if (private_data->storage_type == NANOARROW_TYPE_STRING_VIEW ||
+ private_data->storage_type == NANOARROW_TYPE_BINARY_VIEW) {
+ const int32_t nfixed_buf = 2;
+ const int32_t n_vbuf = private_data->n_variadic_buffers;
+ private_data->buffer_data = (const void**)realloc(
Review Comment:
```suggestion
private_data->buffer_data = (const void**)ArrowRealloc(
```
##########
python/src/nanoarrow/schema.py:
##########
@@ -76,6 +76,8 @@ class Type(enum.Enum):
LARGE_BINARY = int(_types.LARGE_BINARY)
LARGE_LIST = int(_types.LARGE_LIST)
INTERVAL_MONTH_DAY_NANO = int(_types.INTERVAL_MONTH_DAY_NANO)
+ BINARY_VIEW = int(_types.BINARY_VIEW)
Review Comment:
```suggestion
RUN_END_ENCODED = int(_types.RUN_END_ENCODED)
BINARY_VIEW = int(_types.BINARY_VIEW)
```
...maybe?
##########
src/nanoarrow/common/inline_array.h:
##########
@@ -810,6 +874,11 @@ static inline int64_t ArrowArrayViewListChildOffset(
}
}
+static inline struct ArrowBufferView ArrowArrayViewBufferView(
+ const struct ArrowArrayView* array_view, int64_t i) {
+ return array_view->buffer_views[i];
Review Comment:
Does this need to be `if(...) return array-view->buffer_views[i]; else
return array_view->variadic_buffer_views[i - 2];`?
##########
src/nanoarrow/common/array.c:
##########
@@ -148,12 +156,17 @@ ArrowErrorCode ArrowArrayInitFromType(struct ArrowArray*
array,
ArrowBitmapInit(&private_data->bitmap);
ArrowBufferInit(&private_data->buffers[0]);
ArrowBufferInit(&private_data->buffers[1]);
- private_data->buffer_data[0] = NULL;
- private_data->buffer_data[1] = NULL;
- private_data->buffer_data[2] = NULL;
+ private_data->buffer_data =
+ (const void**)ArrowMalloc(sizeof(void*) * NANOARROW_MAX_FIXED_BUFFERS);
+ for (int i = 0; i < NANOARROW_MAX_FIXED_BUFFERS; ++i) {
+ private_data->buffer_data[i] = NULL;
+ }
Review Comment:
I think it might be less disruptive to keep the fixed-size `buffer_data` and
just add a new `variadic_buffers` array (that is null/not allocated most of the
time).
##########
src/nanoarrow/common/inline_array.h:
##########
@@ -467,52 +468,113 @@ static inline ArrowErrorCode
ArrowArrayAppendDouble(struct ArrowArray* array,
return NANOARROW_OK;
}
+#define NANOARROW_BINARY_VIEW_INLINE_SIZE 12
+#define NANOARROW_BINARY_VIEW_PREVIEW_SIZE 4
+#define NANOARROW_BINARY_VIEW_BLOCK_SIZE 32 << 10 // 32KB
Review Comment:
For the purposes of this PR, I wonder if you can omit the chunking logic
(i.e., when building a string/binary view array, simply use the third fixed
buffer as your one variadic buffer and always append the data to that one).
For "build by buffer" we would need to support `ArrowArraySetBuffer()` on
the variadic buffers (which would let a user implement this logic themselves),
and perhaps a `union ArrowBinaryViewType ArrowBinaryViewInlined()`/`union
ArrowBinaryViewType ArrowBinaryViewReference()` helper (if I didn't miss one
you made already!)
##########
src/nanoarrow/common/array.c:
##########
@@ -449,14 +462,33 @@ static ArrowErrorCode ArrowArrayFinalizeBuffers(struct
ArrowArray* array) {
NANOARROW_RETURN_NOT_OK(ArrowArrayFinalizeBuffers(array->dictionary));
}
+ if (private_data->storage_type == NANOARROW_TYPE_STRING_VIEW ||
+ private_data->storage_type == NANOARROW_TYPE_BINARY_VIEW) {
Review Comment:
I think this should be in `ArrowArrayFlushInternalPointers()` (if I remember
correctly, the "finalize buffers" step is skipped for non CPU data)
##########
python/src/nanoarrow/schema.py:
##########
@@ -792,6 +812,24 @@ def large_binary(nullable: bool = True) -> Schema:
return Schema(Type.LARGE_BINARY, nullable=nullable)
+def binary_view(nullable: bool = True) -> Schema:
+ """Create an instance of a binary view type.
+
+ Parameters
+ ----------
+ nullable : bool, optional
+ Use ``False`` to mark this field as non-nullable.
+
+ Examples
+ --------
+
+ >>> import nanoarrow as na
+ >>> na.large_binary()
+ <Schema> large_binary
Review Comment:
```suggestion
>>> na.binary_view()
<Schema> binary_view
```
##########
python/src/nanoarrow/schema.py:
##########
@@ -756,6 +758,24 @@ def large_string(nullable: bool = True) -> Schema:
return Schema(Type.LARGE_STRING, nullable=nullable)
+def string_view(nullable: bool = True) -> Schema:
+ """Create an instance of a string view type.
+
+ Parameters
+ ----------
+ nullable : bool, optional
+ Use ``False`` to mark this field as non-nullable.
+
+ Examples
+ --------
+
+ >>> import nanoarrow as na
+ >>> na.large_binary()
+ <Schema> large_binary
Review Comment:
```suggestion
>>> na.string_view()
<Schema> string_view
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]