paleolimbot commented on code in PR #596:
URL: https://github.com/apache/arrow-nanoarrow/pull/596#discussion_r1759427744


##########
src/nanoarrow/common/array.c:
##########
@@ -449,14 +462,33 @@ static ArrowErrorCode ArrowArrayFinalizeBuffers(struct 
ArrowArray* array) {
     NANOARROW_RETURN_NOT_OK(ArrowArrayFinalizeBuffers(array->dictionary));
   }
 
+  if (private_data->storage_type == NANOARROW_TYPE_STRING_VIEW ||
+      private_data->storage_type == NANOARROW_TYPE_BINARY_VIEW) {
+    const int32_t nfixed_buf = 2;
+    const int32_t n_vbuf = private_data->n_variadic_buffers;
+    private_data->buffer_data = (const void**)realloc(

Review Comment:
   ```suggestion
       private_data->buffer_data = (const void**)ArrowRealloc(
   ```



##########
python/src/nanoarrow/schema.py:
##########
@@ -76,6 +76,8 @@ class Type(enum.Enum):
     LARGE_BINARY = int(_types.LARGE_BINARY)
     LARGE_LIST = int(_types.LARGE_LIST)
     INTERVAL_MONTH_DAY_NANO = int(_types.INTERVAL_MONTH_DAY_NANO)
+    BINARY_VIEW = int(_types.BINARY_VIEW)

Review Comment:
   ```suggestion
       RUN_END_ENCODED = int(_types.RUN_END_ENCODED)
       BINARY_VIEW = int(_types.BINARY_VIEW)
   ```
   
   ...maybe?



##########
src/nanoarrow/common/inline_array.h:
##########
@@ -810,6 +874,11 @@ static inline int64_t ArrowArrayViewListChildOffset(
   }
 }
 
+static inline struct ArrowBufferView ArrowArrayViewBufferView(
+    const struct ArrowArrayView* array_view, int64_t i) {
+  return array_view->buffer_views[i];

Review Comment:
   Does this need to be `if(...) return array-view->buffer_views[i]; else 
return array_view->variadic_buffer_views[i - 2];`?



##########
src/nanoarrow/common/array.c:
##########
@@ -148,12 +156,17 @@ ArrowErrorCode ArrowArrayInitFromType(struct ArrowArray* 
array,
   ArrowBitmapInit(&private_data->bitmap);
   ArrowBufferInit(&private_data->buffers[0]);
   ArrowBufferInit(&private_data->buffers[1]);
-  private_data->buffer_data[0] = NULL;
-  private_data->buffer_data[1] = NULL;
-  private_data->buffer_data[2] = NULL;
+  private_data->buffer_data =
+      (const void**)ArrowMalloc(sizeof(void*) * NANOARROW_MAX_FIXED_BUFFERS);
+  for (int i = 0; i < NANOARROW_MAX_FIXED_BUFFERS; ++i) {
+    private_data->buffer_data[i] = NULL;
+  }

Review Comment:
   I think it might be less disruptive to keep the fixed-size `buffer_data` and 
just add a new `variadic_buffers` array (that is null/not allocated most of the 
time).



##########
src/nanoarrow/common/inline_array.h:
##########
@@ -467,52 +468,113 @@ static inline ArrowErrorCode 
ArrowArrayAppendDouble(struct ArrowArray* array,
   return NANOARROW_OK;
 }
 
+#define NANOARROW_BINARY_VIEW_INLINE_SIZE 12
+#define NANOARROW_BINARY_VIEW_PREVIEW_SIZE 4
+#define NANOARROW_BINARY_VIEW_BLOCK_SIZE 32 << 10  // 32KB

Review Comment:
   For the purposes of this PR, I wonder if you can omit the chunking logic 
(i.e., when building a string/binary view array, simply use the third fixed 
buffer as your one variadic buffer and always append the data to that one).
   
   For "build by buffer" we would need to support `ArrowArraySetBuffer()` on 
the variadic buffers (which would let a user implement this logic themselves), 
and perhaps a `union ArrowBinaryViewType ArrowBinaryViewInlined()`/`union 
ArrowBinaryViewType ArrowBinaryViewReference()` helper (if I didn't miss one 
you made already!)



##########
src/nanoarrow/common/array.c:
##########
@@ -449,14 +462,33 @@ static ArrowErrorCode ArrowArrayFinalizeBuffers(struct 
ArrowArray* array) {
     NANOARROW_RETURN_NOT_OK(ArrowArrayFinalizeBuffers(array->dictionary));
   }
 
+  if (private_data->storage_type == NANOARROW_TYPE_STRING_VIEW ||
+      private_data->storage_type == NANOARROW_TYPE_BINARY_VIEW) {

Review Comment:
   I think this should be in `ArrowArrayFlushInternalPointers()` (if I remember 
correctly, the "finalize buffers" step is skipped for non CPU data)



##########
python/src/nanoarrow/schema.py:
##########
@@ -792,6 +812,24 @@ def large_binary(nullable: bool = True) -> Schema:
     return Schema(Type.LARGE_BINARY, nullable=nullable)
 
 
+def binary_view(nullable: bool = True) -> Schema:
+    """Create an instance of a binary view type.
+
+    Parameters
+    ----------
+    nullable : bool, optional
+        Use ``False`` to mark this field as non-nullable.
+
+    Examples
+    --------
+
+    >>> import nanoarrow as na
+    >>> na.large_binary()
+    <Schema> large_binary

Review Comment:
   ```suggestion
       >>> na.binary_view()
       <Schema> binary_view
   ```



##########
python/src/nanoarrow/schema.py:
##########
@@ -756,6 +758,24 @@ def large_string(nullable: bool = True) -> Schema:
     return Schema(Type.LARGE_STRING, nullable=nullable)
 
 
+def string_view(nullable: bool = True) -> Schema:
+    """Create an instance of a string view type.
+
+    Parameters
+    ----------
+    nullable : bool, optional
+        Use ``False`` to mark this field as non-nullable.
+
+    Examples
+    --------
+
+    >>> import nanoarrow as na
+    >>> na.large_binary()
+    <Schema> large_binary

Review Comment:
   ```suggestion
       >>> na.string_view()
       <Schema> string_view
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to