[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #16: Implement array appenders

2022-08-10 Thread GitBox


paleolimbot commented on code in PR #16:
URL: https://github.com/apache/arrow-nanoarrow/pull/16#discussion_r942431188


##
src/nanoarrow/typedefs_inline.h:
##
@@ -166,6 +166,24 @@ enum ArrowType {
   NANOARROW_TYPE_INTERVAL_MONTH_DAY_NANO
 };
 
+/// \brief Functional types of buffers as described in the Arrow Columnar 
Specification
+enum ArrowBufferType {
+  NANOARROW_BUFFER_TYPE_NONE,
+  NANOARROW_BUFFER_TYPE_VALIDITY,
+  NANOARROW_BUFFER_TYPE_TYPE_IDS,
+  NANOARROW_BUFFER_TYPE_OFFSET,
+  NANOARROW_BUFFER_TYPE_DATA
+};
+
+/// \brief A description of an arrangement of buffers
+struct ArrowLayout {

Review Comment:
   Hang tight...it helps with a generic "reserve" that doesn't have to special 
case every storage type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #16: Implement array appenders

2022-08-10 Thread GitBox


paleolimbot commented on code in PR #16:
URL: https://github.com/apache/arrow-nanoarrow/pull/16#discussion_r942434862


##
src/nanoarrow/utils_inline.h:
##
@@ -26,6 +26,114 @@
 extern "C" {
 #endif
 
+static inline void ArrowLayoutInit(struct ArrowLayout* layout,
+   enum ArrowType storage_type) {
+  layout->buffer_type[0] = NANOARROW_BUFFER_TYPE_VALIDITY;
+  layout->buffer_type[1] = NANOARROW_BUFFER_TYPE_NONE;
+  layout->buffer_type[2] = NANOARROW_BUFFER_TYPE_NONE;
+
+  layout->element_size_bits[0] = 1;
+  layout->element_size_bits[1] = 0;
+  layout->element_size_bits[2] = 0;
+
+  switch (storage_type) {
+case NANOARROW_TYPE_NA:
+  layout->buffer_type[0] = NANOARROW_BUFFER_TYPE_NONE;
+  layout->element_size_bits[0] = 0;
+  break;
+
+case NANOARROW_TYPE_LIST:
+  layout->buffer_type[1] = NANOARROW_BUFFER_TYPE_OFFSET;
+  layout->element_size_bits[1] = 32;
+  break;
+
+case NANOARROW_TYPE_LARGE_LIST:
+  layout->buffer_type[1] = NANOARROW_BUFFER_TYPE_OFFSET;
+  layout->element_size_bits[1] = 64;
+  break;
+
+case NANOARROW_TYPE_BOOL:
+  layout->buffer_type[1] = NANOARROW_BUFFER_TYPE_DATA;
+  layout->element_size_bits[1] = 1;
+  break;
+
+case NANOARROW_TYPE_UINT8:
+case NANOARROW_TYPE_INT8:
+  layout->buffer_type[1] = NANOARROW_BUFFER_TYPE_DATA;
+  layout->element_size_bits[1] = 8;
+  break;
+
+case NANOARROW_TYPE_UINT16:
+case NANOARROW_TYPE_INT16:
+case NANOARROW_TYPE_HALF_FLOAT:
+  layout->buffer_type[1] = NANOARROW_BUFFER_TYPE_DATA;
+  layout->element_size_bits[1] = 16;
+  break;
+
+case NANOARROW_TYPE_UINT32:
+case NANOARROW_TYPE_INT32:
+case NANOARROW_TYPE_FLOAT:
+case NANOARROW_TYPE_INTERVAL_MONTHS:
+  layout->buffer_type[1] = NANOARROW_BUFFER_TYPE_DATA;
+  layout->element_size_bits[1] = 32;
+  break;
+
+case NANOARROW_TYPE_UINT64:
+case NANOARROW_TYPE_INT64:
+case NANOARROW_TYPE_DOUBLE:
+case NANOARROW_TYPE_INTERVAL_DAY_TIME:
+  layout->buffer_type[1] = NANOARROW_BUFFER_TYPE_DATA;
+  layout->element_size_bits[1] = 64;
+  break;
+
+case NANOARROW_TYPE_DECIMAL128:
+case NANOARROW_TYPE_INTERVAL_MONTH_DAY_NANO:
+  layout->buffer_type[1] = NANOARROW_BUFFER_TYPE_DATA;
+  layout->element_size_bits[1] = 128;
+  break;
+
+case NANOARROW_TYPE_DECIMAL256:
+  layout->buffer_type[1] = NANOARROW_BUFFER_TYPE_DATA;
+  layout->element_size_bits[1] = 256;
+  break;
+
+case NANOARROW_TYPE_FIXED_SIZE_BINARY:
+  layout->buffer_type[1] = NANOARROW_BUFFER_TYPE_DATA;

Review Comment:
   Hang tight...this will get parsed/set with `ArrowSchemaViewInit`, although 
you're right that `ArrowArrayInit()` might need a variant to support that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #16: Implement array appenders

2022-08-10 Thread GitBox


paleolimbot commented on code in PR #16:
URL: https://github.com/apache/arrow-nanoarrow/pull/16#discussion_r942437669


##
src/nanoarrow/nanoarrow.h:
##
@@ -508,9 +512,31 @@ void ArrowArraySetValidityBitmap(struct ArrowArray* array, 
struct ArrowBitmap* b
 ArrowErrorCode ArrowArraySetBuffer(struct ArrowArray* array, int64_t i,
struct ArrowBuffer* buffer);
 
+static inline struct ArrowBitmap* ArrowArrayValidityBitmap(struct ArrowArray* 
array);
+
+static inline struct ArrowBuffer* ArrowArrayBuffer(struct ArrowArray* array, 
int64_t i);
+
+static inline ArrowErrorCode ArrowArrayReserve(struct ArrowArray* array,
+   int64_t 
additional_size_elements);
+
+static inline ArrowErrorCode ArrowArrayResize(struct ArrowArray* array,
+  int64_t new_capacity_elements,
+  char shrink_to_fit);
+
+static inline ArrowErrorCode ArrowArrayAppendNull(struct ArrowArray* array, 
int64_t n);
+
+static inline ArrowErrorCode ArrowArrayAppendInt(struct ArrowArray* array, 
int64_t value);

Review Comment:
   At the point that one is appending by element, does the time required to 
cast to 64-bit width and back matter? The truly fast way to do it would be 
`ArrowBufferAppendUnsafe(ArrowArrayBuffer(array, 1), &value, sizeof(int16_t)))`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #16: Implement array appenders

2022-08-10 Thread GitBox


paleolimbot commented on code in PR #16:
URL: https://github.com/apache/arrow-nanoarrow/pull/16#discussion_r942441450


##
src/nanoarrow/nanoarrow.h:
##
@@ -508,9 +512,31 @@ void ArrowArraySetValidityBitmap(struct ArrowArray* array, 
struct ArrowBitmap* b
 ArrowErrorCode ArrowArraySetBuffer(struct ArrowArray* array, int64_t i,
struct ArrowBuffer* buffer);
 
+static inline struct ArrowBitmap* ArrowArrayValidityBitmap(struct ArrowArray* 
array);
+
+static inline struct ArrowBuffer* ArrowArrayBuffer(struct ArrowArray* array, 
int64_t i);
+
+static inline ArrowErrorCode ArrowArrayReserve(struct ArrowArray* array,
+   int64_t 
additional_size_elements);
+
+static inline ArrowErrorCode ArrowArrayResize(struct ArrowArray* array,
+  int64_t new_capacity_elements,
+  char shrink_to_fit);
+
+static inline ArrowErrorCode ArrowArrayAppendNull(struct ArrowArray* array, 
int64_t n);
+
+static inline ArrowErrorCode ArrowArrayAppendInt(struct ArrowArray* array, 
int64_t value);
+
+static inline ArrowErrorCode ArrowArrayAppendUInt(struct ArrowArray* array, 
uint64_t value);
+
+static inline ArrowErrorCode ArrowArrayAppendDouble(struct ArrowArray* array, 
double value);

Review Comment:
   Same as above...at the point that you're element-wise appending, does the 
cast matter?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #16: Implement array appenders

2022-08-10 Thread GitBox


paleolimbot commented on code in PR #16:
URL: https://github.com/apache/arrow-nanoarrow/pull/16#discussion_r942442675


##
src/nanoarrow/nanoarrow.h:
##
@@ -508,9 +512,31 @@ void ArrowArraySetValidityBitmap(struct ArrowArray* array, 
struct ArrowBitmap* b
 ArrowErrorCode ArrowArraySetBuffer(struct ArrowArray* array, int64_t i,
struct ArrowBuffer* buffer);
 
+static inline struct ArrowBitmap* ArrowArrayValidityBitmap(struct ArrowArray* 
array);
+
+static inline struct ArrowBuffer* ArrowArrayBuffer(struct ArrowArray* array, 
int64_t i);
+
+static inline ArrowErrorCode ArrowArrayReserve(struct ArrowArray* array,
+   int64_t 
additional_size_elements);
+
+static inline ArrowErrorCode ArrowArrayResize(struct ArrowArray* array,
+  int64_t new_capacity_elements,
+  char shrink_to_fit);
+
+static inline ArrowErrorCode ArrowArrayAppendNull(struct ArrowArray* array, 
int64_t n);
+
+static inline ArrowErrorCode ArrowArrayAppendInt(struct ArrowArray* array, 
int64_t value);

Review Comment:
   The intent is that these will append by value and fail if the value is not 
exactly representable by the current array type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #16: Implement array appenders

2022-08-10 Thread GitBox


paleolimbot commented on code in PR #16:
URL: https://github.com/apache/arrow-nanoarrow/pull/16#discussion_r942446356


##
src/nanoarrow/array_inline.h:
##
@@ -0,0 +1,146 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef NANOARROW_ARRAY_INLINE_H_INCLUDED
+#define NANOARROW_ARRAY_INLINE_H_INCLUDED
+
+#include 
+#include 
+#include 
+
+#include "bitmap_inline.h"
+#include "buffer_inline.h"
+#include "typedefs_inline.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+static inline struct ArrowBitmap* ArrowArrayValidityBitmap(struct ArrowArray* 
array) {
+  struct ArrowArrayPrivateData* private_data =
+  (struct ArrowArrayPrivateData*)array->private_data;
+  return &private_data->bitmap;
+}
+
+static inline struct ArrowBuffer* ArrowArrayBuffer(struct ArrowArray* array, 
int64_t i) {
+  struct ArrowArrayPrivateData* private_data =
+  (struct ArrowArrayPrivateData*)array->private_data;
+  switch (i) {
+case 0:
+  return &private_data->bitmap.buffer;
+default:
+  return private_data->buffers + i - 1;
+  }
+}
+
+static inline ArrowErrorCode ArrowArrayReserve(struct ArrowArray* array,
+   int64_t 
additional_size_elements) {
+  struct ArrowArrayPrivateData* private_data =
+  (struct ArrowArrayPrivateData*)array->private_data;
+  return EINVAL;
+}
+
+static inline ArrowErrorCode ArrowArrayFinish(struct ArrowArray* array,
+  char shrink_to_fit) {
+  struct ArrowArrayPrivateData* private_data =
+  (struct ArrowArrayPrivateData*)array->private_data;
+
+  if (shrink_to_fit) {
+// TODO: implement
+return EINVAL;
+  }
+
+  // Make sure the value we get with array->buffers[i] is set to the actual
+  // pointer (which may have changed from the original due to reallocation)
+  for (int64_t i = 0; i < 3; i++) {
+private_data->buffer_data[i] = ArrowArrayBuffer(array, i)->data;
+  }
+
+  return NANOARROW_OK;
+}
+
+static inline ArrowErrorCode ArrowArrayAppendNull(struct ArrowArray* array, 
int64_t n) {
+  struct ArrowArrayPrivateData* private_data =
+  (struct ArrowArrayPrivateData*)array->private_data;
+
+  if (n == 0) {
+return NANOARROW_OK;
+  }
+
+  int result;
+
+  // Append n 0 bits to the validity bitmap. If we haven't allocated a bitmap 
yet, do it
+  // now
+  if (private_data->bitmap.buffer.data == NULL) {
+result = ArrowBitmapReserve(&private_data->bitmap, array->length + n);
+if (result != NANOARROW_OK) {
+  return result;
+}
+
+ArrowBitmapAppendUnsafe(&private_data->bitmap, 1, array->length);
+ArrowBitmapAppendUnsafe(&private_data->bitmap, 0, n);
+  } else {
+result = ArrowBitmapReserve(&private_data->bitmap, n);
+if (result != NANOARROW_OK) {
+  return result;
+}
+
+ArrowBitmapAppendUnsafe(&private_data->bitmap, 0, n);
+  }
+
+  // TODO Depending on the type, we need to append some elements to some 
buffers

Review Comment:
   Yes, plus preserving the "append by value" semantics 
(`ArrowArrayAppendInt(array, 0)` will append a literal non-null zero; 
`ArrowArrayAppendNull(array, 1)` will append a null and nobody has to think 
about what the fill value is or whether it's required.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on a diff in pull request #16: Implement array appenders

2022-08-10 Thread GitBox


paleolimbot commented on code in PR #16:
URL: https://github.com/apache/arrow-nanoarrow/pull/16#discussion_r942470780


##
src/nanoarrow/nanoarrow.h:
##
@@ -508,9 +512,31 @@ void ArrowArraySetValidityBitmap(struct ArrowArray* array, 
struct ArrowBitmap* b
 ArrowErrorCode ArrowArraySetBuffer(struct ArrowArray* array, int64_t i,
struct ArrowBuffer* buffer);
 
+static inline struct ArrowBitmap* ArrowArrayValidityBitmap(struct ArrowArray* 
array);
+
+static inline struct ArrowBuffer* ArrowArrayBuffer(struct ArrowArray* array, 
int64_t i);
+
+static inline ArrowErrorCode ArrowArrayReserve(struct ArrowArray* array,
+   int64_t 
additional_size_elements);
+
+static inline ArrowErrorCode ArrowArrayResize(struct ArrowArray* array,
+  int64_t new_capacity_elements,
+  char shrink_to_fit);
+
+static inline ArrowErrorCode ArrowArrayAppendNull(struct ArrowArray* array, 
int64_t n);
+
+static inline ArrowErrorCode ArrowArrayAppendInt(struct ArrowArray* array, 
int64_t value);

Review Comment:
   Give me a day or so to flush this out...sometimes when I start typing it 
becomes immediately obvious that it was a bad idea. The tests will provide a 
good starting place for benchmarks (I was planning to use Arrow's builder to 
build some things and this builder to build some things and compare).
   
   In the context of a database driver, you'd be doing a 
`switch(database_type)` or `switch(arrow_type)` for every row anyway...this 
would just move that branch inside nanoarrow (I think).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org