[jira] [Created] (ARROW-17300) [Java][Docs] Compare/contrast the Netty and Unsafe memory backends

2022-08-03 Thread David Li (Jira)
David Li created ARROW-17300:


 Summary: [Java][Docs] Compare/contrast the Netty and Unsafe memory 
backends
 Key: ARROW-17300
 URL: https://issues.apache.org/jira/browse/ARROW-17300
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation, Java
Reporter: David Li


We should compare why you might want to use each.

Are there benchmarks in the Java benchmark suite that might also be useful? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17299) Expose the Scanner kDefaultBatchReadahead and kDefaultFragmentReadahead parameters

2022-08-03 Thread Ziheng Wang (Jira)
Ziheng Wang created ARROW-17299:
---

 Summary: Expose the Scanner kDefaultBatchReadahead and 
kDefaultFragmentReadahead parameters
 Key: ARROW-17299
 URL: https://issues.apache.org/jira/browse/ARROW-17299
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++, Python
Reporter: Ziheng Wang
Assignee: Ziheng Wang


In the Scanner there are parameters kDefaultFragmentReadahead and 
kDefaultBatchReadahead that are currently set to fixed numbers that cannot be 
changed.

This is not great because tuning these numbers is the key to tradeoff RAM usage 
and network IO utilization during reading. For example on an i3.2xlarge 
instance on AWS you can get peak throughput only by quadrupling 
kDefaultFragmentReadahead from the default. 

The current settings are very conservative and assume a < 1Gbps network. 
Exposing them allow people to tune the Scanner behavior to their own hardware. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17298) [C++][Docs] Add Acero project example in Getting Started Section

2022-08-03 Thread Will Jones (Jira)
Will Jones created ARROW-17298:
--

 Summary: [C++][Docs] Add Acero project example in Getting Started 
Section
 Key: ARROW-17298
 URL: https://issues.apache.org/jira/browse/ARROW-17298
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: Will Jones
 Fix For: 10.0.0


>From [~westonpace]:

{quote}
A request I've seen a few times (and just received now) has been...
Can you point me at a sample C++ starter project that links against Acero?  For 
example, I tend to use a CMakeLists.txt that looks something like...
cmake_minimum_required(VERSION 3.10)

{code}
set(CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake;${CMAKE_MODULE_PATH}")
set(CMAKE_CXX_FLAGS "-Wall -Wextra")
# set(CMAKE_CXX_FLAGS_DEBUG "-g")
set(CMAKE_CXX_FLAGS_RELEASE "-O3")

# set the project name
project(Experiments VERSION 1.0)

# specify the C++ standard
set(CMAKE_CXX_STANDARD 17)
set(CMAKE_CXX_STANDARD_REQUIRED True)

set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

if(NOT DEFINED CONDA_HOME)
  message(FATAL_ERROR "CONDA_HOME is a required variable")
endif()

include_directories(SYSTEM ${CONDA_HOME}/include)
link_directories(${CONDA_HOME}/lib64)
link_directories(${CONDA_HOME}/lib)

function(experiment TARGET)
add_executable(
${TARGET}
${TARGET}.cc
)
target_link_libraries(
${TARGET}
arrow
arrow_dataset
parquet
aws-cpp-sdk-core
aws-cpp-sdk-s3
glog
pthread
re2
utf8proc
lz4
snappy
z
zstd
aws-cpp-sdk-identity-management
thrift
)
if (MSVC)
target_compile_options(${TARGET} PRIVATE /W4 /WX)
else ()
target_compile_options(${TARGET} PRIVATE -Wall -Wextra -Wpedantic 
-Werror)
endif ()
endfunction()

experiment(arrow_16642)
{code}
{quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [arrow-nanoarrow] lidavidm commented on pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-03 Thread GitBox


lidavidm commented on PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#issuecomment-1204391711

   > I'm not sure which buffer you mean! The `struct ArrowBuffer`?
   
   Sorry, I mean the buffer of values inside the bitmap builder


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-03 Thread GitBox


paleolimbot commented on PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#issuecomment-1204391097

   I'm not sure which buffer you mean! The `struct ArrowBuffer`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-17297) [Java] [Doc] C++ to Java via C Data Interface (ImportArray, ImportRecordBatch)

2022-08-03 Thread David Dali Susanibar Arce (Jira)
David Dali Susanibar Arce created ARROW-17297:
-

 Summary: [Java] [Doc] C++ to Java via C Data Interface 
(ImportArray, ImportRecordBatch)
 Key: ARROW-17297
 URL: https://issues.apache.org/jira/browse/ARROW-17297
 Project: Apache Arrow
  Issue Type: Task
  Components: Documentation, Java
Reporter: David Dali Susanibar Arce


Add detail documentation about how-to interact between C++ to Java via C Data 
Interface for:
 * ImportArray
 * ImportRecordBatch



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17296) [Python] Doctest failure in pyarrow.parquet.read_metadata after 10.0.0 dev version update

2022-08-03 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-17296:


 Summary: [Python] Doctest failure in pyarrow.parquet.read_metadata 
after 10.0.0 dev version update
 Key: ARROW-17296
 URL: https://issues.apache.org/jira/browse/ARROW-17296
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Wes McKinney
 Fix For: 10.0.0


The version update caused the doctest in this function to fail



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [arrow-nanoarrow] lidavidm commented on a diff in pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-03 Thread GitBox


lidavidm commented on code in PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#discussion_r937046260


##
src/nanoarrow/typedefs_inline.h:
##
@@ -0,0 +1,172 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef NANOARROW_TYPEDEFS_INLINE_H_INCLUDED
+#define NANOARROW_TYPEDEFS_INLINE_H_INCLUDED
+
+#include 
+
+/// \defgroup nanoarrow-inline-typedef Type definitions used in inlined 
implementations

Review Comment:
   This still needs the `extern "C"`



##
src/nanoarrow/bitmap_inline.h:
##
@@ -0,0 +1,131 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef NANOARROW_BITMAP_INLINE_H_INCLUDED
+#define NANOARROW_BITMAP_INLINE_H_INCLUDED
+
+#include 
+#include 
+
+#include "buffer_inline.h"
+#include "typedefs_inline.h"
+
+static inline int8_t ArrowBitmapElement(const void* bitmap, int64_t i) {

Review Comment:
   It may be worth stealing Arrow's implementations of these utilities: 
https://github.com/apache/arrow/blob/3b987d92d14ce7b5f5ccd2afb7366273e20348d4/cpp/src/arrow/util/bit_util.h#L296



##
src/nanoarrow/bitmap_inline.h:
##
@@ -0,0 +1,131 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#ifndef NANOARROW_BITMAP_INLINE_H_INCLUDED
+#define NANOARROW_BITMAP_INLINE_H_INCLUDED
+
+#include 
+#include 
+
+#include "buffer_inline.h"
+#include "typedefs_inline.h"
+
+static inline int8_t ArrowBitmapElement(const void* bitmap, int64_t i) {
+  const int8_t* bitmap_char = (const int8_t*)bitmap;
+  return 0 != (bitmap_char[i / 8] & ((int8_t)0x01) << (i % 8));
+}
+
+static inline void ArrowBitmapSetElement(void* bitmap, int64_t i, int8_t 
value) {
+  int8_t* bitmap_char = (int8_t*)bitmap;
+  int8_t mask = 0x01 << (i % 8);
+  if (value) {
+bitmap_char[i / 8] |= mask;
+  } else {
+bitmap_char[i / 8] &= ~mask;
+  }
+}
+
+static inline int64_t ArrowBitmapCountTrue(const void* bitmap, int64_t i_from,
+   int64_t i_to) {
+  int64_t count = 0;
+  for (int64_t i = i_from; i < i_to; i++) {
+count += ArrowBitmapElement(bitmap, i);

Review Comment:
   I wonder if this optimized to popcnt or not…but Arrow also has 
implementations to steal!



##
src/nanoarrow/nanoarrow.c:
##
@@ -17,6 +17,7 @@
 
 #include "allocator.c"
 #include "buffer.c"
+#include "bitmap.c"

Review Comment:
   Does this file exist?



##
src/nanoarrow/nanoarrow.h:
##
@@ -483,82 +372,117 @@ ArrowErrorCode ArrowSchemaViewInit(struct 
ArrowSchemaView* schema_view,
 
 /// }@
 
-/// \defgroup nanoarrow-buffer-builder Growable buffer builders
-
-/// \brief An owning mutable view of a buffer
-struct ArrowBuffer {
-  /// \brief A pointer t

[GitHub] [arrow-nanoarrow] paleolimbot commented on pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-03 Thread GitBox


paleolimbot commented on PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#issuecomment-1204376962

   Ok, made the definitions inline, although perhaps there is a prettier way to 
do it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot opened a new pull request, #12: Add metadata builder functions

2022-08-03 Thread GitBox


paleolimbot opened a new pull request, #12:
URL: https://github.com/apache/arrow-nanoarrow/pull/12

   Fixes #6.
   
   This PR adds functions `ArrowMetadataBuilderAppend()` (for blindly but 
efficiently adding a key/value pair to the end of some metadata) and 
`ArrowMetadataBuilderSet()` (to less efficiently replace or remove a value for 
key).
   
   The use case I had in mind is building extension type metadata from some 
input (i.e., make an output schema like the input except with new extension 
type or with new serialized extension type metadata). It's rather difficult to 
replicate the "replace" or "remove" behaviour otherwise.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-17295) [C++] Build separate bundled_depenencies.so

2022-08-03 Thread Will Jones (Jira)
Will Jones created ARROW-17295:
--

 Summary: [C++] Build separate bundled_depenencies.so
 Key: ARROW-17295
 URL: https://issues.apache.org/jira/browse/ARROW-17295
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 8.0.1, 8.0.0
Reporter: Will Jones


When building arrow _static_ libraries with bundled dependencies, we produce 
{{{}arrow_bundled_dependencies.a{}}}. But when building dynamic libraries, the 
bundled dependencies are statically linked directly into the arrow libraries 
(libarrow, libarrow_flight, etc.). This means that users can access the symbols 
of bundled dependencies in the static case, but not in the dynamic library case.

One use case of this is being able to pass in gRPC configuration to a Flight 
server, which requires access to gRPC symbols.

Could we change the dynamic library building to build an 
{{arrow_bundled_dependencies.so}} so that the symbols are accessible?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [arrow-nanoarrow] paleolimbot commented on pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-03 Thread GitBox


paleolimbot commented on PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#issuecomment-1204110572

   You convinced me! I didn't know you could split up an `inline` declaration 
and definition and that solves the conundrum I was having.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] lidavidm commented on pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-03 Thread GitBox


lidavidm commented on PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#issuecomment-1204107639

   But if it's easier to do things in stages then we can just review the 
current PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] lidavidm commented on pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-03 Thread GitBox


lidavidm commented on PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#issuecomment-1204106882

   I don't think we need to split up header files or rearrange things, though, 
and I don't think we need to inline the buffer functions - it could just be 
moving the definitions from `bitmap.c` to `nanoarrow.h` and adding `inline`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm opened a new issue, #53: [Python] Driver distribution

2022-08-03 Thread GitBox


lidavidm opened a new issue, #53:
URL: https://github.com/apache/arrow-adbc/issues/53

   We don't want to/can't futz with shared libraries when distributing 
packages. Instead, since drivers have an entrypoint, just expose the entrypoint 
in Python?
   
   e.g. `adbc_driver_sqlite` will bundle the driver and expose 
`adbc_driver_sqlite._entrypoint(driver_address: int) -> int` ~= `AdbcStatusCode 
_entrypoint(struct AdbcDriver* driver)`. Then the driver manager can just 
import the package and call the entrypoint. (There'll need to be some support 
for this on the C++ side too)
   
   That way we can more easily distribute pip-installable packages


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm opened a new issue, #52: [Python] Implement DBAPI

2022-08-03 Thread GitBox


lidavidm opened a new issue, #52:
URL: https://github.com/apache/arrow-adbc/issues/52

   Will make implementing Ibis backends easier since we can reuse more of the 
code for SQLAlchemy-based backends


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] pitrou commented on issue #11: Inline performance-sensitive functions and their dependencies

2022-08-03 Thread GitBox


pitrou commented on issue #11:
URL: https://github.com/apache/arrow-nanoarrow/issues/11#issuecomment-1204071180

   Note that inline functions (or macros) require you to be very careful if you 
want to maintain some ABI compatibility. I don't know if that's a goal of 
nanoarrow, but worth remembering.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot commented on pull request #10: Implement bitmap setters, getters, and element-wise builder

2022-08-03 Thread GitBox


paleolimbot commented on PR #10:
URL: https://github.com/apache/arrow-nanoarrow/pull/10#issuecomment-1204069688

   I made #11 to set up a PR for the inlining process...that's definitely 
important but requires a rather different set of changes (splitting up header 
files, rearranging stuff, defines, maybe some benchmarking) that transcend the 
scope of the bitmap (since these implementations use the buffer functions). If 
it works for all of you, this and perhaps the next few PRs are more about 
scope, syntax, and correctness (i.e., tests).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] lidavidm commented on issue #11: Inline performance-sensitive functions and their dependencies

2022-08-03 Thread GitBox


lidavidm commented on issue #11:
URL: https://github.com/apache/arrow-nanoarrow/issues/11#issuecomment-1204065088

   I don't think you need to split up the headers, just 
   
   ```c
   /// \brief Docstring...
   inline ArrowErrorCode ArrowBufferAppend() {
 // Definition...
   }
   ```
   
   directly in the header and continue as usual.
   
   If having definitions inline is messy, you could still split the declaration 
and definition (and then shunt all the definitions to the bottom)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-nanoarrow] paleolimbot opened a new issue, #11: Inline performance-sensitive functions and their dependencies

2022-08-03 Thread GitBox


paleolimbot opened a new issue, #11:
URL: https://github.com/apache/arrow-nanoarrow/issues/11

   A number of functions should be defined as `inline` with their definition 
visible to the compiler when a client types `#include "nanoarrow.h"`. Buffer 
appenders, element-wise builder, and bitmap functions are in this category.
   
   My expertise is pretty limited with respect to strategies of how to do this 
while maintaining nice documentation...I imagine the headers would get split up 
(error.h, buffer.h, bitmap.h, builder.h) and nanoarrow.h would get reduced to a 
set of includes. I rather liked reading through adbc.h to get a sense of what 
ADBC is and what it does, so perhaps there's another strategy that can be used 
to maintain nanoarrow.h in something like its current form.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on pull request #51: [Python] Set up package with Poetry

2022-08-03 Thread GitBox


lidavidm commented on PR #51:
URL: https://github.com/apache/arrow-adbc/pull/51#issuecomment-1204017482

   I also grabbed https://pypi.org/project/adbc-driver-manager/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] lidavidm commented on a diff in pull request #51: [Python] Set up package with Poetry

2022-08-03 Thread GitBox


lidavidm commented on code in PR #51:
URL: https://github.com/apache/arrow-adbc/pull/51#discussion_r936686336


##
python/adbc_driver_manager/requirements-dev.txt:
##
@@ -1,19 +1,124 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-Cython
-pytest
+atomicwrites==1.4.1; python_version >= "3.7" and python_full_version < "3.0.0" 
and sys_platform == "win32" or sys_platform == "win32" and python_version >= 
"3.7" and python_full_version >= "3.4.0" \

Review Comment:
   Yup, it's via `poetry export`; I'll update CONTRIBUTING.md and figure out 
the CI changes here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] pitrou commented on a diff in pull request #51: [Python] Set up package with Poetry

2022-08-03 Thread GitBox


pitrou commented on code in PR #51:
URL: https://github.com/apache/arrow-adbc/pull/51#discussion_r936684861


##
python/adbc_driver_manager/requirements-dev.txt:
##
@@ -1,19 +1,124 @@
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-Cython
-pytest
+atomicwrites==1.4.1; python_version >= "3.7" and python_full_version < "3.0.0" 
and sys_platform == "win32" or sys_platform == "win32" and python_version >= 
"3.7" and python_full_version >= "3.4.0" \

Review Comment:
   Is this auto-generated somehow? Would be nice to add some comment explaining 
how.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-17294) [Release] Update remove old artifacts release script

2022-08-03 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-17294:
---

 Summary: [Release] Update remove old artifacts release script
 Key: ARROW-17294
 URL: https://issues.apache.org/jira/browse/ARROW-17294
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Reporter: Krisztian Szucs
 Fix For: 10.0.0


I just executed the remove old artifacts release script which also removed the 
previously created three patch releases for 6.0.2, 7.0.1, 8.0.1. 

That's not desirable since those have just been released so I had to revert to 
an earlier revision.

cc [~kou] [~assignUser] [~raulcd]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17293) [Java][CI] Prune java nightly builds

2022-08-03 Thread Jacob Wujciak-Jens (Jira)
Jacob Wujciak-Jens created ARROW-17293:
--

 Summary: [Java][CI] Prune java nightly builds
 Key: ARROW-17293
 URL: https://issues.apache.org/jira/browse/ARROW-17293
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, Java
Reporter: Jacob Wujciak-Jens


Currently we are accumulating a huge number of nightly java jars. We should 
prune them to keep max. 14 versions around. (see r_nightly.yml)

It might also be nice to always rename/copy the most recent jars to something 
fixed so there is no need to update your local config to always have the newest 
version?  (but up to the java users if this is necessary/worth it).

 

cc [~dsusanibara] [~ljw1001] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17292) [C++] Segmentation fault on arrow-compute-hash-join-node-test on macos nightlies

2022-08-03 Thread Jira
Raúl Cumplido created ARROW-17292:
-

 Summary: [C++] Segmentation fault on 
arrow-compute-hash-join-node-test on macos nightlies
 Key: ARROW-17292
 URL: https://issues.apache.org/jira/browse/ARROW-17292
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Raúl Cumplido
 Fix For: 10.0.0


Some of our nightly builds are failing due to a segmentation fault on hash-join 
tests:
{code:java}
 33/90 Test #35: arrow-compute-hash-join-node-test .***Failed    1.21 
sec
Running arrow-compute-hash-join-node-test, redirecting output into 
/var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/arrow-HEAD.X.W72iCJcj/cpp-build/build/test-logs/arrow-compute-hash-join-node-test.txt
 (attempt 1/1)
/Users/runner/work/crossbow/crossbow/arrow/cpp/build-support/run-test.sh: line 
88: 78018 Segmentation fault: 11  $TEST_EXECUTABLE "$@" > $LOGFILE.raw 2>&1
Running main() from 
/var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/arrow-HEAD.X.W72iCJcj/cpp-build/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest_main.cc
[==] Running 29 tests from 4 test suites.
[--] Global test environment set-up.
[--] 10 tests from HashJoin
[ RUN      ] HashJoin.Suffix
[       OK ] HashJoin.Suffix (4 ms)
[ RUN      ] HashJoin.Random
/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvmgn/T/arrow-HEAD.X.W72iCJcj/cpp-build/src/arrow/compute/exec
 {code}
The failures can be seen. It seems to be only related to macos from the failed 
jobs:
[verify-rc-source-cpp-macos-conda-amd64|https://github.com/ursacomputing/crossbow/runs/7631965199?check_suite_focus=true]
[verify-rc-source-integration-macos-conda-amd64|https://github.com/ursacomputing/crossbow/runs/7631969879?check_suite_focus=true]
[verify-rc-source-python-macos-amd64|https://github.com/ursacomputing/crossbow/runs/7631926429?check_suite_focus=true]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [arrow-nanoarrow] pitrou commented on issue #8: Implement `struct ArrowArrayBuilder`

2022-08-03 Thread GitBox


pitrou commented on issue #8:
URL: https://github.com/apache/arrow-nanoarrow/issues/8#issuecomment-1203706719

   > One way to do this is a bunch of callables that are something like the 
reverse of SQLite's result interface
   
   So this is for building arrays whose type is not strictly known at 
compile-time? Going through function pointers will make this slower than if you 
expose type-specific inline functions/macros, at least when appending one items 
at a time.
   
   We have to think about the use cases for nanoarrow. Mostly, it will be used 
for bridging/converting with non-Arrow systems, and in this scheme people will 
probably have runtime switch/case statements to dispatch based on datatype 
anyway. So I'm not sure an untyped convention is really useful.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org