pitrou commented on a change in pull request #11206:
URL: https://github.com/apache/arrow/pull/11206#discussion_r718264548
##########
File path: ci/docker/debian-10-go-cgo.dockerfile
##########
@@ -0,0 +1,34 @@
+# Licensed to the Apache Software Foundation (ASF) under one
Review comment:
Why do we need this in addition to the existing
`debian-10-go-cgo-python` dockerfile?
##########
File path: go/arrow/memory/cgo_allocator.go
##########
@@ -0,0 +1,108 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// +build cgo
+// +build ccalloc
+
+package memory
+
+import (
+ "runtime"
+
+ cga "github.com/apache/arrow/go/arrow/memory/internal/cgoalloc"
+)
+
+// CgoArrowAllocator is an allocator which exposes the C++ memory pool class
+// from the Arrow C++ Library as an allocator for memory buffers to use in Go.
+// The build tag 'ccalloc' must be used in order to include it as it requires
+// linking against the arrow library.
+//
+// The primary reason to use this would be as an allocator when dealing with
+// exporting data across the cdata interface in order to ensure that the memory
+// is allocated safely on the C side so it can be held on the CGO side beyond
+// the context of a single function call. If the memory in use isn't allocated
+// on the C side, then it is not safe for any pointers to data to be held
outside
+// of Go beyond the context of a single Cgo function call as it will be
invisible
+// to the Go garbage collector and could potentially get moved without being
updated.
Review comment:
I don't understand this comment. In #11220 your `releaseData` function
seems to take care of this correctly (by defining a global container of
exported arrays).
##########
File path: go/arrow/memory/internal/cgoalloc/allocator.cc
##########
@@ -0,0 +1,75 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// +build ccalloc
+
+#include "allocator.h"
+#include "arrow/memory_pool.h"
+#include "helpers.h"
+
+struct mem_holder {
+ std::unique_ptr<arrow::MemoryPool> pool;
+ arrow::MemoryPool* current_pool;
Review comment:
Why isn't this a `unique_ptr` as well?
##########
File path: go/arrow/memory/internal/cgoalloc/helpers.h
##########
@@ -0,0 +1,59 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include <cstdint>
+#include <memory>
+
+// helper functions to be included by C++ code for interacting with Cgo
+
+// create_ref will construct a shared_ptr on the heap and return a pointer
+// to it. the returned uintptr_t can then be used with retrieve_instance
+// to get back the shared_ptr and object it refers to. This ensures that
+// the object outlives the exported function so that Go can use it.
+template <typename T>
+uintptr_t create_ref(std::shared_ptr<T> t) {
+ std::shared_ptr<T>* retained_ptr = new std::shared_ptr<T>(t);
+ return reinterpret_cast<uintptr_t>(retained_ptr);
+}
+
+// specialization for shared_ptrs to const objects
Review comment:
I'm curious: `create_ref<const T>` doesn't work by itself?
##########
File path: go/arrow/memory/internal/cgoalloc/allocator.go
##########
@@ -0,0 +1,99 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// +build ccalloc
+
+package cgoalloc
+
+// #cgo !windows pkg-config: arrow
+// #cgo CXXFLAGS: -std=c++14
+// #cgo windows LDFLAGS: -larrow
+// #include "allocator.h"
+import "C"
+import (
+ "reflect"
+ "unsafe"
+)
+
+// CGOMemPool is an alias to the typedef'd uintptr from the allocator.h file
+type CGOMemPool = C.ArrowMemoryPool
+
+// CgoPoolAlloc allocates a block of memory of length 'size' using the memory
+// pool that is passed in.
+func CgoPoolAlloc(pool CGOMemPool, size int) []byte {
+ var out *C.uint8_t
+ C.arrow_pool_allocate(pool, C.int64_t(size),
(**C.uint8_t)(unsafe.Pointer(&out)))
+
+ var ret []byte
+ s := (*reflect.SliceHeader)(unsafe.Pointer(&ret))
+ s.Data = uintptr(unsafe.Pointer(out))
+ s.Len = size
+ s.Cap = size
+
+ return ret
+}
+
+// CgoPoolRealloc calls 'reallocate' on the block of memory passed in which
must
+// be a slice that was returned by CgoPoolAlloc or CgoPoolRealloc.
+func CgoPoolRealloc(pool CGOMemPool, size int, b []byte) []byte {
+ oldSize := C.int64_t(len(b))
+ data := (*C.uint8_t)(unsafe.Pointer(&b[0]))
+ C.arrow_pool_reallocate(pool, oldSize, C.int64_t(size), &data)
+
+ var ret []byte
+ s := (*reflect.SliceHeader)(unsafe.Pointer(&ret))
+ s.Data = uintptr(unsafe.Pointer(data))
+ s.Len = size
+ s.Cap = size
+
+ return ret
+}
+
+// CgoPoolFree uses the indicated memory pool to free a block of memory. The
+// slice passed in *must* be a slice which was returned by CgoPoolAlloc or
+// CgoPoolRealloc.
+func CgoPoolFree(pool CGOMemPool, b []byte) {
+ if len(b) == 0 {
Review comment:
I don't see a corresponding condition in `CgoPoolAlloc` and
`CgoPoolRealloc`, won't this leak 0-byte allocated areas?
##########
File path: go/arrow/memory/cgo_allocator.go
##########
@@ -0,0 +1,108 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// +build cgo
+// +build ccalloc
+
+package memory
+
+import (
+ "runtime"
+
+ cga "github.com/apache/arrow/go/arrow/memory/internal/cgoalloc"
+)
+
+// CgoArrowAllocator is an allocator which exposes the C++ memory pool class
+// from the Arrow C++ Library as an allocator for memory buffers to use in Go.
+// The build tag 'ccalloc' must be used in order to include it as it requires
+// linking against the arrow library.
+//
+// The primary reason to use this would be as an allocator when dealing with
+// exporting data across the cdata interface in order to ensure that the memory
+// is allocated safely on the C side so it can be held on the CGO side beyond
+// the context of a single function call. If the memory in use isn't allocated
+// on the C side, then it is not safe for any pointers to data to be held
outside
+// of Go beyond the context of a single Cgo function call as it will be
invisible
+// to the Go garbage collector and could potentially get moved without being
updated.
Review comment:
Ok, so what it seems to mean is that Go should always use its own
allocator when allocating array data, rather than the Go GC. Would there be a
problem in doing that?
##########
File path: ci/docker/debian-10-go-cgo.dockerfile
##########
@@ -0,0 +1,34 @@
+# Licensed to the Apache Software Foundation (ASF) under one
Review comment:
Ok, but the comment is misleading then ("so we can use pyarrow").
##########
File path: go/arrow/memory/internal/cgoalloc/allocator.cc
##########
@@ -0,0 +1,75 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// +build ccalloc
+
+#include "allocator.h"
+#include "arrow/memory_pool.h"
+#include "helpers.h"
+
+struct mem_holder {
+ std::unique_ptr<arrow::MemoryPool> pool;
+ arrow::MemoryPool* current_pool;
Review comment:
But you're not using `default_memory_pool`, you're using
`MemoryPool::CreateDefault`.
##########
File path: go/arrow/memory/cgo_allocator.go
##########
@@ -0,0 +1,108 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// +build cgo
+// +build ccalloc
+
+package memory
+
+import (
+ "runtime"
+
+ cga "github.com/apache/arrow/go/arrow/memory/internal/cgoalloc"
+)
+
+// CgoArrowAllocator is an allocator which exposes the C++ memory pool class
+// from the Arrow C++ Library as an allocator for memory buffers to use in Go.
+// The build tag 'ccalloc' must be used in order to include it as it requires
+// linking against the arrow library.
+//
+// The primary reason to use this would be as an allocator when dealing with
+// exporting data across the cdata interface in order to ensure that the memory
+// is allocated safely on the C side so it can be held on the CGO side beyond
+// the context of a single function call. If the memory in use isn't allocated
+// on the C side, then it is not safe for any pointers to data to be held
outside
+// of Go beyond the context of a single Cgo function call as it will be
invisible
+// to the Go garbage collector and could potentially get moved without being
updated.
Review comment:
I don't think the C++ libarrow allocator is necessary, just any C-based
allocator (or perhaps there's already a public Go package for this?)
##########
File path: go/arrow/memory/internal/cgoalloc/allocator.cc
##########
@@ -0,0 +1,75 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// +build ccalloc
+
+#include "allocator.h"
+#include "arrow/memory_pool.h"
+#include "helpers.h"
+
+struct mem_holder {
+ std::unique_ptr<arrow::MemoryPool> pool;
+ arrow::MemoryPool* current_pool;
Review comment:
Perhaps, but that doesn't change my concern. `CreateDefault` creates a
_new_ memory pool instance, but you're not destroying it if logging is disabled.
##########
File path: go/arrow/memory/internal/cgoalloc/allocator.cc
##########
@@ -0,0 +1,75 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// +build ccalloc
+
+#include "allocator.h"
+#include "arrow/memory_pool.h"
+#include "helpers.h"
+
+struct mem_holder {
+ std::unique_ptr<arrow::MemoryPool> pool;
+ arrow::MemoryPool* current_pool;
Review comment:
Hmm, yes, you're right.
Still, this seems a bit convoluted. Why not:
```c++
struct mem_holder {
std::unique_ptr<arrow::MemoryPool> wrapped_pool, current_pool;
};
ArrowMemoryPool arrow_create_memory_pool(bool enable_logging) {
auto holder = std::make_shared<mem_holder>();
if (enable_logging) {
holder->wrapped_pool.reset(arrow::MemoryPool::CreateDefault());
holder->current_pool.reset(new
arrow::LoggingMemoryPool(holder->wrapped_pool.get()));
} else {
holder->current_pool.reset(arrow::MemoryPool::CreateDefault());
}
return create_ref(holder);
}
void arrow_release_pool(ArrowMemoryPool pool) {
release_ref<mem_holder>(pool);
}
```
or even:
```c++
struct mem_holder {
std::unique_ptr<arrow::MemoryPool> owned_pool;
arrow::MemoryPool* pool;
};
ArrowMemoryPool arrow_create_memory_pool(bool enable_logging) {
auto holder = std::make_shared<mem_holder>();
if (enable_logging) {
holder->owned_pool.reset(new
arrow::LoggingMemoryPool(arrow::default_memory_pool()));
holder->pool = holder->owned_pool.get();
} else {
holder->pool = arrow::default_memory_pool();
}
return create_ref(holder);
}
void arrow_release_pool(ArrowMemoryPool pool) {
release_ref<mem_holder>(pool);
}
```
##########
File path: go/arrow/memory/cgo_allocator.go
##########
@@ -0,0 +1,108 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// +build cgo
+// +build ccalloc
+
+package memory
+
+import (
+ "runtime"
+
+ cga "github.com/apache/arrow/go/arrow/memory/internal/cgoalloc"
+)
+
+// CgoArrowAllocator is an allocator which exposes the C++ memory pool class
+// from the Arrow C++ Library as an allocator for memory buffers to use in Go.
+// The build tag 'ccalloc' must be used in order to include it as it requires
+// linking against the arrow library.
+//
+// The primary reason to use this would be as an allocator when dealing with
+// exporting data across the cdata interface in order to ensure that the memory
+// is allocated safely on the C side so it can be held on the CGO side beyond
+// the context of a single function call. If the memory in use isn't allocated
+// on the C side, then it is not safe for any pointers to data to be held
outside
+// of Go beyond the context of a single Cgo function call as it will be
invisible
+// to the Go garbage collector and could potentially get moved without being
updated.
Review comment:
Ok, cgo performance does seem horrible:
https://about.sourcegraph.com/go/gophercon-2018-adventures-in-cgo-performance/
##########
File path: go/arrow/memory/internal/cgoalloc/allocator.cc
##########
@@ -0,0 +1,75 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+// +build ccalloc
+
+#include "allocator.h"
+#include "arrow/memory_pool.h"
+#include "helpers.h"
+
+struct mem_holder {
+ std::unique_ptr<arrow::MemoryPool> pool;
+ arrow::MemoryPool* current_pool;
Review comment:
I can't think of a reason to use `CreateDefault`, unless for consistency
you really need a `unique_ptr`. The `default_memory_pool` is destroyed at
process exit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]