[jira] [Created] (ARROW-9046) [C++][R] Put more things in type_fwds
Neal Richardson created ARROW-9046: -- Summary: [C++][R] Put more things in type_fwds Key: ARROW-9046 URL: https://issues.apache.org/jira/browse/ARROW-9046 Project: Apache Arrow Issue Type: Improvement Components: C++, R Reporter: Neal Richardson Assignee: Ben Kietzman Fix For: 1.0.0 Hopefully to reduce compile time. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9045) [C++] Improve and expand Take/Filter benchmarks
Wes McKinney created ARROW-9045: --- Summary: [C++] Improve and expand Take/Filter benchmarks Key: ARROW-9045 URL: https://issues.apache.org/jira/browse/ARROW-9045 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Wes McKinney Fix For: 1.0.0 I'm putting this up as a separate patch for review -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9044) [Go][Packaging] Revisit the license file attachment to the go packages
Krisztian Szucs created ARROW-9044: -- Summary: [Go][Packaging] Revisit the license file attachment to the go packages Key: ARROW-9044 URL: https://issues.apache.org/jira/browse/ARROW-9044 Project: Apache Arrow Issue Type: Improvement Components: Go, Packaging Reporter: Krisztian Szucs Assignee: Krisztian Szucs Fix For: 1.0.0 As per https://github.com/apache/arrow/pull/7355#issuecomment-639560475 -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] [C++] custom allocator for large objects
Le 05/06/2020 à 17:09, Rémi Dettai a écrit : > I looked into the details of why the decoder could not estimate the target > Arrow array size for my Parquet column. It's because I am decoding from > Parquet-Dictionary to Arrow-Plain (which is the default when loading > Parquet). In this case the size prediction is impossible :-( But we can probably make up a heuristic. For example avg(dictionary value size) * number of non-null values It would avoid a number of resizes, even though there may still be a couple of them at the end. It may oversize in some cases, but much less than your proposed strategy of reserving a huge chunk of virtual memory :-) Regards Antoine.
Re: [DISCUSS] [C++] custom allocator for large objects
Le 05/06/2020 à 16:25, Uwe L. Korn a écrit : > > On Fri, Jun 5, 2020, at 3:13 PM, Rémi Dettai wrote: >> Hi Antoine ! >>> I would indeed have expected jemalloc to do that (remap the pages) >> I have no idea about the performance gain this would provide (if any). >> Could be interesting to explore. > > This would actually be the most interesting thing. In general, getting access > to the pages mapped into RAM would improve in a lot of more situations, not > just realloction. For example, when you take a small slice of a large array > and only pass this on, but don't an explicit reference to the array, you will > still indirectly hold on the larger memory size. Having an allocator that > would understand the mapping between pages and memory block would allow us to > free the pages that are not part of the view. > > Also, yes: For CSV and JSON, we don't have size estimates beforehand. There > this would be a great performance improvement. For CSV we actually know the size after parsing: https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/converter.cc#L177-L178 It would be a shame if this were possible in CSV but not in Parquet, a storage format dedicated to big columnar data. Regards Antoine.
Re: [DISCUSS] [C++] custom allocator for large objects
I looked into the details of why the decoder could not estimate the target Arrow array size for my Parquet column. It's because I am decoding from Parquet-Dictionary to Arrow-Plain (which is the default when loading Parquet). In this case the size prediction is impossible :-( > This would actually be the most interesting thing. In general, getting access to the pages mapped into RAM would improve in a lot of more situations, not just realloction. For example, when you take a small slice of a large array and only pass this on, but don't an explicit reference to the array, you will still indirectly hold on the larger memory size. Having an allocator that would understand the mapping between pages and memory block would allow us to free the pages that are not part of the view Not sure I'm following you on this one. From my understanding the subject here is mremap which allows you to keep your physical memory but change the virtual address range that points to it. It seems according to this ( https://stackoverflow.com/questions/11621606/faster-way-to-move-memory-page-than-mremap) that is mainly efficient for growing large allocations. Le ven. 5 juin 2020 à 16:25, Uwe L. Korn a écrit : > > On Fri, Jun 5, 2020, at 3:13 PM, Rémi Dettai wrote: > > Hi Antoine ! > > > I would indeed have expected jemalloc to do that (remap the pages) > > I have no idea about the performance gain this would provide (if any). > > Could be interesting to explore. > > This would actually be the most interesting thing. In general, getting > access to the pages mapped into RAM would improve in a lot of more > situations, not just realloction. For example, when you take a small slice > of a large array and only pass this on, but don't an explicit reference to > the array, you will still indirectly hold on the larger memory size. Having > an allocator that would understand the mapping between pages and memory > block would allow us to free the pages that are not part of the view. > > Also, yes: For CSV and JSON, we don't have size estimates beforehand. > There this would be a great performance improvement. > > Best > Uwe >
Re: [DISCUSS] [C++] custom allocator for large objects
On Fri, Jun 5, 2020, at 3:13 PM, Rémi Dettai wrote: > Hi Antoine ! > > I would indeed have expected jemalloc to do that (remap the pages) > I have no idea about the performance gain this would provide (if any). > Could be interesting to explore. This would actually be the most interesting thing. In general, getting access to the pages mapped into RAM would improve in a lot of more situations, not just realloction. For example, when you take a small slice of a large array and only pass this on, but don't an explicit reference to the array, you will still indirectly hold on the larger memory size. Having an allocator that would understand the mapping between pages and memory block would allow us to free the pages that are not part of the view. Also, yes: For CSV and JSON, we don't have size estimates beforehand. There this would be a great performance improvement. Best Uwe
[jira] [Created] (ARROW-9043) [Go] Temporarily copy LICENSE.txt to go/
Wes McKinney created ARROW-9043: --- Summary: [Go] Temporarily copy LICENSE.txt to go/ Key: ARROW-9043 URL: https://issues.apache.org/jira/browse/ARROW-9043 Project: Apache Arrow Issue Type: Improvement Components: Go Reporter: Wes McKinney Fix For: 1.0.0 {{go mod}} needs to find a license file in the root of the Go module. In the future "go mod" may be able to follow symlinks in which case this can be replaced by a symlink. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] [C++] custom allocator for large objects
Hi Antoine ! > I would indeed have expected jemalloc to do that (remap the pages) I have no idea about the performance gain this would provide (if any). Could be interesting to explore. > do you know that Arrow also supports integration with another allocator, mimalloc I only tried Jemalloc and the system allocator because I assumed mimalloc was specific to windows. I'll try to look into this as well ;-) I liked the idea of looking deeper into jemalloc, as it is also the default allocator for Rust. Thanks for your insights! Regards, Remi Le ven. 5 juin 2020 à 14:32, Antoine Pitrou a écrit : > > Le 05/06/2020 à 14:25, Rémi Dettai a écrit : > > Hi Uwe! > > > >> As your suggestions don't seem to be specific to Arrow, why not > > contribute them directly to jemalloc? They are much better in reviewing > > allocator code than we are. > > I mentioned this idea in the jemalloc gitter. The first response was that > > it should work but workloads with realloc aren't very common and these > huge > > allocations can have some impact during coredumps. It's true that > jemalloc > > is supposed to be general purpose allocator, so it has to make > compromises > > in that sense. Let's wait to see if other answers come in. You can follow > > the conversation here if you are interested > > https://gitter.im/jemalloc/jemalloc. It seemed to me while > investigating on > > Jemalloc inners that most of the conception effort is concentrated on > > small/medium allocations. This makes sense as they probably represent 99% > > of workloads. > > Thanks for the pointer. The reply is interesting: > """ > Another trick that’s possible but we don’t do is to just remap the pages > into some new space, rather than alloc-and-copy > """ > I would indeed have expected jemalloc to do that (remap the pages) when > reallocating large allocations, so I'm a bit surprised they don't. I > guess it's one more special case to worry about. > > By the way, before going further with jemalloc, do you know that Arrow > also supports integration with another allocator, mimalloc > (https://github.com/microsoft/mimalloc/)? Perhaps you should try it out > and see whether that improves performance or not. > > (and also the system allocator, by the way) > > Regards > > Antoine. >
[jira] [Created] (ARROW-9042) [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior
Krisztian Szucs created ARROW-9042: -- Summary: [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior Key: ARROW-9042 URL: https://issues.apache.org/jira/browse/ARROW-9042 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Krisztian Szucs Fix For: 1.0.0 Also avoid undefined behaviour caused by signed integer overflow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] [C++] custom allocator for large objects
Le 05/06/2020 à 14:25, Rémi Dettai a écrit : > Hi Uwe! > >> As your suggestions don't seem to be specific to Arrow, why not > contribute them directly to jemalloc? They are much better in reviewing > allocator code than we are. > I mentioned this idea in the jemalloc gitter. The first response was that > it should work but workloads with realloc aren't very common and these huge > allocations can have some impact during coredumps. It's true that jemalloc > is supposed to be general purpose allocator, so it has to make compromises > in that sense. Let's wait to see if other answers come in. You can follow > the conversation here if you are interested > https://gitter.im/jemalloc/jemalloc. It seemed to me while investigating on > Jemalloc inners that most of the conception effort is concentrated on > small/medium allocations. This makes sense as they probably represent 99% > of workloads. Thanks for the pointer. The reply is interesting: """ Another trick that’s possible but we don’t do is to just remap the pages into some new space, rather than alloc-and-copy """ I would indeed have expected jemalloc to do that (remap the pages) when reallocating large allocations, so I'm a bit surprised they don't. I guess it's one more special case to worry about. By the way, before going further with jemalloc, do you know that Arrow also supports integration with another allocator, mimalloc (https://github.com/microsoft/mimalloc/)? Perhaps you should try it out and see whether that improves performance or not. (and also the system allocator, by the way) Regards Antoine.
Re: [DISCUSS] [C++] custom allocator for large objects
Hi Uwe! > As your suggestions don't seem to be specific to Arrow, why not contribute them directly to jemalloc? They are much better in reviewing allocator code than we are. I mentioned this idea in the jemalloc gitter. The first response was that it should work but workloads with realloc aren't very common and these huge allocations can have some impact during coredumps. It's true that jemalloc is supposed to be general purpose allocator, so it has to make compromises in that sense. Let's wait to see if other answers come in. You can follow the conversation here if you are interested https://gitter.im/jemalloc/jemalloc. It seemed to me while investigating on Jemalloc inners that most of the conception effort is concentrated on small/medium allocations. This makes sense as they probably represent 99% of workloads. But Arrow is really about structuring larger chunks of data in memory. I agree with Antoine that in normal circumstances, if you start to blame the allocator, it means that you likely got things wrong elsewhere. But maybe the Arrow workload is specific enough to get a sit in the realm of exceptions (right next to Video Game engines) ! ;-) > Still, when we read a column, we should be able to determine its final size from the Parquet metadata. Maybe we're passing an information there not along? I'm going to take a look at this. But even if for Parquet you can get a good estimation, what about CSV or JSON ? How can you estimate the size of a column of strings beforehand, specially with a compressed payload where you don't even know the number of lines! Le ven. 5 juin 2020 à 12:24, Uwe L. Korn a écrit : > Hello Rémi, > > under the hood jemalloc does quite similar things to what you describe. > I'm not sure what the offset is in the current version but in earlier > releases, it used a different allocation strategy for objects above 4MB. > For the initial large allocation, you will see quite some copies as mmap is > returning a new base address and it isn't able to reuse an existing space. > This could probably be circumvented by a single large allocation which is > free'd again. > > As your suggestions don't seem to be specific to Arrow, why not contribute > them directly to jemalloc? They are much better in reviewing allocator code > than we are. > > Still, when we read a column, we should be able to determine its final > size from the Parquet metadata. Maybe we're passing an information there > not along? > > Best, > Uwe > > On Thu, Jun 4, 2020, at 5:48 PM, Rémi Dettai wrote: > > When creating large arrays, Arrow uses realloc quite intensively. > > > > I have an example where y read a gzipped parquet column (strings) that > > expands from 8MB to 100+MB when loaded into Arrow. Of course Jemalloc > > cannot anticipate this and every reallocate call above 1MB (the most > > critical ones) ends up being a copy. > > > > I think that knowing that we like using realloc in Arrow, we could come > up > > with an allocator for large objects that would behave a lot better than > > Jemalloc. For smaller objects, this allocator could just let the memory > > request being handled by Jemalloc. Not trying to outsmart the brilliant > > guys from Facebook and co ;-) But for larger objects, we could adopt a > > custom strategy: > > - if an allocation or a re-allocation larger than 1MB (or maybe even > 512K) > > is made on our memory pool, call mmap with size XGB (X being slightly > > smaller than the total physical memory on the system). This is ok because > > mmap will not physically allocate this memory as long as it is not > touched. > > - we keep track of all allocations that we created this way, by storing > the > > pointer + the actual used size inside our XGB alloc in a map. > > - when growing an alloc mmaped this way we will always have contiguous > > memory available, (otherwise we would already have OOMed because X is the > > physical memory size). > > - when reducing the alloc size we can free with madvice (optional: if the > > alloc becomes small enough, we might copy it back into a Jemalloc > > allocation). > > > > I am not an expert of these matters, and I just learned what an allocator > > really is, so my approach might be naive. In this case feel free ton > > enlighten me! > > > > Please note that I'm not sure about the level of portability of this > > solution. > > > > Have a nice day! > > > > Remi > > >
Re: [DISCUSS] [C++] custom allocator for large objects
Hello Rémi, under the hood jemalloc does quite similar things to what you describe. I'm not sure what the offset is in the current version but in earlier releases, it used a different allocation strategy for objects above 4MB. For the initial large allocation, you will see quite some copies as mmap is returning a new base address and it isn't able to reuse an existing space. This could probably be circumvented by a single large allocation which is free'd again. As your suggestions don't seem to be specific to Arrow, why not contribute them directly to jemalloc? They are much better in reviewing allocator code than we are. Still, when we read a column, we should be able to determine its final size from the Parquet metadata. Maybe we're passing an information there not along? Best, Uwe On Thu, Jun 4, 2020, at 5:48 PM, Rémi Dettai wrote: > When creating large arrays, Arrow uses realloc quite intensively. > > I have an example where y read a gzipped parquet column (strings) that > expands from 8MB to 100+MB when loaded into Arrow. Of course Jemalloc > cannot anticipate this and every reallocate call above 1MB (the most > critical ones) ends up being a copy. > > I think that knowing that we like using realloc in Arrow, we could come up > with an allocator for large objects that would behave a lot better than > Jemalloc. For smaller objects, this allocator could just let the memory > request being handled by Jemalloc. Not trying to outsmart the brilliant > guys from Facebook and co ;-) But for larger objects, we could adopt a > custom strategy: > - if an allocation or a re-allocation larger than 1MB (or maybe even 512K) > is made on our memory pool, call mmap with size XGB (X being slightly > smaller than the total physical memory on the system). This is ok because > mmap will not physically allocate this memory as long as it is not touched. > - we keep track of all allocations that we created this way, by storing the > pointer + the actual used size inside our XGB alloc in a map. > - when growing an alloc mmaped this way we will always have contiguous > memory available, (otherwise we would already have OOMed because X is the > physical memory size). > - when reducing the alloc size we can free with madvice (optional: if the > alloc becomes small enough, we might copy it back into a Jemalloc > allocation). > > I am not an expert of these matters, and I just learned what an allocator > really is, so my approach might be naive. In this case feel free ton > enlighten me! > > Please note that I'm not sure about the level of portability of this > solution. > > Have a nice day! > > Remi >
[NIGHTLY] Arrow Build Report for Job nightly-2020-06-05-0
Arrow Build Report for Job nightly-2020-06-05-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0 Failed Tasks: - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-win-vs2015-py37 - conda-win-vs2015-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-win-vs2015-py38 - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-homebrew-cpp - homebrew-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-homebrew-r-autobrew - test-conda-cpp-valgrind: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-cpp-valgrind - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-python-3.7-dask-latest - test-conda-python-3.7-spark-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-python-3.7-spark-master - test-conda-python-3.7-turbodbc-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-python-3.7-turbodbc-latest - test-conda-python-3.7-turbodbc-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-python-3.7-turbodbc-master - test-conda-python-3.8-dask-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-python-3.8-dask-master - test-conda-python-3.8-jpype: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-python-3.8-jpype - wheel-win-cp35m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-appveyor-wheel-win-cp35m - wheel-win-cp36m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-appveyor-wheel-win-cp36m - wheel-win-cp37m: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-appveyor-wheel-win-cp37m - wheel-win-cp38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-appveyor-wheel-win-cp38 Succeeded Tasks: - centos-6-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-centos-6-amd64 - centos-7-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-centos-7-aarch64 - centos-7-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-centos-7-amd64 - centos-8-aarch64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-centos-8-aarch64 - centos-8-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-centos-8-amd64 - conda-clean: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-clean - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-linux-gcc-py37 - conda-linux-gcc-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-linux-gcc-py38 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-osx-clang-py37 - conda-osx-clang-py38: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-osx-clang-py38 - debian-buster-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-debian-buster-amd64 - debian-buster-arm64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-debian-buster-arm64 - debian-stretch-amd64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-debian-stretch-amd64 - debian-stretch-arm64: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-debian-stretch-arm64 - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-gandiva-jar-osx - gandiva-jar-xenial: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-gandiva-jar-xenial - nuget: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-nuget
Re: [jira] [Created] (ARROW-9037) [C++/C-ABI] unable to import array with null count == -1 (which could be exported)
Apologies. This was sent by mistake. On 5 Jun 2020, at 15:17, Saurabh Kumar wrote: Zhuo Peng created ARROW-9037: Summary: [C++/C-ABI] unable to import array with null count == -1 (which could be exported) Key: ARROW-9037 URL: https://issues.apache.org/jira/browse/ARROW-9037 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.17.1 Reporter: Zhuo Peng If an Array is created with null_count == -1 but without any null (and thus no null bitmap buffer), then ArrayData.null_count will remain -1 when exporting if null_count is never computed. The exported C struct also has null_count == -1 [1]. But when importing, if null_count != 0, an error [2] will be raised. [1] https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L560 [2] https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L1404 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9037) [C++/C-ABI] unable to import array with null count == -1 (which could be exported)
Zhuo Peng created ARROW-9037: Summary: [C++/C-ABI] unable to import array with null count == -1 (which could be exported) Key: ARROW-9037 URL: https://issues.apache.org/jira/browse/ARROW-9037 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.17.1 Reporter: Zhuo Peng If an Array is created with null_count == -1 but without any null (and thus no null bitmap buffer), then ArrayData.null_count will remain -1 when exporting if null_count is never computed. The exported C struct also has null_count == -1 [1]. But when importing, if null_count != 0, an error [2] will be raised. [1] https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L560 [2] https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L1404 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9037) [C++/C-ABI] unable to import array with null count == -1 (which could be exported)
Zhuo Peng created ARROW-9037: Summary: [C++/C-ABI] unable to import array with null count == -1 (which could be exported) Key: ARROW-9037 URL: https://issues.apache.org/jira/browse/ARROW-9037 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.17.1 Reporter: Zhuo Peng If an Array is created with null_count == -1 but without any null (and thus no null bitmap buffer), then ArrayData.null_count will remain -1 when exporting if null_count is never computed. The exported C struct also has null_count == -1 [1]. But when importing, if null_count != 0, an error [2] will be raised. [1] https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L560 [2] https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L1404 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9041) overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class
Karthikeyan Natarajan created ARROW-9041: Summary: overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class Key: ARROW-9041 URL: https://issues.apache.org/jira/browse/ARROW-9041 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.15.0 Reporter: Karthikeyan Natarajan Following warnings appear cpp/build/arrow/install/include/arrow/io/file.h(189): warning: overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class "arrow::io::MemoryMappedFile" cpp/build/arrow/install/include/arrow/io/memory.h(98): warning: overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class "arrow::io::MockOutputStream" cpp/build/arrow/install/include/arrow/io/memory.h(116): warning: overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class "arrow::io::FixedSizeBufferWriter" Suggestion solution is to use `using Writable::Write` in protected/private. [https://isocpp.org/wiki/faq/strange-inheritance#hiding-rule] -- This message was sent by Atlassian Jira (v8.3.4#803005)