[jira] [Created] (ARROW-9046) [C++][R] Put more things in type_fwds

2020-06-05 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-9046:
--

 Summary: [C++][R] Put more things in type_fwds
 Key: ARROW-9046
 URL: https://issues.apache.org/jira/browse/ARROW-9046
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, R
Reporter: Neal Richardson
Assignee: Ben Kietzman
 Fix For: 1.0.0


Hopefully to reduce compile time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9045) [C++] Improve and expand Take/Filter benchmarks

2020-06-05 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9045:
---

 Summary: [C++] Improve and expand Take/Filter benchmarks
 Key: ARROW-9045
 URL: https://issues.apache.org/jira/browse/ARROW-9045
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


I'm putting this up as a separate patch for review



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9044) [Go][Packaging] Revisit the license file attachment to the go packages

2020-06-05 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-9044:
--

 Summary: [Go][Packaging] Revisit the license file attachment to 
the go packages
 Key: ARROW-9044
 URL: https://issues.apache.org/jira/browse/ARROW-9044
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go, Packaging
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 1.0.0


As per https://github.com/apache/arrow/pull/7355#issuecomment-639560475



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] [C++] custom allocator for large objects

2020-06-05 Thread Antoine Pitrou


Le 05/06/2020 à 17:09, Rémi Dettai a écrit :
> I looked into the details of why the decoder could not estimate the target
> Arrow array size for my Parquet column. It's because I am decoding from
> Parquet-Dictionary to Arrow-Plain (which is the default when loading
> Parquet). In this case the size prediction is impossible :-(

But we can probably make up a heuristic.  For example
   avg(dictionary value size) * number of non-null values

It would avoid a number of resizes, even though there may still be a
couple of them at the end.  It may oversize in some cases, but much less
than your proposed strategy of reserving a huge chunk of virtual memory :-)

Regards

Antoine.


Re: [DISCUSS] [C++] custom allocator for large objects

2020-06-05 Thread Antoine Pitrou


Le 05/06/2020 à 16:25, Uwe L. Korn a écrit :
> 
> On Fri, Jun 5, 2020, at 3:13 PM, Rémi Dettai wrote:
>> Hi Antoine !
>>> I would indeed have expected jemalloc to do that (remap the pages)
>> I have no idea about the performance gain this would provide (if any).
>> Could be interesting to explore.
> 
> This would actually be the most interesting thing. In general, getting access 
> to the pages mapped into RAM would improve in a lot of more situations, not 
> just realloction. For example, when you take a small slice of a large array 
> and only pass this on, but don't an explicit reference to the array, you will 
> still indirectly hold on the larger memory size. Having an allocator that 
> would understand the mapping between pages and memory block would allow us to 
> free the pages that are not part of the view.
> 
> Also, yes: For CSV and JSON, we don't have size estimates beforehand. There 
> this would be a great performance improvement.

For CSV we actually know the size after parsing:
https://github.com/apache/arrow/blob/master/cpp/src/arrow/csv/converter.cc#L177-L178

It would be a shame if this were possible in CSV but not in Parquet, a
storage format dedicated to big columnar data.

Regards

Antoine.


Re: [DISCUSS] [C++] custom allocator for large objects

2020-06-05 Thread Rémi Dettai
I looked into the details of why the decoder could not estimate the target
Arrow array size for my Parquet column. It's because I am decoding from
Parquet-Dictionary to Arrow-Plain (which is the default when loading
Parquet). In this case the size prediction is impossible :-(

> This would actually be the most interesting thing. In general, getting
access to the pages mapped into RAM would improve in a lot of more
situations, not just realloction. For example, when you take a small slice
of a large array and only pass this on, but don't an explicit reference to
the array, you will still indirectly hold on the larger memory size. Having
an allocator that would understand the mapping between pages and memory
block would allow us to free the pages that are not part of the view
Not sure I'm following you on this one. From my understanding the subject
here is mremap which allows you to keep your physical memory but change the
virtual address range that points to it. It seems according to this (
https://stackoverflow.com/questions/11621606/faster-way-to-move-memory-page-than-mremap)
that is mainly efficient for growing large allocations.

Le ven. 5 juin 2020 à 16:25, Uwe L. Korn  a écrit :

>
> On Fri, Jun 5, 2020, at 3:13 PM, Rémi Dettai wrote:
> > Hi Antoine !
> > > I would indeed have expected jemalloc to do that (remap the pages)
> > I have no idea about the performance gain this would provide (if any).
> > Could be interesting to explore.
>
> This would actually be the most interesting thing. In general, getting
> access to the pages mapped into RAM would improve in a lot of more
> situations, not just realloction. For example, when you take a small slice
> of a large array and only pass this on, but don't an explicit reference to
> the array, you will still indirectly hold on the larger memory size. Having
> an allocator that would understand the mapping between pages and memory
> block would allow us to free the pages that are not part of the view.
>
> Also, yes: For CSV and JSON, we don't have size estimates beforehand.
> There this would be a great performance improvement.
>
> Best
> Uwe
>


Re: [DISCUSS] [C++] custom allocator for large objects

2020-06-05 Thread Uwe L. Korn


On Fri, Jun 5, 2020, at 3:13 PM, Rémi Dettai wrote:
> Hi Antoine !
> > I would indeed have expected jemalloc to do that (remap the pages)
> I have no idea about the performance gain this would provide (if any).
> Could be interesting to explore.

This would actually be the most interesting thing. In general, getting access 
to the pages mapped into RAM would improve in a lot of more situations, not 
just realloction. For example, when you take a small slice of a large array and 
only pass this on, but don't an explicit reference to the array, you will still 
indirectly hold on the larger memory size. Having an allocator that would 
understand the mapping between pages and memory block would allow us to free 
the pages that are not part of the view.

Also, yes: For CSV and JSON, we don't have size estimates beforehand. There 
this would be a great performance improvement.

Best
Uwe


[jira] [Created] (ARROW-9043) [Go] Temporarily copy LICENSE.txt to go/

2020-06-05 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9043:
---

 Summary: [Go] Temporarily copy LICENSE.txt to go/
 Key: ARROW-9043
 URL: https://issues.apache.org/jira/browse/ARROW-9043
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Wes McKinney
 Fix For: 1.0.0


{{go mod}} needs to find a license file in the root of the Go module. In the 
future "go mod" may be able to follow symlinks in which case this can be 
replaced by a symlink.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] [C++] custom allocator for large objects

2020-06-05 Thread Rémi Dettai
Hi Antoine !
> I would indeed have expected jemalloc to do that (remap the pages)
I have no idea about the performance gain this would provide (if any).
Could be interesting to explore.

> do you know that Arrow also supports integration with another allocator,
mimalloc
I only tried Jemalloc and the system allocator because I assumed mimalloc
was specific to windows. I'll try to look into this as well ;-) I liked the
idea of looking deeper into jemalloc, as it is also the default allocator
for Rust.

Thanks for your insights!

Regards,

Remi



Le ven. 5 juin 2020 à 14:32, Antoine Pitrou  a écrit :

>
> Le 05/06/2020 à 14:25, Rémi Dettai a écrit :
> > Hi Uwe!
> >
> >> As your suggestions don't seem to be specific to Arrow, why not
> > contribute them directly to jemalloc? They are much better in reviewing
> > allocator code than we are.
> > I mentioned this idea in the jemalloc gitter. The first response was that
> > it should work but workloads with realloc aren't very common and these
> huge
> > allocations can have some impact during coredumps. It's true that
> jemalloc
> > is supposed to be general purpose allocator, so it has to make
> compromises
> > in that sense. Let's wait to see if other answers come in. You can follow
> > the conversation here if you are interested
> > https://gitter.im/jemalloc/jemalloc. It seemed to me while
> investigating on
> > Jemalloc inners that most of the conception effort is concentrated on
> > small/medium allocations. This makes sense as they probably represent 99%
> > of workloads.
>
> Thanks for the pointer.  The reply is interesting:
> """
> Another trick that’s possible but we don’t do is to just remap the pages
> into some new space, rather than alloc-and-copy
> """
> I would indeed have expected jemalloc to do that (remap the pages) when
> reallocating large allocations, so I'm a bit surprised they don't.  I
> guess it's one more special case to worry about.
>
> By the way, before going further with jemalloc, do you know that Arrow
> also supports integration with another allocator, mimalloc
> (https://github.com/microsoft/mimalloc/)?  Perhaps you should try it out
> and see whether that improves performance or not.
>
> (and also the system allocator, by the way)
>
> Regards
>
> Antoine.
>


[jira] [Created] (ARROW-9042) [C++] Add Subtract and Multiply arithmetic kernels with wrap-around behavior

2020-06-05 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-9042:
--

 Summary: [C++] Add Subtract and Multiply arithmetic kernels with 
wrap-around behavior
 Key: ARROW-9042
 URL: https://issues.apache.org/jira/browse/ARROW-9042
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Krisztian Szucs
 Fix For: 1.0.0


Also avoid undefined behaviour caused by signed integer overflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] [C++] custom allocator for large objects

2020-06-05 Thread Antoine Pitrou


Le 05/06/2020 à 14:25, Rémi Dettai a écrit :
> Hi Uwe!
> 
>> As your suggestions don't seem to be specific to Arrow, why not
> contribute them directly to jemalloc? They are much better in reviewing
> allocator code than we are.
> I mentioned this idea in the jemalloc gitter. The first response was that
> it should work but workloads with realloc aren't very common and these huge
> allocations can have some impact during coredumps. It's true that jemalloc
> is supposed to be general purpose allocator, so it has to make compromises
> in that sense. Let's wait to see if other answers come in. You can follow
> the conversation here if you are interested
> https://gitter.im/jemalloc/jemalloc. It seemed to me while investigating on
> Jemalloc inners that most of the conception effort is concentrated on
> small/medium allocations. This makes sense as they probably represent 99%
> of workloads.

Thanks for the pointer.  The reply is interesting:
"""
Another trick that’s possible but we don’t do is to just remap the pages
into some new space, rather than alloc-and-copy
"""
I would indeed have expected jemalloc to do that (remap the pages) when
reallocating large allocations, so I'm a bit surprised they don't.  I
guess it's one more special case to worry about.

By the way, before going further with jemalloc, do you know that Arrow
also supports integration with another allocator, mimalloc
(https://github.com/microsoft/mimalloc/)?  Perhaps you should try it out
and see whether that improves performance or not.

(and also the system allocator, by the way)

Regards

Antoine.


Re: [DISCUSS] [C++] custom allocator for large objects

2020-06-05 Thread Rémi Dettai
Hi Uwe!

> As your suggestions don't seem to be specific to Arrow, why not
contribute them directly to jemalloc? They are much better in reviewing
allocator code than we are.
I mentioned this idea in the jemalloc gitter. The first response was that
it should work but workloads with realloc aren't very common and these huge
allocations can have some impact during coredumps. It's true that jemalloc
is supposed to be general purpose allocator, so it has to make compromises
in that sense. Let's wait to see if other answers come in. You can follow
the conversation here if you are interested
https://gitter.im/jemalloc/jemalloc. It seemed to me while investigating on
Jemalloc inners that most of the conception effort is concentrated on
small/medium allocations. This makes sense as they probably represent 99%
of workloads. But Arrow is really about structuring larger chunks of data
in memory. I agree with Antoine that in normal circumstances, if you start
to blame the allocator, it means that you likely got things wrong
elsewhere. But maybe the Arrow workload is specific enough to get a sit in
the realm of exceptions (right next to Video Game engines) ! ;-)

> Still, when we read a column, we should be able to determine its final
size from the Parquet metadata. Maybe we're passing an information there
not along?
I'm going to take a look at this. But even if for Parquet you can get a
good estimation, what about CSV or JSON ? How can you estimate the size of
a column of strings beforehand, specially with a compressed payload where
you don't even know the number of lines!

Le ven. 5 juin 2020 à 12:24, Uwe L. Korn  a écrit :

> Hello Rémi,
>
> under the hood jemalloc does quite similar things to what you describe.
> I'm not sure what the offset is in the current version but in earlier
> releases, it used a different allocation strategy for objects above 4MB.
> For the initial large allocation, you will see quite some copies as mmap is
> returning a new base address and it isn't able to reuse an existing space.
> This could probably be circumvented by a single large allocation which is
> free'd again.
>
> As your suggestions don't seem to be specific to Arrow, why not contribute
> them directly to jemalloc? They are much better in reviewing allocator code
> than we are.
>
> Still, when we read a column, we should be able to determine its final
> size from the Parquet metadata. Maybe we're passing an information there
> not along?
>
> Best,
> Uwe
>
> On Thu, Jun 4, 2020, at 5:48 PM, Rémi Dettai wrote:
> > When creating large arrays, Arrow uses realloc quite intensively.
> >
> > I have an example where y read a gzipped parquet column (strings) that
> > expands from 8MB to 100+MB when loaded into Arrow. Of course Jemalloc
> > cannot anticipate this and every reallocate call above 1MB (the most
> > critical ones) ends up being a copy.
> >
> > I think that knowing that we like using realloc in Arrow, we could come
> up
> > with an allocator for large objects that would behave a lot better than
> > Jemalloc. For smaller objects, this allocator could just let the memory
> > request being handled by Jemalloc. Not trying to outsmart the brilliant
> > guys from Facebook and co ;-) But for larger objects, we could adopt a
> > custom strategy:
> > - if an allocation or a re-allocation larger than 1MB (or maybe even
> 512K)
> > is made on our memory pool, call mmap with size XGB (X being slightly
> > smaller than the total physical memory on the system). This is ok because
> > mmap will not physically allocate this memory as long as it is not
> touched.
> > - we keep track of all allocations that we created this way, by storing
> the
> > pointer + the actual used size inside our XGB alloc in a map.
> > - when growing an alloc mmaped this way we will always have contiguous
> > memory available, (otherwise we would already have OOMed because X is the
> > physical memory size).
> > - when reducing the alloc size we can free with madvice (optional: if the
> > alloc becomes small enough, we might copy it back into a Jemalloc
> > allocation).
> >
> > I am not an expert of these matters, and I just learned what an allocator
> > really is, so my approach might be naive. In this case feel free ton
> > enlighten me!
> >
> > Please note that I'm not sure about the level of portability of this
> > solution.
> >
> > Have a nice day!
> >
> > Remi
> >
>


Re: [DISCUSS] [C++] custom allocator for large objects

2020-06-05 Thread Uwe L. Korn
Hello Rémi,

under the hood jemalloc does quite similar things to what you describe. I'm not 
sure what the offset is in the current version but in earlier releases, it used 
a different allocation strategy for objects above 4MB. For the initial large 
allocation, you will see quite some copies as mmap is returning a new base 
address and it isn't able to reuse an existing space. This could probably be 
circumvented by a single large allocation which is free'd again. 

As your suggestions don't seem to be specific to Arrow, why not contribute them 
directly to jemalloc? They are much better in reviewing allocator code than we 
are.

Still, when we read a column, we should be able to determine its final size 
from the Parquet metadata. Maybe we're passing an information there not along?

Best,
Uwe

On Thu, Jun 4, 2020, at 5:48 PM, Rémi Dettai wrote:
> When creating large arrays, Arrow uses realloc quite intensively.
> 
> I have an example where y read a gzipped parquet column (strings) that
> expands from 8MB to 100+MB when loaded into Arrow. Of course Jemalloc
> cannot anticipate this and every reallocate call above 1MB (the most
> critical ones) ends up being a copy.
> 
> I think that knowing that we like using realloc in Arrow, we could come up
> with an allocator for large objects that would behave a lot better than
> Jemalloc. For smaller objects, this allocator could just let the memory
> request being handled by Jemalloc. Not trying to outsmart the brilliant
> guys from Facebook and co ;-) But for larger objects, we could adopt a
> custom strategy:
> - if an allocation or a re-allocation larger than 1MB (or maybe even 512K)
> is made on our memory pool, call mmap with size XGB (X being slightly
> smaller than the total physical memory on the system). This is ok because
> mmap will not physically allocate this memory as long as it is not touched.
> - we keep track of all allocations that we created this way, by storing the
> pointer + the actual used size inside our XGB alloc in a map.
> - when growing an alloc mmaped this way we will always have contiguous
> memory available, (otherwise we would already have OOMed because X is the
> physical memory size).
> - when reducing the alloc size we can free with madvice (optional: if the
> alloc becomes small enough, we might copy it back into a Jemalloc
> allocation).
> 
> I am not an expert of these matters, and I just learned what an allocator
> really is, so my approach might be naive. In this case feel free ton
> enlighten me!
> 
> Please note that I'm not sure about the level of portability of this
> solution.
> 
> Have a nice day!
> 
> Remi
>


[NIGHTLY] Arrow Build Report for Job nightly-2020-06-05-0

2020-06-05 Thread Crossbow


Arrow Build Report for Job nightly-2020-06-05-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0

Failed Tasks:
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-win-vs2015-py38
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-homebrew-r-autobrew
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-cpp-valgrind
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.8-dask-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-python-3.8-dask-master
- test-conda-python-3.8-jpype:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-test-conda-python-3.8-jpype
- wheel-win-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-appveyor-wheel-win-cp35m
- wheel-win-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-appveyor-wheel-win-cp36m
- wheel-win-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-appveyor-wheel-win-cp37m
- wheel-win-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-appveyor-wheel-win-cp38

Succeeded Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-centos-6-amd64
- centos-7-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-centos-7-aarch64
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-centos-7-amd64
- centos-8-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-centos-8-aarch64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-centos-8-amd64
- conda-clean:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-clean
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-azure-conda-osx-clang-py38
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-debian-buster-amd64
- debian-buster-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-debian-buster-arm64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-debian-stretch-amd64
- debian-stretch-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-debian-stretch-arm64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-travis-gandiva-jar-xenial
- nuget:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-05-0-github-nuget

Re: [jira] [Created] (ARROW-9037) [C++/C-ABI] unable to import array with null count == -1 (which could be exported)

2020-06-05 Thread Saurabh Kumar

Apologies. This was sent by mistake.

On 5 Jun 2020, at 15:17, Saurabh Kumar wrote:


Zhuo Peng created ARROW-9037:


 Summary: [C++/C-ABI] unable to import array with null 
count == -1 (which could be exported)

 Key: ARROW-9037
 URL: https://issues.apache.org/jira/browse/ARROW-9037
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.17.1
Reporter: Zhuo Peng


If an Array is created with null_count == -1 but without any null (and 
thus no null bitmap buffer), then ArrayData.null_count will remain -1 
when exporting if null_count is never computed. The exported C struct 
also has null_count == -1 [1]. But when importing, if null_count != 0, 
an error [2] will be raised.


[1] 
https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L560

[2] 
https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L1404

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9037) [C++/C-ABI] unable to import array with null count == -1 (which could be exported)

2020-06-05 Thread Saurabh Kumar

Zhuo Peng created ARROW-9037:


 Summary: [C++/C-ABI] unable to import array with null 
count == -1 (which could be exported)

 Key: ARROW-9037
 URL: https://issues.apache.org/jira/browse/ARROW-9037
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.17.1
Reporter: Zhuo Peng


If an Array is created with null_count == -1 but without any null (and 
thus no null bitmap buffer), then ArrayData.null_count will remain -1 
when exporting if null_count is never computed. The exported C struct 
also has null_count == -1 [1]. But when importing, if null_count != 0, 
an error [2] will be raised.


[1] 
https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L560

[2] 
https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L1404

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9037) [C++/C-ABI] unable to import array with null count == -1 (which could be exported)

2020-06-05 Thread Saurabh Kumar

Zhuo Peng created ARROW-9037:


 Summary: [C++/C-ABI] unable to import array with null 
count == -1 (which could be exported)

 Key: ARROW-9037
 URL: https://issues.apache.org/jira/browse/ARROW-9037
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.17.1
Reporter: Zhuo Peng


If an Array is created with null_count == -1 but without any null (and 
thus no null bitmap buffer), then ArrayData.null_count will remain -1 
when exporting if null_count is never computed. The exported C struct 
also has null_count == -1 [1]. But when importing, if null_count != 0, 
an error [2] will be raised.


[1] 
https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L560

[2] 
https://github.com/apache/arrow/blob/5389008df0267ba8d57edb7d6bb6ec0bfa10ff9a/cpp/src/arrow/c/bridge.cc#L1404

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9041) overloaded virtual function "arrow::io::Writable::Write" is only partially overridden in class

2020-06-05 Thread Karthikeyan Natarajan (Jira)
Karthikeyan Natarajan created ARROW-9041:


 Summary: overloaded virtual function "arrow::io::Writable::Write" 
is only partially overridden in class 
 Key: ARROW-9041
 URL: https://issues.apache.org/jira/browse/ARROW-9041
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.15.0
Reporter: Karthikeyan Natarajan


Following warnings appear 

cpp/build/arrow/install/include/arrow/io/file.h(189): warning: overloaded 
virtual function "arrow::io::Writable::Write" is only partially overridden in 
class "arrow::io::MemoryMappedFile"

cpp/build/arrow/install/include/arrow/io/memory.h(98): warning: overloaded 
virtual function "arrow::io::Writable::Write" is only partially overridden in 
class "arrow::io::MockOutputStream"

cpp/build/arrow/install/include/arrow/io/memory.h(116): warning: overloaded 
virtual function "arrow::io::Writable::Write" is only partially overridden in 
class "arrow::io::FixedSizeBufferWriter"

Suggestion solution is to use `using Writable::Write` in protected/private.

[https://isocpp.org/wiki/faq/strange-inheritance#hiding-rule]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)