[jira] [Created] (ARROW-7250) Undefined symbols for StringToFloatConverter::Impl with clang 4.x
Uwe Korn created ARROW-7250: --- Summary: Undefined symbols for StringToFloatConverter::Impl with clang 4.x Key: ARROW-7250 URL: https://issues.apache.org/jira/browse/ARROW-7250 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 0.15.1 Reporter: Uwe Korn Assignee: Uwe Korn {code:java} Undefined symbols for architecture x86_64: "arrow::internal::StringToFloatConverter::Impl::main_junk_value_", referenced from: arrow::internal::StringToFloatConverter::StringToFloat(char const*, unsigned long, float*) in libarrow.a(parsing.cc.o) arrow::internal::StringToFloatConverter::StringToFloat(char const*, unsigned long, double*) in libarrow.a(parsing.cc.o) "arrow::internal::StringToFloatConverter::Impl::fallback_junk_value_", referenced from: arrow::internal::StringToFloatConverter::StringToFloat(char const*, unsigned long, float*) in libarrow.a(parsing.cc.o) arrow::internal::StringToFloatConverter::StringToFloat(char const*, unsigned long, double*) in libarrow.a(parsing.cc.o) ld: symbol(s) not found for architecture x86_64{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Dense unions: monotonic or strictly monotonic offsets?
Hi Wes, Thanks for your clarification. I agree with you that the problem should be considered in the implementation level. Best, Liya Fan On Mon, Nov 25, 2019 at 10:34 AM Wes McKinney wrote: > On Sun, Nov 24, 2019 at 8:07 PM Fan Liya wrote: > > > > Hi Wes, > > > > I agree with you that this is a data representation issue. > > > > My point is that, data representation and data operation are closely > > related. > > As far as this issue is concerned, if we allow several values in the > union > > vector to be mapped to the same value in the underlying vector, it is > > possible that when we modify one value in the union vector, the other > value > > is also modified, which is unexpected. > > Right, but Arrow columnar data is immutable, so any mutation > operations are application/implementation-level concerns and should > not influence the specification documents. Implementations need to be > aware of the implications of the specification, of course. > > > This is a problem with our current specification, because our > > vectors/arrays provide set/write APIs. > > So we may need a "coherency protocol" to define the behavior (e.g. copy > on > > write) when trying to modify a shared value, IMO. > > It's an application/implementation-level concern so I think it would > need to be addressed separately from clarifying the specification. > > > > > Best, > > Liya Fan > > > > On Sat, Nov 23, 2019 at 3:31 AM Wes McKinney > wrote: > > > > > hi Liya, > > > > > > I don't understand your point -- we are strictly discussing data > > > representation here I believe. From a data representation perspective, > > > there is no conflict with repeated or non-monotonic offset values. > > > > > > On Fri, Nov 22, 2019 at 1:49 AM Fan Liya wrote: > > > > > > > > This is an interesting question. > > > > IMO, to support repeated values, we also need to design a "coherency > > > > protocol", to avoid the scenario where once a value is witten, the > change > > > > is propagated to another slot unexpectedly. > > > > > > > > Best, > > > > Liya Fan > > > > > > > > On Fri, Nov 22, 2019 at 1:34 PM Micah Kornfield < > emkornfi...@gmail.com> > > > > wrote: > > > > > > > > > Hmm, I also thought the intention was monotonically increasing. I > can't > > > > > think of a strong reason one way or another. If the argument about > > > code to > > > > > do random access is the same in all cases, is there any benefit to > > > forcing > > > > > any order at all? Memory prefetching? > > > > > > > > > > On Thu, Nov 21, 2019 at 11:48 AM Wes McKinney > > > > wrote: > > > > > > > > > > > hi Antoine, > > > > > > > > > > > > It's a good question. > > > > > > > > > > > > The intent when we wrote the specification was to be strictly > > > > > > monotonic, but there seems nothing especially harmful about > relaxing > > > > > > the constraint to allow for repeated values or even > non-monotonicity > > > > > > (strict or otherwise). For example, if we had the union > > > > > > > > > > > > ['a', 'a', 'a', 0, 1, 'b', 'b'] > > > > > > > > > > > > then this could be represented as > > > > > > > > > > > > type_ids: [0, 0, 0, 1, 1, 0, 0] > > > > > > offsets: [0, 0, 0, 0, 1, 1, 1] > > > > > > child[0]: ['a', 'b'] > > > > > > child[1]: [0, 1] > > > > > > > > > > > > or > > > > > > > > > > > > type_ids: [0, 0, 0, 1, 1, 0, 0] > > > > > > offsets: [1, 1, 1, 0, 1, 0, 0] > > > > > > child[0]: ['b', 'a'] > > > > > > child[1]: [0, 1] > > > > > > > > > > > > What do others think? Either way some clarification in the > > > > > > specification would be useful. Because the code used to do random > > > > > > access is the same in all cases, I feel weakly supportive of > removing > > > > > > constraints on the offsets. > > > > > > > > > > > > - Wes > > > > > > > > > > > > On Thu, Nov 21, 2019 at 9:04 AM Antoine Pitrou < > anto...@python.org> > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > I'd like some clarification on the spec and intent for dense > > > arrays. > > > > > > > > > > > > > > Currently, it is specified that offsets of a dense union are > "in > > > order > > > > > / > > > > > > > increasing" (*). However, it is not obvious whether repeated > > > values > > > > > are > > > > > > > allowed or not. > > > > > > > > > > > > > > I suspect the intent is to avoid having people exploit unions > as > > > some > > > > > > > kind of poor man's dictionaries. Also, perhaps some > optimizations > > > are > > > > > > > possible if monotonic or strictly monotonic indices are > assumed? > > > But I > > > > > > > don't know the history behind the union type. > > > > > > > > > > > > > > Regards > > > > > > > > > > > > > > Antoine. > > > > > > > > > > > > > > > > > > > > > (*) > https://arrow.apache.org/docs/format/Columnar.html#dense-union > > > > > > > > > > > > > > >
Re: Dense unions: monotonic or strictly monotonic offsets?
On Sun, Nov 24, 2019 at 8:07 PM Fan Liya wrote: > > Hi Wes, > > I agree with you that this is a data representation issue. > > My point is that, data representation and data operation are closely > related. > As far as this issue is concerned, if we allow several values in the union > vector to be mapped to the same value in the underlying vector, it is > possible that when we modify one value in the union vector, the other value > is also modified, which is unexpected. Right, but Arrow columnar data is immutable, so any mutation operations are application/implementation-level concerns and should not influence the specification documents. Implementations need to be aware of the implications of the specification, of course. > This is a problem with our current specification, because our > vectors/arrays provide set/write APIs. > So we may need a "coherency protocol" to define the behavior (e.g. copy on > write) when trying to modify a shared value, IMO. It's an application/implementation-level concern so I think it would need to be addressed separately from clarifying the specification. > > Best, > Liya Fan > > On Sat, Nov 23, 2019 at 3:31 AM Wes McKinney wrote: > > > hi Liya, > > > > I don't understand your point -- we are strictly discussing data > > representation here I believe. From a data representation perspective, > > there is no conflict with repeated or non-monotonic offset values. > > > > On Fri, Nov 22, 2019 at 1:49 AM Fan Liya wrote: > > > > > > This is an interesting question. > > > IMO, to support repeated values, we also need to design a "coherency > > > protocol", to avoid the scenario where once a value is witten, the change > > > is propagated to another slot unexpectedly. > > > > > > Best, > > > Liya Fan > > > > > > On Fri, Nov 22, 2019 at 1:34 PM Micah Kornfield > > > wrote: > > > > > > > Hmm, I also thought the intention was monotonically increasing. I can't > > > > think of a strong reason one way or another. If the argument about > > code to > > > > do random access is the same in all cases, is there any benefit to > > forcing > > > > any order at all? Memory prefetching? > > > > > > > > On Thu, Nov 21, 2019 at 11:48 AM Wes McKinney > > wrote: > > > > > > > > > hi Antoine, > > > > > > > > > > It's a good question. > > > > > > > > > > The intent when we wrote the specification was to be strictly > > > > > monotonic, but there seems nothing especially harmful about relaxing > > > > > the constraint to allow for repeated values or even non-monotonicity > > > > > (strict or otherwise). For example, if we had the union > > > > > > > > > > ['a', 'a', 'a', 0, 1, 'b', 'b'] > > > > > > > > > > then this could be represented as > > > > > > > > > > type_ids: [0, 0, 0, 1, 1, 0, 0] > > > > > offsets: [0, 0, 0, 0, 1, 1, 1] > > > > > child[0]: ['a', 'b'] > > > > > child[1]: [0, 1] > > > > > > > > > > or > > > > > > > > > > type_ids: [0, 0, 0, 1, 1, 0, 0] > > > > > offsets: [1, 1, 1, 0, 1, 0, 0] > > > > > child[0]: ['b', 'a'] > > > > > child[1]: [0, 1] > > > > > > > > > > What do others think? Either way some clarification in the > > > > > specification would be useful. Because the code used to do random > > > > > access is the same in all cases, I feel weakly supportive of removing > > > > > constraints on the offsets. > > > > > > > > > > - Wes > > > > > > > > > > On Thu, Nov 21, 2019 at 9:04 AM Antoine Pitrou > > > > wrote: > > > > > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > I'd like some clarification on the spec and intent for dense > > arrays. > > > > > > > > > > > > Currently, it is specified that offsets of a dense union are "in > > order > > > > / > > > > > > increasing" (*). However, it is not obvious whether repeated > > values > > > > are > > > > > > allowed or not. > > > > > > > > > > > > I suspect the intent is to avoid having people exploit unions as > > some > > > > > > kind of poor man's dictionaries. Also, perhaps some optimizations > > are > > > > > > possible if monotonic or strictly monotonic indices are assumed? > > But I > > > > > > don't know the history behind the union type. > > > > > > > > > > > > Regards > > > > > > > > > > > > Antoine. > > > > > > > > > > > > > > > > > > (*) https://arrow.apache.org/docs/format/Columnar.html#dense-union > > > > > > > > > > >
Re: Dense unions: monotonic or strictly monotonic offsets?
Hi Wes, I agree with you that this is a data representation issue. My point is that, data representation and data operation are closely related. As far as this issue is concerned, if we allow several values in the union vector to be mapped to the same value in the underlying vector, it is possible that when we modify one value in the union vector, the other value is also modified, which is unexpected. This is a problem with our current specification, because our vectors/arrays provide set/write APIs. So we may need a "coherency protocol" to define the behavior (e.g. copy on write) when trying to modify a shared value, IMO. Best, Liya Fan On Sat, Nov 23, 2019 at 3:31 AM Wes McKinney wrote: > hi Liya, > > I don't understand your point -- we are strictly discussing data > representation here I believe. From a data representation perspective, > there is no conflict with repeated or non-monotonic offset values. > > On Fri, Nov 22, 2019 at 1:49 AM Fan Liya wrote: > > > > This is an interesting question. > > IMO, to support repeated values, we also need to design a "coherency > > protocol", to avoid the scenario where once a value is witten, the change > > is propagated to another slot unexpectedly. > > > > Best, > > Liya Fan > > > > On Fri, Nov 22, 2019 at 1:34 PM Micah Kornfield > > wrote: > > > > > Hmm, I also thought the intention was monotonically increasing. I can't > > > think of a strong reason one way or another. If the argument about > code to > > > do random access is the same in all cases, is there any benefit to > forcing > > > any order at all? Memory prefetching? > > > > > > On Thu, Nov 21, 2019 at 11:48 AM Wes McKinney > wrote: > > > > > > > hi Antoine, > > > > > > > > It's a good question. > > > > > > > > The intent when we wrote the specification was to be strictly > > > > monotonic, but there seems nothing especially harmful about relaxing > > > > the constraint to allow for repeated values or even non-monotonicity > > > > (strict or otherwise). For example, if we had the union > > > > > > > > ['a', 'a', 'a', 0, 1, 'b', 'b'] > > > > > > > > then this could be represented as > > > > > > > > type_ids: [0, 0, 0, 1, 1, 0, 0] > > > > offsets: [0, 0, 0, 0, 1, 1, 1] > > > > child[0]: ['a', 'b'] > > > > child[1]: [0, 1] > > > > > > > > or > > > > > > > > type_ids: [0, 0, 0, 1, 1, 0, 0] > > > > offsets: [1, 1, 1, 0, 1, 0, 0] > > > > child[0]: ['b', 'a'] > > > > child[1]: [0, 1] > > > > > > > > What do others think? Either way some clarification in the > > > > specification would be useful. Because the code used to do random > > > > access is the same in all cases, I feel weakly supportive of removing > > > > constraints on the offsets. > > > > > > > > - Wes > > > > > > > > On Thu, Nov 21, 2019 at 9:04 AM Antoine Pitrou > > > wrote: > > > > > > > > > > > > > > > Hello, > > > > > > > > > > I'd like some clarification on the spec and intent for dense > arrays. > > > > > > > > > > Currently, it is specified that offsets of a dense union are "in > order > > > / > > > > > increasing" (*). However, it is not obvious whether repeated > values > > > are > > > > > allowed or not. > > > > > > > > > > I suspect the intent is to avoid having people exploit unions as > some > > > > > kind of poor man's dictionaries. Also, perhaps some optimizations > are > > > > > possible if monotonic or strictly monotonic indices are assumed? > But I > > > > > don't know the history behind the union type. > > > > > > > > > > Regards > > > > > > > > > > Antoine. > > > > > > > > > > > > > > > (*) https://arrow.apache.org/docs/format/Columnar.html#dense-union > > > > > > > >
[jira] [Created] (ARROW-7249) [CI] Relase test fails in master due to new arrow-flight Rust crate
Andy Grove created ARROW-7249: - Summary: [CI] Relase test fails in master due to new arrow-flight Rust crate Key: ARROW-7249 URL: https://issues.apache.org/jira/browse/ARROW-7249 Project: Apache Arrow Issue Type: Bug Components: CI Reporter: Andy Grove Assignee: Andy Grove Fix For: 1.0.0 See https://github.com/apache/arrow/runs/318192961 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[NIGHTLY] Arrow Build Report for Job nightly-2019-11-24-0
Arrow Build Report for Job nightly-2019-11-24-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0 Failed Tasks: - conda-osx-clang-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-osx-clang-py27 - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-osx-clang-py36 - conda-osx-clang-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-osx-clang-py37 - homebrew-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-travis-homebrew-cpp - test-conda-python-2.7-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-2.7-pandas-master - test-conda-python-3.7-dask-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-dask-latest - test-conda-python-3.7-dask-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-dask-master - test-conda-python-3.7-hdfs-2.9.2: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-hdfs-2.9.2 - test-conda-python-3.7-pandas-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-pandas-latest - test-conda-python-3.7-pandas-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-pandas-master - test-conda-python-3.7-spark-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-spark-master - test-conda-python-3.7-turbodbc-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-turbodbc-latest - test-conda-python-3.7-turbodbc-master: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-turbodbc-master - test-conda-python-3.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7 - test-ubuntu-14.04-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-ubuntu-14.04-cpp - test-ubuntu-fuzzit: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-ubuntu-fuzzit Succeeded Tasks: - centos-6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-centos-6 - centos-7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-centos-7 - centos-8: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-centos-8 - conda-linux-gcc-py27: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-linux-gcc-py27 - conda-linux-gcc-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-linux-gcc-py36 - conda-linux-gcc-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-linux-gcc-py37 - conda-win-vs2015-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-win-vs2015-py36 - conda-win-vs2015-py37: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-win-vs2015-py37 - debian-buster: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-debian-buster - debian-stretch: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-debian-stretch - gandiva-jar-osx: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-travis-gandiva-jar-osx - gandiva-jar-trusty: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-travis-gandiva-jar-trusty - macos-r-autobrew: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-travis-macos-r-autobrew - test-conda-cpp: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-cpp - test-conda-python-2.7-pandas-latest: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-2.7-pandas-latest - test-conda-python-2.7: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-2.7 - test-conda-python-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.6 - test-conda-r-3.6: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-r-3.6 -
[jira] [Created] (ARROW-7248) Automatically Regenerate IPC messages from Flatbuffers
Martin Grund created ARROW-7248: --- Summary: Automatically Regenerate IPC messages from Flatbuffers Key: ARROW-7248 URL: https://issues.apache.org/jira/browse/ARROW-7248 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Martin Grund It would be great if there was an automatic way to regenerate the code for the Flatbuffer input files. This makes following the mainline development easier. -- This message was sent by Atlassian Jira (v8.3.4#803005)