[jira] [Created] (ARROW-7250) Undefined symbols for StringToFloatConverter::Impl with clang 4.x

2019-11-24 Thread Uwe Korn (Jira)
Uwe Korn created ARROW-7250:
---

 Summary: Undefined symbols for StringToFloatConverter::Impl with 
clang 4.x
 Key: ARROW-7250
 URL: https://issues.apache.org/jira/browse/ARROW-7250
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.15.1
Reporter: Uwe Korn
Assignee: Uwe Korn


{code:java}
Undefined symbols for architecture x86_64:
  "arrow::internal::StringToFloatConverter::Impl::main_junk_value_", referenced 
from:
  arrow::internal::StringToFloatConverter::StringToFloat(char const*, 
unsigned long, float*) in libarrow.a(parsing.cc.o)
  arrow::internal::StringToFloatConverter::StringToFloat(char const*, 
unsigned long, double*) in libarrow.a(parsing.cc.o)
  "arrow::internal::StringToFloatConverter::Impl::fallback_junk_value_", 
referenced from:
  arrow::internal::StringToFloatConverter::StringToFloat(char const*, 
unsigned long, float*) in libarrow.a(parsing.cc.o)
  arrow::internal::StringToFloatConverter::StringToFloat(char const*, 
unsigned long, double*) in libarrow.a(parsing.cc.o)
ld: symbol(s) not found for architecture x86_64{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Dense unions: monotonic or strictly monotonic offsets?

2019-11-24 Thread Fan Liya
Hi Wes,

Thanks for your clarification.
I agree with you that the problem should be considered in the
implementation level.

Best,
Liya Fan

On Mon, Nov 25, 2019 at 10:34 AM Wes McKinney  wrote:

> On Sun, Nov 24, 2019 at 8:07 PM Fan Liya  wrote:
> >
> > Hi Wes,
> >
> > I agree with you that this is a data representation issue.
> >
> > My point is that, data representation and data operation are closely
> > related.
> > As far as this issue is concerned, if we allow several values in the
> union
> > vector to be mapped to the same value in the underlying vector, it is
> > possible that when we modify one value in the union vector, the other
> value
> > is also modified, which is unexpected.
>
> Right, but Arrow columnar data is immutable, so any mutation
> operations are application/implementation-level concerns and should
> not influence the specification documents. Implementations need to be
> aware of the implications of the specification, of course.
>
> > This is a problem with our current specification, because our
> > vectors/arrays provide set/write APIs.
> > So we may need a "coherency protocol" to define the behavior (e.g. copy
> on
> > write) when trying to modify a shared value, IMO.
>
> It's an application/implementation-level concern so I think it would
> need to be addressed separately from clarifying the specification.
>
> >
> > Best,
> > Liya Fan
> >
> > On Sat, Nov 23, 2019 at 3:31 AM Wes McKinney 
> wrote:
> >
> > > hi Liya,
> > >
> > > I don't understand your point -- we are strictly discussing data
> > > representation here I believe. From a data representation perspective,
> > > there is no conflict with repeated or non-monotonic offset values.
> > >
> > > On Fri, Nov 22, 2019 at 1:49 AM Fan Liya  wrote:
> > > >
> > > > This is an interesting question.
> > > > IMO, to support repeated values, we also need to design a "coherency
> > > > protocol", to avoid the scenario where once a value is witten, the
> change
> > > > is propagated to another slot unexpectedly.
> > > >
> > > > Best,
> > > > Liya Fan
> > > >
> > > > On Fri, Nov 22, 2019 at 1:34 PM Micah Kornfield <
> emkornfi...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hmm, I also thought the intention was monotonically increasing. I
> can't
> > > > > think of a strong reason one way or another. If the argument about
> > > code to
> > > > > do random access is the same in all cases, is there any benefit to
> > > forcing
> > > > > any order at all?  Memory prefetching?
> > > > >
> > > > > On Thu, Nov 21, 2019 at 11:48 AM Wes McKinney  >
> > > wrote:
> > > > >
> > > > > > hi Antoine,
> > > > > >
> > > > > > It's a good question.
> > > > > >
> > > > > > The intent when we wrote the specification was to be strictly
> > > > > > monotonic, but there seems nothing especially harmful about
> relaxing
> > > > > > the constraint to allow for repeated values or even
> non-monotonicity
> > > > > > (strict or otherwise). For example, if we had the union
> > > > > >
> > > > > > ['a', 'a', 'a', 0, 1, 'b', 'b']
> > > > > >
> > > > > > then this could be represented as
> > > > > >
> > > > > > type_ids: [0, 0, 0, 1, 1, 0, 0]
> > > > > > offsets: [0, 0, 0, 0, 1, 1, 1]
> > > > > > child[0]: ['a', 'b']
> > > > > > child[1]: [0, 1]
> > > > > >
> > > > > > or
> > > > > >
> > > > > > type_ids: [0, 0, 0, 1, 1, 0, 0]
> > > > > > offsets: [1, 1, 1, 0, 1, 0, 0]
> > > > > > child[0]: ['b', 'a']
> > > > > > child[1]: [0, 1]
> > > > > >
> > > > > > What do others think? Either way some clarification in the
> > > > > > specification would be useful. Because the code used to do random
> > > > > > access is the same in all cases, I feel weakly supportive of
> removing
> > > > > > constraints on the offsets.
> > > > > >
> > > > > > - Wes
> > > > > >
> > > > > > On Thu, Nov 21, 2019 at 9:04 AM Antoine Pitrou <
> anto...@python.org>
> > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I'd like some clarification on the spec and intent for dense
> > > arrays.
> > > > > > >
> > > > > > > Currently, it is specified that offsets of a dense union are
> "in
> > > order
> > > > > /
> > > > > > > increasing" (*).  However, it is not obvious whether repeated
> > > values
> > > > > are
> > > > > > > allowed or not.
> > > > > > >
> > > > > > > I suspect the intent is to avoid having people exploit unions
> as
> > > some
> > > > > > > kind of poor man's dictionaries.  Also, perhaps some
> optimizations
> > > are
> > > > > > > possible if monotonic or strictly monotonic indices are
> assumed?
> > > But I
> > > > > > > don't know the history behind the union type.
> > > > > > >
> > > > > > > Regards
> > > > > > >
> > > > > > > Antoine.
> > > > > > >
> > > > > > >
> > > > > > > (*)
> https://arrow.apache.org/docs/format/Columnar.html#dense-union
> > > > > >
> > > > >
> > >
>


Re: Dense unions: monotonic or strictly monotonic offsets?

2019-11-24 Thread Wes McKinney
On Sun, Nov 24, 2019 at 8:07 PM Fan Liya  wrote:
>
> Hi Wes,
>
> I agree with you that this is a data representation issue.
>
> My point is that, data representation and data operation are closely
> related.
> As far as this issue is concerned, if we allow several values in the union
> vector to be mapped to the same value in the underlying vector, it is
> possible that when we modify one value in the union vector, the other value
> is also modified, which is unexpected.

Right, but Arrow columnar data is immutable, so any mutation
operations are application/implementation-level concerns and should
not influence the specification documents. Implementations need to be
aware of the implications of the specification, of course.

> This is a problem with our current specification, because our
> vectors/arrays provide set/write APIs.
> So we may need a "coherency protocol" to define the behavior (e.g. copy on
> write) when trying to modify a shared value, IMO.

It's an application/implementation-level concern so I think it would
need to be addressed separately from clarifying the specification.

>
> Best,
> Liya Fan
>
> On Sat, Nov 23, 2019 at 3:31 AM Wes McKinney  wrote:
>
> > hi Liya,
> >
> > I don't understand your point -- we are strictly discussing data
> > representation here I believe. From a data representation perspective,
> > there is no conflict with repeated or non-monotonic offset values.
> >
> > On Fri, Nov 22, 2019 at 1:49 AM Fan Liya  wrote:
> > >
> > > This is an interesting question.
> > > IMO, to support repeated values, we also need to design a "coherency
> > > protocol", to avoid the scenario where once a value is witten, the change
> > > is propagated to another slot unexpectedly.
> > >
> > > Best,
> > > Liya Fan
> > >
> > > On Fri, Nov 22, 2019 at 1:34 PM Micah Kornfield 
> > > wrote:
> > >
> > > > Hmm, I also thought the intention was monotonically increasing. I can't
> > > > think of a strong reason one way or another. If the argument about
> > code to
> > > > do random access is the same in all cases, is there any benefit to
> > forcing
> > > > any order at all?  Memory prefetching?
> > > >
> > > > On Thu, Nov 21, 2019 at 11:48 AM Wes McKinney 
> > wrote:
> > > >
> > > > > hi Antoine,
> > > > >
> > > > > It's a good question.
> > > > >
> > > > > The intent when we wrote the specification was to be strictly
> > > > > monotonic, but there seems nothing especially harmful about relaxing
> > > > > the constraint to allow for repeated values or even non-monotonicity
> > > > > (strict or otherwise). For example, if we had the union
> > > > >
> > > > > ['a', 'a', 'a', 0, 1, 'b', 'b']
> > > > >
> > > > > then this could be represented as
> > > > >
> > > > > type_ids: [0, 0, 0, 1, 1, 0, 0]
> > > > > offsets: [0, 0, 0, 0, 1, 1, 1]
> > > > > child[0]: ['a', 'b']
> > > > > child[1]: [0, 1]
> > > > >
> > > > > or
> > > > >
> > > > > type_ids: [0, 0, 0, 1, 1, 0, 0]
> > > > > offsets: [1, 1, 1, 0, 1, 0, 0]
> > > > > child[0]: ['b', 'a']
> > > > > child[1]: [0, 1]
> > > > >
> > > > > What do others think? Either way some clarification in the
> > > > > specification would be useful. Because the code used to do random
> > > > > access is the same in all cases, I feel weakly supportive of removing
> > > > > constraints on the offsets.
> > > > >
> > > > > - Wes
> > > > >
> > > > > On Thu, Nov 21, 2019 at 9:04 AM Antoine Pitrou 
> > > > wrote:
> > > > > >
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I'd like some clarification on the spec and intent for dense
> > arrays.
> > > > > >
> > > > > > Currently, it is specified that offsets of a dense union are "in
> > order
> > > > /
> > > > > > increasing" (*).  However, it is not obvious whether repeated
> > values
> > > > are
> > > > > > allowed or not.
> > > > > >
> > > > > > I suspect the intent is to avoid having people exploit unions as
> > some
> > > > > > kind of poor man's dictionaries.  Also, perhaps some optimizations
> > are
> > > > > > possible if monotonic or strictly monotonic indices are assumed?
> > But I
> > > > > > don't know the history behind the union type.
> > > > > >
> > > > > > Regards
> > > > > >
> > > > > > Antoine.
> > > > > >
> > > > > >
> > > > > > (*) https://arrow.apache.org/docs/format/Columnar.html#dense-union
> > > > >
> > > >
> >


Re: Dense unions: monotonic or strictly monotonic offsets?

2019-11-24 Thread Fan Liya
Hi Wes,

I agree with you that this is a data representation issue.

My point is that, data representation and data operation are closely
related.
As far as this issue is concerned, if we allow several values in the union
vector to be mapped to the same value in the underlying vector, it is
possible that when we modify one value in the union vector, the other value
is also modified, which is unexpected.

This is a problem with our current specification, because our
vectors/arrays provide set/write APIs.
So we may need a "coherency protocol" to define the behavior (e.g. copy on
write) when trying to modify a shared value, IMO.

Best,
Liya Fan

On Sat, Nov 23, 2019 at 3:31 AM Wes McKinney  wrote:

> hi Liya,
>
> I don't understand your point -- we are strictly discussing data
> representation here I believe. From a data representation perspective,
> there is no conflict with repeated or non-monotonic offset values.
>
> On Fri, Nov 22, 2019 at 1:49 AM Fan Liya  wrote:
> >
> > This is an interesting question.
> > IMO, to support repeated values, we also need to design a "coherency
> > protocol", to avoid the scenario where once a value is witten, the change
> > is propagated to another slot unexpectedly.
> >
> > Best,
> > Liya Fan
> >
> > On Fri, Nov 22, 2019 at 1:34 PM Micah Kornfield 
> > wrote:
> >
> > > Hmm, I also thought the intention was monotonically increasing. I can't
> > > think of a strong reason one way or another. If the argument about
> code to
> > > do random access is the same in all cases, is there any benefit to
> forcing
> > > any order at all?  Memory prefetching?
> > >
> > > On Thu, Nov 21, 2019 at 11:48 AM Wes McKinney 
> wrote:
> > >
> > > > hi Antoine,
> > > >
> > > > It's a good question.
> > > >
> > > > The intent when we wrote the specification was to be strictly
> > > > monotonic, but there seems nothing especially harmful about relaxing
> > > > the constraint to allow for repeated values or even non-monotonicity
> > > > (strict or otherwise). For example, if we had the union
> > > >
> > > > ['a', 'a', 'a', 0, 1, 'b', 'b']
> > > >
> > > > then this could be represented as
> > > >
> > > > type_ids: [0, 0, 0, 1, 1, 0, 0]
> > > > offsets: [0, 0, 0, 0, 1, 1, 1]
> > > > child[0]: ['a', 'b']
> > > > child[1]: [0, 1]
> > > >
> > > > or
> > > >
> > > > type_ids: [0, 0, 0, 1, 1, 0, 0]
> > > > offsets: [1, 1, 1, 0, 1, 0, 0]
> > > > child[0]: ['b', 'a']
> > > > child[1]: [0, 1]
> > > >
> > > > What do others think? Either way some clarification in the
> > > > specification would be useful. Because the code used to do random
> > > > access is the same in all cases, I feel weakly supportive of removing
> > > > constraints on the offsets.
> > > >
> > > > - Wes
> > > >
> > > > On Thu, Nov 21, 2019 at 9:04 AM Antoine Pitrou 
> > > wrote:
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > I'd like some clarification on the spec and intent for dense
> arrays.
> > > > >
> > > > > Currently, it is specified that offsets of a dense union are "in
> order
> > > /
> > > > > increasing" (*).  However, it is not obvious whether repeated
> values
> > > are
> > > > > allowed or not.
> > > > >
> > > > > I suspect the intent is to avoid having people exploit unions as
> some
> > > > > kind of poor man's dictionaries.  Also, perhaps some optimizations
> are
> > > > > possible if monotonic or strictly monotonic indices are assumed?
> But I
> > > > > don't know the history behind the union type.
> > > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > > >
> > > > >
> > > > > (*) https://arrow.apache.org/docs/format/Columnar.html#dense-union
> > > >
> > >
>


[jira] [Created] (ARROW-7249) [CI] Relase test fails in master due to new arrow-flight Rust crate

2019-11-24 Thread Andy Grove (Jira)
Andy Grove created ARROW-7249:
-

 Summary: [CI] Relase test fails in master due to new arrow-flight 
Rust crate
 Key: ARROW-7249
 URL: https://issues.apache.org/jira/browse/ARROW-7249
 Project: Apache Arrow
  Issue Type: Bug
  Components: CI
Reporter: Andy Grove
Assignee: Andy Grove
 Fix For: 1.0.0


See https://github.com/apache/arrow/runs/318192961



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2019-11-24-0

2019-11-24 Thread Crossbow


Arrow Build Report for Job nightly-2019-11-24-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0

Failed Tasks:
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-osx-clang-py27
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-osx-clang-py37
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-travis-homebrew-cpp
- test-conda-python-2.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-2.7-pandas-master
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-dask-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-dask-master
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.7
- test-ubuntu-14.04-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-ubuntu-14.04-cpp
- test-ubuntu-fuzzit:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-ubuntu-fuzzit

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-centos-6
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-centos-7
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-centos-8
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-linux-gcc-py27
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-linux-gcc-py37
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-conda-win-vs2015-py37
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-debian-buster
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-azure-debian-stretch
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-travis-gandiva-jar-osx
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-travis-gandiva-jar-trusty
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-travis-macos-r-autobrew
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-cpp
- test-conda-python-2.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-2.7-pandas-latest
- test-conda-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-2.7
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-python-3.6
- test-conda-r-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-11-24-0-circle-test-conda-r-3.6
- 

[jira] [Created] (ARROW-7248) Automatically Regenerate IPC messages from Flatbuffers

2019-11-24 Thread Martin Grund (Jira)
Martin Grund created ARROW-7248:
---

 Summary: Automatically Regenerate IPC messages from Flatbuffers
 Key: ARROW-7248
 URL: https://issues.apache.org/jira/browse/ARROW-7248
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Martin Grund


It would be great if there was an automatic way to regenerate the code for the 
Flatbuffer input files. This makes following the mainline development easier.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)