date:20191213

[jira] [Created] (ARROW-7394) [C++] Implement zero-copy optimizations when performing Filter on ChunkedArray

2019-12-13 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-7394:
---

 Summary: [C++] Implement zero-copy optimizations when performing 
Filter on ChunkedArray
 Key: ARROW-7394
 URL: https://issues.apache.org/jira/browse/ARROW-7394
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney


For high-selectivity filters (most elements included), it may be wasteful and 
slow to copy large contiguous ranges of array chunks into the resulting 
ChunkedArray. Instead, we can scan the filter boolean array and slice off 
chunks of the source data rather than copying. 

We will need to empirically determine how large the contiguous range needs to 
be in order to merit the slice-based approach versus simple/native 
materialization. For example, in a filter array like

1 0 1 0 1 0 1 0 1

it would not make sense to slice 5 times because slicing carries some overhead. 
But if we had

1 ... 1 [100 1's] 0 1 ... 1 [100 1's] 0 1 ... 1 [100 1's] 0 1 ... 1 [100 1's] 

then performing 4 slices may be faster than doing a copy materialization. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Planned Support for ORC Dataset?

2019-12-13 Thread Wes McKinney

hi Jacques,

I agree with you, it's worth distinguishing between ORC in two different forms:

* The raw binary files
* The "dataset format" that is maintained by the Hive libraries. For
example, I don't think there is any practical way for us to handle
this in C++

It may be that without the 2nd bullet point not many use cases are enabled.

On Fri, Dec 13, 2019 at 1:17 PM Jacques Nadeau  wrote:
>
> To clarify, I don't really question the value. That was the wrong word. I
> question the benefit/value tradeoff. You've got two options it seems:
>
> - Support Orc without acid (solves a much smaller set of usecases/users)
> - Support Orc with acid (a magnitude more implementation work)
>
> On Fri, Dec 13, 2019 at 11:15 AM Jacques Nadeau  wrote:
>
> > I question the value of adding the Orc format. The format is fragmented
> > with the main tool writing it (hive) writing a version of the format (acid
> > v2) that can't be consumed by systems that only use the Orc libraries
> > (since they don't support acid). If you want to consume that data, you have
> > to depend on internal Hive code (which is only written in java).
> >
> > On Thu, Dec 12, 2019 at 2:49 PM Wes McKinney  wrote:
> >
> >> FWIW, the incremental effort of adding new data formats to the C++
> >> Datasets API should be relatively low. I think we even should document
> >> in broad terms how users can define their own data sources or file
> >> formats
> >>
> >> On Wed, Dec 11, 2019 at 4:19 PM Neal Richardson
> >>  wrote:
> >> >
> >> > Hi William,
> >> > ORC is part of the C++ Datasets grand vision: see
> >> >
> >> https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit#heading=h.22aikbvt54fv
> >> .
> >> > That said, I don't think anyone in the Arrow community is currently
> >> > prioritizing work on ORC, and we'd welcome contributions in that area.
> >> >
> >> > For a view of what open issues we have for ORC (at least for C++), see
> >> >
> >> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20in%20(%22C%2B%2B%22%2C%20%22C%2B%2B%20-%20Dataset%22)%20AND%20text%20~%20ORC%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC
> >> ,
> >> > though that's surely not an exhaustive list of ORC-related features one
> >> > could want.
> >> >
> >> > Neal
> >> >
> >> > On Wed, Dec 11, 2019 at 12:49 PM William Callaghan 
> >> > wrote:
> >> >
> >> > > Hi there,
> >> > >
> >> > > Not sure if this is the appropriate place, but I had done some
> >> searching
> >> > > and could not find anything with regards to supporting ORC datasets.
> >> I see
> >> > > that Parquet datasets are support (where a dataset could contain
> >> multiple
> >> > > Parquet files), but I do not see this for ORC (only the ability to
> >> read a
> >> > > single ORC file and not multiple, or nested ORCs -- ie. a directory
> >> with
> >> > > sub directories (indices) with corresponding orc files underneath).
> >> > >
> >> > > I'm wondering, does Arrow currently have support for nested ORC
> >> structures?
> >> > > If not, is this planned?
> >> > >
> >> > > Thank you.
> >> > > Regards,
> >> > > William
> >> > >
> >>
> >

Re: Planned Support for ORC Dataset?

2019-12-13 Thread Jacques Nadeau

To clarify, I don't really question the value. That was the wrong word. I
question the benefit/value tradeoff. You've got two options it seems:

- Support Orc without acid (solves a much smaller set of usecases/users)
- Support Orc with acid (a magnitude more implementation work)

On Fri, Dec 13, 2019 at 11:15 AM Jacques Nadeau  wrote:

> I question the value of adding the Orc format. The format is fragmented
> with the main tool writing it (hive) writing a version of the format (acid
> v2) that can't be consumed by systems that only use the Orc libraries
> (since they don't support acid). If you want to consume that data, you have
> to depend on internal Hive code (which is only written in java).
>
> On Thu, Dec 12, 2019 at 2:49 PM Wes McKinney  wrote:
>
>> FWIW, the incremental effort of adding new data formats to the C++
>> Datasets API should be relatively low. I think we even should document
>> in broad terms how users can define their own data sources or file
>> formats
>>
>> On Wed, Dec 11, 2019 at 4:19 PM Neal Richardson
>>  wrote:
>> >
>> > Hi William,
>> > ORC is part of the C++ Datasets grand vision: see
>> >
>> https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit#heading=h.22aikbvt54fv
>> .
>> > That said, I don't think anyone in the Arrow community is currently
>> > prioritizing work on ORC, and we'd welcome contributions in that area.
>> >
>> > For a view of what open issues we have for ORC (at least for C++), see
>> >
>> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20in%20(%22C%2B%2B%22%2C%20%22C%2B%2B%20-%20Dataset%22)%20AND%20text%20~%20ORC%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC
>> ,
>> > though that's surely not an exhaustive list of ORC-related features one
>> > could want.
>> >
>> > Neal
>> >
>> > On Wed, Dec 11, 2019 at 12:49 PM William Callaghan 
>> > wrote:
>> >
>> > > Hi there,
>> > >
>> > > Not sure if this is the appropriate place, but I had done some
>> searching
>> > > and could not find anything with regards to supporting ORC datasets.
>> I see
>> > > that Parquet datasets are support (where a dataset could contain
>> multiple
>> > > Parquet files), but I do not see this for ORC (only the ability to
>> read a
>> > > single ORC file and not multiple, or nested ORCs -- ie. a directory
>> with
>> > > sub directories (indices) with corresponding orc files underneath).
>> > >
>> > > I'm wondering, does Arrow currently have support for nested ORC
>> structures?
>> > > If not, is this planned?
>> > >
>> > > Thank you.
>> > > Regards,
>> > > William
>> > >
>>
>

Re: Planned Support for ORC Dataset?

2019-12-13 Thread Jacques Nadeau

I question the value of adding the Orc format. The format is fragmented
with the main tool writing it (hive) writing a version of the format (acid
v2) that can't be consumed by systems that only use the Orc libraries
(since they don't support acid). If you want to consume that data, you have
to depend on internal Hive code (which is only written in java).

On Thu, Dec 12, 2019 at 2:49 PM Wes McKinney  wrote:

> FWIW, the incremental effort of adding new data formats to the C++
> Datasets API should be relatively low. I think we even should document
> in broad terms how users can define their own data sources or file
> formats
>
> On Wed, Dec 11, 2019 at 4:19 PM Neal Richardson
>  wrote:
> >
> > Hi William,
> > ORC is part of the C++ Datasets grand vision: see
> >
> https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit#heading=h.22aikbvt54fv
> .
> > That said, I don't think anyone in the Arrow community is currently
> > prioritizing work on ORC, and we'd welcome contributions in that area.
> >
> > For a view of what open issues we have for ORC (at least for C++), see
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20in%20(%22C%2B%2B%22%2C%20%22C%2B%2B%20-%20Dataset%22)%20AND%20text%20~%20ORC%20ORDER%20BY%20updated%20DESC%2C%20priority%20DESC
> ,
> > though that's surely not an exhaustive list of ORC-related features one
> > could want.
> >
> > Neal
> >
> > On Wed, Dec 11, 2019 at 12:49 PM William Callaghan 
> > wrote:
> >
> > > Hi there,
> > >
> > > Not sure if this is the appropriate place, but I had done some
> searching
> > > and could not find anything with regards to supporting ORC datasets. I
> see
> > > that Parquet datasets are support (where a dataset could contain
> multiple
> > > Parquet files), but I do not see this for ORC (only the ability to
> read a
> > > single ORC file and not multiple, or nested ORCs -- ie. a directory
> with
> > > sub directories (indices) with corresponding orc files underneath).
> > >
> > > I'm wondering, does Arrow currently have support for nested ORC
> structures?
> > > If not, is this planned?
> > >
> > > Thank you.
> > > Regards,
> > > William
> > >
>

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

2019-12-13 Thread Jacques Nadeau

I support moving forward with the current proposal.

On Thu, Dec 12, 2019 at 12:20 PM David Li  wrote:

> Just following up here again, any other thoughts?
>
> I think we do have justifications for potentially separate streams in
> a call, but that's more of an orthogonal question - it doesn't need to
> be addressed here. I do agree that it very much complicates things.
>
> Thanks,
> David
>
> On 11/29/19, Wes McKinney  wrote:
> > I would generally agree with this. Note that you have the possibility
> > to use unions-of-structs to send record batches with different schemas
> > in the same stream, though with some added complexity on each side
> >
> > On Thu, Nov 28, 2019 at 10:37 AM Jacques Nadeau 
> wrote:
> >>
> >> I'd vote for explicitly not supported. We should keep our primitives
> >> narrow.
> >>
> >> On Wed, Nov 27, 2019, 1:17 PM David Li  wrote:
> >>
> >> > Thanks for the feedback.
> >> >
> >> > I do think if we had explicitly embraced gRPC from the beginning,
> >> > there are a lot of places where things could be made more ergonomic,
> >> > including with the metadata fields. But it would also have locked out
> >> > us of potential future transports.
> >> >
> >> > On another note: I hesitate to put too much into this method, but we
> >> > are looking at use cases where potentially, a client may want to
> >> > upload multiple distinct datasets (with differing schemas). (This is a
> >> > little tentative, and I can get more details...) Right now, each
> >> > logical stream in Flight must have a single, consistent schema; would
> >> > it make sense to look at ways to relax this, or declare this
> >> > explicitly out of scope (and require multiple calls and coordination
> >> > with the deployment topology) in order to accomplish this?
> >> >
> >> > Best,
> >> > David
> >> >
> >> > On 11/27/19, Jacques Nadeau  wrote:
> >> > > Fair enough. I'm okay with the bytes approach and the proposal looks
> >> > > good
> >> > > to me.
> >> > >
> >> > > On Fri, Nov 8, 2019 at 11:37 AM David Li 
> >> > > wrote:
> >> > >
> >> > >> I've updated the proposal.
> >> > >>
> >> > >> On the subject of Protobuf Any vs bytes, and how to handle
> >> > >> errors/metadata, I still think using bytes is preferable:
> >> > >> - It doesn't require (conditionally) exposing or wrapping Protobuf
> >> > types,
> >> > >> - We wouldn't be able to practically expose the Protobuf field to
> >> > >> C++
> >> > >> users without causing build pains,
> >> > >> - We can't let Python users take advantage of the Protobuf field
> >> > >> without somehow being compatible with the Protobuf wheels (by
> >> > >> linking
> >> > >> to the same version, and doing magic to turn the C++ Protobufs into
> >> > >> the Python ones),
> >> > >> - All our other application-defined fields are already bytes.
> >> > >>
> >> > >> Applications that want structure can encode JSON or Protobuf Any
> >> > >> into
> >> > >> the bytes field themselves, much as you can already do for Ticket,
> >> > >> commands in FlightDescriptors, and application metadata in
> >> > >> DoGet/DoPut. I don't think this is (much) less efficient than using
> >> > >> Any directly, since Any itself is a bytes field with a tag, and
> must
> >> > >> invoke the Protobuf deserializer again to read the actual message.
> >> > >>
> >> > >> If we decide on using bytes, then I don't think it makes sense to
> >> > >> define a new message with a oneof either, since it would be
> >> > >> redundant.
> >> > >>
> >> > >> Thanks,
> >> > >> David
> >> > >>
> >> > >> On 11/7/19, David Li  wrote:
> >> > >> > I've been extremely backlogged, I will update the proposal when I
> >> > >> > get
> >> > >> > a chance and reply here when done.
> >> > >> >
> >> > >> > Best,
> >> > >> > David
> >> > >> >
> >> > >> > On 11/7/19, Wes McKinney  wrote:
> >> > >> >> Bumping this discussion since a couple of weeks have passed. It
> >> > >> >> seems
> >> > >> >> there are still some questions here, could we summarize what are
> >> > >> >> the
> >> > >> >> alternatives along with any public API implications so we can
> try
> >> > >> >> to
> >> > >> >> render a decision?
> >> > >> >>
> >> > >> >> On Sat, Oct 26, 2019 at 7:19 PM David Li  >
> >> > >> >> wrote:
> >> > >> >>>
> >> > >> >>> Hi Wes,
> >> > >> >>>
> >> > >> >>> Responses inline:
> >> > >> >>>
> >> > >> >>> On Sat, Oct 26, 2019, 13:46 Wes McKinney 
> >> > wrote:
> >> > >> >>>
> >> > >> >>> > On Mon, Oct 21, 2019 at 7:40 PM David Li
> >> > >> >>> > 
> >> > >> >>> > wrote:
> >> > >> >>> > >
> >> > >> >>> > > The question is whether to repurpose the existing
> FlightData
> >> > >> >>> > > structure, and allow for the metadata field to be filled in
> >> > >> >>> > > and
> >> > >> data
> >> > >> >>> > > fields to be blank (as a control message), or to wrap the
> >> > >> FlightData
> >> > >> >>> > > structure in another structure that explicitly
> distinguishes
> >> > >> between
> >> > >> >>> > > control and data messages.
> >> > >> >>> >
> >> > >> >>> > I'm not super against

[jira] [Created] (ARROW-7393) Fix plasma build for Kava

2019-12-13 Thread Donatien Criaud (Jira)

Donatien Criaud created ARROW-7393:
--

 Summary: Fix plasma build for Kava
 Key: ARROW-7393
 URL: https://issues.apache.org/jira/browse/ARROW-7393
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Donatien Criaud


test.sh file to build plasma binaries for java has a critical error in a 
filepath



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7392) [Packaging] Add conda packaging tasks for python 3.8

2019-12-13 Thread Krisztian Szucs (Jira)

Krisztian Szucs created ARROW-7392:
--

 Summary: [Packaging] Add conda packaging tasks for python 3.8
 Key: ARROW-7392
 URL: https://issues.apache.org/jira/browse/ARROW-7392
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Packaging
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs


Conda-forge now supports python 3.8 so we should build the appropriate packages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7391) [Python] Remove unnecessary classes from the binding layer

2019-12-13 Thread Ben Kietzman (Jira)

Ben Kietzman created ARROW-7391:
---

 Summary: [Python] Remove unnecessary classes from the binding layer
 Key: ARROW-7391
 URL: https://issues.apache.org/jira/browse/ARROW-7391
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Ben Kietzman


Several Python classes introduced by https://github.com/apache/arrow/pull/5237 
are unnecessary and can be removed in favor of simple functions which produce 
opaque pointers, including the PartitionScheme and Expression classes. These 
should be removed to reduce cognitive overhead of the Python datasets API and 
to loosen coupling between Python and C++.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7390) [C++][Dataset] Concurrency race in Projector::Project

2019-12-13 Thread Francois Saint-Jacques (Jira)

Francois Saint-Jacques created ARROW-7390:
-

 Summary: [C++][Dataset] Concurrency race in Projector::Project 
 Key: ARROW-7390
 URL: https://issues.apache.org/jira/browse/ARROW-7390
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Francois Saint-Jacques


When a DataFragment is invoked by 2 scan tasks of the same DataFragment, 
there's a race to invoke SetInputSchema. Note that ResizeMissingColumns also 
suffers from this race. The ideal goal is to make Project a const method.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7389) [Python][Packaging] Remove pyarrow.s3fs import check from the recipe

2019-12-13 Thread Krisztian Szucs (Jira)

Krisztian Szucs created ARROW-7389:
--

 Summary: [Python][Packaging] Remove pyarrow.s3fs import check from 
the recipe
 Key: ARROW-7389
 URL: https://issues.apache.org/jira/browse/ARROW-7389
 Project: Apache Arrow
  Issue Type: Bug
  Components: Packaging, Python
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-7388) [Python] Skip HDFS tests if libhdfs cannot be located

2019-12-13 Thread Krisztian Szucs (Jira)

Krisztian Szucs created ARROW-7388:
--

 Summary: [Python] Skip HDFS tests if libhdfs cannot be located
 Key: ARROW-7388
 URL: https://issues.apache.org/jira/browse/ARROW-7388
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Krisztian Szucs


CI is failing because libhdfs is not installed. We should skip the HDFS tests 
in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[NIGHTLY] Arrow Build Report for Job nightly-2019-12-13-0

2019-12-13 Thread Crossbow



Arrow Build Report for Job nightly-2019-12-13-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0

Failed Tasks:
- centos-7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-centos-7
- conda-linux-gcc-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-conda-linux-gcc-py27
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-conda-linux-gcc-py37
- conda-osx-clang-py27:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-conda-osx-clang-py27
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-conda-osx-clang-py37
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-conda-win-vs2015-py37
- debian-buster:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-debian-buster
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-travis-gandiva-jar-osx
- gandiva-jar-trusty:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-travis-gandiva-jar-trusty
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-travis-homebrew-cpp
- macos-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-travis-macos-r-autobrew
- test-ubuntu-16.04-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-circle-test-ubuntu-16.04-cpp
- test-ubuntu-18.04-python-3:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-circle-test-ubuntu-18.04-python-3
- wheel-manylinux1-cp27m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-wheel-manylinux1-cp27m
- wheel-manylinux1-cp27mu:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-wheel-manylinux1-cp27mu
- wheel-manylinux1-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-wheel-manylinux1-cp35m
- wheel-manylinux1-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-wheel-manylinux1-cp36m
- wheel-manylinux1-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-wheel-manylinux1-cp37m
- wheel-manylinux1-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-wheel-manylinux1-cp38
- wheel-manylinux2010-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-wheel-manylinux2010-cp35m

Succeeded Tasks:
- centos-6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-centos-6
- centos-8:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-centos-8
- debian-stretch:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-azure-debian-stretch
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-circle-test-conda-cpp
- test-conda-python-2.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-circle-test-conda-python-2.7-pandas-latest
- test-conda-python-2.7:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-circle-test-conda-python-2.7
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-circle-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-circle-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-circle-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-pandas-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-circle-test-conda-python-3.7-pandas-latest
- test-conda-python-3.7-pandas-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2019-12-13-0-circle-test-conda-python-3.7-pandas-master
- test-conda-python-3.7-spark-master:
  URL:

Re: [Gandiva] How to optimize per CPU feature

2019-12-13 Thread Ravindra Pindikura

On Fri, Dec 13, 2019 at 3:41 PM Yibo Cai  wrote:

> Hi,
>
> Thanks to pravindra's patch [1], Gandiva loop vectorization is okay now.
>
> Will Gandiva detects CPU feature at runtime? My test CPU supports sse to
> avx2, but I only
> see "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" in IR, and final code
> doesn't leverage
> registers longer than 128.
>

Can you please give some details about the hardware/OS-version you are
running this on ? Also, are you building the binaries and running them on
the same host ?

> [1] https://github.com/apache/arrow/pull/6019
>

-- 
Thanks and regards,
Ravindra.

[Gandiva] How to optimize per CPU feature

2019-12-13 Thread Yibo Cai


Hi,

Thanks to pravindra's patch [1], Gandiva loop vectorization is okay now.

Will Gandiva detects CPU feature at runtime? My test CPU supports sse to avx2, 
but I only
see "target-features"="+fxsr,+mmx,+sse,+sse2,+x87" in IR, and final code 
doesn't leverage
registers longer than 128.

[1] https://github.com/apache/arrow/pull/6019

[jira] [Created] (ARROW-7394) [C++] Implement zero-copy optimizations when performing Filter on ChunkedArray

Re: Planned Support for ORC Dataset?

Re: Planned Support for ORC Dataset?

Re: Planned Support for ORC Dataset?

Re: [Discuss][FlightRPC] Extensions to Flight: "DoBidirectional"

[jira] [Created] (ARROW-7393) Fix plasma build for Kava

[jira] [Created] (ARROW-7392) [Packaging] Add conda packaging tasks for python 3.8

[jira] [Created] (ARROW-7391) [Python] Remove unnecessary classes from the binding layer

[jira] [Created] (ARROW-7390) [C++][Dataset] Concurrency race in Projector::Project

[jira] [Created] (ARROW-7389) [Python][Packaging] Remove pyarrow.s3fs import check from the recipe

[jira] [Created] (ARROW-7388) [Python] Skip HDFS tests if libhdfs cannot be located

[NIGHTLY] Arrow Build Report for Job nightly-2019-12-13-0

Re: [Gandiva] How to optimize per CPU feature

[Gandiva] How to optimize per CPU feature

14 matches

Site Navigation

Mail list logo

Footer information