date:20200511

Re: Problem with building C++ flight code

2020-05-11 Thread Fan Liya

Hi Neal,

Thanks a lot for your kind reply.
I have opened ARROW-8771 to track it.

Best,
Liya Fan


On Tue, May 12, 2020 at 4:03 AM Neal Richardson 
wrote:

> We maintain a slimmer version of boost that only includes the modules Arrow
> requires. See
> https://github.com/apache/arrow/blob/master/cpp/build-support/trim-boost.sh
> .
>
> I don't see process.hpp included, so it must not have been documented
> previously that we required it. We can add it to the list. Could you make a
> JIRA please?
>
> FWIW I see #include  in only 3 places in `cpp/`, all of
> which are test helpers. So I am guessing that if you build with the C++
> tests turned off, then it should compile.
>
> Neal
>
> On Mon, May 11, 2020 at 2:25 AM Fan Liya  wrote:
>
> > Hi Antoine,
> >
> > I manually downloaded a boost package from https://www.boost.org/
> > <
> >
> https://www.boost.org/doc/libs/1_73_0/doc/html/process.html#boost_process.introduction
> > >,
> > and verified that you are right.
> > Thank you!!!
> >
> > It seems the one automatically downloaded when running make command is
> not
> > complete either, from the content
> > of boost_ep-prefix/src/boost_ep-stamp/boost_ep-urlinfo.txt file,
> > the package seems to be downloaded from:
> >
> > repository='external project URL'
> > module='
> >
> >
> https://dl.bintray.com/ursalabs/arrow-boost/boost_1_71_0.tar.gz;https://dl.bintray.com/boostorg/release/1.71.0/source/boost_1_71_0.tar.gz;https://github.com/boostorg/boost/archive/boost-1.71.0.tar.gz;https://github.com/ursa-labs/thirdparty/releases/download/latest/boost_1_71_0.tar.gz
> > '
> > tag=''
> >
> > Best,
> > Liya Fan
> >
> >
> > On Mon, May 11, 2020 at 3:45 PM Antoine Pitrou 
> wrote:
> >
> > >
> > > Le 11/05/2020 à 05:37, Fan Liya a écrit :
> > > > Hi Wes,
> > > >
> > > > Thanks a lot for your kind reply.
> > > >
> > > > It did not work for me.
> > > > Please note that we have two boost libraries in our system: one is
> > > > installed by running yum install ...; and the other is downloaded by
> > GNU
> > > > make when building the code.
> > > > Neither of the boost libraries has the boost/process.hpp file.
> > > >
> > > > It seems the  boost/process.hpp file no longer exists in high version
> > > boost
> > > > libraries.
> > >
> > > I doubt so. This is the doc for version 1.73.0:
> > >
> > >
> >
> https://www.boost.org/doc/libs/1_73_0/doc/html/process.html#boost_process.introduction
> > >
> > > Your boost install is probably incomplete.
> > > You probably want something like `yum install boost-process`.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> >
>

[jira] [Created] (ARROW-8771) [C++] Add boost/process library to build support

2020-05-11 Thread Liya Fan (Jira)

Liya Fan created ARROW-8771:
---

 Summary: [C++] Add boost/process library to build support
 Key: ARROW-8771
 URL: https://issues.apache.org/jira/browse/ARROW-8771
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Liya Fan


Some of our test source code requires the process.hpp file (and its dependent 
libraries). Our current build support does not include these files, causing 
build failures like:

fatal error: boost/process.hpp: No such file or directory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8770) [C++][CI] enable arrow-csv-test on s390x

2020-05-11 Thread Kazuaki Ishizaki (Jira)

Kazuaki Ishizaki created ARROW-8770:
---

 Summary: [C++][CI] enable arrow-csv-test on s390x
 Key: ARROW-8770
 URL: https://issues.apache.org/jira/browse/ARROW-8770
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Kazuaki Ishizaki


arrow-csv-test on s390x looks good with the latest master.

{code}
  Start 25: arrow-csv-test
24/75 Test #25: arrow-csv-test    Passed0.31 sec
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8769) [C++] Add convenience methods to access fields by name in StructScalar

2020-05-11 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-8769:
---

 Summary: [C++] Add convenience methods to access fields by name in 
StructScalar
 Key: ARROW-8769
 URL: https://issues.apache.org/jira/browse/ARROW-8769
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


This would improve usability of this type



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [C++] Revamping approach to Arrow compute kernel development

2020-05-11 Thread Wes McKinney

I'm working actively on this but perhaps as expected it has ballooned
into a very large project -- it's unclear at the moment whether I'll
be able to break the work into smaller patches that are easier to
digest. I'm working as fast as I can to have an initial
feature-preserving PR up, but the changes to the src/arrow/compute
directory are extensive, so I would please ask that we do not merge
non-essential patches into cpp/src/arrow/compute until I get the PR up
(estimated later this week at present rate) so we can see where things
stand.

On Wed, Apr 22, 2020 at 8:06 AM Wes McKinney  wrote:
>
> On Wed, Apr 22, 2020 at 12:41 AM Micah Kornfield  
> wrote:
> >
> > Hi Wes,
> > I haven't had time to read the doc, but wanted to ask some questions on
> > points raised on the thread.
> >
> > * For efficiency, kernels used for array-expr evaluation should write
> > > into preallocated memory as their default mode. This enables the
> > > interpreter to avoid temporary memory allocations and improve CPU
> > > cache utilization. Almost none of our kernels are implemented this way
> > > currently.
> >
> > Did something change, I was pretty sure I submitted a patch a while ago for
> > boolean kernels, that separated out memory allocation from computation.
> > Which should allow for writing to the same memory.  Is this a concern with
> > the public Function APIs for the Kernel APIs themselves, or a lower level
> > implementation concern?
>
> Yes, you did in the internal implementation [1]. The concern is the
> public API and the general approach to implementing new kernels.
>
> I'm working on this right now (it's a large project so it will take me
> a little while to produce something to be reviewed) so bear with me =)
>
> [1]: 
> https://github.com/apache/arrow/commit/4910fbf4fda05b864daaba820db08291e4afdcb6#diff-561ea05d36150eb15842f452e3f07c76
>
> > * Sorting is generally handled by different data processing nodes from
> > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> > > Projections and Filters use expressions, they do not sort.
> >
> > Would sorting the list-column elements per row be an array-expr?
>
> Yes, as that's an element-wise function. When I said sorting I was
> referring to ORDER BY. The functions we have that do sorting do so in
> the context of a single array [2].
>
> A query engine must be able to sort a (potentially very large) stream
> of record batches. One approach is for the Sort operator to exhaust
> its child input, accumulating all of the record batches in memory
> (spilling to disk as needed) and then sorting and emitting record
> batches from the sorted records/tuples. See e.g. Impala's sorting code
> [3] [4]
>
> [2]: 
> https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/sort_to_indices.h#L34
> [3]: https://github.com/apache/impala/blob/master/be/src/runtime/sorter.h
> [4]: https://github.com/apache/impala/blob/master/be/src/exec/sort-node.h
>
> >
> > On Tue, Apr 21, 2020 at 5:35 AM Wes McKinney  wrote:
> >
> > > On Tue, Apr 21, 2020 at 7:32 AM Antoine Pitrou  wrote:
> > > >
> > > >
> > > > Le 21/04/2020 à 13:53, Wes McKinney a écrit :
> > > > >>
> > > > >> That said, in the SortToIndices case, this wouldn't be a problem,
> > > since
> > > > >> only the second pass writes to the output.
> > > > >
> > > > > This kernel is not valid for normal array-exprs (see the spreadsheet I
> > > > > linked), such as what you can write in SQL
> > > > >
> > > > > Kernels like SortToIndices are a different type of function (in other
> > > > > words, "not a SQL function") and so if we choose to allow such a
> > > > > "non-SQL-like" functions in the expression evaluator then different
> > > > > logic must be used.
> > > >
> > > > Hmm, I think that maybe I'm misunderstanding at which level we're
> > > > talking here.  SortToIndices() may not be a "SQL function", but it looks
> > > > like an important basic block for a query engine (since, after all,
> > > > sorting results is an often used feature in SQL and other languages).
> > > > So it should be usable *inside* the expression engine, even though it's
> > > > not part of the exposed vocabulary, no?
> > >
> > > No, not as part of "expressions" as they are defined in the context of
> > > SQL engines.
> > >
> > > Sorting is generally handled by different data processing nodes from
> > > Projections, Aggregations / Hash Aggregations, Filters, and Joins.
> > > Projections and Filters use expressions, they do not sort.
> > >
> > > > Regards
> > > >
> > > > Antoine.
> > >

[jira] [Created] (ARROW-8768) [R][CI] Fix nightly as-cran spurious failure

2020-05-11 Thread Neal Richardson (Jira)

Neal Richardson created ARROW-8768:
--

 Summary: [R][CI] Fix nightly as-cran spurious failure
 Key: ARROW-8768
 URL: https://issues.apache.org/jira/browse/ARROW-8768
 Project: Apache Arrow
  Issue Type: Bug
  Components: Continuous Integration, R
Reporter: Neal Richardson
Assignee: Neal Richardson
 Fix For: 1.0.0


An extra check we added to ensure that the package doesn't write anything to 
the user's home directory started failing on one of the 5 as-cran checks. It 
appears that a new feature of texlive2020, which is apparently invoked on 
checking that the pdf manual can be built, adds some caching junk to the home 
dir. It is unlikely that this is a real failure, probably just an artifact of 
the test environment. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: Problem with building C++ flight code

2020-05-11 Thread Neal Richardson

We maintain a slimmer version of boost that only includes the modules Arrow
requires. See
https://github.com/apache/arrow/blob/master/cpp/build-support/trim-boost.sh.

I don't see process.hpp included, so it must not have been documented
previously that we required it. We can add it to the list. Could you make a
JIRA please?

FWIW I see #include  in only 3 places in `cpp/`, all of
which are test helpers. So I am guessing that if you build with the C++
tests turned off, then it should compile.

Neal

On Mon, May 11, 2020 at 2:25 AM Fan Liya  wrote:

> Hi Antoine,
>
> I manually downloaded a boost package from https://www.boost.org/
> <
> https://www.boost.org/doc/libs/1_73_0/doc/html/process.html#boost_process.introduction
> >,
> and verified that you are right.
> Thank you!!!
>
> It seems the one automatically downloaded when running make command is not
> complete either, from the content
> of boost_ep-prefix/src/boost_ep-stamp/boost_ep-urlinfo.txt file,
> the package seems to be downloaded from:
>
> repository='external project URL'
> module='
>
> https://dl.bintray.com/ursalabs/arrow-boost/boost_1_71_0.tar.gz;https://dl.bintray.com/boostorg/release/1.71.0/source/boost_1_71_0.tar.gz;https://github.com/boostorg/boost/archive/boost-1.71.0.tar.gz;https://github.com/ursa-labs/thirdparty/releases/download/latest/boost_1_71_0.tar.gz
> '
> tag=''
>
> Best,
> Liya Fan
>
>
> On Mon, May 11, 2020 at 3:45 PM Antoine Pitrou  wrote:
>
> >
> > Le 11/05/2020 à 05:37, Fan Liya a écrit :
> > > Hi Wes,
> > >
> > > Thanks a lot for your kind reply.
> > >
> > > It did not work for me.
> > > Please note that we have two boost libraries in our system: one is
> > > installed by running yum install ...; and the other is downloaded by
> GNU
> > > make when building the code.
> > > Neither of the boost libraries has the boost/process.hpp file.
> > >
> > > It seems the  boost/process.hpp file no longer exists in high version
> > boost
> > > libraries.
> >
> > I doubt so. This is the doc for version 1.73.0:
> >
> >
> https://www.boost.org/doc/libs/1_73_0/doc/html/process.html#boost_process.introduction
> >
> > Your boost install is probably incomplete.
> > You probably want something like `yum install boost-process`.
> >
> > Regards
> >
> > Antoine.
> >
>

Re: [Python][Documentation] Add column limit recommendations Parquet page

2020-05-11 Thread Maarten Ballintijn

Hi Wes, others,

Thanks for your reply and the nice test case. Based on you experiment I decided 
to dig a little deeper
to further my understanding. I build the master branch, with the tools. 
I used parquet-scan to do some test. (my understanding is that this is an easy 
way
to look at reading performance, please correct me if I’m wrong)

> On May 9, 2020, at 5:49 PM, Wes McKinney  wrote:
> 
> hi Maarten,
> 
> I added d...@parquet.apache.org to this (if you are not subscribed to
> this list you may want to)
> 
> I made a quick notebook to help illustrate:
> 
> https://gist.github.com/wesm/cabf684db3ce8fdd6df27cf782f7226e
> 
> Summary:
> 
> * Files with 1000+ columns can see the metadata-to-data ratio exceed
> 10% (in the example I made it's 15-20%).
> * The time to deserialize whole files with many columns starts to
> balloon superlinearly with extremely wide files

The meta data (at the column level, as reported by parquet-dump-schema) is 
indeed significant for shorter columns.
Part of this might be the effect of the dictionary encoding? It seems to have 2 
pages per column (dictionary & data?)

Doing some experiments, I noticed 3 things,

1) using —columns=0 to just read one column with parquet-scan, reading the file 
meta-data scales linearly with
number of columns. This time is substantial (0.4s for 100k columns)

2) specifying sets of columns. The reading time per column is constant. On my 
system about 0.12s per
additional 5k columns. (this is with 5k rows)

3) I can actually read/scan 20k columns out of 100k in 0.95s. This is really 
good. 

Doing this from python these numbers are very different. 
The one column case goes to 3.5s and 0.22s per additional 5k columns.

Is it well understood that using python is that much slower? Are my 
observations correct?

> On Sat, May 9, 2020 at 4:28 PM Maarten Ballintijn  wrote:
>> 
>> Wes,
>> 
>> "Users would be well advised to not write columns with large numbers (> 
>> 1000) of columns"
>> You've mentioned this before and as this is in my experience not an uncommon 
>> use-case can you maybe expand a bit on the following related questions. 
>> (use-cases include daily or minute data for a few 10's of thousands items 
>> like stocks or other financial instruments, IoT sensors, etc).
>> 
>> Parquet Standard - Is the issue intrinsic to the Parquet standard you think? 
>> The ability to read a sub-set of the columns and/or row-groups, compact 
>> storage through the use of RLE, categoricals etc, all seem to point to the 
>> format being well suited for these use-cases
> 
> Parquet files by design are pretty heavy on metadata -- which is fine
> when the number of columns is small. When files have many columns, the
> costs associated with dealing with the file metadata really add up
> because the ratio of metadata to data in the file becomes skewed.
> Also, the common FileMetaData must be entirely parsed even when you
> only want to read one column.

I see about 20MB for the 100k columns file. So 200 bytes per column seems a 
lot, but this only becomes an issue when you have many columns (and 100k a 
factor two to there then our use-case). How much of this is Arrow overhead? 

>> Parquet-C++ implementation - Is the issue with current Parquet-C++ 
>> implementation, or any of the dependencies? Is it something which could be 
>> fixed? Would a specialized implementation help? Is the problem related to 
>> going from Parquet -> Arrow -> Python/Pandas? E.g. would a Parquet -> numpy 
>> reader work better?
> 
> No, it's not an issue specific to the C++ implementation.

Yes, the parquet-scan tests indicate that as well.

>> Alternatives - What would you recommend as a superior solution? Store this 
>> data tall i.s.o wide? Use another storage format?
> 
> It really depends on your particular use case. You can try other
> solutions (e.g. Arrow IPC / Feather files, or row-oriented data
> formats) and see what works best

Oh not, not row oriented data-formats :-)  We do like (need?) the ability to 
read a subset of the columns.
If anyone else has other suggestions then I’d definitely love to hear about it.

I’m going to redo some of these experiments with real data to see if it 
translates.

Cheers,
Maarten.

Re: [DISCUSS] Need for Arrow 0.17.1 patch release (binary only?)

2020-05-11 Thread Krisztián Szűcs

On Fri, May 8, 2020 at 9:58 PM Wes McKinney  wrote:
>
> From the release milestone
> (https://issues.apache.org/jira/projects/ARROW/versions/12348202) it
> looks like there are a few remaining issues
>
> ARROW-8684 -- there is a PR but we don't understand why it works. If
> we can't figure it out but the patch fixes the broken wheel issue we
> may have to accept it for now
> ARROW-8741: in progress
> ARROW-8726: No patch yet available (does this need to block? since
> this is a bug in bleeding edge functionality)
>
> Who would be able to cut the patch release? Do you think it will be
> possible to reduce the steps involved with producing this release to
> ease the work required of the RM?
I'll do it.

The patch release affects all of the C++ based packages we cannot really skip
any steps of creating a release candidate but we can spare a couple of post
release steps to not ship unaffected implementations.

Additionally we'll need to include multiple packaging related patches to have
passing build - I'm going through the changelog.

>
> On Thu, May 7, 2020 at 7:30 AM Antoine Pitrou  wrote:
> >
> >
> > There's also ARROW-8728 and ARROW-8704.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 07/05/2020 à 14:25, Francois Saint-Jacques a écrit :
> > > I'll add https://issues.apache.org/jira/browse/ARROW-8726 to the list.
> > >
> > > On Tue, May 5, 2020 at 6:52 PM Wes McKinney  wrote:
> > >>
> > >> Sorry I haven't had enough coffee today.
> > >>
> > >> The patches that still need to be resolved AFAICT are ARROW-8684 and
> > >> ARROW-8706 (AKA PARQUET-1857), so it will take a little while yet
> > >>
> > >> On Tue, May 5, 2020 at 5:18 PM Wes McKinney  wrote:
> > >>>
> > >>> I just added it to the milestone in JIRA. In general if you want to
> > >>> add a patch to a patch release, you can go into JIRA and add the fix
> > >>> version
> > >>>
> > >>> I think the most critical patches have all been merged now, is there a
> > >>> reason to delay much longer in cutting a release? We might use the
> > >>> opportunity to produce a slimmed down "Release Management for Patch
> > >>> Releases" guide with some steps omitted (hopefully)
> > >>>
> > >>> On Tue, May 5, 2020 at 2:55 PM Paul Taylor  
> > >>> wrote:
> > 
> >  Would it be possible to include the variant.hpp update
> >   for nvcc in 0.17.1?
> > 
> >  Thanks,
> > 
> >  Paul
> > 
> >  On 5/4/20 4:17 PM, Wes McKinney wrote:
> > > hi folks,
> > >
> > > We have accumulated a few regressions
> > >
> > > ARROW-8657 https://github.com/apache/arrow/pull/7089
> > > ARROW-8694 https://github.com/apache/arrow/pull/7103
> > >
> > > there may be a few others.
> > >
> > > I think we should try to make a "streamlined" patch release (after
> > > surveying incoming bug reports for other serious regressions) if
> > > possible focused on providing patched binaries to the impacted users
> > > (in the above, this would be any user of the Parquet portion of the
> > > C++ library). The hope would be to be able to trim down the work
> > > required of a release manager in a normal major release in these
> > > scenarios where we need to get out bugfixes sooner.
> > >
> > > Thoughts?
> > >
> > > Thanks
> > > Wes

[jira] [Created] (ARROW-8767) [C++] Make ThreadPool task ordering configurable

2020-05-11 Thread Antoine Pitrou (Jira)

Antoine Pitrou created ARROW-8767:
-

 Summary: [C++] Make ThreadPool task ordering configurable
 Key: ARROW-8767
 URL: https://issues.apache.org/jira/browse/ARROW-8767
 Project: Apache Arrow
  Issue Type: Wish
  Components: C++
Reporter: Antoine Pitrou


We may want to choose a task ordering strategy when constructing a ThreadPool.

To make the ordering strategy configurable, we may want to externalize it in a 
separate JobQueue class.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8766) [Python] A FileSystem implementation based on Python callbacks

2020-05-11 Thread Joris Van den Bossche (Jira)

Joris Van den Bossche created ARROW-8766:


 Summary: [Python] A FileSystem implementation based on Python 
callbacks
 Key: ARROW-8766
 URL: https://issues.apache.org/jira/browse/ARROW-8766
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Joris Van den Bossche


The new {{pyarrow.fs}} filesystems are now actual C++ objects, and no longer 
"just" a python interface. So they can't easily be expanded from the Python 
side, and the existing integration with {{fsspec}} filesystems is therefore 
also not working anymore. 

One possible solution is  to have a C++ filesystem that calls back into a 
python object for each of its methods (possibly similar to how you can 
implement a flight server in Python, I suppose). 

Such a FileSystem implementation would allow to make a {{pyarrow.fs}} wrapper 
for {{fsspec}} filesystems, and thus allow such filesystems to be used in 
pyarrow where new filesystems are expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8765) [C++] Design Scheduler API

2020-05-11 Thread Antoine Pitrou (Jira)

Antoine Pitrou created ARROW-8765:
-

 Summary: [C++] Design Scheduler API
 Key: ARROW-8765
 URL: https://issues.apache.org/jira/browse/ARROW-8765
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8764) [C++] Make ThreadPool configurable in ReadRangeCache

2020-05-11 Thread Antoine Pitrou (Jira)

Antoine Pitrou created ARROW-8764:
-

 Summary: [C++] Make ThreadPool configurable in ReadRangeCache
 Key: ARROW-8764
 URL: https://issues.apache.org/jira/browse/ARROW-8764
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8763) [C++] Create RandomAccessFile::WillNeed-like API

2020-05-11 Thread Antoine Pitrou (Jira)

Antoine Pitrou created ARROW-8763:
-

 Summary: [C++] Create RandomAccessFile::WillNeed-like API
 Key: ARROW-8763
 URL: https://issues.apache.org/jira/browse/ARROW-8763
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Antoine Pitrou


We need to inform RandomAccessFile that we will need a given range or number of 
ranges.
Also call that method from MemoryMappedFile::Read and friends.

Also perhaps write specialized ReadAsync implementations?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8762) [C++][Gandiva] Replace Gandiva's BitmapAnd with common implementation

2020-05-11 Thread Wes McKinney (Jira)

Wes McKinney created ARROW-8762:
---

 Summary: [C++][Gandiva] Replace Gandiva's BitmapAnd with common 
implementation
 Key: ARROW-8762
 URL: https://issues.apache.org/jira/browse/ARROW-8762
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, C++ - Gandiva
Reporter: Wes McKinney
 Fix For: 1.0.0


Now that the arrow/util/bit_util.h implementation has been optimized, we should 
just use that one



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (ARROW-8761) [C++] Improve the performance of minmax kernel

2020-05-11 Thread Liya Fan (Jira)

Liya Fan created ARROW-8761:
---

 Summary: [C++] Improve the performance of minmax kernel
 Key: ARROW-8761
 URL: https://issues.apache.org/jira/browse/ARROW-8761
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Liya Fan
Assignee: Liya Fan


We improve the performance of the max-min kernel with the simple idea: if the 
current value is smaller than the current min value; then there is no need to 
compare it against the current max value, because it must be smaller than the 
current max value. 

This simple trick reduces the expected number of comparisons from 2n to 1.5n, 
which can be notable for large arrays. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[NIGHTLY] Arrow Build Report for Job nightly-2020-05-11-0

2020-05-11 Thread Crossbow



Arrow Build Report for Job nightly-2020-05-11-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0

Failed Tasks:
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-test-conda-python-3.7-spark-master
- test-r-linux-as-cran:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-test-r-linux-as-cran
- wheel-win-cp35m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-appveyor-wheel-win-cp35m
- wheel-win-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-appveyor-wheel-win-cp36m
- wheel-win-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-appveyor-wheel-win-cp37m
- wheel-win-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-appveyor-wheel-win-cp38

Succeeded Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-centos-6-amd64
- centos-7-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-travis-centos-7-aarch64
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-centos-7-amd64
- centos-8-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-travis-centos-8-aarch64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-centos-8-amd64
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-azure-conda-win-vs2015-py38
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-debian-buster-amd64
- debian-buster-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-travis-debian-buster-arm64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-debian-stretch-amd64
- debian-stretch-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-travis-debian-stretch-arm64
- gandiva-jar-osx:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-travis-gandiva-jar-osx
- gandiva-jar-xenial:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-travis-gandiva-jar-xenial
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-travis-homebrew-r-autobrew
- nuget:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-nuget
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-test-conda-cpp-valgrind
- test-conda-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-test-conda-cpp
- test-conda-python-3.6-pandas-0.23:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-test-conda-python-3.6-pandas-0.23
- test-conda-python-3.6:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-test-conda-python-3.6
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-hdfs-2.9.2:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-05-11-0-github-test-conda-python-3.7-hdfs-2.9.2
- test-conda-python-3.7-kartothek-latest:
  URL:

Re: Problem with building C++ flight code

2020-05-11 Thread Fan Liya

Hi Antoine,

I manually downloaded a boost package from https://www.boost.org/
,
and verified that you are right.
Thank you!!!

It seems the one automatically downloaded when running make command is not
complete either, from the content
of boost_ep-prefix/src/boost_ep-stamp/boost_ep-urlinfo.txt file,
the package seems to be downloaded from:

repository='external project URL'
module='
https://dl.bintray.com/ursalabs/arrow-boost/boost_1_71_0.tar.gz;https://dl.bintray.com/boostorg/release/1.71.0/source/boost_1_71_0.tar.gz;https://github.com/boostorg/boost/archive/boost-1.71.0.tar.gz;https://github.com/ursa-labs/thirdparty/releases/download/latest/boost_1_71_0.tar.gz
'
tag=''

Best,
Liya Fan


On Mon, May 11, 2020 at 3:45 PM Antoine Pitrou  wrote:

>
> Le 11/05/2020 à 05:37, Fan Liya a écrit :
> > Hi Wes,
> >
> > Thanks a lot for your kind reply.
> >
> > It did not work for me.
> > Please note that we have two boost libraries in our system: one is
> > installed by running yum install ...; and the other is downloaded by GNU
> > make when building the code.
> > Neither of the boost libraries has the boost/process.hpp file.
> >
> > It seems the  boost/process.hpp file no longer exists in high version
> boost
> > libraries.
>
> I doubt so. This is the doc for version 1.73.0:
>
> https://www.boost.org/doc/libs/1_73_0/doc/html/process.html#boost_process.introduction
>
> Your boost install is probably incomplete.
> You probably want something like `yum install boost-process`.
>
> Regards
>
> Antoine.
>

Re: Problem with building C++ flight code

2020-05-11 Thread Antoine Pitrou



Le 11/05/2020 à 05:37, Fan Liya a écrit :
> Hi Wes,
> 
> Thanks a lot for your kind reply.
> 
> It did not work for me.
> Please note that we have two boost libraries in our system: one is
> installed by running yum install ...; and the other is downloaded by GNU
> make when building the code.
> Neither of the boost libraries has the boost/process.hpp file.
> 
> It seems the  boost/process.hpp file no longer exists in high version boost
> libraries.

I doubt so. This is the doc for version 1.73.0:
https://www.boost.org/doc/libs/1_73_0/doc/html/process.html#boost_process.introduction

Your boost install is probably incomplete.
You probably want something like `yum install boost-process`.

Regards

Antoine.

Re: Problem with building C++ flight code

[jira] [Created] (ARROW-8771) [C++] Add boost/process library to build support

[jira] [Created] (ARROW-8770) [C++][CI] enable arrow-csv-test on s390x

[jira] [Created] (ARROW-8769) [C++] Add convenience methods to access fields by name in StructScalar

Re: [C++] Revamping approach to Arrow compute kernel development

[jira] [Created] (ARROW-8768) [R][CI] Fix nightly as-cran spurious failure

Re: Problem with building C++ flight code

Re: [Python][Documentation] Add column limit recommendations Parquet page

Re: [DISCUSS] Need for Arrow 0.17.1 patch release (binary only?)

[jira] [Created] (ARROW-8767) [C++] Make ThreadPool task ordering configurable

[jira] [Created] (ARROW-8766) [Python] A FileSystem implementation based on Python callbacks

[jira] [Created] (ARROW-8765) [C++] Design Scheduler API

[jira] [Created] (ARROW-8764) [C++] Make ThreadPool configurable in ReadRangeCache

[jira] [Created] (ARROW-8763) [C++] Create RandomAccessFile::WillNeed-like API

[jira] [Created] (ARROW-8762) [C++][Gandiva] Replace Gandiva's BitmapAnd with common implementation

[jira] [Created] (ARROW-8761) [C++] Improve the performance of minmax kernel

[NIGHTLY] Arrow Build Report for Job nightly-2020-05-11-0

Re: Problem with building C++ flight code

Re: Problem with building C++ flight code

19 matches

Site Navigation

Mail list logo

Footer information