[jira] [Created] (ARROW-9099) Add TRIM function for string

2020-06-10 Thread Sagnik Chakraborty (Jira)
Sagnik Chakraborty created ARROW-9099:
-

 Summary: Add TRIM function for string
 Key: ARROW-9099
 URL: https://issues.apache.org/jira/browse/ARROW-9099
 Project: Apache Arrow
  Issue Type: Task
Reporter: Sagnik Chakraborty






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Move JIRA notifications to separate mailing list?

2020-06-10 Thread Wes McKinney
I just requested jira@arrow.a.o be created

On Wed, Jun 10, 2020, 11:06 PM Neal Richardson 
wrote:

> I like it. That has some symmetry with how we handle github notifications:
> github@ for everything, commits@ for just commits.
>
> I looked into creating new mailing lists, and it appears that it's not
> available to all PMC members, only the PMC chair:
> https://selfserve.apache.org/mail.html. But once that is done, I'm happy
> to
> handle working with INFRA on updating what goes where.
>
> Neal
>
>
> On Wed, Jun 10, 2020 at 8:16 PM Wes McKinney  wrote:
>
> > Here's my proposal:
> >
> > * Create new mailing list j...@arrow.apache.org and move all JIRA
> > activity to that list (what currently goes to issues@)
> > * Send new issues notifications to issues@arrow.a.o. Stop sending
> > these e-mails to dev@
> > * Encourage dev@ subscribers to subscribe to issues@arrow.a.o
> >
> > Absent dissent I would suggest going ahead and asking INFRA to do
> > this. Note that any PMC member can create the new jira@ mailing list
> > (do this first, don't ask INFRA to do it)
> >
> > On Mon, Jun 8, 2020 at 2:33 PM Wes McKinney  wrote:
> > >
> > > I'm openly not very sympathetic toward people who don't take time to
> > > set up e-mail filters but I support having two e-mail lists:
> > >
> > > * One having new issues only. I think that active developers need to
> > > see new issues to create awareness of what others are doing in the
> > > project, so I think we should really encourage people to subscribe to
> > > this list (and set up an e-mail filter if they don't want the e-mails
> > > coming into their inbox). While I think having less "noise" on dev@ is
> > > a good thing (even though it's only "noise" if you don't set up e-mail
> > > filters) I'm concerned that this action will decrease developer
> > > engagement in the project. There are of course other ways [1] to
> > > subscribe to the JIRA activity feed if getting notifications in Slack
> > > or Zulip is your thing.
> > > * One having all JIRA traffic (i.e. what is currently at
> > > https://lists.apache.org/list.html?iss...@arrow.apache.org)
> > >
> > > [1]: https://github.com/ursa-labs/jira-zulip-bridge
> > >
> > > On Mon, Jun 8, 2020 at 1:57 PM Antoine Pitrou 
> > wrote:
> > > >
> > > >
> > > > I would welcome a separate list, but only with notifications of new
> > JIRA
> > > > issues.  I am not interested in generic JIRA traffic.
> > > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
> > > >
> > > > Le 08/06/2020 à 20:46, Neal Richardson a écrit :
> > > > > And if you're like me, and this message got filtered out of your
> > inbox
> > > > > because it is from dev@ and contains "JIRA" in the subject, well,
> > maybe
> > > > > that demonstrates the problem ;)
> > > > >
> > > > > On Mon, Jun 8, 2020 at 11:43 AM Neal Richardson <
> > neal.p.richard...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >> Hi all,
> > > > >> I've noticed that some other Apache projects have a separate
> > mailing list
> > > > >> for JIRA notifications (Spark, for example, has
> > iss...@spark.apache.org).
> > > > >> The result is that the dev@ mailing list is focused on actual
> > discussions
> > > > >> threads (like this!), votes, and other official business. Would we
> > be
> > > > >> interested in doing the same?
> > > > >>
> > > > >> In my opinion, the status quo is not great. The dev@ archives (
> > > > >> https://lists.apache.org/list.html?dev@arrow.apache.org) aren't
> > that
> > > > >> readable/browseable to me, and if I want to see what's going on in
> > JIRA, I
> > > > >> go to JIRA. In fact, the first thing I/we recommend to people
> > signing up
> > > > >> for the mailing list is to set up email filters to exclude the
> JIRA
> > noise.
> > > > >> Having a separate mailing list will make it easier for people to
> > manage
> > > > >> their own informations streams better.
> > > > >>
> > > > >> The counterargument is that moving JIRA traffic to a separate
> > mailing
> > > > >> list, requiring an additional subscribe action, might mean that
> > developers
> > > > >> miss out on things like new issues being created. I'm not
> personally
> > > > >> worried about this because I suspect that many of us already
> aren't
> > using
> > > > >> the mailing list to stay on top of JIRA issues, and that those who
> > want the
> > > > >> JIRA stream in their email can easily opt-in (subscribe). But I'm
> > > > >> interested in the community's opinions on this.
> > > > >>
> > > > >> Thoughts?
> > > > >>
> > > > >> Neal
> > > > >>
> > > > >
> >
>


Re: [DISCUSS] Move JIRA notifications to separate mailing list?

2020-06-10 Thread Neal Richardson
I like it. That has some symmetry with how we handle github notifications:
github@ for everything, commits@ for just commits.

I looked into creating new mailing lists, and it appears that it's not
available to all PMC members, only the PMC chair:
https://selfserve.apache.org/mail.html. But once that is done, I'm happy to
handle working with INFRA on updating what goes where.

Neal


On Wed, Jun 10, 2020 at 8:16 PM Wes McKinney  wrote:

> Here's my proposal:
>
> * Create new mailing list j...@arrow.apache.org and move all JIRA
> activity to that list (what currently goes to issues@)
> * Send new issues notifications to issues@arrow.a.o. Stop sending
> these e-mails to dev@
> * Encourage dev@ subscribers to subscribe to issues@arrow.a.o
>
> Absent dissent I would suggest going ahead and asking INFRA to do
> this. Note that any PMC member can create the new jira@ mailing list
> (do this first, don't ask INFRA to do it)
>
> On Mon, Jun 8, 2020 at 2:33 PM Wes McKinney  wrote:
> >
> > I'm openly not very sympathetic toward people who don't take time to
> > set up e-mail filters but I support having two e-mail lists:
> >
> > * One having new issues only. I think that active developers need to
> > see new issues to create awareness of what others are doing in the
> > project, so I think we should really encourage people to subscribe to
> > this list (and set up an e-mail filter if they don't want the e-mails
> > coming into their inbox). While I think having less "noise" on dev@ is
> > a good thing (even though it's only "noise" if you don't set up e-mail
> > filters) I'm concerned that this action will decrease developer
> > engagement in the project. There are of course other ways [1] to
> > subscribe to the JIRA activity feed if getting notifications in Slack
> > or Zulip is your thing.
> > * One having all JIRA traffic (i.e. what is currently at
> > https://lists.apache.org/list.html?iss...@arrow.apache.org)
> >
> > [1]: https://github.com/ursa-labs/jira-zulip-bridge
> >
> > On Mon, Jun 8, 2020 at 1:57 PM Antoine Pitrou 
> wrote:
> > >
> > >
> > > I would welcome a separate list, but only with notifications of new
> JIRA
> > > issues.  I am not interested in generic JIRA traffic.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > Le 08/06/2020 à 20:46, Neal Richardson a écrit :
> > > > And if you're like me, and this message got filtered out of your
> inbox
> > > > because it is from dev@ and contains "JIRA" in the subject, well,
> maybe
> > > > that demonstrates the problem ;)
> > > >
> > > > On Mon, Jun 8, 2020 at 11:43 AM Neal Richardson <
> neal.p.richard...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hi all,
> > > >> I've noticed that some other Apache projects have a separate
> mailing list
> > > >> for JIRA notifications (Spark, for example, has
> iss...@spark.apache.org).
> > > >> The result is that the dev@ mailing list is focused on actual
> discussions
> > > >> threads (like this!), votes, and other official business. Would we
> be
> > > >> interested in doing the same?
> > > >>
> > > >> In my opinion, the status quo is not great. The dev@ archives (
> > > >> https://lists.apache.org/list.html?dev@arrow.apache.org) aren't
> that
> > > >> readable/browseable to me, and if I want to see what's going on in
> JIRA, I
> > > >> go to JIRA. In fact, the first thing I/we recommend to people
> signing up
> > > >> for the mailing list is to set up email filters to exclude the JIRA
> noise.
> > > >> Having a separate mailing list will make it easier for people to
> manage
> > > >> their own informations streams better.
> > > >>
> > > >> The counterargument is that moving JIRA traffic to a separate
> mailing
> > > >> list, requiring an additional subscribe action, might mean that
> developers
> > > >> miss out on things like new issues being created. I'm not personally
> > > >> worried about this because I suspect that many of us already aren't
> using
> > > >> the mailing list to stay on top of JIRA issues, and that those who
> want the
> > > >> JIRA stream in their email can easily opt-in (subscribe). But I'm
> > > >> interested in the community's opinions on this.
> > > >>
> > > >> Thoughts?
> > > >>
> > > >> Neal
> > > >>
> > > >
>


RE: [C++][Discuss] Approaches for SIMD optimizations

2020-06-10 Thread Du, Frank
Thanks Jed.

I collect some data on my setup, gcc version 7.5.0, 18.04.4 LTS, SSE 
build(-msse4.2)

[Unroll baseline]
for (int64_t i = 0; i < length_rounded; i += kRoundFactor) {
  for (int64_t k = 0; k < kRoundFactor; k++) {
sum_rounded[k] += values[i + k];
  }
}
SumKernelFloat/32768/02.91 us 2.90 us   239992 
bytes_per_second=10.5063G/s null_percent=0 size=32.768k
SumKernelDouble/32768/0   1.89 us 1.89 us   374470 
bytes_per_second=16.1847G/s null_percent=0 size=32.768k
SumKernelInt8/32768/0 11.6 us 11.6 us60329 
bytes_per_second=2.63274G/s null_percent=0 size=32.768k
SumKernelInt16/32768/06.98 us 6.98 us   100293 
bytes_per_second=4.3737G/s null_percent=0 size=32.768k
SumKernelInt32/32768/03.89 us 3.88 us   180423 
bytes_per_second=7.85862G/s null_percent=0 size=32.768k
SumKernelInt64/32768/01.86 us 1.85 us   380477 
bytes_per_second=16.4536G/s null_percent=0 size=32.768k

[#pragma omp simd reduction(+:sum)]
#pragma omp simd reduction(+:sum)
for (int64_t i = 0; i < n; i++)
sum += values[i];
SumKernelFloat/32768/02.97 us 2.96 us   235686 
bytes_per_second=10.294G/s null_percent=0 size=32.768k
SumKernelDouble/32768/0   2.97 us 2.97 us   236456 
bytes_per_second=10.2875G/s null_percent=0 size=32.768k
SumKernelInt8/32768/0 11.7 us 11.7 us60006 
bytes_per_second=2.61643G/s null_percent=0 size=32.768k
SumKernelInt16/32768/05.47 us 5.47 us   127999 
bytes_per_second=5.58002G/s null_percent=0 size=32.768k
SumKernelInt32/32768/02.42 us 2.41 us   290635 
bytes_per_second=12.6485G/s null_percent=0 size=32.768k
SumKernelInt64/32768/01.82 us 1.82 us   386749 
bytes_per_second=16.7733G/s null_percent=0 size=32.768k

[SSE intrinsic]
SumKernelFloat/32768/02.24 us 2.24 us   310914 
bytes_per_second=13.6335G/s null_percent=0 size=32.768k
SumKernelDouble/32768/0   1.43 us 1.43 us   486642 
bytes_per_second=21.3266G/s null_percent=0 size=32.768k
SumKernelInt8/32768/0 6.93 us 6.92 us   100720 
bytes_per_second=4.41046G/s null_percent=0 size=32.768k
SumKernelInt16/32768/03.14 us 3.14 us   222803 
bytes_per_second=9.72931G/s null_percent=0 size=32.768k
SumKernelInt32/32768/02.11 us 2.11 us   331388 
bytes_per_second=14.4907G/s null_percent=0 size=32.768k
SumKernelInt64/32768/01.32 us 1.32 us   532964 
bytes_per_second=23.0728G/s null_percent=0 size=32.768k

I tried to tweak the kRoundFactor or using some unroll based omp simd, or build 
with clang-8, unluckily I never can get the results up to intrinsic. The ASM 
code generated all use SIMD instructions, only some small difference like 
instruction sequences or xmm register used. The things under compiler is really 
some secret for me.

Thanks,
Frank

-Original Message-
From: Jed Brown  
Sent: Thursday, June 11, 2020 1:58 AM
To: Du, Frank ; dev@arrow.apache.org
Subject: RE: [C++][Discuss] Approaches for SIMD optimizations

"Du, Frank"  writes:

> The PR I committed provide a basic support for runtime dispatching. I 
> agree that complier should generate good vectorize for the non-null 
> data part but in fact it didn't, jedbrown point to it can force 
> complier to SIMD using some additional pragmas, something like 
> "#pragma omp simd reduction(+:sum)", I will try this pragma later but 
> need figure out if it need a linking against OpenMP.

It does not require linking OpenMP.  You just compile with -fopenmp-simd
(gcc/clang) or -qopenmp-simd (icc) so that it interprets the "omp simd"
pragmas.  (These can be captured in macros using _Pragma.)

Note that you get automatic vectorization for this sort of thing without any 
OpenMP if you add -funsafe-math-optimizations (included in -ffast-math).

  https://gcc.godbolt.org/z/8thgru

Many projects don't want -funsafe-math-optimizations because there are places 
where it can hurt numerical stability.  ICC includes unsafe math in normal 
optimization levels while GCC and Clang are more conservative.


[jira] [Created] (ARROW-9098) RecordBatch::ToStructArray cannot handle record batches with 0 column

2020-06-10 Thread Zhuo Peng (Jira)
Zhuo Peng created ARROW-9098:


 Summary: RecordBatch::ToStructArray cannot handle record batches 
with 0 column
 Key: ARROW-9098
 URL: https://issues.apache.org/jira/browse/ARROW-9098
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 0.17.1
Reporter: Zhuo Peng


If RecordBatch::ToStructArray is called against a record batch with 0 column, 
the following error will be raised:

Invalid: Can't infer struct array length with 0 child arrays



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Move JIRA notifications to separate mailing list?

2020-06-10 Thread Wes McKinney
Here's my proposal:

* Create new mailing list j...@arrow.apache.org and move all JIRA
activity to that list (what currently goes to issues@)
* Send new issues notifications to issues@arrow.a.o. Stop sending
these e-mails to dev@
* Encourage dev@ subscribers to subscribe to issues@arrow.a.o

Absent dissent I would suggest going ahead and asking INFRA to do
this. Note that any PMC member can create the new jira@ mailing list
(do this first, don't ask INFRA to do it)

On Mon, Jun 8, 2020 at 2:33 PM Wes McKinney  wrote:
>
> I'm openly not very sympathetic toward people who don't take time to
> set up e-mail filters but I support having two e-mail lists:
>
> * One having new issues only. I think that active developers need to
> see new issues to create awareness of what others are doing in the
> project, so I think we should really encourage people to subscribe to
> this list (and set up an e-mail filter if they don't want the e-mails
> coming into their inbox). While I think having less "noise" on dev@ is
> a good thing (even though it's only "noise" if you don't set up e-mail
> filters) I'm concerned that this action will decrease developer
> engagement in the project. There are of course other ways [1] to
> subscribe to the JIRA activity feed if getting notifications in Slack
> or Zulip is your thing.
> * One having all JIRA traffic (i.e. what is currently at
> https://lists.apache.org/list.html?iss...@arrow.apache.org)
>
> [1]: https://github.com/ursa-labs/jira-zulip-bridge
>
> On Mon, Jun 8, 2020 at 1:57 PM Antoine Pitrou  wrote:
> >
> >
> > I would welcome a separate list, but only with notifications of new JIRA
> > issues.  I am not interested in generic JIRA traffic.
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 08/06/2020 à 20:46, Neal Richardson a écrit :
> > > And if you're like me, and this message got filtered out of your inbox
> > > because it is from dev@ and contains "JIRA" in the subject, well, maybe
> > > that demonstrates the problem ;)
> > >
> > > On Mon, Jun 8, 2020 at 11:43 AM Neal Richardson 
> > > 
> > > wrote:
> > >
> > >> Hi all,
> > >> I've noticed that some other Apache projects have a separate mailing list
> > >> for JIRA notifications (Spark, for example, has iss...@spark.apache.org).
> > >> The result is that the dev@ mailing list is focused on actual discussions
> > >> threads (like this!), votes, and other official business. Would we be
> > >> interested in doing the same?
> > >>
> > >> In my opinion, the status quo is not great. The dev@ archives (
> > >> https://lists.apache.org/list.html?dev@arrow.apache.org) aren't that
> > >> readable/browseable to me, and if I want to see what's going on in JIRA, 
> > >> I
> > >> go to JIRA. In fact, the first thing I/we recommend to people signing up
> > >> for the mailing list is to set up email filters to exclude the JIRA 
> > >> noise.
> > >> Having a separate mailing list will make it easier for people to manage
> > >> their own informations streams better.
> > >>
> > >> The counterargument is that moving JIRA traffic to a separate mailing
> > >> list, requiring an additional subscribe action, might mean that 
> > >> developers
> > >> miss out on things like new issues being created. I'm not personally
> > >> worried about this because I suspect that many of us already aren't using
> > >> the mailing list to stay on top of JIRA issues, and that those who want 
> > >> the
> > >> JIRA stream in their email can easily opt-in (subscribe). But I'm
> > >> interested in the community's opinions on this.
> > >>
> > >> Thoughts?
> > >>
> > >> Neal
> > >>
> > >


[jira] [Created] (ARROW-9097) [Rust] Customizable schema inference for CSV

2020-06-10 Thread Sergey Todyshev (Jira)
Sergey Todyshev created ARROW-9097:
--

 Summary: [Rust] Customizable schema inference for CSV
 Key: ARROW-9097
 URL: https://issues.apache.org/jira/browse/ARROW-9097
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Sergey Todyshev


Please consider extracting infer_csv_schema function into separate module 
allowing customization of fields DataType inference. Currently the missing part 
is an inference of datetime fields.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9096) data type "integer" not understood: pandas roundtrip

2020-06-10 Thread Richard Wu (Jira)
Richard Wu created ARROW-9096:
-

 Summary: data type "integer" not understood: pandas roundtrip
 Key: ARROW-9096
 URL: https://issues.apache.org/jira/browse/ARROW-9096
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 0.17.1
Reporter: Richard Wu


The following will fail the roundtrip since the column indexes' pandas_type is 
converted from int64 to integer when an additional column is introduced and 
subsequently moved to the index:

 
{code:java}
df = pd.DataFrame(np.ones((3,1), index=[[1,2,3]])
df['foo'] = np.arange(3)
df = df.set_index('foo', append=True)
table = pyarrow.Table.from_pandas(df)
table.to_pandas()  # Errors{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Updating arrow website for 1.0 release

2020-06-10 Thread Neal Richardson
Hi all,
To accompany our 1.0 release, we should update our website to reflect where
the Arrow project currently stands and to orient it at a broader audience.
I've started a PR at https://github.com/apache/arrow-site/pull/63, and I'd
appreciate feedback and contributions. Feel free to push directly to my
branch, or just comment/make suggestions on the PR, whichever you prefer.

See the PR description for details of what I've been working on, as well as
a link to a preview version of the site. Among the things I could use input
on right now are the "use cases" and "getting started" guides--I'm pretty
familiar with some parts of the project but much less so with others, so if
there are examples and resources from your areas that should be made
visible, please add them.

As will surely be clear when you look at it, I am not a web designer
(though based on what I see on the site currently, none of you are either
;). I will try to get actual web design help before we go live, so please
don't be distracted (yet) by purely design choices that you don't like.

Thanks,
Neal


[jira] [Created] (ARROW-9095) [Rust] Fix NullArray to comply with spec

2020-06-10 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-9095:
-

 Summary: [Rust] Fix NullArray to comply with spec
 Key: ARROW-9095
 URL: https://issues.apache.org/jira/browse/ARROW-9095
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Affects Versions: 0.17.0
Reporter: Neville Dipale


When I implemented the NullArray, I didn't comply with the spec under the 
premise that I'd handle reading and writing IPC in a spec-compliant way as that 
looked like the easier approach.

After some integration testing, I realised that I wasn't doing it correctly, so 
it's better to comply with the spec by not allocating any buffers for the array.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


RE: [C++][Discuss] Approaches for SIMD optimizations

2020-06-10 Thread Jed Brown
"Du, Frank"  writes:

> The PR I committed provide a basic support for runtime dispatching. I
> agree that complier should generate good vectorize for the non-null
> data part but in fact it didn't, jedbrown point to it can force
> complier to SIMD using some additional pragmas, something like
> "#pragma omp simd reduction(+:sum)", I will try this pragma later but
> need figure out if it need a linking against OpenMP.

It does not require linking OpenMP.  You just compile with -fopenmp-simd
(gcc/clang) or -qopenmp-simd (icc) so that it interprets the "omp simd"
pragmas.  (These can be captured in macros using _Pragma.)

Note that you get automatic vectorization for this sort of thing without
any OpenMP if you add -funsafe-math-optimizations (included in
-ffast-math).

  https://gcc.godbolt.org/z/8thgru

Many projects don't want -funsafe-math-optimizations because there are
places where it can hurt numerical stability.  ICC includes unsafe math
in normal optimization levels while GCC and Clang are more conservative.


[jira] [Created] (ARROW-9094) [Python] Bump versions of compiled dependencies in manylinux wheels

2020-06-10 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9094:
-

 Summary: [Python] Bump versions of compiled dependencies in 
manylinux wheels
 Key: ARROW-9094
 URL: https://issues.apache.org/jira/browse/ARROW-9094
 Project: Apache Arrow
  Issue Type: Task
  Components: Packaging, Python
Reporter: Antoine Pitrou
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Arrow sync call June 10 at 12:00 US/Eastern, 16:00 UTC

2020-06-10 Thread Neal Richardson
Attendees:
Projjal Chanda
Jörn Horstmann
Kazuaki Ishizaki
Ben Kietzman
Micah Kornfield
Nishit Kumar
David Li
Wes McKinney
Rok Mihevc
Antoine Pitrou
Neal Richardson

* timeline for 1.0: aim for rc week of July 6
  * Discussion of forward-compatibility format changes to add
  * Integration testing status
  * New website (ML message forthcoming)
* SIMD discussion, following up from the mailing list

On Wed, Jun 10, 2020 at 8:53 AM Neal Richardson 
wrote:

> Hi all,
> Last minute reminder that our biweekly call is coming up at the top of the
> hour at https://meet.google.com/vtm-teks-phx. All are welcome to join.
> Notes will be sent out to the mailing list afterward.
>
> Neal
>
>


Arrow sync call June 10 at 12:00 US/Eastern, 16:00 UTC

2020-06-10 Thread Neal Richardson
Hi all,
Last minute reminder that our biweekly call is coming up at the top of the
hour at https://meet.google.com/vtm-teks-phx. All are welcome to join.
Notes will be sent out to the mailing list afterward.

Neal


python plasma client get_buffers behavior

2020-06-10 Thread saurabh pratap singh
Hi

We are using python plasma client to do a get_buffers for arrow tables
created by java in plasma .

The python plasma client basically polls on a queue and do a get_buffers on
the object ids returned from the queue.
What I have observed is tin context of plasma object table entry for those
object ids is that the get_buffers will first increment the ref count by 1
and then there is an implicit  release call which decreases the ref count
again .

But when there are no more entries in the queue I see that few object ids
still have a lingering reference count in plasma wrt to get_buffers and
there was no "implicit" release for that get call like previous one.

 Is this expected ?
Is there any way I can handle this and make and explicit release for such
object ids as well .

Thanks


[jira] [Created] (ARROW-9093) [FlightRPC][C++][Python] Allow setting gRPC client options

2020-06-10 Thread David Li (Jira)
David Li created ARROW-9093:
---

 Summary: [FlightRPC][C++][Python] Allow setting gRPC client options
 Key: ARROW-9093
 URL: https://issues.apache.org/jira/browse/ARROW-9093
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, FlightRPC, Python
Reporter: David Li
Assignee: David Li


There's no way to set generic gRPC options which are useful for tuning behavior 
(e.g. round-robin load balancing). Rather than bind all of these one by one, 
gRPC allows setting arguments as generic string-string or string-integer pairs; 
we could expose this (and leave the interpretation implementation-dependent).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9092) [C++] gandiva-decimal-test hangs with LLVM 9

2020-06-10 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9092:
---

 Summary: [C++] gandiva-decimal-test hangs with LLVM 9
 Key: ARROW-9092
 URL: https://issues.apache.org/jira/browse/ARROW-9092
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Wes McKinney


I built Gandiva C++ unittests with LLVM 9 on Ubuntu 18.04 and 
gandiva-decimal-test hangs forever



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9091) [C++] Utilize function's default options when passing no options to CallFunction to a function that requires them

2020-06-10 Thread Wes McKinney (Jira)
Wes McKinney created ARROW-9091:
---

 Summary: [C++] Utilize function's default options when passing no 
options to CallFunction to a function that requires them
 Key: ARROW-9091
 URL: https://issues.apache.org/jira/browse/ARROW-9091
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Wes McKinney
 Fix For: 1.0.0


Otherwise benign usage of {{CallFunction}} can cause an unintuitive segfault in 
some cases



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9090) [C++] Bump versions of bundled libraries

2020-06-10 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-9090:
-

 Summary: [C++] Bump versions of bundled libraries
 Key: ARROW-9090
 URL: https://issues.apache.org/jira/browse/ARROW-9090
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Antoine Pitrou
 Fix For: 1.0.0


We should bump the versions of bundled dependencies, wherever possible, to 
ensure that users get bugfixes and improvements made in those third-party 
libraries.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[NIGHTLY] Arrow Build Report for Job nightly-2020-06-10-0

2020-06-10 Thread Crossbow


Arrow Build Report for Job nightly-2020-06-10-0

All tasks: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0

Failed Tasks:
- homebrew-cpp:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-travis-homebrew-cpp
- homebrew-r-autobrew:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-travis-homebrew-r-autobrew
- test-conda-cpp-valgrind:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-github-test-conda-cpp-valgrind
- test-conda-python-3.7-dask-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-github-test-conda-python-3.7-dask-latest
- test-conda-python-3.7-spark-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-github-test-conda-python-3.7-spark-master
- test-conda-python-3.7-turbodbc-latest:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-github-test-conda-python-3.7-turbodbc-latest
- test-conda-python-3.7-turbodbc-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-github-test-conda-python-3.7-turbodbc-master
- test-conda-python-3.8-dask-master:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-github-test-conda-python-3.8-dask-master
- test-conda-python-3.8-jpype:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-github-test-conda-python-3.8-jpype
- ubuntu-bionic-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-travis-ubuntu-bionic-arm64
- ubuntu-focal-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-travis-ubuntu-focal-arm64

Pending Tasks:
- wheel-manylinux1-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-wheel-manylinux1-cp36m
- wheel-manylinux1-cp37m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-wheel-manylinux1-cp37m
- wheel-manylinux2010-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-wheel-manylinux2010-cp36m
- wheel-manylinux2010-cp38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-wheel-manylinux2010-cp38
- wheel-manylinux2014-cp36m:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-wheel-manylinux2014-cp36m

Succeeded Tasks:
- centos-6-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-github-centos-6-amd64
- centos-7-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-travis-centos-7-aarch64
- centos-7-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-github-centos-7-amd64
- centos-8-aarch64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-travis-centos-8-aarch64
- centos-8-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-github-centos-8-amd64
- conda-clean:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-conda-clean
- conda-linux-gcc-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-conda-linux-gcc-py36
- conda-linux-gcc-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-conda-linux-gcc-py37
- conda-linux-gcc-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-conda-linux-gcc-py38
- conda-osx-clang-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-conda-osx-clang-py36
- conda-osx-clang-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-conda-osx-clang-py37
- conda-osx-clang-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-conda-osx-clang-py38
- conda-win-vs2015-py36:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-conda-win-vs2015-py36
- conda-win-vs2015-py37:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-conda-win-vs2015-py37
- conda-win-vs2015-py38:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-azure-conda-win-vs2015-py38
- debian-buster-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-github-debian-buster-amd64
- debian-buster-arm64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-travis-debian-buster-arm64
- debian-stretch-amd64:
  URL: 
https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-06-10-0-github-debian-stretch-amd64
- 

RE: [C++][Discuss] Approaches for SIMD optimizations

2020-06-10 Thread Kazuaki Ishizaki
> My initial thought on this is that in the short-term would be to focus 
on
> the dynamic dispatch question (continue to build our own vs adopt an
> existing library) and lean the compiler for most vectorization. Using
> intrinsics should be limited to complex numerical functions and places
> where the compiler fails to vectorize/translate well (e.g. bit
> manipulations).

It looks good direction to reduce the number of binaries on one CPU 
architecture (e.g. x86).

Kazuaki Ishizaki



From:   Micah Kornfield 
To: dev 
Date:   2020/06/10 13:38
Subject:[EXTERNAL] Re: [C++][Discuss] Approaches for SIMD 
optimizations



A few thoughts on this as a high level:
1.  Most of the libraries don't support runtime dispatch (libsimdpp seems
to be the exception here), so we should decide if we want to roll our own
dynamic dispatch mechanism.
2.  It isn't clear to me in the linked PR if the performance delta between
SIMD generated code and what the compiler would generate.  For simple
aggregates of non-null data I would expect pretty good auto-vectorization.
Compiler auto-vectorization seems to get better over time.  For instance
the scalar example linked in the paper seems to get vectorized somewhat
under Clang 10 (
https://urldefense.proofpoint.com/v2/url?u=https-3A__godbolt.org_z_oPopQL=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-ZCGj9Pg=ht-eat3JsBM5dJhLw7VRHoPqIHBsAQqE88_UwgsYfws=3mGgzlyoLH0fsp96FfJPMdEQ0R6SWJ0TlcZdoT_9jzw=
 
).
3.  It appears there are some efforts to make a standardized C++ library
[1] which might be based on Vc.

My initial thought on this is that in the short-term would be to focus on
the dynamic dispatch question (continue to build our own vs adopt an
existing library) and lean the compiler for most vectorization. Using
intrinsics should be limited to complex numerical functions and places
where the compiler fails to vectorize/translate well (e.g. bit
manipulations).

If we do find the need for a dedicated library I would lean towards
something that will converge to a standard to reduce additional
dependencies in the long run. That being said most of these libraries seem
to be header only so the dependency is fairly light-weight, so we can
vendor them if need-be.

[1] 
https://urldefense.proofpoint.com/v2/url?u=https-3A__en.cppreference.com_w_cpp_experimental_simd=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-ZCGj9Pg=ht-eat3JsBM5dJhLw7VRHoPqIHBsAQqE88_UwgsYfws=xQJVFw4POmC1ssJxbzJVkt4_3WVjgMyKfyZ8SWXHYuc=
 






On Tue, Jun 9, 2020 at 3:32 AM Antoine Pitrou  wrote:

>
> Thank you.  xsimd used to require C++14, but apparently they have
> demoted it to C++11.  Good!
>
> Regards
>
> Antoine.
>
>
> Le 09/06/2020 à 12:04, Maarten Breddels a écrit :
> > Hi Antoine,
> >
> > Adding xsimd to the list of options:
> >  * 
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_xtensor-2Dstack_xsimd=DwIFaQ=jf_iaSHvJObTbx-siA1ZOg=b70dG_9wpCdZSkBJahHYQ4IwKMdp2hQM29f-ZCGj9Pg=ht-eat3JsBM5dJhLw7VRHoPqIHBsAQqE88_UwgsYfws=gJQs85klv9e8HffFnBYp5Fewxwg2TavyBShGb9frWi8=
 

> > Not sure how it compares to the rest though.
> >
> > cheers,
> >
> > Maarten
> >
>





[jira] [Created] (ARROW-9089) [Python] A PyFileSystem handler for fsspec-based filesystems

2020-06-10 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-9089:


 Summary: [Python] A PyFileSystem handler for fsspec-based 
filesystems
 Key: ARROW-9089
 URL: https://issues.apache.org/jira/browse/ARROW-9089
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Joris Van den Bossche


Follow-up on ARROW-8766 to use this machinery to add an FSSpecHandler



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9088) [Rust] Recent version of arrow crate does not compile into wasm target

2020-06-10 Thread Sergey Todyshev (Jira)
Sergey Todyshev created ARROW-9088:
--

 Summary: [Rust] Recent version of arrow crate does not compile 
into wasm target
 Key: ARROW-9088
 URL: https://issues.apache.org/jira/browse/ARROW-9088
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Sergey Todyshev


arrow 0.16 compiles successfully into wasm32-unknown-unknown, but recent git 
version does not. it would be nice to fix that.

compiler errors:

 
{noformat}
error[E0433]: failed to resolve: could not find `unix` in `os`
--> 
/home/regl/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:41:18
 |
  41 | use std::os::unix::ffi::OsStringExt;
 |   could not find `unix` in `os`
  
  error[E0432]: unresolved import `unix`
   --> 
/home/regl/.cargo/registry/src/github.com-1ecc6299db9ec823/dirs-1.0.5/src/lin.rs:6:5
|
  6 | use unix;
|  no `unix` in the root{noformat}
the problem is that prettytable-rs dependency depends on dirs which causes this 
error

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [C++][Discuss] Approaches for SIMD optimizations

2020-06-10 Thread Micah Kornfield
>
> I agree that complier should generate good vectorize for the non-null data
> part but in fact it didn't,  jedbrown point to it can force complier to
> SIMD using some additional pragmas, something like "#pragma omp simd
> reduction(+:sum)"

It is an interesting question why.  We won't always be able to rely on the
compiler but if it does something unexpected, I'm not sure the best thing
is to jump to intrinsics.

In this case I think most of the gain could be done  by adjusting: "
constexpr int64_t kRoundFactor = 8; " [1]

To be constexpr int64_t kRoundFactor = SIMD_REGISTER_SIZE / sizeof(Type);

[1]
https://github.com/apache/arrow/blob/efb707a5438380dcef78418668b57c3f60592a23/cpp/src/arrow/compute/kernels/aggregate_basic.cc#L143

On Tue, Jun 9, 2020 at 11:04 PM Du, Frank  wrote:

> The PR I committed provide a basic support for runtime dispatching. I
> agree that complier should generate good vectorize for the non-null data
> part but in fact it didn't,  jedbrown point to it can force complier to
> SIMD using some additional pragmas, something like "#pragma omp simd
> reduction(+:sum)", I will try this pragma later but need figure out if it
> need a linking against OpenMP. As I said in the PR, the next step is to
> provide acceleration for nullable data part which is more typical in real
> world and hard to vectorize by compiler. The nullable path of manual
> intrinsic is very easy for AVX512 thanks to native support of mask[1]. I
> has some initial try on SSE path locally and conclude no much gain can be
> achieved, but I would expect it will be totally different for AVX2 as more
> calculation bandwidth provide by AVX2. Consider most recent x86 hardware
> has avx2 support already thus I can remove the SSE intrinsic path anyway to
> reduce one burden.
>
> For the SIMD wrapper, it seems popular compute library(Numpy, openblas,
> etc.) are using intrinsic directly also. I heard numpy is trying to unify a
> single interface but still struggle for many reasons, the hardware provide
> similar interface but still too many difference in detail.
>
> [1] https://en.wikipedia.org/wiki/AVX-512#Opmask_registers
>
> Thanks,
> Frank
>
> -Original Message-
> From: Micah Kornfield 
> Sent: Wednesday, June 10, 2020 12:38 PM
> To: dev 
> Subject: Re: [C++][Discuss] Approaches for SIMD optimizations
>
> A few thoughts on this as a high level:
> 1.  Most of the libraries don't support runtime dispatch (libsimdpp seems
> to be the exception here), so we should decide if we want to roll our own
> dynamic dispatch mechanism.
> 2.  It isn't clear to me in the linked PR if the performance delta between
> SIMD generated code and what the compiler would generate.  For simple
> aggregates of non-null data I would expect pretty good auto-vectorization.
> Compiler auto-vectorization seems to get better over time.  For instance
> the scalar example linked in the paper seems to get vectorized somewhat
> under Clang 10 (https://godbolt.org/z/oPopQL).
> 3.  It appears there are some efforts to make a standardized C++ library
> [1] which might be based on Vc.
>
> My initial thought on this is that in the short-term would be to focus on
> the dynamic dispatch question (continue to build our own vs adopt an
> existing library) and lean the compiler for most vectorization. Using
> intrinsics should be limited to complex numerical functions and places
> where the compiler fails to vectorize/translate well (e.g. bit
> manipulations).
>
> If we do find the need for a dedicated library I would lean towards
> something that will converge to a standard to reduce additional
> dependencies in the long run. That being said most of these libraries seem
> to be header only so the dependency is fairly light-weight, so we can
> vendor them if need-be.
>
> [1] https://en.cppreference.com/w/cpp/experimental/simd
>
>
>
>
>
> On Tue, Jun 9, 2020 at 3:32 AM Antoine Pitrou  wrote:
>
> >
> > Thank you.  xsimd used to require C++14, but apparently they have
> > demoted it to C++11.  Good!
> >
> > Regards
> >
> > Antoine.
> >
> >
> > Le 09/06/2020 à 12:04, Maarten Breddels a écrit :
> > > Hi Antoine,
> > >
> > > Adding xsimd to the list of options:
> > >  * https://github.com/xtensor-stack/xsimd
> > > Not sure how it compares to the rest though.
> > >
> > > cheers,
> > >
> > > Maarten
> > >
> >
>


Re: [C++][Discuss] Approaches for SIMD optimizations

2020-06-10 Thread Yibo Cai

I did a quick investigation of libsimdpp, google highway[1], and 
Vc(std-simd)[2].

I tried rewriting simd utf8 validation code by unifying sse and neon intrinsics 
with libsimdpp wrappers. You can compare simdpp/sse/neon code[3].
Utf8 validation is non-trivial, but the porting is straightforward, easier than 
I thought. And simd wrappers unified mysterious names `_mm_shuffle_epi8`, 
`vqtbl1q_u8` to more friendly names like `simdpp::permute_bytes16` or 
`hwy::TableLookupBytes`.
But I failed finally due to some gaps of neon and sse4. Neon `tbl` supports 
lookup multiple tables, and return 0 if index is out of bound[4]. Sse4 `pshufb` 
lookup one table, and out of bound indices are handled much more complex[5]. In 
this specific code, neon behaviour is convenient. To unify the code, I have to 
abandon neon feature and sacrifice performance on arm. Of course it's also 
possible to improve libsimdpp/highway.

I think this is common for all simd wrappers. It can unify most code if vector 
length is the same. But there are always cases where arch dependent code is 
necessary, such as above example or advanced features like avx512 mask, which 
are not well supported by simd wrappers.

About performance, as described in google highway design philosophy[6], it 
achieves portability, maintainabilty and readability by sacrificing 10-20% 
performance. Sounds fair.

libsimdpp and highway look like mature products, claims to support 
gcc/clang/msvc, c++11, x86/arm/ppc. std-simd only supports gcc-9+, and arm/ppc 
support is poor now.

That said, I don't think leveraging simd wrapper alone will fix our problem, 
especially the code size, both source and binary. They are just shallow 
wrappers to intrinsics with more friendly api.

Personally, I prefer apply simd only to subroutines(e.g. the sum loop), not the 
whole kernel. It's simpler, of course we need to prevent exploding ifdef.
Simd kernel shares many common code with base kernel, moving the code to 
xxx_internal.h makes base kernel harder to read.

Besides, as Wes commented[7], it's better to put simd code in a standalone 
shared lib. Binary may explodes quickly, we will need 
#{i8,i16,i32,i64,float,double} * #{sse,avx,avx512 | neon,sve} simd code 
instances for a simple sum operation, though simd wrapper may help reducing 
source code size by its carefully designed templates.

[1] https://github.com/google/highway
[2] https://github.com/VcDevel/std-simd
[3] simdpp: https://github.com/cyb70289/utf8/blob/simdpp/range-simdpp.cc
sse: https://github.com/cyb70289/utf8/blob/simdpp/range-sse.c
neon: https://github.com/cyb70289/utf8/blob/simdpp/range-neon.c
[4] 
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics?search=vqtbl2q_u8
[5] 
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_shuffle_epi8=5153
[6] https://github.com/google/highway#design-philosophy
[7] https://github.com/apache/arrow/pull/7314#issuecomment-638972317

Yibo

On 6/9/20 5:34 PM, Antoine Pitrou wrote:


Hello,

As part of https://github.com/apache/arrow/pull/7314, a discussion
started about our strategy for adding SIMD optimizations to various
routines and kernels.

Currently, we have no defined strategy and we have been adding
hand-written SIMD-optimized functions for particular primitives and
instruction sets, thanks to the submissions of contributors.  For
example, the above PR adds ~500 lines of code for the purpose of
accelerating the SUM kernel, when the input has no nulls, on the SSE
instruction set.

However, it seems that this ad hoc approach may not scale very well.
There are several widely-used SIMD instruction sets out there (the most
common being SSE[2], AVX[2], AVX512, Neon... I suppose ARM SVE will come
into play at some point), and there will be many potential functions to
optimize once we start writing a comprehensive library of computation
kernels.  Adding hand-written implementations, using intrinsic
functions, for each {routine, instruction set} pair threatens to create
a large maintenance burden.

In that PR, I suggested that we instead take a look at the SIMD wrapper
libraries available in C++.  There are several available:
* MIPP (https://github.com/aff3ct/MIPP)
* Vc (https://github.com/VcDevel/Vc)
* libsimdpp (https://github.com/p12tic/libsimdpp)
* (others yet)

In the course of the discussion, an interesting paper was mentioned:
https://dl.acm.org/doi/pdf/10.1145/3178433.3178435
together with an implementation comparison of a simple function:
https://gitlab.inria.fr/acassagn/mandelbrot

The SIMD wrappers met skepticism from Frank, the PR submitter, on the
basis that performance may not be optimal and that not all desired
features may be provided (such as runtime dispatching).

However, we also have to account that, without a wrapper library, we
will probably only integrate and maintain a small fraction of the
optimized routines that would be otherwise possible with a more
abstracted approach.  So, while the 

RE: [C++][Discuss] Approaches for SIMD optimizations

2020-06-10 Thread Du, Frank
The PR I committed provide a basic support for runtime dispatching. I agree 
that complier should generate good vectorize for the non-null data part but in 
fact it didn't,  jedbrown point to it can force complier to SIMD using some 
additional pragmas, something like "#pragma omp simd reduction(+:sum)", I will 
try this pragma later but need figure out if it need a linking against OpenMP. 
As I said in the PR, the next step is to provide acceleration for nullable data 
part which is more typical in real world and hard to vectorize by compiler. The 
nullable path of manual intrinsic is very easy for AVX512 thanks to native 
support of mask[1]. I has some initial try on SSE path locally and conclude no 
much gain can be achieved, but I would expect it will be totally different for 
AVX2 as more calculation bandwidth provide by AVX2. Consider most recent x86 
hardware has avx2 support already thus I can remove the SSE intrinsic path 
anyway to reduce one burden.

For the SIMD wrapper, it seems popular compute library(Numpy, openblas, etc.) 
are using intrinsic directly also. I heard numpy is trying to unify a single 
interface but still struggle for many reasons, the hardware provide similar 
interface but still too many difference in detail. 

[1] https://en.wikipedia.org/wiki/AVX-512#Opmask_registers

Thanks,
Frank

-Original Message-
From: Micah Kornfield  
Sent: Wednesday, June 10, 2020 12:38 PM
To: dev 
Subject: Re: [C++][Discuss] Approaches for SIMD optimizations

A few thoughts on this as a high level:
1.  Most of the libraries don't support runtime dispatch (libsimdpp seems to be 
the exception here), so we should decide if we want to roll our own dynamic 
dispatch mechanism.
2.  It isn't clear to me in the linked PR if the performance delta between SIMD 
generated code and what the compiler would generate.  For simple aggregates of 
non-null data I would expect pretty good auto-vectorization.
Compiler auto-vectorization seems to get better over time.  For instance the 
scalar example linked in the paper seems to get vectorized somewhat under Clang 
10 (https://godbolt.org/z/oPopQL).
3.  It appears there are some efforts to make a standardized C++ library [1] 
which might be based on Vc.

My initial thought on this is that in the short-term would be to focus on the 
dynamic dispatch question (continue to build our own vs adopt an existing 
library) and lean the compiler for most vectorization. Using intrinsics should 
be limited to complex numerical functions and places where the compiler fails 
to vectorize/translate well (e.g. bit manipulations).

If we do find the need for a dedicated library I would lean towards something 
that will converge to a standard to reduce additional dependencies in the long 
run. That being said most of these libraries seem to be header only so the 
dependency is fairly light-weight, so we can vendor them if need-be.

[1] https://en.cppreference.com/w/cpp/experimental/simd





On Tue, Jun 9, 2020 at 3:32 AM Antoine Pitrou  wrote:

>
> Thank you.  xsimd used to require C++14, but apparently they have 
> demoted it to C++11.  Good!
>
> Regards
>
> Antoine.
>
>
> Le 09/06/2020 à 12:04, Maarten Breddels a écrit :
> > Hi Antoine,
> >
> > Adding xsimd to the list of options:
> >  * https://github.com/xtensor-stack/xsimd
> > Not sure how it compares to the rest though.
> >
> > cheers,
> >
> > Maarten
> >
>