[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-10-07 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17613939#comment-17613939
 ] 

Kouhei Sutou commented on ARROW-15678:
--

We don't have this problem with {{-DCMAKE_BUILD_TYPE=Release}}. So it may work 
with most cases.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Assignee: Kouhei Sutou
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-10-07 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17613930#comment-17613930
 ] 

Antoine Pitrou commented on ARROW-15678:


Well, I don't think we can force the compiler to inline _everything_.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Assignee: Kouhei Sutou
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-10-07 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17613912#comment-17613912
 ] 

Kouhei Sutou commented on ARROW-15678:
--

I propose one more approach for the proposed solution 1.:

How about always enabling inline optimization for SIMD optimized compile units 
({{level_conversion_bmi2.cc}}) even when an user specifies 
{{-DCMAKE_BUILD_TYPE=MinSizeRel}}?

It may increases binary size but it may be better that SIMD related code 
prioritizes performance than binary size.

We don't need to write manual 
{{template}}/{{\_\_attribute\_\_((always\_inline))}}s with this approach.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Assignee: Kouhei Sutou
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-10-07 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17613909#comment-17613909
 ] 

Kouhei Sutou commented on ARROW-15678:
--

Summary of this problem:

Problem:

* Parquet module is crashed with {{-DCMAKE_BUILD_TYPE=MinSizeRel}}

Why the problem is happened:

* We compile the same code ({{level_conversion_inc.h}}) multiple times with 
different optimization flags such as {{-msse4.2}} and {{-mavx2}}
* The code calls the same function 
({{arrow::internal::FirstTimeBitmapWriter::AppendWord()}}) that is defined in 
header file
* The called function isn't inlined with {{-DCMAKE_BUILD_TYPE=MinSizeRel}}
* It generates multiple definitions for the called (not-inlined) function 
({{arrow::internal::FirstTimeBitmapWriter::AppendWord()}})

Proposed solutions so far:

# Force to inline functions that are called from the code that are compiled 
with SIMD related optimization flags
# Restrict SIMD related optimization area to only the target function

For 1., we have two approaches for it:
* Use template:
** 

[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-10-06 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17613789#comment-17613789
 ] 

Kouhei Sutou commented on ARROW-15678:
--

Yes. I will fix this before the 10.0.0 release. Sorry for not working on this 
yet.


> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Assignee: Kouhei Sutou
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-10-06 Thread Jonathan Keane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17613507#comment-17613507
 ] 

Jonathan Keane commented on ARROW-15678:


I thought that [~kou] was going to take a look at this (or at least the 
underlying multiple SIMD instruction ordering issue that causes the failures...)

The only update I have is that I continue to run into the segfault in CI for 
downstream projects I'm working on, so it continues to be an issue for 
pre-built libarrow on machines like github's macos runners. 

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-10-06 Thread Jira


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17613392#comment-17613392
 ] 

Raúl Cumplido commented on ARROW-15678:
---

This was a blocker for the last release and is still a blocker for the 10.0.0 
release. [~jonkeane] do you know if there has been any move?

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-25 Thread Jonathan Keane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571125#comment-17571125
 ] 

Jonathan Keane commented on ARROW-15678:


I have no updates beyond what's discussed above: there are a few approaches, 
none of them ideal, we need someone to champion this (or risk the homebrew 
maintainers turning off optimizations on us)

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 10.0.0
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-25 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571097#comment-17571097
 ] 

Krisztian Szucs commented on ARROW-15678:
-

Postponing to 10.0 for now.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-25 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570777#comment-17570777
 ] 

Krisztian Szucs commented on ARROW-15678:
-

[~jonkeane] can you give an update on this issue?

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-22 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570164#comment-17570164
 ] 

Antoine Pitrou commented on ARROW-15678:


Note that I suggested a perhaps more acceptable workaround above.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-22 Thread Jonathan Keane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570163#comment-17570163
 ] 

Jonathan Keane commented on ARROW-15678:


Homebrew only accepted that as a temporary workaround and has threatened to 
turn off optimizations if we don't resolve this. They haven't yet followed 
through yet, though.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-22 Thread Jacob Wujciak-Jens (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570126#comment-17570126
 ] 

Jacob Wujciak-Jens commented on ARROW-15678:


That was my impression: 
[issue|https://github.com/Homebrew/homebrew-core/issues/94724] and 
[PR|https://github.com/Homebrew/homebrew-core/pull/94958] in homebrew-core. 
Maybe [~jonkeane] can confirm?


> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-22 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569996#comment-17569996
 ] 

Antoine Pitrou commented on ARROW-15678:


If that was actually accepted by Homebrew then fine.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-22 Thread Jacob Wujciak-Jens (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569991#comment-17569991
 ] 

Jacob Wujciak-Jens commented on ARROW-15678:


Looking at [ARROW-15664] and this 
[PR|https://github.com/apache/arrow/pull/12364/files#diff-ca50d864d033146f9135f2fc25ae337322982dd340c6fa25b1efe9f0c02db870]
 it seems like a workaround has been implemented for homebrew IIUC, so this is 
still an issue but as the real fix wont happen for 9.0.0 it shouldn't be a 
blocker anymore?

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-22 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569968#comment-17569968
 ] 

Antoine Pitrou commented on ARROW-15678:


Ideally it would... But there's little chance for it to be fixed in time for 
9.0.0.

As I said above, the workaround should be to disable runtime SIMD optimizations 
on the affected builds. Somehow has to validate that suggestion, though (i.e. 
someone who's able to reproduce this issue).

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-22 Thread Jira


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17569933#comment-17569933
 ] 

Raúl Cumplido commented on ARROW-15678:
---

Is this still a blocker?

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-08 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564152#comment-17564152
 ] 

Antoine Pitrou commented on ARROW-15678:


Disabling all optimizations for Arrow is brutal. It would be *much* better to 
simply disable runtime SIMD optimizations (by passing 
{{-DARROW_RUNTIME_SIMD_LEVEL=NONE}} to CMake, AFAIR).

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-07 Thread Jonathan Keane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17564018#comment-17564018
 ] 

Jonathan Keane commented on ARROW-15678:


Last I checked, the homebrew maintainers have said that they will disable all 
optimization for arrow if we don't get this sorted on our own. So not required 
if we're ok with that (though we should engage with them on this)

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-07-07 Thread Ian Cook (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17563824#comment-17563824
 ] 

Ian Cook commented on ARROW-15678:
--

[~jonkeane] this issue is marked as a blocker for 9.0.0. Should this block the 
release?

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-05-25 Thread Ben Kietzman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542244#comment-17542244
 ] 

Ben Kietzman commented on ARROW-15678:
--

On further investigation, we can include immintrin.h with or without -mavx2 and 
clang at least will not complain unless the intrinsics are referenced, so

{{code}}
#include 

[[gnu::target("avx2")]]
void use_simd() {
  __m256i arg;
  _mm256_abs_epi16 (arg);
}

int main() { use_simd(); }
{{code}}

compiles and runs happily without any special compilation flags. Using an 
attribute like this seems viable provided we can be certain that the modified 
target isn't transitively applied to functions which might be invoked for the 
first time inside a SIMD enabled function

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-05-25 Thread Ben Kietzman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542235#comment-17542235
 ] 

Ben Kietzman commented on ARROW-15678:
--

IIUC, we'll still need to pass {{-mavx2}} so that we can include immintrin.h so 
the attribute described in the {{ARROW_SPECIALIZED_SIMD_TARGET}} approach would 
need to be attached to the *non*-SIMD functions to ensure that they're compiled 
with no special instructions

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-05-25 Thread Ben Kietzman (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542232#comment-17542232
 ] 

Ben Kietzman commented on ARROW-15678:
--

[~apitrou] The most robust solution I can think of is to avoid linking between 
objects with differing instruction sets altogether. We'd have something like
{code}
$ nm libarrow_compute_avx2.so | grep DefLevelsBitmapSimd
00404ff0 t DefLevelsBitmapSimd
{code}

That library would be acquired with {{dlopen(path, 
RTLD_LOCAL)/LoadLibrary(path)}} which would guarantee that any functions like 
{{FirstTimeBitmapWriter::*}} which might have been recompiled with illegal 
instructions are not available outside {{libarrow_compute_avx2.so}}.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-05-19 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539368#comment-17539368
 ] 

Antoine Pitrou commented on ARROW-15678:


[~kou] We can do that for the specific symptoms here. However, a more general 
solution will have to be found since other files have the same problem: 
compiling SIMD-specific code which calls into other routines.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-05-18 Thread Kouhei Sutou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539142#comment-17539142
 ] 

Kouhei Sutou commented on ARROW-15678:
--

How about using template to distinct implementation for each architecture?

{noformat}
diff --git a/cpp/src/arrow/compute/kernels/codegen_internal.h 
b/cpp/src/arrow/compute/kernels/codegen_internal.h
index fa50427bc3..a4bd0eb586 100644
--- a/cpp/src/arrow/compute/kernels/codegen_internal.h
+++ b/cpp/src/arrow/compute/kernels/codegen_internal.h
@@ -710,8 +710,8 @@ struct ScalarUnaryNotNullStateful {
Datum* out) {
   Status st = Status::OK();
   ArrayData* out_arr = out->mutable_array();
-  FirstTimeBitmapWriter out_writer(out_arr->buffers[1]->mutable_data(),
-   out_arr->offset, out_arr->length);
+  FirstTimeBitmapWriter<> out_writer(out_arr->buffers[1]->mutable_data(),
+ out_arr->offset, out_arr->length);
   VisitArrayValuesInline(
   arg0,
   [&](Arg0Value v) {
diff --git a/cpp/src/arrow/compute/kernels/row_encoder.cc 
b/cpp/src/arrow/compute/kernels/row_encoder.cc
index 10a1f4cda5..26316ec315 100644
--- a/cpp/src/arrow/compute/kernels/row_encoder.cc
+++ b/cpp/src/arrow/compute/kernels/row_encoder.cc
@@ -42,7 +42,7 @@ Status KeyEncoder::DecodeNulls(MemoryPool* pool, int32_t 
length, uint8_t** encod
 ARROW_ASSIGN_OR_RAISE(*null_bitmap, AllocateBitmap(length, pool));
 uint8_t* validity = (*null_bitmap)->mutable_data();
 
-FirstTimeBitmapWriter writer(validity, 0, length);
+FirstTimeBitmapWriter<> writer(validity, 0, length);
 for (int32_t i = 0; i < length; ++i) {
   if (encoded_bytes[i][0] == kValidByte) {
 writer.Set();
diff --git a/cpp/src/arrow/compute/kernels/scalar_set_lookup.cc 
b/cpp/src/arrow/compute/kernels/scalar_set_lookup.cc
index 7d8d2edc4b..433df0f1b7 100644
--- a/cpp/src/arrow/compute/kernels/scalar_set_lookup.cc
+++ b/cpp/src/arrow/compute/kernels/scalar_set_lookup.cc
@@ -353,8 +353,8 @@ struct IsInVisitor {
 const auto& state = checked_cast&>(*ctx->state());
 ArrayData* output = out->mutable_array();
 
-FirstTimeBitmapWriter writer(output->buffers[1]->mutable_data(), 
output->offset,
- output->length);
+FirstTimeBitmapWriter<> writer(output->buffers[1]->mutable_data(), 
output->offset,
+   output->length);
 
 VisitArrayDataInline(
 this->data,
diff --git a/cpp/src/arrow/compute/kernels/scalar_string_ascii.cc 
b/cpp/src/arrow/compute/kernels/scalar_string_ascii.cc
index 611601cab8..da7de1c277 100644
--- a/cpp/src/arrow/compute/kernels/scalar_string_ascii.cc
+++ b/cpp/src/arrow/compute/kernels/scalar_string_ascii.cc
@@ -1456,7 +1456,7 @@ struct MatchSubstringImpl {
 [](const void* raw_offsets, const uint8_t* data, int64_t 
length,
int64_t output_offset, uint8_t* output) {
   const offset_type* offsets = reinterpret_cast(raw_offsets);
-  FirstTimeBitmapWriter bitmap_writer(output, output_offset, length);
+  FirstTimeBitmapWriter<> bitmap_writer(output, output_offset, length);
   for (int64_t i = 0; i < length; ++i) {
 const char* current_data = reinterpret_cast(data + 
offsets[i]);
 int64_t current_length = offsets[i + 1] - offsets[i];
diff --git a/cpp/src/arrow/util/bit_util_benchmark.cc 
b/cpp/src/arrow/util/bit_util_benchmark.cc
index 258fd27785..66a81b4e04 100644
--- a/cpp/src/arrow/util/bit_util_benchmark.cc
+++ b/cpp/src/arrow/util/bit_util_benchmark.cc
@@ -386,7 +386,7 @@ static void BitmapWriter(benchmark::State& state) {
 }
 
 static void FirstTimeBitmapWriter(benchmark::State& state) {
-  BenchmarkBitmapWriter(state, 
state.range(0));
+  BenchmarkBitmapWriter>(state, 
state.range(0));
 }
 
 struct GenerateBitsFunctor {
diff --git a/cpp/src/arrow/util/bit_util_test.cc 
b/cpp/src/arrow/util/bit_util_test.cc
index 6c2aff4fbe..9b9f19feb1 100644
--- a/cpp/src/arrow/util/bit_util_test.cc
+++ b/cpp/src/arrow/util/bit_util_test.cc
@@ -832,14 +832,14 @@ TEST(FirstTimeBitmapWriter, NormalOperation) {
 const uint8_t fill_byte = static_cast(fill_byte_int);
 {
   uint8_t bitmap[] = {fill_byte, fill_byte, fill_byte, fill_byte};
-  auto writer = internal::FirstTimeBitmapWriter(bitmap, 0, 12);
+  auto writer = internal::FirstTimeBitmapWriter<>(bitmap, 0, 12);
   WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1});
   //  {0b00110110, 0b1010, 0, 0}
   ASSERT_BYTES_EQ(bitmap, {0x36, 0x0a});
 }
 {
   uint8_t bitmap[] = {fill_byte, fill_byte, fill_byte, fill_byte};
-  auto writer = internal::FirstTimeBitmapWriter(bitmap, 4, 12);
+  auto writer = internal::FirstTimeBitmapWriter<>(bitmap, 4, 12);
   WriteVectorToWriter(writer, {0, 1, 1, 0, 1, 

[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-05-18 Thread Jonathan Keane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539107#comment-17539107
 ] 

Jonathan Keane commented on ARROW-15678:


@kou Do you think you might be able to take a look at this?

The comment at 
https://github.com/apache/arrow/pull/12928#issuecomment-1105955726 has a good 
explanation of what's going on and following that there are a few possible 
fixes (though none of them were fully implemented or decided

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 9.0.0
>
>  Time Spent: 13h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-04-21 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526058#comment-17526058
 ] 

Weston Pace commented on ARROW-15678:
-

That seems like an good solution to me.  I had no idea it was possible.

If we want the other headers to be included then we already have a bit of a 
solution demonstrated in {{level_conversion_inc.h}}.  In the common header file 
you require some kind of `target` namespace to be defined.

{noformat}
namespace parquet {
namespace internal {
#ifndef PARQUET_IMPL_NAMESPACE
#error "PARQUET_IMPL_NAMESPACE must be defined"
#endif
namespace PARQUET_IMPL_NAMESPACE {
...
}  // namespace PARQUET_IMPL_NAMESPACE
}  // namespace internal
}  // namespace parquet
{noformat}

However, anything that includes one of these "common headers" must define that 
namespace...

{noformat}
#define PARQUET_IMPL_NAMESPACE standard
#include "parquet/level_conversion_inc.h"
#undef PARQUET_IMPL_NAMESPACE
{noformat}

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Assignee: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-04-21 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526055#comment-17526055
 ] 

Antoine Pitrou commented on ARROW-15678:


In any case, this is probably too involved a change for 8.0.0, so the 8.0.0 fix 
would simply to disable SIMD optimizations for Homebrew?

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Assignee: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-04-21 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526053#comment-17526053
 ] 

Antoine Pitrou commented on ARROW-15678:


So, currently we are doing something such as:
{code}
clang -c something_avx2.cc -mavx2 
{code}

An alternative would be not to pass the optimization flag on the command line 
but enable it selectively inside the source code, e.g.:
{code}
clang -c something_avx2.cc -DARROW_SPECIALIZED_SIMD_TARGET=avx2
{code}

{code:c++}
namespace parquet {
namespace internal {
namespace PARQUET_IMPL_NAMESPACE {

#ifdef ARROW_SPECIALIZED_SIMD_TARGET

#define STRINGIFY_EXPANDED(a) ARROW_STRINGIFY(a)
#pragma clang attribute push (__attribute__((target( 
STRINGIFY_EXPANDED(ARROW_SPECIALIZED_SIMD_TARGET)) )), apply_to=function)

#endif

...

#ifdef ARROW_SPECIALIZED_SIMD_TARGET
#pragma clang attribute pop
#endif

}  // namespace PARQUET_IMPL_NAMESPACE
}  // namespace internal
}  // namespace parquet
{code}


> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Assignee: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-04-21 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525996#comment-17525996
 ] 

Antoine Pitrou commented on ARROW-15678:


[~bkietz] You may have some idea about how to fix this cleanly and reliably.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Assignee: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-04-21 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525995#comment-17525995
 ] 

Antoine Pitrou commented on ARROW-15678:


Wow, thanks for the diagnosis [~westonpace].
So, it turns out that our method for compiling multiple versions of code is 
violating the one-definition-rule for any inline function or method.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Assignee: Jonathan Keane
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 8.0.0
>
>  Time Spent: 11h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-04-19 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17524443#comment-17524443
 ] 

Weston Pace commented on ARROW-15678:
-

We can perhaps add a static check that reports symbols outside the appropriate 
namespace.  We might need some configurable suppression.  For example, 
{{level_conversion_bmi2.cc.o}} would report:

{noformat}
 W arrow::util::ArrowLogBase& 
arrow::util::ArrowLogBase::operator<< (char const (&) [51])
 W std::__cxx11::basic_string, 
std::allocator > arrow::util::StringBuilder(char 
const (&) [33])
 W void arrow::util::StringBuilderRecursive(std::ostream&, char const (&) [33])
 W arrow::util::detail::StringStreamWrapper::stream()
 W arrow::util::Voidify::operator&(arrow::util::ArrowLogBase&)
 W arrow::util::Voidify::Voidify()
 W arrow::bit_util::BytesForBits(long)
 W arrow::internal::FirstTimeBitmapWriter::AppendWord(unsigned 
long, long)
 W arrow::internal::FirstTimeBitmapWriter::Finish()
 W 
arrow::internal::FirstTimeBitmapWriter::FirstTimeBitmapWriter(unsigned char*, 
long, long)
 W parquet::ParquetException::ParquetException(char const (&) [33])
 W parquet::ParquetException::~ParquetException()
 W parquet::ParquetException::~ParquetException()
 W arrow::internal::FirstTimeBitmapWriter::position() const
 W parquet::ParquetException::what() const
 W std::exception::exception()
 W char const (::forward(std::remove_reference::type&)) [33]
{noformat}

{{level_comparison_avx.cc.o}} looks to be in better shape:

{noformat}
 W short const& std::max(short const&, short const&)
 W short const& std::min(short const&, short const&)
{noformat}

But yes, if we have a better solution for this problem it might be safer.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-04-19 Thread David Li (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17524280#comment-17524280
 ] 

David Li commented on ARROW-15678:
--

Good catch. This is exactly the same problem we ran into before with kernels: 
[https://github.com/apache/arrow/blob/6c10a389bbc35b67187930dc0db2a88671e76c2d/cpp/src/arrow/compute/kernels/aggregate_internal.h#L135-L138]
 (ARROW-13382). I wonder if we should reconsider the plan of vectorizing 
kernels by rebuilding the same source multiple times given this potential 
pitfall.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-04-17 Thread Weston Pace (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17523302#comment-17523302
 ] 

Weston Pace commented on ARROW-15678:
-

Ok, so I was finally able to track this down.  Fortunately (unfortunately?) it 
is not really a compiler bug (or maybe it is, I'm not sure).  At the very 
least, I think we can avoid it.

{{level_comparison.cc}} is compiled with {{-msse4.2}}.

{{level_comparison_avx2.cc}} is compiled with {{-mavx2}}

This is expected and the functions they generate are housed in separate 
namespaces so they don't get confused.  However, both functions rely on the 
function arrow::internal::FirstTimeBitmapWriter::AppendWord.  The function is 
not templated but it is defined in the header file (and is not marked inline).  
I'm not really sure how we aren't getting a duplicate symbol error but some 
reading suggests it is implicitly inlined at link time.

In the object file (libparquet.a), there are two identical symbols named 
{{__ZN5arrow8internal21FirstTimeBitmapWriter10AppendWordEyx}}.  One of them has 
{{SHLX}} and one of them has {{SHL}}.  This disassembly of the {{SHLX}} version 
matches exactly the disassembly in the stack trace that [~jonkeane] posted in 
the PR.  The two calling functions are 
({{parquet::internal::standard::DefLevelsBatchToBitmap}} and 
{{parquet::internal::bmi2::DefLevelsBatchToBitmap}}.

So I think, the -O3 version is inlining the functions.  The -Os version is not 
(-Os seems to discourage inlining in general).  The linker is then faced with 
two identical symbols and just picks one (again, trying to optimize for size).  
It just so happens the version it picked was the one with {{SHLX}}.

So, as a test, we can try splitting the implementation part of 
{{bitmap_writer.h}} into {{bitmap_writer.cc}} (at least for 
{{FirstTimeBitmapWriter}}).  The .cc file should then only be compiled once 
(with sse4.2).  However, it's very possible we are just hitting the tip of the 
iceberg here, as any header file linked in by these avx2 compiled versions 
could be a ticking time bomb.

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (ARROW-15678) [C++][CI] a crossbow job with MinRelSize enabled

2022-03-02 Thread Jonathan Keane (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-15678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500305#comment-17500305
 ] 

Jonathan Keane commented on ARROW-15678:


The pull request linked has the starts of this — but there's still an 
unidentified segfault in one of the tests 

> [C++][CI] a crossbow job with MinRelSize enabled
> 
>
> Key: ARROW-15678
> URL: https://issues.apache.org/jira/browse/ARROW-15678
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Jonathan Keane
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 8h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)