xumanbu opened a new issue, #11390: URL: https://github.com/apache/incubator-gluten/issues/11390
### Backend VL (Velox) ### Bug description ### Environment - ** Architecture: aarch64 ( aws graviton2 neoverse-n1) - ** OS: Centos9 - ** Gluten Version: 1.5.x - ** Build Type: Default build (without explicit CPU_TARGET override) ### Problem Description When building Gluten on aarch64 architecture with default settings, there is an inconsistency in the C++ compiler flags between Gluten's build script and Velox's build script. This inconsistency causes runtime errors during xsimd initialization due to CPU feature detection mismatch. ``` Current thread (0x0000ffffac02d800): JavaThread "main" [_thread_in_native, id=52, stack(0x0000ffffb0c38000,0x0000ffffb0e38000)] Stack: [0x0000ffffb0c38000,0x0000ffffb0e38000], sp=0x0000ffffb0e31df0, free space=2023k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) C [libvelox.so+0x2e86470] facebook::velox::simd::initializeSimdUtil()+0x50 C [ld-linux-aarch64.so.1+0x5be4] call_init+0xd4 C [ld-linux-aarch64.so.1+0x5cec] _dl_init+0x7c C [ld-linux-aarch64.so.1+0x2110] _dl_catch_exception+0xe0 C [ld-linux-aarch64.so.1+0xc020] dl_open_worker+0xe0 C [ld-linux-aarch64.so.1+0x2094] _dl_catch_exception+0x64 C [ld-linux-aarch64.so.1+0xc45c] _dl_open+0x98 C [libc.so.6+0x7d194] dlopen_doit+0x64 C [ld-linux-aarch64.so.1+0x2094] _dl_catch_exception+0x64 C [ld-linux-aarch64.so.1+0x21dc] _dl_catch_error+0x2c C [libc.so.6+0x7cc18] _dlerror_run+0x88 C [libc.so.6+0x7d270] dlopen+0x90 V [libjvm.so+0xa9c898] os::Linux::dlopen_helper(char const*, char*, int)+0x28 V [libjvm.so+0xa9cbc4] os::dll_load(char const*, char*, int)+0x74 V [libjvm.so+0x80adec] JVM_LoadLibrary+0x9c C [libjava.so+0xfa60] Java_java_lang_ClassLoader_00024NativeLibrary_load0+0x15c j java.lang.ClassLoader$NativeLibrary.load0(Ljava/lang/String;ZZ)Z+0 [email protected] ``` ### Root Cause Analysis feat(build): Allow to build arm with common flags https://github.com/facebookincubator/velox/pull/14366, if wo build by default on arm cpu, did't set `ARM_BUILD_TARGET`, may occur this issue. #### 1. Gluten's C++ Flags (`dev/build_helper_functions.sh:76-77`) ```bash "aarch64") echo -n "-mcpu=neoverse-n1 -std=c++17 $ADDITIONAL_FLAGS" ;; ``` - **Default behavior**: Always uses `-mcpu=neoverse-n1` - **Target**: Specific Neoverse N1 CPU - **Optimization**: Optimized for Neoverse N1 instruction set and pipeline #### 2. Velox's C++ Flags (`ep/build-velox/build/velox_ep/scripts/setup-helper-functions.sh:142-180`) ```bash "aarch64") # Detect ARM CPU via MIDR_EL1 register if [ -f "$ARM_CPU_FILE" ] && [ "$ARM_BUILD_TARGET" = "local" ]; then # Detects specific CPU: neoverse-n1, neoverse-n2, neoverse-v1, neoverse-v2, etc. # ... else echo -n "-march=armv8-a+crc+crypto " # Fallback for unknown CPUs fi ;; ``` - **Default behavior**: Uses `-march=armv8-a+crc+crypto` when CPU detection file is not available - **Target**: Generic ARMv8-A architecture with CRC and crypto extensions - **Compatibility**: Works on all ARMv8-A CPUs ### Proposed Solution **Align Gluten's `dev/build_helper_functions.sh` with Velox's approach:** ```bash "aarch64") # Detect ARM CPU via MIDR_EL1 register ARM_CPU_FILE="/sys/devices/system/cpu/cpu0/regs/identification/midr_el1" if [ -f "$ARM_CPU_FILE" ]; then hex_ARM_CPU_DETECT=$(cat $ARM_CPU_FILE) ARM_CPU_PRODUCT=${hex_ARM_CPU_DETECT: -4:3} case "$ARM_CPU_PRODUCT" in "d0c") echo -n "-mcpu=neoverse-n1 -std=c++17 $ADDITIONAL_FLAGS" ;; "d49") echo -n "-mcpu=neoverse-n2 -std=c++17 $ADDITIONAL_FLAGS" ;; "d40") echo -n "-mcpu=neoverse-v1 -std=c++17 $ADDITIONAL_FLAGS" ;; "d4f") echo -n "-mcpu=neoverse-v2 -std=c++17 $ADDITIONAL_FLAGS" ;; *) echo -n "-march=armv8-a+crc+crypto -std=c++17 $ADDITIONAL_FLAGS" ;; esac else # Fallback to generic ARMv8-A for compatibility echo -n "-march=armv8-a+crc+crypto -std=c++17 $ADDITIONAL_FLAGS" fi ;; ``` ### Gluten version Gluten-1.5 ### Spark version None ### Spark configurations _No response_ ### System information _No response_ ### Relevant logs ```bash ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
