xumanbu opened a new issue, #11390:
URL: https://github.com/apache/incubator-gluten/issues/11390

   ### Backend
   
   VL (Velox)
   
   ### Bug description
   
   ### Environment
   - ** Architecture: aarch64 ( aws graviton2 neoverse-n1)
   - ** OS: Centos9
   - ** Gluten Version: 1.5.x
   - ** Build Type: Default build (without explicit CPU_TARGET override)
   
   
   ### Problem Description
   When building Gluten on aarch64 architecture with default settings, there is 
an inconsistency in the C++ compiler flags between Gluten's build script and 
Velox's build script. This inconsistency causes runtime errors during xsimd 
initialization due to CPU feature detection mismatch.
   ```
   Current thread (0x0000ffffac02d800):  JavaThread "main" [_thread_in_native, 
id=52, stack(0x0000ffffb0c38000,0x0000ffffb0e38000)]
   
   Stack: [0x0000ffffb0c38000,0x0000ffffb0e38000],  sp=0x0000ffffb0e31df0,  
free space=2023k
   Native frames: (J=compiled Java code, A=aot compiled Java code, 
j=interpreted, Vv=VM code, C=native code)
   C  [libvelox.so+0x2e86470]  facebook::velox::simd::initializeSimdUtil()+0x50
   C  [ld-linux-aarch64.so.1+0x5be4]  call_init+0xd4
   C  [ld-linux-aarch64.so.1+0x5cec]  _dl_init+0x7c
   C  [ld-linux-aarch64.so.1+0x2110]  _dl_catch_exception+0xe0
   C  [ld-linux-aarch64.so.1+0xc020]  dl_open_worker+0xe0
   C  [ld-linux-aarch64.so.1+0x2094]  _dl_catch_exception+0x64
   C  [ld-linux-aarch64.so.1+0xc45c]  _dl_open+0x98
   C  [libc.so.6+0x7d194]  dlopen_doit+0x64
   C  [ld-linux-aarch64.so.1+0x2094]  _dl_catch_exception+0x64
   C  [ld-linux-aarch64.so.1+0x21dc]  _dl_catch_error+0x2c
   C  [libc.so.6+0x7cc18]  _dlerror_run+0x88
   C  [libc.so.6+0x7d270]  dlopen+0x90
   V  [libjvm.so+0xa9c898]  os::Linux::dlopen_helper(char const*, char*, 
int)+0x28
   V  [libjvm.so+0xa9cbc4]  os::dll_load(char const*, char*, int)+0x74
   V  [libjvm.so+0x80adec]  JVM_LoadLibrary+0x9c
   C  [libjava.so+0xfa60]  
Java_java_lang_ClassLoader_00024NativeLibrary_load0+0x15c
   j  java.lang.ClassLoader$NativeLibrary.load0(Ljava/lang/String;ZZ)Z+0 
[email protected]
   ```
   
   ### Root Cause Analysis
   feat(build): Allow to build arm with common flags 
https://github.com/facebookincubator/velox/pull/14366, if wo build by default 
on arm cpu, did't set `ARM_BUILD_TARGET`,  may occur this issue.
   
   #### 1. Gluten's C++ Flags (`dev/build_helper_functions.sh:76-77`)
   ```bash
   "aarch64")
     echo -n "-mcpu=neoverse-n1 -std=c++17 $ADDITIONAL_FLAGS"
   ;;
   ```
   - **Default behavior**: Always uses `-mcpu=neoverse-n1`
   - **Target**: Specific Neoverse N1 CPU
   - **Optimization**: Optimized for Neoverse N1 instruction set and pipeline
   
   #### 2. Velox's C++ Flags 
(`ep/build-velox/build/velox_ep/scripts/setup-helper-functions.sh:142-180`)
   ```bash
   "aarch64")
     # Detect ARM CPU via MIDR_EL1 register
     if [ -f "$ARM_CPU_FILE" ] && [ "$ARM_BUILD_TARGET" = "local" ]; then
       # Detects specific CPU: neoverse-n1, neoverse-n2, neoverse-v1, 
neoverse-v2, etc.
       # ...
     else
       echo -n "-march=armv8-a+crc+crypto "  # Fallback for unknown CPUs
     fi
   ;;
   ```
   - **Default behavior**: Uses `-march=armv8-a+crc+crypto` when CPU detection 
file is not available
   - **Target**: Generic ARMv8-A architecture with CRC and crypto extensions
   - **Compatibility**: Works on all ARMv8-A CPUs
   
   ### Proposed Solution
   
   **Align Gluten's `dev/build_helper_functions.sh` with Velox's approach:**
   
   ```bash
   "aarch64")
     # Detect ARM CPU via MIDR_EL1 register
     ARM_CPU_FILE="/sys/devices/system/cpu/cpu0/regs/identification/midr_el1"
   
     if [ -f "$ARM_CPU_FILE" ]; then
       hex_ARM_CPU_DETECT=$(cat $ARM_CPU_FILE)
       ARM_CPU_PRODUCT=${hex_ARM_CPU_DETECT: -4:3}
   
       case "$ARM_CPU_PRODUCT" in
         "d0c") echo -n "-mcpu=neoverse-n1 -std=c++17 $ADDITIONAL_FLAGS" ;;
         "d49") echo -n "-mcpu=neoverse-n2 -std=c++17 $ADDITIONAL_FLAGS" ;;
         "d40") echo -n "-mcpu=neoverse-v1 -std=c++17 $ADDITIONAL_FLAGS" ;;
         "d4f") echo -n "-mcpu=neoverse-v2 -std=c++17 $ADDITIONAL_FLAGS" ;;
         *)     echo -n "-march=armv8-a+crc+crypto -std=c++17 
$ADDITIONAL_FLAGS" ;;
       esac
     else
       # Fallback to generic ARMv8-A for compatibility
       echo -n "-march=armv8-a+crc+crypto -std=c++17 $ADDITIONAL_FLAGS"
     fi
   ;;
   ```
   
   
   ### Gluten version
   
   Gluten-1.5
   
   ### Spark version
   
   None
   
   ### Spark configurations
   
   _No response_
   
   ### System information
   
   _No response_
   
   ### Relevant logs
   
   ```bash
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to