Issue 184302
Summary [WebAssembly][Fast-ISel] generates inefficient shift sequence for extending i8/i16 to i32
Labels new issue
Assignees
Reporter ParkHanbum
    ### Description
When using FastISel (`-fast-isel`) target for WebAssembly, reading values from `i8` or `i16` variables into `i32` outputs requires sign extension operations(`sext`) lowering. Currently, these extensions are systematically converted into an unnecessary string of sequential bitwise variations (an unsigned load followed by `shl` and `shr_s`).

For `wasm-32`, WebAssembly has specifically engineered and natively built operations explicitly for this logic transformation: `i32.8_s` and `i16.i32_s` respectively. Modifying compiler rules to generate and directly incorporate these load instructions directly handles operations inside a single, compact instructions compared to processing it systematically over sequential shifting lines of variables.

This adjustment improves compile time, resulting codes are processed rapidly decreasing bytecode size footprint resulting in a smoother user flow. 

### Steps to Reproduce
Set up an environment mapping, create a mock target LLVM configuration file (For example, `sext.ll`) enclosing below instructions:

```llvm
target datalayout = "e-m:e-p:32:32-i64:64-n32:64-S128"
target triple = "wasm32-unknown-unknown"

define i32 @sext_i16_i32(ptr %p) {
  %v = load atomic i16, ptr %p seq_cst, align 2
  %e = sext i16 %v to i32
  ret i32 %e
}

define i32 @sext_i8_i32(ptr %p) {
  %v = load atomic i8, ptr %p seq_cst, align 1
  %e = sext i8 %v to i32
  ret i32 %e
}
```

Implement FastLiselOut against previously initialized configurations.
Current Output Trace (wasm-fast-isel_out_unoptimal_seq_log1)

Right now outputs present inefficiencies as variables translate utilizing sequentially parsed shift mechanisms, generating operations mimicking standard Right Bit Arithmetic.

```wasm
sext_i16_i32:                           # @sext_i16_i32
        .functype       sext_i16_i32 (i32) -> (i32)
 local.get       0
        i32.load16_u    0
; -- inefficient translation rules ---
        i32.const       16
        i32.shl
        i32.const 16
        i32.shr_s
; ------------------------------------
 end_function
        
        
sext_i8_i32:                            # @sext_i8_i32
        .functype       sext_i8_i32 (i32) -> (i32)
 local.get       0
        i32.load8_u     0
; -- inefficient translation rules ---
        i32.const       24  
        i32.shl
        i32.const 24
        i32.shr_s
; ------------------------------------
 end_function
```

Ideal Behavior Trace

Updates and refactoring should point and output target logic utilizing WebAssembly instructions: i32.8_s and i16.i32_s. These instructions explicitly fold native bit mappings appropriately adjusting extensions overriding top bit thresholds removing unnecessary logic operations from the translation tree:
```wasm
test_sext_i8:
    local.get 0
    i32.load8_s 0   # <--- Optimized Single Pass Fold
    end_function

test_sext_i16:
    local.get 0
    i32.load16_s 0  # <--- Optimized Single Pass Fold
 end_function
```

Reference Context Parameter Log
- Target Architecture Configuration: WebAssembly (wasm32)
- Core File Affected Component : WebAssemblyFastISel.cpp
- Test case exist : load-ext.ll

Current implementation within SelectSExt redirects parameters parsing through standard sequential logic expansion defaults. Modifying rulesets mapping should actively parse instruction flags targeting directly natively integrated elements directly utilizing WebAssembly load8_s and load16_s.


_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to