zeroshade commented on code in PR #37785:
URL: https://github.com/apache/arrow/pull/37785#discussion_r1333184383


##########
go/parquet/internal/bmi/bitmap_neon_arm64.s:
##########
@@ -27,8 +27,8 @@ LBB0_2:
     MOVD R0, res+16(FP)
     RET
 LBB0_4:
-    WORD $0xaa1f03e0 // mov    x0, xzr
     WORD $0xa8c17bfd // ldp    x29, x30, [sp], #16
+    MOVD ZR, res+16(FP)

Review Comment:
   @cyb70289 I'd be fine with removing the arm assembly implementation of the 
pext function. I haven't benchmarked the lookup table vs the ARM assembly 
version yet. but I agree the lookup table would likely be faster.
   
   The other issue i'm having is the bit packing side of things, though I think 
i may have figured out a solution to it... and i've learned more about ARM64 
assembly in the last few days than i ever intended on learning LOL as I've 
needed to do a lot of handcoding assembly to get this to work. Looks like the 
issue boils down to needing anything with a label to be explicit in the 
assembly rather than using the `WORD`/`BYTE` syntax to pass them to the 
processor stream so that the labels get translated to the correct PC locations.
   
   > There should be a well-defined way to generate the assembly, so that other 
people can later participate in maintenance.
   
   I agree! I was a little dissappointed that @guyuqi didn't add explicit steps 
to how he generated the arm64 assembly but wanted to benefit from it, so that 
was my mistake to accept that PR so long ago. Currently I'm trying to 
explicitly define (and automate as much as possible) with the generation based 
on when i figure out what works and solves this issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to