zclllyybb opened a new pull request, #63484:
URL: https://github.com/apache/doris/pull/63484

   Root cause: md5/md5sum evaluated every row through Md5Digest and OpenSSL, 
which leaves the vectorized string function path dominated by per-row scalar 
digest setup and hex materialization.
   
   Fix: add an AVX2 multi-buffer MD5 helper with scalar fallback, expose a 
batch hex API, and route single-argument md5/md5sum over 
ColumnString/ColumnVarbinary through the batch path while keeping 
multi-argument md5sum and sm3 on the existing digest implementation.
   
   test with sql:
   ```sql
   SET parallel_pipeline_task_num=1;
   SET enable_query_cache=false;
   SELECT SUM(ASCII(SUBSTRING(MD5(CAST(number AS STRING)), 1, 1)))
   FROM numbers("number" = "50000000");
   ```
   
   result:
   
   | version | times | avg | median |
   |---|---:|---:|---:|
   | upstream/master baseline | 8.59, 10.21, 9.52, 9.93, 8.85s | 9.42s | 9.52s |
   | after AVX2 batch | 2.83, 2.84, 2.82, 2.79, 2.82s | 2.82s | 2.82s |


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to