zclllyybb opened a new pull request, #63484:
URL: https://github.com/apache/doris/pull/63484
Root cause: md5/md5sum evaluated every row through Md5Digest and OpenSSL,
which leaves the vectorized string function path dominated by per-row scalar
digest setup and hex materialization.
Fix: add an AVX2 multi-buffer MD5 helper with scalar fallback, expose a
batch hex API, and route single-argument md5/md5sum over
ColumnString/ColumnVarbinary through the batch path while keeping
multi-argument md5sum and sm3 on the existing digest implementation.
test with sql:
```sql
SET parallel_pipeline_task_num=1;
SET enable_query_cache=false;
SELECT SUM(ASCII(SUBSTRING(MD5(CAST(number AS STRING)), 1, 1)))
FROM numbers("number" = "50000000");
```
result:
| version | times | avg | median |
|---|---:|---:|---:|
| upstream/master baseline | 8.59, 10.21, 9.52, 9.93, 8.85s | 9.42s | 9.52s |
| after AVX2 batch | 2.83, 2.84, 2.82, 2.79, 2.82s | 2.82s | 2.82s |
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]