[
https://issues.apache.org/jira/browse/ORC-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Taiyang Li updated ORC-1950:
----------------------------
Summary: [C++] Replace std::unorder_map with google dense_hash_map in
SortedStringDictionary and remove reorder to improve write performance of
dict-encoding columns (was: [C++] Replace std::unorder_map with google
dense_hash_map as SortedStringDictionary and remove reorder to improve write
performance of dict-encoding columns)
> [C++] Replace std::unorder_map with google dense_hash_map in
> SortedStringDictionary and remove reorder to improve write performance of
> dict-encoding columns
> ------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: ORC-1950
> URL: https://issues.apache.org/jira/browse/ORC-1950
> Project: ORC
> Issue Type: Bug
> Reporter: Taiyang Li
> Priority: Major
>
> Replace std::unorder_map with google dense_hash_map as SortedStringDictionary
> and remove reorder to improve write performance of dict-encoding columns
>
> POC:
> [https://github.com/bigo-sg/ClickHouse/commit/b9fc51fd8ded21f84f31cfa169350906b9f14456]
>
>
> baseline:
> {code:java}
> 2025-07-08T16:22:54+08:00
> Running ./build_gcc/src/Common/benchmarks/orc_string_dictionary
> Run on (96 X 2900 MHz CPU s)
> CPU Caches:
> L1 Data 32 KiB (x48)
> L1 Instruction 32 KiB (x48)
> L2 Unified 1024 KiB (x48)
> L3 Unified 36608 KiB (x2)
> Load Average: 27.44, 62.03, 43.39
> Benchmark Time
> CPU Iterations
> BM_writeStringDictionary<NewSortedStringDictionary, 10> 49801815 ns
> 49800922 ns 11
> BM_writeStringDictionary<NewSortedStringDictionary, 100> 60295648 ns
> 60294001 ns 12
> BM_writeStringDictionary<NewSortedStringDictionary, 1000> 73385081 ns
> 73383192 ns 10
> BM_writeStringDictionary<NewSortedStringDictionary, 10000> 121725939 ns
> 121642493 ns 6
> BM_writeStringDictionary<NewSortedStringDictionary, 100000> 232034759 ns
> 232031059 ns 3 {code}
> Opt1:
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)