taiyang-li commented on PR #2321:
URL: https://github.com/apache/orc/pull/2321#issuecomment-3072725680
I tested it in clickhouse. Here are the results:
``` sql
Query: select concat('gluten ', cast(rand()%1000 as String)) from
numbers(10000000) into outfile 'dict.orc' truncate;
-- Enable dictionary encoding before optimization
set max_threads =1 ;
set output_format_orc_dictionary_key_size_threshold = 1;
10000000 rows in set. Elapsed: 0.760 sec. Processed 10.00 million rows,
80.00 MB (13.16 million rows/s., 105.31 MB/s.)
10000000 rows in set. Elapsed: 0.796 sec. Processed 10.00 million rows,
80.00 MB (12.56 million rows/s., 100.51 MB/s.)
10000000 rows in set. Elapsed: 0.803 sec. Processed 10.00 million rows,
80.00 MB (12.45 million rows/s., 99.63 MB/s.)
-- Enable dictionary encoding after optimization
set max_threads =1 ;
set output_format_orc_dictionary_key_size_threshold = 1;
10000000 rows in set. Elapsed: 0.645 sec. Processed 10.00 million rows,
80.00 MB (15.50 million rows/s., 124.00 MB/s.)
10000000 rows in set. Elapsed: 0.622 sec. Processed 10.00 million rows,
80.00 MB (16.08 million rows/s., 128.64 MB/s.)
10000000 rows in set. Elapsed: 0.631 sec. Processed 10.00 million rows,
80.00 MB (15.85 million rows/s., 126.83 MB/s.)
-- Disable dictionary encoding
set max_threads =1 ;
set output_format_orc_dictionary_key_size_threshold = 0;
10000000 rows in set. Elapsed: 0.707 sec. Processed 10.00 million rows,
80.00 MB (14.14 million rows/s., 113.15 MB/s.)
10000000 rows in set. Elapsed: 0.671 sec. Processed 10.00 million rows,
80.00 MB (14.90 million rows/s., 119.17 MB/s.)
10000000 rows in set. Elapsed: 0.727 sec. Processed 10.00 million rows,
80.00 MB (13.75 million rows/s., 109.98 MB/s.)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]