linfey90 commented on PR #5824:
URL: https://github.com/apache/iceberg/pull/5824#issuecomment-1366402542

   Hilbert  has better data aggregation.Here is a simple performance test.
   1: prepare a parquet table which has One hundred million rows, and 11 
columns.
   and has two column name c1 and c2.the values is range from 0 to 500000.
   the flinksql like,
   CREATE TABLE default_catalog.default_database.dg (
       c1 INT,
        c2 bigint
        c3 VARCHAR,
        c4 VARCHAR,
        c5 TINYINT,
        c6 SMALLINT,
        c7 FLOAT,
        c8 double,
        c9 char,
        c10 boolean,
        c11 AS localtimestamp
   ) WITH (
       'connector' = 'datagen',
        'fields.c3.length' = '10',
        'fields.c4.length' = '10',
        'fields.c1.min' = '0',
        'fields.c1.max' = '1000000',
        'fields.c2.min' = '0',
        'fields.c2.max' = '1000000',
       'rows-per-second' = '30000',
        'number-of-rows' ='100000000'
   );
   2: Create two tables, test_zorder and test_hilbert, and copy the above data.
   3: rewrite the table by sort c1,c2 with zorder and hilbert.
   4: Write code to view the number of file skips, and execute the sql like 
select count.
   |query condition         | table           | file skip  | total Files | file 
Skip percentage  | query time |
   | ------------- |:-------------:| -----:| -----:| -----:| -----:|
   | c1 <500000 and c2 < 500000      | hilbert | 97 | 171 | 56.7% | 1.018s |
   |       | zorder      |   82 | 180 | 45.56% | 1.353s |
   | c1 >500000 and c2 > 500000 | hilbert | 28 | 171 | 16.37% | 3.337s |
   |  | zorder | 18 | 180 | 10% | 3.37s |
   
   note:The query time depends on the cluster environment and is for reference 
only. But file skip is stable.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to