[jira] [Commented] (IMPALA-11192) test_scanner_fuzz.py runs super slow on ORC format

Quanlong Huang (Jira) Wed, 16 Mar 2022 19:51:04 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-11192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507957#comment-17507957
 ]


Quanlong Huang commented on IMPALA-11192:
-----------------------------------------

It seems like we don't need to invoke {{path_aware_copy_files_to_hdfs()}} for 
full-ACID ORC tables since there are no "_impala_insert_staging" dirs inside 
them.

I simply make a change like this:
{code:python}
diff --git a/tests/query_test/test_scanners_fuzz.py 
b/tests/query_test/test_scanners_fuzz.py
index 9c4b48a..0576132 100644
--- a/tests/query_test/test_scanners_fuzz.py
+++ b/tests/query_test/test_scanners_fuzz.py
@@ -191,7 +191,7 @@ class TestScannersFuzzing(ImpalaTestSuite):
       table_loc = self._get_table_location(fq_fuzz_table_name, vector)
       check_call(['hdfs', 'dfs', '-copyToLocal', table_loc + "/*", 
tmp_table_dir])
       partitions = self.walk_and_corrupt_table_data(tmp_table_dir, num_copies, 
rng)
-      self.path_aware_copy_files_to_hdfs(tmp_table_dir, table_loc)
+      self.filesystem_client.copy_from_local(tmp_table_dir, table_loc)
     else:
       self.execute_query("create table %s.%s like %s.%s" % (fuzz_db, 
fuzz_table,
           src_db, src_table))
{code}
Checking the query profiles, I can see succeeded queries scan the expected 
number of rows. So the directory layout is ok.
The test runs much faster after this. (2155s -> 503s)
Patch: https://gerrit.cloudera.org/c/18329/

> test_scanner_fuzz.py runs super slow on ORC format
> --------------------------------------------------
>
>                 Key: IMPALA-11192
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11192
>             Project: IMPALA
>          Issue Type: Test
>          Components: Infrastructure
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>
> I recently need to iterate test_scanner_fuzz.py multiple times and find it 
> takes more than 0.5h to run it once (only for ORC).
> {code:bash}
> $ time -p impala-py.test --skip_hbase --table_formats=orc/def/block 
> tests/query_test/test_scanners_fuzz.py
> real 2155.47
> user 2779.64
> sys 193.76
> {code}
> Looking into a Jenkins job, it shows that ORC tests are much slower than 
> other formats:
> ||Test name||Duration||Status||
> |test_fuzz_alltypes[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> avro/snap/block]|31 sec|Passed|
> |test_fuzz_alltypes[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> orc/def/block]|{color:#FF0000}2 min 1 sec{color}|Passed|
> |test_fuzz_alltypes[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> parquet/none]|35 sec|Passed|
> |test_fuzz_alltypes[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> text/none]|48 sec|Passed|
> |test_fuzz_alltypes[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> avro/snap/block]|40 sec|Passed|
> |test_fuzz_alltypes[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> orc/def/block]|{color:#FF0000}2 min 55 sec{color}|Passed|
> |test_fuzz_alltypes[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> parquet/none]|22 sec|Passed|
> |test_fuzz_alltypes[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> text/none]|29 sec|Passed|
> |test_fuzz_alltypes[... 'debug_action': None, \\| table_format: 
> avro/snap/block]|32 sec|Passed|
> |test_fuzz_alltypes[... 'debug_action': None, \\| table_format: 
> orc/def/block]|{color:#FF0000}3 min 25 sec{color}|Passed|
> |test_fuzz_alltypes[... 'debug_action': None, \\| table_format: 
> parquet/none]|29 sec|Passed|
> |test_fuzz_alltypes[... 'debug_action': None, \\| table_format: text/none]|20 
> sec|Passed|
> |test_fuzz_decimal_tbl[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> avro/snap/block]|20 sec|Passed|
> |test_fuzz_decimal_tbl[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> orc/def/block]|{color:#FF0000}1 min 35 sec{color}|Passed|
> |test_fuzz_decimal_tbl[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> parquet/none]|22 sec|Passed|
> |test_fuzz_decimal_tbl[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> text/none]|18 sec|Passed|
> |test_fuzz_decimal_tbl[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> avro/snap/block]|20 sec|Passed|
> |test_fuzz_decimal_tbl[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> orc/def/block]|{color:#FF0000}1 min 16 sec{color}|Passed|
> |test_fuzz_decimal_tbl[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> parquet/none]|17 sec|Passed|
> |test_fuzz_decimal_tbl[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> text/none]|16 sec|Passed|
> |test_fuzz_decimal_tbl[... 'debug_action': None \\| table_format: 
> avro/snap/block]|19 sec|Passed|
> |test_fuzz_decimal_tbl[... 'debug_action': None \\| table_format: 
> orc/def/block]|{color:#FF0000}1 min 4 sec{color}|Passed|
> |test_fuzz_decimal_tbl[... 'debug_action': None \\| table_format: 
> parquet/none]|22 sec|Passed|
> |test_fuzz_decimal_tbl[... 'debug_action': None \\| table_format: 
> text/none]|29 sec|Passed|
> |test_fuzz_nested_types[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> avro/snap/block]|4 sec|Skipped|
> |test_fuzz_nested_types[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> orc/def/block]|{color:#FF0000}1 min 35 sec{color}|Passed|
> |test_fuzz_nested_types[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> parquet/none]|18 sec|Passed|
> |test_fuzz_nested_types[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> text/none]|4.7 sec|Skipped|
> |test_fuzz_nested_types[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> avro/snap/block]|5.2 sec|Skipped|
> |test_fuzz_nested_types[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> orc/def/block]|{color:#FF0000}1 min 59 sec{color}|Passed|
> |test_fuzz_nested_types[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> parquet/none]|17 sec|Passed|
> |test_fuzz_nested_types[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> text/none]|4.3 sec|Skipped|
> |test_fuzz_nested_types[... 'debug_action': None \\| table_format: 
> avro/snap/block]|4.1 sec|Skipped|
> |test_fuzz_nested_types[... 'debug_action': None \\| table_format: 
> orc/def/block]|{color:#FF0000}2 min 30 sec{color}|Passed|
> |test_fuzz_nested_types[... 'debug_action': None \\| table_format: 
> parquet/none]|18 sec|Passed|
> |test_fuzz_nested_types[... 'debug_action': None \\| table_format: 
> text/none]|3.5 sec|Skipped|
> |test_fuzz_uncompressed_parquet_orc[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> avro/snap/block]|4 sec|Skipped|
> |test_fuzz_uncompressed_parquet_orc[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> orc/def/block]|{color:#FF0000}17 min{color}|Passed|
> |test_fuzz_uncompressed_parquet_orc[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> parquet/none]|20 sec|Passed|
> |test_fuzz_uncompressed_parquet_orc[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> text/none]|3.1 sec|Skipped|
> |test_fuzz_uncompressed_parquet_orc[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> avro/snap/block]|3.8 sec|Skipped|
> |test_fuzz_uncompressed_parquet_orc[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> orc/def/block]|{color:#FF0000}9 min 51 sec{color}|Passed|
> |test_fuzz_uncompressed_parquet_orc[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> parquet/none]|19 sec|Passed|
> |test_fuzz_uncompressed_parquet_orc[... 'debug_action': 
> '-1:OPEN:[email protected]' \\| table_format: 
> text/none]|4.4 sec|Skipped|
> |test_fuzz_uncompressed_parquet_orc[... 'debug_action': None \\| 
> table_format: avro/snap/block]|3.9 sec|Skipped|
> |test_fuzz_uncompressed_parquet_orc[... 'debug_action': None \\| 
> table_format: orc/def/block]|{color:#FF0000}9 min 27 sec{color}|Passed|
> |test_fuzz_uncompressed_parquet_orc[... 'debug_action': None \\| 
> table_format: parquet/none]|19 sec|Passed|
> |test_fuzz_uncompressed_parquet_orc[... 'debug_action': None \\| 
> table_format: text/none]|4.3 sec|Skipped|
> Tests on other formats take less than 1min for each. ORC tests usually takes 
> several minutes.
> CC [~boroknagyz]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-11192) test_scanner_fuzz.py runs super slow on ORC format

Reply via email to