Re: performance issue on big table join

俊杰陈 Wed, 01 Nov 2017 18:22:35 -0700

Thanks Hongxu,

Here are configurations on my cluster,  most of them are default values.
Which item do you think it may impact?


        ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0]
        ABORT_ON_ERROR: [0]
        ALLOW_UNSUPPORTED_FORMATS: [0]
        APPX_COUNT_DISTINCT: [0]
        BATCH_SIZE: [0]
        COMPRESSION_CODEC: [NONE]
        DEBUG_ACTION: []
        DEFAULT_ORDER_BY_LIMIT: [-1]
        DISABLE_CACHED_READS: [0]
        DISABLE_CODEGEN: [0]
        DISABLE_OUTERMOST_TOPN: [0]
        DISABLE_ROW_RUNTIME_FILTERING: [0]
        DISABLE_STREAMING_PREAGGREGATIONS: [0]
        DISABLE_UNSAFE_SPILLS: [0]
        ENABLE_EXPR_REWRITES: [1]
        EXEC_SINGLE_NODE_ROWS_THRESHOLD: [100]
        EXPLAIN_LEVEL: [1]
        HBASE_CACHE_BLOCKS: [0]
        HBASE_CACHING: [0]
        MAX_BLOCK_MGR_MEMORY: [0]
        MAX_ERRORS: [100]
        MAX_IO_BUFFERS: [0]
        MAX_NUM_RUNTIME_FILTERS: [10]
        MAX_SCAN_RANGE_LENGTH: [0]
        MEM_LIMIT: [0]
        MT_DOP: [0]
        NUM_NODES: [0]
        NUM_SCANNER_THREADS: [0]
        OPTIMIZE_PARTITION_KEY_SCANS: [0]
        PARQUET_ANNOTATE_STRINGS_UTF8: [0]
        PARQUET_FALLBACK_SCHEMA_RESOLUTION: [0]
        PARQUET_FILE_SIZE: [0]
        PREFETCH_MODE: [1]
        QUERY_TIMEOUT_S: [0]
        REPLICA_PREFERENCE: [0]
        REQUEST_POOL: []
        RESERVATION_REQUEST_TIMEOUT: [0]
        RM_INITIAL_MEM: [0]
        RUNTIME_BLOOM_FILTER_SIZE: [1048576]
        RUNTIME_FILTER_MAX_SIZE: [16777216]
        RUNTIME_FILTER_MIN_SIZE: [1048576]
        RUNTIME_FILTER_MODE: [2]
        RUNTIME_FILTER_WAIT_TIME_MS: [0]
        S3_SKIP_INSERT_STAGING: [1]
        SCAN_NODE_CODEGEN_THRESHOLD: [1800000]
        SCHEDULE_RANDOM_REPLICA: [0]
        SCRATCH_LIMIT: [-1]
        SEQ_COMPRESSION_MODE: [0]
        STRICT_MODE: [0]
        SUPPORT_START_OVER: [false]
        SYNC_DDL: [0]
        V_CPU_CORES: [0]

2017-10-31 15:30 GMT+08:00 Hongxu Ma <[email protected]>:

> Hi JJ
> Consider it only takes 3mins on SparkSQL, maybe there are some mistakes in
> query options.
> Try run "set;" in impala-shell and check all query options, e.g:
>     BATCH_SIZE: [0]
>     DISABLE_CODEGEN: [0]
>     RUNTIME_FILTER_MODE: GLOBAL
>
> Just a guess, thanks.
>
> 在 27/10/2017 10:25, 俊杰陈 写道:
> The profile file is damaged. Here is a screenshot for exec summary
> [cid:ii_j999ymep1_15f5ba563aeabb91]
> 
>
> 2017-10-27 10:04 GMT+08:00 俊杰陈 <[email protected]<mailto:cjj
> [email protected]>>:
> Hi Devs
>
> I met a performance issue on big table join. The query takes more than 3
> hours on Impala and only 3 minutes on Spark SQL on the same 5 nodes
> cluster. when running query,  the left scanner and exchange node are very
> slow.  Did I miss some key arguments?
>
> you can see profile file in attachment.
>
> [cid:ii_j9998pph2_15f5b92f2cf47020]
> 
> --
> Thanks & Best Regards
>
>
>
> --
> Thanks & Best Regards
>
>
> --
> Regards,
> Hongxu.
>



-- 
Thanks & Best Regards

Re: performance issue on big table join

Reply via email to