Thanks Hongxu, Here are configurations on my cluster, most of them are default values. Which item do you think it may impact?
ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0] ABORT_ON_ERROR: [0] ALLOW_UNSUPPORTED_FORMATS: [0] APPX_COUNT_DISTINCT: [0] BATCH_SIZE: [0] COMPRESSION_CODEC: [NONE] DEBUG_ACTION: [] DEFAULT_ORDER_BY_LIMIT: [-1] DISABLE_CACHED_READS: [0] DISABLE_CODEGEN: [0] DISABLE_OUTERMOST_TOPN: [0] DISABLE_ROW_RUNTIME_FILTERING: [0] DISABLE_STREAMING_PREAGGREGATIONS: [0] DISABLE_UNSAFE_SPILLS: [0] ENABLE_EXPR_REWRITES: [1] EXEC_SINGLE_NODE_ROWS_THRESHOLD: [100] EXPLAIN_LEVEL: [1] HBASE_CACHE_BLOCKS: [0] HBASE_CACHING: [0] MAX_BLOCK_MGR_MEMORY: [0] MAX_ERRORS: [100] MAX_IO_BUFFERS: [0] MAX_NUM_RUNTIME_FILTERS: [10] MAX_SCAN_RANGE_LENGTH: [0] MEM_LIMIT: [0] MT_DOP: [0] NUM_NODES: [0] NUM_SCANNER_THREADS: [0] OPTIMIZE_PARTITION_KEY_SCANS: [0] PARQUET_ANNOTATE_STRINGS_UTF8: [0] PARQUET_FALLBACK_SCHEMA_RESOLUTION: [0] PARQUET_FILE_SIZE: [0] PREFETCH_MODE: [1] QUERY_TIMEOUT_S: [0] REPLICA_PREFERENCE: [0] REQUEST_POOL: [] RESERVATION_REQUEST_TIMEOUT: [0] RM_INITIAL_MEM: [0] RUNTIME_BLOOM_FILTER_SIZE: [1048576] RUNTIME_FILTER_MAX_SIZE: [16777216] RUNTIME_FILTER_MIN_SIZE: [1048576] RUNTIME_FILTER_MODE: [2] RUNTIME_FILTER_WAIT_TIME_MS: [0] S3_SKIP_INSERT_STAGING: [1] SCAN_NODE_CODEGEN_THRESHOLD: [1800000] SCHEDULE_RANDOM_REPLICA: [0] SCRATCH_LIMIT: [-1] SEQ_COMPRESSION_MODE: [0] STRICT_MODE: [0] SUPPORT_START_OVER: [false] SYNC_DDL: [0] V_CPU_CORES: [0] 2017-10-31 15:30 GMT+08:00 Hongxu Ma <inte...@outlook.com>: > Hi JJ > Consider it only takes 3mins on SparkSQL, maybe there are some mistakes in > query options. > Try run "set;" in impala-shell and check all query options, e.g: > BATCH_SIZE: [0] > DISABLE_CODEGEN: [0] > RUNTIME_FILTER_MODE: GLOBAL > > Just a guess, thanks. > > 在 27/10/2017 10:25, 俊杰陈 写道: > The profile file is damaged. Here is a screenshot for exec summary > [cid:ii_j999ymep1_15f5ba563aeabb91] > > > 2017-10-27 10:04 GMT+08:00 俊杰陈 <cjjnj...@gmail.com<mailto:cjj > nj...@gmail.com>>: > Hi Devs > > I met a performance issue on big table join. The query takes more than 3 > hours on Impala and only 3 minutes on Spark SQL on the same 5 nodes > cluster. when running query, the left scanner and exchange node are very > slow. Did I miss some key arguments? > > you can see profile file in attachment. > > [cid:ii_j9998pph2_15f5b92f2cf47020] > > -- > Thanks & Best Regards > > > > -- > Thanks & Best Regards > > > -- > Regards, > Hongxu. > -- Thanks & Best Regards