See my response on the other thread you started. The probe side of joins are are executed in a single thread per host. Impala can run multiple builds in parallel - but each build uses only a single thread. A single query might not be able to max out your CPU, but most realistic workloads run several queries concurrently.
On Thu, Nov 2, 2017 at 12:22 AM, Hongxu Ma <inte...@outlook.com> wrote: > Thanks LL. Your query options look good. > > As Xu Cheng mentioned, I also noticed that Impala do hash join slowly in > some big data situations. > Very curious to the root cause. > > > 在 02/11/2017 10:00, 俊杰陈 写道: > > +user list > > 2017-11-02 9:57 GMT+08:00 俊杰陈 <cjjnj...@gmail.com> <cjjnj...@gmail.com>: > > > Hi Mostafa > > Cheng already put the profile in thread. > > Here is another profile for impala release version. you can also see the > attachment. > > > 2017-11-02 9:30 GMT+08:00 Mostafa Mokhtar <mmokh...@cloudera.com> > <mmokh...@cloudera.com>: > > > Attaching the query profile will be most helpful to investigate this > issue. > > If you can capture the profile from the WebUI on the coordinator node it > would be great. > > On Wed, Nov 1, 2017 at 6:22 PM, 俊杰陈 <cjjnj...@gmail.com> <cjjnj...@gmail.com> > wrote: > > > Thanks Hongxu, > > Here are configurations on my cluster, most of them are default values. > Which item do you think it may impact? > > ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0] > ABORT_ON_ERROR: [0] > ALLOW_UNSUPPORTED_FORMATS: [0] > APPX_COUNT_DISTINCT: [0] > BATCH_SIZE: [0] > COMPRESSION_CODEC: [NONE] > DEBUG_ACTION: [] > DEFAULT_ORDER_BY_LIMIT: [-1] > DISABLE_CACHED_READS: [0] > DISABLE_CODEGEN: [0] > DISABLE_OUTERMOST_TOPN: [0] > DISABLE_ROW_RUNTIME_FILTERING: [0] > DISABLE_STREAMING_PREAGGREGATIONS: [0] > DISABLE_UNSAFE_SPILLS: [0] > ENABLE_EXPR_REWRITES: [1] > EXEC_SINGLE_NODE_ROWS_THRESHOLD: [100] > EXPLAIN_LEVEL: [1] > HBASE_CACHE_BLOCKS: [0] > HBASE_CACHING: [0] > MAX_BLOCK_MGR_MEMORY: [0] > MAX_ERRORS: [100] > MAX_IO_BUFFERS: [0] > MAX_NUM_RUNTIME_FILTERS: [10] > MAX_SCAN_RANGE_LENGTH: [0] > MEM_LIMIT: [0] > MT_DOP: [0] > NUM_NODES: [0] > NUM_SCANNER_THREADS: [0] > OPTIMIZE_PARTITION_KEY_SCANS: [0] > PARQUET_ANNOTATE_STRINGS_UTF8: [0] > PARQUET_FALLBACK_SCHEMA_RESOLUTION: [0] > PARQUET_FILE_SIZE: [0] > PREFETCH_MODE: [1] > QUERY_TIMEOUT_S: [0] > REPLICA_PREFERENCE: [0] > REQUEST_POOL: [] > RESERVATION_REQUEST_TIMEOUT: [0] > RM_INITIAL_MEM: [0] > RUNTIME_BLOOM_FILTER_SIZE: [1048576] > RUNTIME_FILTER_MAX_SIZE: [16777216] > RUNTIME_FILTER_MIN_SIZE: [1048576] > RUNTIME_FILTER_MODE: [2] > RUNTIME_FILTER_WAIT_TIME_MS: [0] > S3_SKIP_INSERT_STAGING: [1] > SCAN_NODE_CODEGEN_THRESHOLD: [1800000] > SCHEDULE_RANDOM_REPLICA: [0] > SCRATCH_LIMIT: [-1] > SEQ_COMPRESSION_MODE: [0] > STRICT_MODE: [0] > SUPPORT_START_OVER: [false] > SYNC_DDL: [0] > V_CPU_CORES: [0] > > 2017-10-31 15:30 GMT+08:00 Hongxu Ma <inte...@outlook.com> > <inte...@outlook.com>: > > > Hi JJ > Consider it only takes 3mins on SparkSQL, maybe there are some > > mistakes > > in > > query options. > Try run "set;" in impala-shell and check all query options, e.g: > BATCH_SIZE: [0] > DISABLE_CODEGEN: [0] > RUNTIME_FILTER_MODE: GLOBAL > > Just a guess, thanks. > > 在 27/10/2017 10:25, 俊杰陈 写道: > The profile file is damaged. Here is a screenshot for exec summary > [cid:ii_j999ymep1_15f5ba563aeabb91] > > > 2017-10-27 10:04 GMT+08:00 俊杰陈 <cjjnj...@gmail.com<mailto:cjj > nj...@gmail.com> <cjjnj...@gmail.com>>: > Hi Devs > > I met a performance issue on big table join. The query takes more > > than 3 > > hours on Impala and only 3 minutes on Spark SQL on the same 5 nodes > cluster. when running query, the left scanner and exchange node are > > very > > slow. Did I miss some key arguments? > > you can see profile file in attachment. > > [cid:ii_j9998pph2_15f5b92f2cf47020] > > -- > Thanks & Best Regards > > > > -- > Thanks & Best Regards > > > -- > Regards, > Hongxu. > > > > > -- > Thanks & Best Regards > > > > > -- > Thanks & Best Regards > > > > > -- > Regards, > Hongxu. > >