Thanks Alex to reply again. Do we have plan to support multi-thread join/aggregation? Or it is intented to be single thread to maximum query throughput?
2017-11-03 0:32 GMT+08:00 Alexander Behm <alex.b...@cloudera.com>: > See my response on the other thread you started. The probe side of joins > are are executed in a single thread per host. Impala can run multiple > builds in parallel - but each build uses only a single thread. > A single query might not be able to max out your CPU, but most realistic > workloads run several queries concurrently. > > On Thu, Nov 2, 2017 at 12:22 AM, Hongxu Ma <inte...@outlook.com> wrote: > > > Thanks LL. Your query options look good. > > > > As Xu Cheng mentioned, I also noticed that Impala do hash join slowly in > > some big data situations. > > Very curious to the root cause. > > > > > > 在 02/11/2017 10:00, 俊杰陈 写道: > > > > +user list > > > > 2017-11-02 9:57 GMT+08:00 俊杰陈 <cjjnj...@gmail.com> <cjjnj...@gmail.com>: > > > > > > Hi Mostafa > > > > Cheng already put the profile in thread. > > > > Here is another profile for impala release version. you can also see the > > attachment. > > > > > > 2017-11-02 9:30 GMT+08:00 Mostafa Mokhtar <mmokh...@cloudera.com> < > mmokh...@cloudera.com>: > > > > > > Attaching the query profile will be most helpful to investigate this > > issue. > > > > If you can capture the profile from the WebUI on the coordinator node it > > would be great. > > > > On Wed, Nov 1, 2017 at 6:22 PM, 俊杰陈 <cjjnj...@gmail.com> < > cjjnj...@gmail.com> wrote: > > > > > > Thanks Hongxu, > > > > Here are configurations on my cluster, most of them are default values. > > Which item do you think it may impact? > > > > ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0] > > ABORT_ON_ERROR: [0] > > ALLOW_UNSUPPORTED_FORMATS: [0] > > APPX_COUNT_DISTINCT: [0] > > BATCH_SIZE: [0] > > COMPRESSION_CODEC: [NONE] > > DEBUG_ACTION: [] > > DEFAULT_ORDER_BY_LIMIT: [-1] > > DISABLE_CACHED_READS: [0] > > DISABLE_CODEGEN: [0] > > DISABLE_OUTERMOST_TOPN: [0] > > DISABLE_ROW_RUNTIME_FILTERING: [0] > > DISABLE_STREAMING_PREAGGREGATIONS: [0] > > DISABLE_UNSAFE_SPILLS: [0] > > ENABLE_EXPR_REWRITES: [1] > > EXEC_SINGLE_NODE_ROWS_THRESHOLD: [100] > > EXPLAIN_LEVEL: [1] > > HBASE_CACHE_BLOCKS: [0] > > HBASE_CACHING: [0] > > MAX_BLOCK_MGR_MEMORY: [0] > > MAX_ERRORS: [100] > > MAX_IO_BUFFERS: [0] > > MAX_NUM_RUNTIME_FILTERS: [10] > > MAX_SCAN_RANGE_LENGTH: [0] > > MEM_LIMIT: [0] > > MT_DOP: [0] > > NUM_NODES: [0] > > NUM_SCANNER_THREADS: [0] > > OPTIMIZE_PARTITION_KEY_SCANS: [0] > > PARQUET_ANNOTATE_STRINGS_UTF8: [0] > > PARQUET_FALLBACK_SCHEMA_RESOLUTION: [0] > > PARQUET_FILE_SIZE: [0] > > PREFETCH_MODE: [1] > > QUERY_TIMEOUT_S: [0] > > REPLICA_PREFERENCE: [0] > > REQUEST_POOL: [] > > RESERVATION_REQUEST_TIMEOUT: [0] > > RM_INITIAL_MEM: [0] > > RUNTIME_BLOOM_FILTER_SIZE: [1048576] > > RUNTIME_FILTER_MAX_SIZE: [16777216] > > RUNTIME_FILTER_MIN_SIZE: [1048576] > > RUNTIME_FILTER_MODE: [2] > > RUNTIME_FILTER_WAIT_TIME_MS: [0] > > S3_SKIP_INSERT_STAGING: [1] > > SCAN_NODE_CODEGEN_THRESHOLD: [1800000] > > SCHEDULE_RANDOM_REPLICA: [0] > > SCRATCH_LIMIT: [-1] > > SEQ_COMPRESSION_MODE: [0] > > STRICT_MODE: [0] > > SUPPORT_START_OVER: [false] > > SYNC_DDL: [0] > > V_CPU_CORES: [0] > > > > 2017-10-31 15:30 GMT+08:00 Hongxu Ma <inte...@outlook.com> < > inte...@outlook.com>: > > > > > > Hi JJ > > Consider it only takes 3mins on SparkSQL, maybe there are some > > > > mistakes > > > > in > > > > query options. > > Try run "set;" in impala-shell and check all query options, e.g: > > BATCH_SIZE: [0] > > DISABLE_CODEGEN: [0] > > RUNTIME_FILTER_MODE: GLOBAL > > > > Just a guess, thanks. > > > > 在 27/10/2017 10:25, 俊杰陈 写道: > > The profile file is damaged. Here is a screenshot for exec summary > > [cid:ii_j999ymep1_15f5ba563aeabb91] > > > > > > 2017-10-27 10:04 GMT+08:00 俊杰陈 <cjjnj...@gmail.com<mailto:cjj > > nj...@gmail.com> <cjjnj...@gmail.com>>: > > Hi Devs > > > > I met a performance issue on big table join. The query takes more > > > > than 3 > > > > hours on Impala and only 3 minutes on Spark SQL on the same 5 nodes > > cluster. when running query, the left scanner and exchange node are > > > > very > > > > slow. Did I miss some key arguments? > > > > you can see profile file in attachment. > > > > [cid:ii_j9998pph2_15f5b92f2cf47020] > > > > -- > > Thanks & Best Regards > > > > > > > > -- > > Thanks & Best Regards > > > > > > -- > > Regards, > > Hongxu. > > > > > > > > > > -- > > Thanks & Best Regards > > > > > > > > > > -- > > Thanks & Best Regards > > > > > > > > > > -- > > Regards, > > Hongxu. > > > > > -- Thanks & Best Regards