Re: performance issue on big table join

俊杰陈 Thu, 02 Nov 2017 18:18:07 -0700

Thanks Alex to reply again.

Do we have plan to support multi-thread join/aggregation?  Or it is
intented to be single thread to maximum query throughput?




2017-11-03 0:32 GMT+08:00 Alexander Behm <[email protected]>:

> See my response on the other thread you started. The probe side of joins
> are are executed in a single thread per host. Impala can run multiple
> builds in parallel - but each build uses only a single thread.
> A single query might not be able to max out your CPU, but most realistic
> workloads run several queries concurrently.
>
> On Thu, Nov 2, 2017 at 12:22 AM, Hongxu Ma <[email protected]> wrote:
>
> > Thanks LL. Your query options look good.
> >
> > As Xu Cheng mentioned, I also noticed that Impala do hash join slowly in
> > some big data situations.
> > Very curious to the root cause.
> >
> >
> > 在 02/11/2017 10:00, 俊杰陈 写道:
> >
> > +user list
> >
> > 2017-11-02 9:57 GMT+08:00 俊杰陈 <[email protected]> <[email protected]>:
> >
> >
> > Hi Mostafa
> >
> > Cheng already put the profile in thread.
> >
> > Here is another profile for impala release version. you can also see the
> > attachment.
> >
> >
> > 2017-11-02 9:30 GMT+08:00 Mostafa Mokhtar <[email protected]> <
> [email protected]>:
> >
> >
> > Attaching the query profile will be most helpful to investigate this
> > issue.
> >
> > If you can capture the profile from the WebUI on the coordinator node it
> > would be great.
> >
> > On Wed, Nov 1, 2017 at 6:22 PM, 俊杰陈 <[email protected]> <
> [email protected]> wrote:
> >
> >
> > Thanks Hongxu,
> >
> > Here are configurations on my cluster,  most of them are default values.
> > Which item do you think it may impact?
> >
> >         ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0]
> >         ABORT_ON_ERROR: [0]
> >         ALLOW_UNSUPPORTED_FORMATS: [0]
> >         APPX_COUNT_DISTINCT: [0]
> >         BATCH_SIZE: [0]
> >         COMPRESSION_CODEC: [NONE]
> >         DEBUG_ACTION: []
> >         DEFAULT_ORDER_BY_LIMIT: [-1]
> >         DISABLE_CACHED_READS: [0]
> >         DISABLE_CODEGEN: [0]
> >         DISABLE_OUTERMOST_TOPN: [0]
> >         DISABLE_ROW_RUNTIME_FILTERING: [0]
> >         DISABLE_STREAMING_PREAGGREGATIONS: [0]
> >         DISABLE_UNSAFE_SPILLS: [0]
> >         ENABLE_EXPR_REWRITES: [1]
> >         EXEC_SINGLE_NODE_ROWS_THRESHOLD: [100]
> >         EXPLAIN_LEVEL: [1]
> >         HBASE_CACHE_BLOCKS: [0]
> >         HBASE_CACHING: [0]
> >         MAX_BLOCK_MGR_MEMORY: [0]
> >         MAX_ERRORS: [100]
> >         MAX_IO_BUFFERS: [0]
> >         MAX_NUM_RUNTIME_FILTERS: [10]
> >         MAX_SCAN_RANGE_LENGTH: [0]
> >         MEM_LIMIT: [0]
> >         MT_DOP: [0]
> >         NUM_NODES: [0]
> >         NUM_SCANNER_THREADS: [0]
> >         OPTIMIZE_PARTITION_KEY_SCANS: [0]
> >         PARQUET_ANNOTATE_STRINGS_UTF8: [0]
> >         PARQUET_FALLBACK_SCHEMA_RESOLUTION: [0]
> >         PARQUET_FILE_SIZE: [0]
> >         PREFETCH_MODE: [1]
> >         QUERY_TIMEOUT_S: [0]
> >         REPLICA_PREFERENCE: [0]
> >         REQUEST_POOL: []
> >         RESERVATION_REQUEST_TIMEOUT: [0]
> >         RM_INITIAL_MEM: [0]
> >         RUNTIME_BLOOM_FILTER_SIZE: [1048576]
> >         RUNTIME_FILTER_MAX_SIZE: [16777216]
> >         RUNTIME_FILTER_MIN_SIZE: [1048576]
> >         RUNTIME_FILTER_MODE: [2]
> >         RUNTIME_FILTER_WAIT_TIME_MS: [0]
> >         S3_SKIP_INSERT_STAGING: [1]
> >         SCAN_NODE_CODEGEN_THRESHOLD: [1800000]
> >         SCHEDULE_RANDOM_REPLICA: [0]
> >         SCRATCH_LIMIT: [-1]
> >         SEQ_COMPRESSION_MODE: [0]
> >         STRICT_MODE: [0]
> >         SUPPORT_START_OVER: [false]
> >         SYNC_DDL: [0]
> >         V_CPU_CORES: [0]
> >
> > 2017-10-31 15:30 GMT+08:00 Hongxu Ma <[email protected]> <
> [email protected]>:
> >
> >
> > Hi JJ
> > Consider it only takes 3mins on SparkSQL, maybe there are some
> >
> > mistakes
> >
> > in
> >
> > query options.
> > Try run "set;" in impala-shell and check all query options, e.g:
> >     BATCH_SIZE: [0]
> >     DISABLE_CODEGEN: [0]
> >     RUNTIME_FILTER_MODE: GLOBAL
> >
> > Just a guess, thanks.
> >
> > 在 27/10/2017 10:25, 俊杰陈 写道:
> > The profile file is damaged. Here is a screenshot for exec summary
> > [cid:ii_j999ymep1_15f5ba563aeabb91]
> > 
> >
> > 2017-10-27 10:04 GMT+08:00 俊杰陈 <[email protected]<mailto:cjj
> > [email protected]> <[email protected]>>:
> > Hi Devs
> >
> > I met a performance issue on big table join. The query takes more
> >
> > than 3
> >
> > hours on Impala and only 3 minutes on Spark SQL on the same 5 nodes
> > cluster. when running query,  the left scanner and exchange node are
> >
> > very
> >
> > slow.  Did I miss some key arguments?
> >
> > you can see profile file in attachment.
> >
> > [cid:ii_j9998pph2_15f5b92f2cf47020]
> > 
> > --
> > Thanks & Best Regards
> >
> >
> >
> > --
> > Thanks & Best Regards
> >
> >
> > --
> > Regards,
> > Hongxu.
> >
> >
> >
> >
> > --
> > Thanks & Best Regards
> >
> >
> >
> >
> > --
> > Thanks & Best Regards
> >
> >
> >
> >
> > --
> > Regards,
> > Hongxu.
> >
> >
>



-- 
Thanks & Best Regards

Re: performance issue on big table join

Reply via email to