Re: performance issue on big table join

俊杰陈 Wed, 01 Nov 2017 19:00:48 -0700

+user list

2017-11-02 9:57 GMT+08:00 俊杰陈 <[email protected]>:


> Hi Mostafa
>
> Cheng already put the profile in thread.
>
> Here is another profile for impala release version. you can also see the
> attachment.
>
>
> 2017-11-02 9:30 GMT+08:00 Mostafa Mokhtar <[email protected]>:
>
>> Attaching the query profile will be most helpful to investigate this
>> issue.
>>
>> If you can capture the profile from the WebUI on the coordinator node it
>> would be great.
>>
>> On Wed, Nov 1, 2017 at 6:22 PM, 俊杰陈 <[email protected]> wrote:
>>
>> > Thanks Hongxu,
>> >
>> > Here are configurations on my cluster,  most of them are default values.
>> > Which item do you think it may impact?
>> >
>> >         ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0]
>> >         ABORT_ON_ERROR: [0]
>> >         ALLOW_UNSUPPORTED_FORMATS: [0]
>> >         APPX_COUNT_DISTINCT: [0]
>> >         BATCH_SIZE: [0]
>> >         COMPRESSION_CODEC: [NONE]
>> >         DEBUG_ACTION: []
>> >         DEFAULT_ORDER_BY_LIMIT: [-1]
>> >         DISABLE_CACHED_READS: [0]
>> >         DISABLE_CODEGEN: [0]
>> >         DISABLE_OUTERMOST_TOPN: [0]
>> >         DISABLE_ROW_RUNTIME_FILTERING: [0]
>> >         DISABLE_STREAMING_PREAGGREGATIONS: [0]
>> >         DISABLE_UNSAFE_SPILLS: [0]
>> >         ENABLE_EXPR_REWRITES: [1]
>> >         EXEC_SINGLE_NODE_ROWS_THRESHOLD: [100]
>> >         EXPLAIN_LEVEL: [1]
>> >         HBASE_CACHE_BLOCKS: [0]
>> >         HBASE_CACHING: [0]
>> >         MAX_BLOCK_MGR_MEMORY: [0]
>> >         MAX_ERRORS: [100]
>> >         MAX_IO_BUFFERS: [0]
>> >         MAX_NUM_RUNTIME_FILTERS: [10]
>> >         MAX_SCAN_RANGE_LENGTH: [0]
>> >         MEM_LIMIT: [0]
>> >         MT_DOP: [0]
>> >         NUM_NODES: [0]
>> >         NUM_SCANNER_THREADS: [0]
>> >         OPTIMIZE_PARTITION_KEY_SCANS: [0]
>> >         PARQUET_ANNOTATE_STRINGS_UTF8: [0]
>> >         PARQUET_FALLBACK_SCHEMA_RESOLUTION: [0]
>> >         PARQUET_FILE_SIZE: [0]
>> >         PREFETCH_MODE: [1]
>> >         QUERY_TIMEOUT_S: [0]
>> >         REPLICA_PREFERENCE: [0]
>> >         REQUEST_POOL: []
>> >         RESERVATION_REQUEST_TIMEOUT: [0]
>> >         RM_INITIAL_MEM: [0]
>> >         RUNTIME_BLOOM_FILTER_SIZE: [1048576]
>> >         RUNTIME_FILTER_MAX_SIZE: [16777216]
>> >         RUNTIME_FILTER_MIN_SIZE: [1048576]
>> >         RUNTIME_FILTER_MODE: [2]
>> >         RUNTIME_FILTER_WAIT_TIME_MS: [0]
>> >         S3_SKIP_INSERT_STAGING: [1]
>> >         SCAN_NODE_CODEGEN_THRESHOLD: [1800000]
>> >         SCHEDULE_RANDOM_REPLICA: [0]
>> >         SCRATCH_LIMIT: [-1]
>> >         SEQ_COMPRESSION_MODE: [0]
>> >         STRICT_MODE: [0]
>> >         SUPPORT_START_OVER: [false]
>> >         SYNC_DDL: [0]
>> >         V_CPU_CORES: [0]
>> >
>> > 2017-10-31 15:30 GMT+08:00 Hongxu Ma <[email protected]>:
>> >
>> > > Hi JJ
>> > > Consider it only takes 3mins on SparkSQL, maybe there are some
>> mistakes
>> > in
>> > > query options.
>> > > Try run "set;" in impala-shell and check all query options, e.g:
>> > >     BATCH_SIZE: [0]
>> > >     DISABLE_CODEGEN: [0]
>> > >     RUNTIME_FILTER_MODE: GLOBAL
>> > >
>> > > Just a guess, thanks.
>> > >
>> > > 在 27/10/2017 10:25, 俊杰陈 写道:
>> > > The profile file is damaged. Here is a screenshot for exec summary
>> > > [cid:ii_j999ymep1_15f5ba563aeabb91]
>> > > 
>> > >
>> > > 2017-10-27 10:04 GMT+08:00 俊杰陈 <[email protected]<mailto:cjj
>> > > [email protected]>>:
>> > > Hi Devs
>> > >
>> > > I met a performance issue on big table join. The query takes more
>> than 3
>> > > hours on Impala and only 3 minutes on Spark SQL on the same 5 nodes
>> > > cluster. when running query,  the left scanner and exchange node are
>> very
>> > > slow.  Did I miss some key arguments?
>> > >
>> > > you can see profile file in attachment.
>> > >
>> > > [cid:ii_j9998pph2_15f5b92f2cf47020]
>> > > 
>> > > --
>> > > Thanks & Best Regards
>> > >
>> > >
>> > >
>> > > --
>> > > Thanks & Best Regards
>> > >
>> > >
>> > > --
>> > > Regards,
>> > > Hongxu.
>> > >
>> >
>> >
>> >
>> > --
>> > Thanks & Best Regards
>> >
>>
>
>
>
> --
> Thanks & Best Regards
>



-- 
Thanks & Best Regards

Re: performance issue on big table join

Reply via email to