Re: hive.query.reexecution.stats.persist.scope

2023-05-25 Thread Sungwoo Park
Hi Ayush,

Thank you for letting me know about HIVE-26978. To me, this bug seems like
a small price to pay for the huge benefit from persisting runtime
statistics. Setting the config to 'hiveserver' also seems like a good
compromise.

Thanks,

--- Sungwoo

On Thu, May 25, 2023 at 1:49 AM Ayush Saxena  wrote:

> Hi Sungwoo,
>
> I know one issue: if that config is set to Metastore in that case it does
> create some issues, those entries are persisted in the *RUNTIME_STATS *table
> and if you drop a table the entry still stays, so if you have some drop
> table and you recreate the table and shoot a query before 
> *RuntimeStatsCleanerTask
> *can clean the stale entry, your plans for the newer table will be
> screwed till then.
>
> There is a ticket for that here [1], most probably we need to find a way
> to drop those stats on drop table or make sure the newer tables can figure
> out the stats are stale, the catch is the *RUNTIME_STATS *doesn't have
> table name mapping so to do a drop that should be a good effort, I haven't
> spent much time investigating so there might be better ways as well.
>
> -Ayush
>
> [1] https://issues.apache.org/jira/browse/HIVE-26978
>
> On Wed, 24 May 2023 at 19:53, Sungwoo Park  wrote:
>
>> Hi Hive users,
>>
>> Hive can persist runtime statistics by setting
>> hive.query.reexecution.stats.persist.scope to 'hiveserver' or 'metastore'
>> (instead of the default value 'query'). If you have an experience of using
>> this configuration key in production, could you share it here? (Like the
>> stability of query execution, speed improvement, etc.)
>>
>> Thanks,
>>
>> --- Sungwoo
>>
>>


Re: hive.query.reexecution.stats.persist.scope

2023-05-24 Thread Ayush Saxena
Hi Sungwoo,

I know one issue: if that config is set to Metastore in that case it does
create some issues, those entries are persisted in the *RUNTIME_STATS *table
and if you drop a table the entry still stays, so if you have some drop
table and you recreate the table and shoot a query before
*RuntimeStatsCleanerTask
*can clean the stale entry, your plans for the newer table will be screwed
till then.

There is a ticket for that here [1], most probably we need to find a way to
drop those stats on drop table or make sure the newer tables can figure out
the stats are stale, the catch is the *RUNTIME_STATS *doesn't have table
name mapping so to do a drop that should be a good effort, I haven't spent
much time investigating so there might be better ways as well.

-Ayush

[1] https://issues.apache.org/jira/browse/HIVE-26978

On Wed, 24 May 2023 at 19:53, Sungwoo Park  wrote:

> Hi Hive users,
>
> Hive can persist runtime statistics by setting
> hive.query.reexecution.stats.persist.scope to 'hiveserver' or 'metastore'
> (instead of the default value 'query'). If you have an experience of using
> this configuration key in production, could you share it here? (Like the
> stability of query execution, speed improvement, etc.)
>
> Thanks,
>
> --- Sungwoo
>
>


hive.query.reexecution.stats.persist.scope

2023-05-24 Thread Sungwoo Park
Hi Hive users,

Hive can persist runtime statistics by setting
hive.query.reexecution.stats.persist.scope to 'hiveserver' or 'metastore'
(instead of the default value 'query'). If you have an experience of using
this configuration key in production, could you share it here? (Like the
stability of query execution, speed improvement, etc.)

Thanks,

--- Sungwoo