[jira] [Updated] (HIVE-20529) Statistics update in S3 is taking time at target side during REPL Load

2022-10-21 Thread Stamatis Zampetakis (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-20529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stamatis Zampetakis updated HIVE-20529:
---
Fix Version/s: (was: 4.0.0)

I cleared the fixVersion field since this ticket is still open. Please review 
this ticket and if the fix is already committed to a specific version please 
set the version accordingly and mark the ticket as RESOLVED.

According to the [JIRA 
guidelines|https://cwiki.apache.org/confluence/display/Hive/HowToContribute] 
the fixVersion should be set only when the issue is resolved/closed.

> Statistics update in S3 is taking time at target side during REPL Load
> --
>
> Key: HIVE-20529
> URL: https://issues.apache.org/jira/browse/HIVE-20529
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
>
> The statistics operations access the file system to get the number of files 
> created by the operation. In S3 it causes 2-3 seconds of delay. The file list 
> can be obtained from the event info in the replication directory and can be 
> used to update the statistics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HIVE-20529) Statistics update in S3 is taking time at target side during REPL Load

2018-09-10 Thread mahesh kumar behera (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-20529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-20529:
---
Description: The statistics operations access the file system to get the 
number of files created by the operation. In S3 it causes 2-3 seconds of delay. 
The file list can be obtained from the event info in the replication directory 
and can be used to update the statistics.  (was: Operations like insert and add 
partition creates a staging directory to generate the files and then move the 
files created to actual location. In replication flow, the files are first 
copied to the staging directory and then moved (rename) to the actual table 
location. In case of S3, move is not an atomic operation. It internally does a 
copy and delete. So it can not guarantee the consistency required. So it is 
better to copy the files directly to the actual location. This will help in 
avoiding the staging directory creation (which takes 1-2 seconds in s3) and 
move (which takes time proportional to file size).)

> Statistics update in S3 is taking time at target side during REPL Load
> --
>
> Key: HIVE-20529
> URL: https://issues.apache.org/jira/browse/HIVE-20529
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Assignee: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The statistics operations access the file system to get the number of files 
> created by the operation. In S3 it causes 2-3 seconds of delay. The file list 
> can be obtained from the event info in the replication directory and can be 
> used to update the statistics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)