[ 
https://issues.apache.org/jira/browse/HADOOP-12079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581058#comment-14581058
 ] 

Gil Vernik commented on HADOOP-12079:
-------------------------------------

Thanks for reviewing this. 
Right, this is the case where we should have X-Newest. As you said, It actually 
needed when we write the same file name over an existing file and the new file 
is so large that not all replicas are updated on the same time.Then if we 
perform immediate GET -  Swift may land on replica that still not updated.  
Another use for X-Newest, if we write the same file name again and one of the 
replicas failed to update ( for some reason ). But it's not majority of use 
cases, at least for me, so in my cases new files that overwrite existing ones 
are basically the same size and thus all replicas are updated at about the same 
time. I also don't have scripts that writes and overwrites files... Just normal 
user operations, where they write a file and access it later. 
The unittest you mention is exactly the one that need X-Newest, since it send 
one request after another without any delay.

Another case:  is when data already exists in Swift. We than use Hadoop to 
access and analyze it. In this case X-Newest will just force Swift to access 
all replicas, so one GET from Hadoop will force Swift to perform too many 
internal requests ( as number of replicas ) and it greatly affects performance.

I will implement your suggestions and will submit another patch.
Can you please point me where is the documentation that i should update?

> Make 'X-Newest' header a configurable
> -------------------------------------
>
>                 Key: HADOOP-12079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12079
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/swift
>    Affects Versions: 3.0.0, 2.6.0
>            Reporter: Gil Vernik
>            Assignee: Gil Vernik
>             Fix For: 3.0.0, 2.6.1
>
>         Attachments: x-newest-optional0001.patch, 
> x-newest-optional0002.patch, x-newest-optional0003.patch
>
>
> Current code always sends X-Newest header to Swift. While it's true that 
> Swift is eventual consistent and X-Newest will always get the newest version 
> from Swift, in practice this header will make Swift response very slow. 
> This header should be configured as an optional, so that it will be possible 
> to access Swift without this header and get much better performance. 
> This patch doesn't modify current behavior. All is working as is, but there 
> is an option to provide fs.swift.service.useXNewest = false. 
> Some background on Swift and X-Newest: 
> When a GET or HEAD request is made to an object, the default behavior is to 
> get the data from one of the replicas (could be any of them). The downside to 
> this is that if there are older versions of the object (due to eventual 
> consistency) it is possible to get an older version of the object. The upside 
> is that the for the majority of use cases, this isn't an issue. For the small 
> subset of use cases that need to make sure that they get the latest version 
> of the object, they can set the "X-Newest" header to "True". If this is set, 
> the proxy server will check all replicas of the object and only return the 
> newest object. The downside to this is that the request can take longer, 
> since it has to contact all the replicas. It is also more expensive for the 
> backend, so only recommended when it is absolutely needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to