[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16172208#comment-16172208
 ] 

Chris Douglas commented on HDFS-7878:
-------------------------------------

Any other feedback on the patch?

As an API for open-exact, one possible implementation could use the {{Options}} 
pattern used in FileContext and SequenceFile i.e.,
{code:java}
PathHandle FileStatus::getPathHandle(Options... opts);
{code}
which would imply {{open(PathHandle)}} instead of {{open(FileStatus)}}. A few 
folks have raised the idea that ignoring some fields in the {{FileStatus}} 
instance could be confusing (if the file were renamed, permissions/modfication 
time changed, etc.). Using {{PathHandle}} explicitly would make that clearer, 
and the Options API is one, generic way to surface different FileSystem 
capabilities. However, it would mean that serializing a generic {{FileStatus}} 
object may not be sufficient to implement the contract, unless its serialized 
form supports all possible Options. This could be handled by a method on 
FileSystem i.e.,
{code:java}
PathHandle FileSystem::getPathHandle(FileStatus status, Options... opts)
{code}
Which is cleaner in some respects, particularly if guaranteeing an option 
requires an RPC to get/set state in the FileSystem. It also implies that the 
{{PathHandle}} is sufficient to serialize across processes. It does nothing to 
prevent crossing {{PathHandle}} instances across FileSystems, unless the 
FileSystem serialized a guard on each instance.

This is all shuffling around a few APIs; the functionality is similar. Setting 
a default of open-exact is probably what most users expect, and what most 
FileSystems (S3, WASB) will implement. [~sershe], could you be more explicit 
about the use in Hive? Do you need open-by-inodeID to resolve to any version of 
the file?

It's worth mentioning that this spec is incomplete. Even if the open includes 
guards, the stream is still subject to whatever the FIleSystem supports. So a 
consistent open could still see stale/updated state.

/cc [~anu], [~andrew.wang], [~ste...@apache.org]

> API - expose an unique file identifier
> --------------------------------------
>
>                 Key: HDFS-7878
>                 URL: https://issues.apache.org/jira/browse/HDFS-7878
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>              Labels: BB2015-05-TBR
>         Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, 
> HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, 
> HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, 
> HDFS-7878.12.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to