subject:"\[jira\] \[Commented\] \(HDFS\-3370\) HDFS hardlink"

[jira] [Commented] (HDFS-3370) HDFS hardlink

2013-05-14 Thread Michael Segel (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13657202#comment-13657202
]

Michael Segel commented on HDFS-3370:
--

Is this still active? The last entry seems to be on Sept 12th... has there
been any progress?
While the underlying issue is HDFS, it seems that this is more about HBase than
HDFS.

One comment... with respect to hard links over multiple name spaces... why?

I mean hardlinks should exist only within the same namespace, which would
remove this roadblock.

If you were going between namespaces, then use a symbolic link.

Thx...

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

We'd like to add a new feature hardlink to HDFS that allows harlinked files
to share data without copying. Currently we will support hardlinking only
closed files, but it could be extended to unclosed files as well.
Among many potential use cases of the feature, the following two are
primarily used in facebook:
1. This provides a lightweight way for applications like hbase to create a
snapshot;
2. This also allows an application like Hive to move a table to a different
directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-09-12 Thread Jagane Sundar (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454144#comment-13454144
]

Jagane Sundar commented on HDFS-3370:
-

Thanks for the pointer to HBASE-6055, Jesse. I just skimmed it, but it is an
excellent write-up you have there. Your rationalization for the use of HBASE
Timestamps versus actual Point in Time is well taken. My own experience is with
writing software to backup a single running VM, so my previous comment did talk
about an actual PIT.

I did not catch this in my skimming of HBASE-6055, so maybe you can clarify -
when using HBASE Timestamps to create backups, can we guarantee that the next
backup will include all PUTs that were made after the previous snapshot? No
PUTs will fall through the cracks, right?

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-09-12 Thread Jesse Yates (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13454170#comment-13454170
]

Jesse Yates commented on HDFS-3370:
---

@Jagane - in short, yes. With the PIT split, any writes up to that point will
go into the snapshot. Obviously, we can't ensure that future writes beyond the
taking of the snapshot end up in the snapshot. Some writes can get dropped
between snapshots though if you don't have your TTLs set correctly, since a
compaction can age-off the writes before the snapshot can be taken. This is
part of an overall backup solution, and not really the concern of the mechanism
for taking snapshots - that's up to you :) Feel free to DM me if you want to
chat more.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-09-11 Thread Jagane Sundar (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453183#comment-13453183
]

Jagane Sundar commented on HDFS-3370:
-

Pardon my naive question - but, are hard links adequate for the purposes of
HBase backup? The first line in this JIRA says This provides a lightweight way
for applications like hbase to create a snapshot.

Perhaps HBase experts can answer this question: Are single file hard links
adequate for HBase backup? Don't you want a Point In Time snapshot of the
entire filesystem, or at least all the files under the HBase data directory?

Don't you really want a sequence of events such as:
1. Flush all HBase MemStores
2. Quiesce HBase, i.e. get it to stop writing to HDFS
3. Call underlying HDFS to create PIT RO Snapshot with COW Semantics
4. Tell HBase to end quiesce, i.e. it can start writing to HDFS again
5. Backup program now reads from RO snapshot and writes to backup device, while
HBase continues to write to the real directory tree
6. When the backup program is done, it deletes the RO snapshot

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-09-11 Thread Jesse Yates (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453667#comment-13453667
]

Jesse Yates commented on HDFS-3370:
---

@Jaganar - with HBASE-6055 (currently in review) you get a flush (more or less
coordinated between regionservers - see the jira for more info) of the memstore
to HFiles, which we would then _love_ to hardlink into the snapshot directory.
HFiles live under the the region directory - which lives under the column
family and table directories - where the HFile is being served. When a
comapction occurs, the file is moved to the .archive directory. Currently, we
are getting around the hardlink issue by referencing the HFiles by name and
then using a FileLink (also in review) to deal with the file getting archived
out from under us when we restore the table.

The current implementation of snapshots in HBase is pretty close to what you
are proposing (and almost identical for 'globally consistent' - cross-server
consistent- snapshots, but those quiesce for far too long to ensure
consistency), but spends minimal time blocking.

In short, hardlinks make snapshotting easier, but we still need both parts to
get 'clean' restores. Otherwise, we need to do a WAL replay from the COW
version of the WAL to get back in-memory state.

Does that make sense/answer your question?

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-07-13 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13413519#comment-13413519
 ] 

Sanjay Radia commented on HDFS-3370:


Is there any reason to allow cross-namespace hardlinks? Why not just return 
EXDEV or equivalent?...
I agree. Unix does not allow hardlinks across volumes.

 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-07-04 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406322#comment-13406322
]

Konstantin Shvachko commented on HDFS-3370:
---

to leverage ZooKeeper

Correct. With ZK you get all the necessary coordination of the distributed
updates. Plus you can store ref counts in ZNodes - no need for special inodes.
In the end HardLinks is not the goal itself, but a tool to do e.g. HBase
snapshots.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-07-03 Thread Jesse Yates (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405654#comment-13405654
]

Jesse Yates commented on HDFS-3370:
---

Sorry for the slow reply, been a bit busy of late...
@Daryn
bq. Retaining ref-counted paths after deletion in the origin namespace requires
an inode id. A new api to reference paths based on the id is required. We
aren't so soft anymore...

That's why I'd argue for doing it in file meta-data with periodic rewrites so
we can just do appends. We will still need to maintain references if we do
hardlinks, so this is just a single method call to do the update - arguably a
pretty simple code path that doesn't need to be that highly optimized for
multi-writers since we can argue that hardlinks are rare.

bq. The inode id needs to be secured since it bypasses all parent dir
permissions,

Yeah, thats a bit of a pain... Maybe a bit more metadate to store with the
file...?

@Konstantin
bq. Do I understand correctly that your hidden inodes can be regular HDFS
files, and that then the whole implementation can be done on top of existing
HDFS, as a stand alone library supporting calls

Yeah, I guess that's a possibility. But you would probably need to have some
sort of namespace managers to deal with handling hardlinks across different
namespaces, which fits comfortably with the distributed namenode design.

bq. ref-counted links, creating hidden only accessible to the namenode
inodes, leases on arbitrated NN ownership, retention of deleted files with
non-zero ref count, etc. Those aren't client-side operations.

Since you keep the data along with the file (including the current file owner),
you could do it all from a library. However, since the lease needs to be
periodically regained, you will see temporary unavailability in the hardlinked
files in the managed namespace. If you couple the hardlink management with the
namenode managing the space, you can then do forced-resassign of the hardlinks
to the back-up namendoe and still see the availability as you would files in
that namespace, in terms of creating new hardlinks (reads would still work
since all the important data can be replicated across the different namespaces).

@Andy: I don't know if I've seen a compelling reason that we _need_ to have
cross-namespace hardlinks, particularly since they are _hard_, to say the
least.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-07-03 Thread Jesse Yates (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405655#comment-13405655
 ] 

Jesse Yates commented on HDFS-3370:
---

Another, simpler way to do hardlinks with cross-server coordination (which in 
reality needs something like Paxos, or suffer some more unavailability to 
ensure consistency) would be to leverage ZooKeeper. Yes, -1 for another piece 
of infrastructure from this, but if does provide all the cross-namespace 
transactionality we need and make reference counting and security management 
significantly easier. Not quite client-library easy, but pretty darn close :)

 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-07-03 Thread Andy Isaacson (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406011#comment-13406011
]

Andy Isaacson commented on HDFS-3370:
-

{quote}
you keep the data along with the file (including the current file owner), you
could do it all from a library.
{quote}
The fundamental problem with do it in a client library is, there are always
clients who do not or cannot use the library. Then the symlinks don't work for
those clients. I think it's pretty clear that open(2) transparently folows
symlinks unless you ask it not to is a superior developer experience to open
the file, is it a .LNK? then follow the link else return. Even if there's a
helper implementing the latter.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-07-02 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405142#comment-13405142
]

Konstantin Shvachko commented on HDFS-3370:
---

Jesse, thanks for the detailed proposal. It totally addresses the complexity of
issues related to hard links implementation in distributed environment.
Do I understand correctly that your hidden inodes can be regular HDFS files,
and that then the whole implementation can be done on top of existing HDFS, as
a stand alone library supporting calls, like createHardLink(),
deleteHardLink(). The applications then will use this methods if they want the
functionality.
Just trying to answer Sanjay's questions using your design as an example.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-07-02 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405195#comment-13405195
]

Daryn Sharp commented on HDFS-3370:
---

Jesse describes NNs proxying requests to each other to create and manage the
ref-counted links, creating hidden only accessible to the namenode inodes,
leases on arbitrated NN ownership, retention of deleted files with non-zero ref
count, etc. Those aren't client-side operations.

Hardlinks cannot be implemented with a client library. The best you can hope
for on the client-side is managed symlinks that are advisory in nature.
Clients not using the library will ruin the scheme.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-07-02 Thread Andy Isaacson (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405228#comment-13405228
 ] 

Andy Isaacson commented on HDFS-3370:
-

The Windows .lnk file scheme is a pretty awful disaster, I hope that we don't 
implement a similar scheme in HDFS.  I don't know of an example of a 
client-side shortcut scheme that worked out well (though I'd be interested to 
hear of any examples).

Is there any reason to allow cross-namespace hardlinks?  Why not just return 
EXDEV or equivalent? As an even more restrictive example, AFS only permits 
hardlinks within a single directory (not even between subdirectories).

So long as failures are clearly communicated, it seems to me that it's OK to 
have a pretty restrictive implementation.

 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-27 Thread Jesse Yates (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402620#comment-13402620
 ] 

Jesse Yates commented on HDFS-3370:
---

I'd like to propose an alternative to 'real' hardlinkes: reference counted 
soft-Links, or all the hardness you really need in a distributed FS.

In this implementation of hard links, I would propose that wherever the file 
is created is considered the owner of that file. Initially, when created, the 
file has a reference count of (1) on the local namespace. If you want another 
hardlink to the file in the same namespace, you then talk to the NN and request 
another handle to that file, which implicitly updates the references to the 
file. The reference to that file could be stored in memory (and journaled) or 
written as part of the file metadata (more on that later, but lets ignore that 
for the moment). 

Suppose instead that you are in a separate namespace and want a hardlink to the 
file in the original namespace. Then you would make a request to your NN (NNa) 
for a hardlink. Since NNa doesn't own the file you want to reference, it makes 
a hardlink request to NN which originally created the file, the file 'owner' 
(or NNb). NNb then says 'Cool, I've got your request and increment the 
ref-count for the file. NNa can then grant your request and give you a link to 
that file. The failure case here is either
1) NNb goes down, in which case you can just keep around the reference requests 
and batch them when NNb comes back up.
2) NNa goes down mid-request - if NNa doesn't recieve an ACK back for the 
granted request, it can then disregard that request and re-decrement the count 
for that hardlink. 

Deleting the hardlink then follows a similar process. You issue a request to 
the owner NN, either directly from the client if you are deleting a link in the 
current namespace or through a proxy NN to the original namenode. It then 
decrements the reference count on the file and allows the deletion of the link. 
If the reference count ever hits 0, then the NN also deletes the file since 
there are no valid references to that file. 

This has the implicit implication though that the file will not be visible in 
the namespace that created it if all the hardlinks to it are removed. This 
means it essentially becomes a 'hidden' inode. We could, in the future, also 
work out a mechanism to transfer the hidden inode to a NN that has valid 
references to it (maybe via a gossip-style protocol), but that would be out of 
the current scope.

There are some implications for this model. If the owner NN manages the 
ref-count in memory, if that NN goes down, its whole namespace then becomes 
inaccessible, including _creating new hardlinks_ to any of the files (inodes) 
that it owns. However, the owner NN going down doesn't preclude the other NN 
from serving the file from their own 'soft' inodes. 

Alternatively, the NN could have a lock on the a hardlinked file, with the 
ref-counts and ownership info in the file metadata. This might introduce some 
overhead when creating new hardlinks (you need to reopen and modify the block 
or write a new block with the new information periodically - this latter 
actually opens a route to do ref-count management via appends to a file-ref 
file), but has the added advantage that if the owner NN crashed, an alternative 
NN could some and claim ownership of that file. This is similar to doing Paxos 
style leader-election for a given hardlinked file combined with leader-leases. 
However, this very unlikely to see lots of fluctuation as the leader can just 
reclaim the leader token via appends to the file-owner file, with periodic 
rewrites to minimize file size. 

The on-disk representation of the extreme version I'm proposing is then this: 
the full file then is actually composed of three pieces: (1) the actual data 
and then two metadata files, extents (to add a new word/definition),  (2) an 
external-reference extent: each time a reference is made to the file a new 
count is appended and it can periodically recompacted to a single value, (3) an 
owner-extent with the current NN owner and the lease time on the file, 
dictating who controls overall deletion of the file (since ref counts are done 
via the external-ref file). This means (2) and (3) are hidden inodes, only 
accessible to the namenode. We can minimize overhead to these file extents by 
ensuring a single writer via messaging to the the owner NN (as specified by the 
owner-file), though this is not strictly necessary.

Further, (1) could become a hidden inode if all the local namespace references 
are removed, but it could eventually be transferred over to another NN shard 
(namespace) to keep overhead at a minimum, though (again), this is not a strict 
necessity.

The design retains the NN view of files as directory entries, just entries with 
a little bit of metadata. The metadata

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-27 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13402735#comment-13402735
]

Daryn Sharp commented on HDFS-3370:
---

Nice idea, but I think it gets much more complicated. Retaining ref-counted
paths after deletion in the origin namespace requires an inode id. A new api
to reference paths based on the id is required. We aren't so soft anymore...

The inode id needs to be secured since it bypasses all parent dir permissions,
yet the id should be identical for all links in order for copy utils to
distinguish identical inodes.

Now comes the worst part: the client. Will the NNs proxy fs stream operations
to each other with a secure api for referencing inode ids? Or will they
redirect the client to the origin NN? If they redirect, how to protect against
the client guessing ids, or saving them for later replay even when the dir
privs prevent access?

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-25 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13400658#comment-13400658
]

Sanjay Radia commented on HDFS-3370:

Konstantine
* How can one implement hard links in a library? If you have an alternate
library implementation in mind please explain.
* I am fine to have hard links and renames restricted to volumes; this should
then give you freedom to implemented a distributed NN.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-20 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397478#comment-13397478
]

Daryn Sharp commented on HDFS-3370:
---

@Sanjay: Good point about simply recording length. It would eschew
random-write (not proposing it, only mentioning since it was cited earlier),
but a feature like that would require significant other changes so integration
with hardlinks could be deferred until if/when that's implemented. If
snapshots are implemented using COW-hardlinks, then we should consider
duplicating the inode to preserve all metadata, ie. not just length at the time
of snapshot.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-20 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397700#comment-13397700
]

Konstantin Shvachko commented on HDFS-3370:
---

Sanjay, you are taking a quote out of context. It has been explained what
hard means above. Please scan through. One more example:
Well understood why traditional hard links are not allowed across volumes. A
distributed namespace is like dynamically changing volumes. You can restrict a
link to a single volume, but the names can flow to different volumes later on.

I am not proposing to remove the existing complexity from the system, I propose
not to introduce more of it. In distributed case consistent hard links need
PAXOS-like algorithms. They are not elementary operations, which only should
compose the API.
Hard links can be implemented as a library using ZK, which will stand in
distributed case.

A couple of quotes from Snajay's (mine too) favorite author:
- When in doubt, leave it out. If there is a fundamental theorem of API design,
this is it. You can always add things later, but you can't take them away.
- APIs must coexist peacefully with the platform, so do what is customary. It
is almost always wrong to transliterate an API from one platform to another.
- Consider the performance consequences of API design decisions ...

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-19 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396592#comment-13396592
]

Lars Hofhansl commented on HDFS-3370:
-

Hardlinks would be used for temporary snapshotting (not to hold the backup
itself).

Anyway... Since there's strong opposition to this, at Salesforce we'll either
come up with something else, maintain local HDFS patches, or use a different
file system.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-19 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396769#comment-13396769
]

Daryn Sharp commented on HDFS-3370:
---

I understand hardlinks likely aren't meant to be. However I'd like to point
out:
* Hardlinks cannot be implemented at a library level. The n-many directory
entries must be able to reference the same inode, which unlike symlinks, are
not bound by the permissions used to access any other of the paths to the
hardlink. Filesystem support is required.
* Hardlinks shouldn't rule out the possibility of random-write (not suggesting
it, it was brought up earlier). There may need to be some changes to the lease
manager to apply the lease to the underlying inode instead of path.
* Hardlinks for backup aren't sufficient except by convention. That's where
snapshots using hardlinks+COW blocks is interesting. COW blocks also open the
door to zero-write copies.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-19 Thread Jesse Yates (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396961#comment-13396961
]

Jesse Yates commented on HDFS-3370:
---

Maybe I'm missing something here...

bq. Backup itself only becomes safe if HDFS (not HBase) promises to never
modify a file once it is closed. Otherwise, a process that accidentally writes
into the hard-linked file will corrupt both copies

At least for the HBase case, if we set the file permissions to be 744, you will
only have an hbase process that could mess up the file (which it won't do once
we close the file) and then an errant process can only slow down other reader
processes. That would make it sufficient at least for HBase backups, but
clearly not for general HDFS backups.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-19 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397040#comment-13397040
]

Sanjay Radia commented on HDFS-3370:

... hard links .. are very hard to support when the namespace is distributed
There are many things that are hard in a distributed namenode, For example
rename is also hard - i recall discussing the challenges of renames in
distributed nn with Konstantine. Do we remove such things from hdfs?

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-19 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13397049#comment-13397049
 ] 

Sanjay Radia commented on HDFS-3370:


We should consider two kinds of hard-links: normal and COW. COW-HardLinks are 
easy since HDFS only allows append and hence one needs to simply record the 
length.

 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-18 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396306#comment-13396306
]

Konstantin Shvachko commented on HDFS-3370:
---

the key question: What services should a file system provide?

Exactly so. I would clarify it as: What functions should be a part of the file
system API and what should be a library function.

The same argument could be made for symbolic links. The application could
implement those (in fact it's quite simple).

Simple is the key point here. Simple functions should be fs APIs. Hard
functions should go into libraries.

Darin, you are right there is a lot of overlap, and yes hardlinks simplify
building snapshots, but you are just pushing the complexity on HDFS layer. This
does not change the difficulty of the problem.

We relaxed posix semantics in many aspects in HDFS for simplicity and
performance. Imagine how much easier life would be with random writes or
multiple writers. You are not asking for it, right?

Hardlinks are of similar nature. They are hard to support if the namespace is
distributed. They should not be HDFS API, but they could be a library function.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-18 Thread Jesse Yates (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396311#comment-13396311
 ] 

Jesse Yates commented on HDFS-3370:
---

bq. Hardlinks are of similar nature. They are hard to support if the namespace 
is distributed. 

FWIW Ceph also punts on distributed hardlinks and just puts them into a single 
node because they are not commonly used and not likely to be hot or large 
(paraphrasing). Conceptually, you could do it with 2PC across nodes, which 
should be fine as long as the namespace isn't sharded too highly - +1000s of 
nodes hosting hardlink information (again, not too many hardlinks).  

From an HBase perspective, hardlink count _could_ become large (~equal number 
of hfiles), but that isn't going to be near the number of files overall 
currently in HDFS. Maybe punt on the issue until it becomes a problem, keeping 
it flexible behind an interface?

 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-18 Thread M. C. Srivas (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13396466#comment-13396466
]

M. C. Srivas commented on HDFS-3370:

The fact that HBase wants to use hard-links for backup does not make the backup
itself safe. Backup itself only becomes safe if HDFS (not HBase) promises to
never modify a file once it is closed. Otherwise, a process that accidentally
writes into the hard-linked file will corrupt both copies. Simple having
HBase say oh, but we never modify this file via HBase is not strong enough.
The backup has to be absolutely immutable.

So the use-case here requires a commitment from HDFS to never be able to either
append or ever write into an existing file. So it means no chance of
random-write or NFS support.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-15 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13295805#comment-13295805
]

Daryn Sharp commented on HDFS-3370:
---

I see a lot of overlap between hard links and snapshots. Conceptually, a
snapshot is composed of hardlinks with COW semantics for file's metadata and
last partial block. Hardlinks would also be a very easy way to implement
zero-write copies. Streaming the bytes down and back up via the client isn't
very efficient.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-14 Thread M. C. Srivas (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294861#comment-13294861
]

M. C. Srivas commented on HDFS-3370:

@Karthik: using hard-links for backup accomplishes exactly the opposite. The
expectation with a correctly-implemented hardlink is that when the original is
modified, the change is reflected in the file, no matter which path-name was
used to access it. Isn't that exactly the opposite effect of what a
backup/snapshot is supposed to do? Unless of course you are committing to
never ever being able to modify a file once written (although that would be
viewed by most as a major step backwards in the evolution of Hadoop).

Another major problem is the scalability of the NN gets reduced by a factor of
10. (ie, your cluster can now hold only 10 million files instead of the 100
million which it used to be able to hold). Imagine someone doing a backup
every 6 hours. Let's say the backups are to be retained as follows: 4 for the
past 24 hrs, 1 daily for a week, and 1 per week for 1 month. Total: 4 + 7 + 4 =
15 backups, ie, 15 hard-links to the files, one from each backup. So each file
is pointed to by 15 names, or, in another words, the NN now holds 15 names
instead of 1 for each file. I think that would reduce the number of files held
by the cluster practically speaking by a factor of 10, no?

Thirdly, hard-links don't work with directories. What is the scheme to back up
directories? (If this scheme only usable for HBase backups and nothing else,
then I agree with Konstantin that it belongs in the HBase layer and not here)

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-14 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294934#comment-13294934
 ] 

Lars Hofhansl commented on HDFS-3370:
-

This is a good discussion. 

Couple of points:
bq. Or provide use cases which cannot be solved without it.
This seems to be the key question: What services should a file system provide?
The same argument could be made for symbolic links. The application could 
implement those (in fact it's quite simple).

bq. but they are very hard to support when the namespace is distributed
But isn't that an implementation detail, which should not inform the feature 
set? 
Hardlinks could be only supported per distinct namespace (namespace in 
federated HDFS or a volume in MapR - I think). This is not unlike Unix where 
hardlinks are per distinct filesystem (i.e. not across mount points).

@M.C. Srivas:
If you create 15 backups without hardlinks you get 15 times the metadata *and* 
15 times the data... Unless you assume some other feature such as snapshots 
with copy-on-write or backup-on-write semantics. (Maybe I did not get the 
argument)

Immutable files are very a common and useful design pattern (not just for 
HBase) and while not strictly needed, hardlinks are very useful together with 
immutable files.

Just my $0.02.


 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-13 Thread Konstantin Shvachko (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294229#comment-13294229
]

Konstantin Shvachko commented on HDFS-3370:
---

I would recommend finding a different approach to implementing snapshots than
adding this feature.

I agree with Srivas, hard links seem easy in single-NameNode architecture, but
they are very hard to support when the namespace is distributed, because if
links to a file belong to different nodes you cannot just lock the entire
namespace and do atomic cross-node linking / unlinking.
I also agree with Srivas that hard links in traditional file systems cause more
problems than add value.
Looking at the design document I see that you create sort of internal symlinks
called INodeHardLinkFile pointing to HardLinkFileInfo, representing the actual
file. This can be modeled by symlinks on the application (HBase) level without
making any changes in HDFS.

I strongly discourage bringing this feature inside HDFS.
Or provide use cases which *cannot* be solved without it.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-13 Thread Karthik Ranganathan (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13294597#comment-13294597
]

Karthik Ranganathan commented on HDFS-3370:
---

@Konstantin: This can be modeled by symlinks on the application (HBase)
level without making any changes in HDFS.
Modeling this on top of HBase would essentially mean implementing the hardlink
feature at the HBase level for all its files. This means that every application
that needs a similar feature needs to use symbolic links to implement
hardlinks. We have already implemented this at the underlying filesystem level
for HBase backups - except that on disk/node failure, the re-replication would
increase the total size of data in the cluster which was getting hard to
provision. Hence the natural progression towards putting it in HDFS.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-12 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293480#comment-13293480
]

Lars Hofhansl commented on HDFS-3370:
-

Thanks Liyin. Sounds good.

One thought that occurred to me since: We need to think about copy semantics.
For example how will distcp handle this? It shouldn't create a new copy of a
file for each hardlink that points to it, but rather just copy it at most once
and create hardlinks for each following reference. But then what about multiple
distcp commands that happen to cover hardlinks to the same file? I suppose in
the case we cannot be expected to avoid multiple copies of the same file (but
at most one copy for each invocation of distcp, and only if the distcp happens
to cover a different hardlink).

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-12 Thread Liyin Tang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293782#comment-13293782
]

Liyin Tang commented on HDFS-3370:
--

Good point, Lars.
When users run cp in the linux file system against hard linked files, it will
copy the bytes, right?
I think we shall keep the same semantics here as well.

In terms of optimization, the upper level application shall have the knowledge
when to use hardlink in the remote/destination DFS instead of coping the bytes
between two clusters.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-12 Thread Andy Isaacson (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13293965#comment-13293965
]

Andy Isaacson commented on HDFS-3370:
-

{quote}
When users run cp in the linux file system against hard linked files, it will
copy the bytes, right?
{quote}

{{cp -a}} preserves hard links; {{cp -r}} breaks them (duplicates the bytes).

{quote}
I think we shall keep the same semantics here as well.
{quote}

I don't think it's a good idea to pretend that we can or should preserve
*every* corner case of the semantics of POSIX hard links. The Unix hard link
was originally a historical accident of the inode/dentry structure of the
filesystem, preserved because it's useful and has been heavily relied upon by
users of the Unix api. The implementation in something like ZFS or btrfs is
pretty far away from the original simplicity.

Since we don't have API compatibility with Unix and our underlying structure is
deeply different, it's a good idea to borrow the good ideas but take a
practical eye to where it makes sense to diverge.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-08 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291614#comment-13291614
 ] 

Lars Hofhansl commented on HDFS-3370:
-

Do you have a preliminary patch to look at?


 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-08 Thread Liyin Tang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291791#comment-13291791
]

Liyin Tang commented on HDFS-3370:
--

I planned to break this feature into several parts:
1) Implement the new FileSystem API: hardLink based on the INodeHardLinkFile
and HardLinkFileInfo class. Also handle the deletion properly.
2) Handle the DU operation and quote update properly.
3) Update the FSImage format and FSEditLog.

I have finished the part 1 but still work on part 2.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-07 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13290922#comment-13290922
 ] 

Lars Hofhansl commented on HDFS-3370:
-

Is anybody working on a patch for this?
If not, I would not mind picking this up (although I can't promise getting to 
this before the end of the month).


 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-07 Thread Liyin Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291149#comment-13291149
 ] 

Liyin Tang commented on HDFS-3370:
--

Hi Lars, we are still working on this feature. It may take a while to take care 
of all the cases, especially the quote updates and fsimage format change. 

 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-05 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13289689#comment-13289689
]

Lars Hofhansl commented on HDFS-3370:
-

Reading through the Design Doc it seems that
FileSystem.{setPermission|setOwner} would be awkward. We'd have to find each
INodeHardLinkFile pointing to the same file and then changing all their
permissions/owners.

HardLinkFileInfo could also maintain permissions and owners (since they -
following posix - are the same for each hard link). That way changing owner or
permissions would immediately affect all hard links.
When the fsimage is saved each INodeHardLinkFile would still write its own
permission and owner (for simplicity, but that could be optimized, as long as
at least one INode writes the permissions/owner).
Upon read INode representing a hardlink must have the same permission/owner as
all other INodes linking to the same file. If not the image is inconsistent.

In that case HardLinkFileInfo would not need to maintain a list of pointers
back to all INodeHardLinkFiles, and owner/permissions would only be stored once
in memory.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-06-02 Thread Lars Hofhansl (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13287931#comment-13287931
]

Lars Hofhansl commented on HDFS-3370:
-

@M.C. Srivas: Isn't that the same for any file?
A rename of a file renames one of its references. I don't understand how the
fact that the file has more references has any impact on that.

Hardlinks are incredibly useful for applications like HBase, where an immutable
HFile could just be mapped to another directory (not just for backup purposes).

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-15 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276415#comment-13276415
]

Sanjay Radia commented on HDFS-3370:

* I see the additional complexity in quotas because HDFS quotas are directory
based (like several file systems). I think this is addressable if we double
count the quotas along both path.
* Permissions are not a problem since the file retains the file permissions and
both paths to the file offer their own permissions.
* I don't understand Srivas's rename example.
Srivas, in MapR I suspect that renames are only allowed within a volume and
such hard links would be supported only within a volume. Can you explain the
problem with some more details.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-15 Thread M. C. Srivas (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276461#comment-13276461
]

M. C. Srivas commented on HDFS-3370:

Sanjay, POSIX says that a user cannot open a file unless they have permissions
to traverse the entire path from / to the file. The problem is that if a file
has two paths (as in a hard-link), perms becomes very hard to enforce since a
file does not know which dir is its parent. Imagine a rename of a file with
many hard-links across to a new dir. This problem is harder in a distr file
system if you wish to spread the meta-data. Note that the enforcement happens
automatically with symbolic links. As you point out, with MapR we _could_
implement hard-links within a volume, but chose not to and instead implemented
only symlinks. (I personally find symlinks to be more flexible).

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-12 Thread Hari Mankude (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274035#comment-13274035
 ] 

Hari Mankude commented on HDFS-3370:


Can the hard linked files be reopened for append?


 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-11 Thread Sanjay Radia (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273788#comment-13273788
 ] 

Sanjay Radia commented on HDFS-3370:


Is the proposal to allow hard links to only files or to files and directories?

 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-11 Thread Liyin Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13273824#comment-13273824
 ] 

Liyin Tang commented on HDFS-3370:
--

It only allows hard links to the closed files.

 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLink.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-09 Thread Liyin Tang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271141#comment-13271141
]

Liyin Tang commented on HDFS-3370:
--

bq. Another consideration is ds quota is based on a multiple of replication
factor, so who is allowed to change the replication factor since increasing it
may impact a different user's quota?

Generally, when user creates a hardlink in Linux, it requires the EXECUTE
permission for the source directory and WRITE_EXECUTE permission for the
destination directory. And it is a well-known issue that hard links on Linux
could create local DoS vulnerability and security problems, especially when
malicious user keeps creating hard links to other users files and let others
run out of quota. One of solutions to prevent this problem is to set the
permission of the dir correctly.

HDFS hardlink should follow the same permission requirements as genreal Linux
FS and only allow the trusted users or groups have right permission to create
hardlinks. The same security principle shall apply for setReplication
operation, which can be treated as a normal write operation in general Linux FS.

Thanks Daryn Sharp so much for the above discussion.
It really helps us to re-visit several design issues and improve the solutions.
I will update the design doc later.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-09 Thread M. C. Srivas (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271450#comment-13271450
]

M. C. Srivas commented on HDFS-3370:

Creating hard-links in a distributed file-system will cause all kinds of future
problems with scalability. Hard-links are rarely used in the real-world,
because of all the associated bizzare problems. Eg, consider a hardlink setup
as follows:

link1: /path1/dirA/file
link2: /path2/dirB/file

1. Changing the permissions along the path /path1/dirA to make file
inaccessible works, but now with hard-links /path2/dirB is wide open.

2. Rename /path2/dirB to /path3/dirC will require taking locks on
/path1/dirA ... but the file does not have parent ptrs to figure out
which path(s) to lock.

I would recommend finding a different approach to implementing snapshots than
adding this feature.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-09 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271453#comment-13271453
]

Daryn Sharp commented on HDFS-3370:
---

I fully agree that posix and/or linux conventions should ideally be followed.

I did a little testing and didn't realize that a hard link retains the same
attrs (owner, group, perms, xattrs, etc) as the original file. Changing one
implicit changes the others, so that negates some issues such as differing
replication factor concerns. Perhaps hard link creation can be restricted to
only the file owner and superuser.

The quota concerns are still a bit more complex. Unixy systems like linux and
bsd only have fs level quotas for users, so quota handling is trivial compared
to directory level quotas in hdfs. Since all hard links implicitly have the
same owner, quotas are as simple as incrementing the user's ds quota is at file
creation, and decrement when all links are removed. This why a DOS is possible
against a user.

I'm sorry if I'm missing a detail, but I remain unclear on how you are
proposing to handle the directory level quotas. I don't fully grok how finding
a common ancestor with a quota is sufficient because quotas can be added or
removed at any time. Maybe part of the issue too is I have nested directories
with individual quotas in mind, whereas maybe you are assuming one and only one
quota from the root?

I look forward to your thoughts.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-09 Thread Liyin Tang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13271684#comment-13271684
]

Liyin Tang commented on HDFS-3370:
--

@M.C.Srivas, I am afraid that I didn't quite understand your concerns.
bq.
1. Changing the permissions along the path /path1/dirA to make file
inaccessible works, but now with hard-links /path2/dirB is wide open.
2. Rename /path2/dirB to /path3/dirC will require taking locks on
/path1/dirA ... but the file does not have parent ptrs to figure out
which path(s) to lock.

1) Hardlinked files are suposed to have the same permission as the source file.
2) Each INodeFile do have a parent pointer to its parent in HDFS and also which
lock are you talking about exactly (in the implementation's perspective) ?

@Daryn Sharp:
bq. I did a little testing and didn't realize that a hard link retains the same
attrs (owner, group, perms, xattrs, etc) as the original file. Changing one
implicit changes the others, so that negates some issues such as differing
replication factor concerns. Perhaps hard link creation can be restricted to
only the file owner and superuser.

Totally agreed:)

bq. The quota concerns are still a bit more complex. Unixy systems like linux
and bsd only have fs level quotas for users, so quota handling is trivial
compared to directory level quotas in hdfs. Since all hard links implicitly
have the same owner, quotas are as simple as incrementing the user's ds quota
is at file creation, and decrement when all links are removed. This why a DOS
is possible against a user.

From the security perspective, users should be responsible to set the correct
permission to protect themselves. In this cases, users should ONLY grant the
EXECUTE permission to the trusted users for hardlinkling.

bq. I'm sorry if I'm missing a detail, but I remain unclear on how you are
proposing to handle the directory level quotas. I don't fully grok how finding
a common ancestor with a quota is sufficient because quotas can be added or
removed at any time. Maybe part of the issue too is I have nested directories
with individual quotas in mind, whereas maybe you are assuming one and only one
quota from the root?

Would you mind giving me an exact example for your concerns about quotas? I
would be very happy to explain it in details :)

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLink.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-08 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270589#comment-13270589
]

Daryn Sharp commented on HDFS-3370:
---

While I really like the idea of hardlinks, I believe there are more non-trivial
consideration with this proposed implementation. I'm by no means a SME, but I
experimented with a very different approach awhile ago. Here are some of the
issues I encountered:

I think the quota considerations may be a bit trickier. The original creator
of the file takes the nsquota dsquota hit. The links take just the dsquota
hit. However, when the original creator of the file is removed, one of the
other links must absorb the dsquota. If there are multiple remaining links,
which one takes the hit?

What if none of the remaining links have available quota? If the dsquota can
always be exceeded, I can bypass my quota by creating the file in one dir,
hardlinking from my out-of-dsquota dir, then removing the original. If the
dsquota cannot be exceeded, I can (maliciously?) hardlink from my
out-of-dsquota dir to deny the original creator the ability to delete the file
-- perhaps causing them to be unable to reduce their quota usage.

Block management will also be impacted. The manager currently operates on an
inode mapping (changing to an interface though), but which of the hardlink
inodes will it be? The original? When that link is removed, how will the
block manager be updated with another hardlink inode?

When a file is open for writing, the inode converts to under construction, so
there would need to be a hardlink under construction. You will have to think
about how other hardlinks are affected/handled. The case applies to hardlinks
during file creation and appending.

There may also be an impact to file leases. I believe they are path based so
leases will now need to be enforced across multiple paths.

What if one hardlink changes the replication factor? The maximum replication
factor for all hardlinks should probably be obeyed, but now the setrep command
will never succeed since it waits for the replication value to actually change.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLinks.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-08 Thread John George (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270660#comment-13270660
 ] 

John George commented on HDFS-3370:
---

Thanks for uploading the design document. 
Do you plan to support hardlink using FileContext? In the design document, I 
see FileSystem and FsShell being mentioned as client interface - hence the 
question. 

 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLinks.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-08 Thread Liyin Tang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270715#comment-13270715
]

Liyin Tang commented on HDFS-3370:
--

@Daryn Sharp: very good comments :)
1) Quota is the trickest for the hard link.

For nsquota usage, it will be added up when creating hardlinks and be decreased
when removing hardlinks.

For dsquota usage, it will only increase and decrease the quota usage for the
directories, which are not any common ancestor directories with any linked
files.
For example, ln /root/dir1/file1 /root/dir1/file2 : there is no need to
increase the ds quota usage when creating the link file: file2.
Also rm /root/dir1/file1 : there is no need to decrease the ds quota usage
when removing the original source file: file1.

The bottom line is there is no such case that we need to increase any dsquota
during the file removal operation. Because if the directory is a common
ancestor directory, no dsquota needs to be updated, otherwise the dsquota has
already been updated during the hard link created time.

2) You are right that each blockInfo of the linked files needs to be updated
when the original file is deleted. I shall update the design doc to explicitly
explain this part in details.

3) Currently, at least for V1, we shall support the hardlinking only for the
closed files and won't support to append operation against linked files, but it
could be extended in the future.

4) Very good point that hardlinked files shall respect the max replication
factors. From my understanding, the setReplication is just a memory footprint
update and the name node will increase actual replication in the background.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLinks.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-08 Thread Daryn Sharp (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270780#comment-13270780
]

Daryn Sharp commented on HDFS-3370:
---

I'm glad you find my questions helpful!

bq. For example, ln /root/dir1/file1 /root/dir1/file2 : there is no need to
increase the ds quota usage when creating the link file: file2. Also rm
/root/dir1/file1 : there is no need to decrease the ds quota usage when
removing the original source file: file1.

I agree that ds quota doesn't need to be changed when there are links in the
same directory. I'm referring to the case of hardlinks across directories.
Ie. /dir/dir2/file and /dir/dir3/hardlink. If dir2 and dir3 have separate ds
quotas, then dir3 has to absorb the ds quota when the original file is removed
from dir2. What if there is a /dir/dir4/hardlink2? Does dir3 or dir4 absorb
the ds quota? What if neither has the necessary quota available?

bq. Currently, at least for V1, we shall support the hardlinking only for the
closed files and won't support to append operation against linked files, but it
could be extended in the future.

A reasonable approach, but it may lead to user confusion. It almost begs for a
immutable flag (ie. chattr +i/-i) to prevent inadvertent hard linking to files
intended to be mutable.

Nonetheless, I'd suggest exploring the difficulties reconciling the current
design of the namesystem/block management with your design. It may help avoid
boxing ourselves into a corner with limited hard link support.

bq. From my understanding, the setReplication is just a memory footprint
update and the name node will increase actual replication in the background.

Yes, but the FsShell setrep command actively monitors the files and does not
exit until the replication factor is what the user requested -- as determined
by the number of hosts per block. Another consideration is ds quota is based
on a multiple of replication factor, so who is allowed to change the
replication factor since increasing it may impact a different user's quota?

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLinks.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-08 Thread Liyin Tang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13270805#comment-13270805
]

Liyin Tang commented on HDFS-3370:
--

bg. I agree that ds quota doesn't need to be changed when there are links in
the same directory. I'm referring to the case of hardlinks across directories.
Ie. /dir/dir2/file and /dir/dir3/hardlink. If dir2 and dir3 have separate ds
quotas, then dir3 has to absorb the ds quota when the original file is removed
from dir2. What if there is a /dir/dir4/hardlink2? Does dir3 or dir4 absorb the
ds quota? What if neither has the necessary quota available?

Based on the same example you commented, when linking /dir/dir2/file and
/dir/dir3/hardlink, it will increase the dsquota for dir3 but not /dir. Because
dir3 is NOT a common ancestor but dir is. And if dir3 doesn't have enough
dsquota, then it shall throw quota exceptions. Also if there is a
/dir/dir4/hardlink2, it absorbs the dsquota for dir4 as well. So the point is
that it only absorbs the dsquota during the link creation time and decreases
the dsquota during the link deletion time.

From my understanding, the basic semantics for HardLink is to allow user
create multiple logic files referencing to the same set of blocks/bytes on
disks. So user could set different file level attributes for each linked file
such as owner, permission, modification time.
Since these linked files share the same set of blocks, the block level setting
shall be shared.
It may be a little confused to distinguish the replication factor in HDFS
between file-level attributes and block-level attributes.
If we agree that replication factor is a block-level attribute, then we shall
pay the overhead (wait time) when increasing replication factor, just as
increasing the replication factor against a regular file, and the
setReplication operation is supposed to fail if it breaks the dsquota.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLinks.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-04 Thread Namit Jain (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13268564#comment-13268564
]

Namit Jain commented on HDFS-3370:
--

Another usecase in Hive is to copy one table/partition to another
table/partition.
Ideally, we would like the following in Hive:

Copy Table T1 to T2.

The files under table location for T2 (say, /user/hive/warehouse/T2/0) can be a
link to the corresponding file in table T1
(say, /user/hive/warehouse/T1/0).

Having said that, one of the requirements is that the data should be modified
independently. So, if a new data is loaded into T1 (or T2),
those changes should not be visible to T2 (or T1 respectively).

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang

We'd like to add a new feature to HDFS, hardlink, that allows harlinked files
to share data without copying. Currently we will support hardlinking only
closed files, but it could be extended to unclosed files as well.
Among many potential use cases of feature, the following two are primarily
used in facebook:
1. This provides a lightweight way for applications like hbase to create a
snapshot;
2. This also allows an application like Hive to move a table to a different
directory without breaking current running hive queries.

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-04 Thread Sanjay Radia (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13268671#comment-13268671
]

Sanjay Radia commented on HDFS-3370:

The main advantage over symbolic links being that when the original link is
deleted the 2nd one keeps the actual data from being deleted. Correct?
Does the hard link stay on the NN or does it propagate to the actual blocks on
the DN?
I believe it is not necessary to propagate the link to the DNs based on the use
cases you have described.

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLinks.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-04 Thread Liyin Tang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13268685#comment-13268685
]

Liyin Tang commented on HDFS-3370:
--

The main advantage over symbolic links being that when the original link is
deleted the 2nd one keeps the actual data from being deleted. Correct
Do you mean hard link instead of symbolic links? If the original link deleted,
the symbolic link will be broken. But if one of the hard linked files is
deleted, other linked files won't be affected.

You are right that the hard link stays on NN only:)

HDFS hardlink
-

Key: HDFS-3370
URL: https://issues.apache.org/jira/browse/HDFS-3370
Project: Hadoop HDFS
Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
Attachments: HDFS-HardLinks.pdf

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-04 Thread Liyin Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13268687#comment-13268687
 ] 

Liyin Tang commented on HDFS-3370:
--

@Sanjay, sorry that I misunderstood the advantage over. 
It is correct that keeping other linked files after deletion is the main 
advantage over symbolic links:)

 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLinks.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3370) HDFS hardlink

2012-05-04 Thread Hairong Kuang (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-3370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13268735#comment-13268735
 ] 

Hairong Kuang commented on HDFS-3370:
-

Sajay, you are right. HDFS hardlink is only a meta operation and no datanode is 
involved. In all our use cases, the source file may be deleted over time but 
its content can be still accessed through hardlinks.

 HDFS hardlink
 -

 Key: HDFS-3370
 URL: https://issues.apache.org/jira/browse/HDFS-3370
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: Hairong Kuang
Assignee: Liyin Tang
 Attachments: HDFS-HardLinks.pdf


 We'd like to add a new feature hardlink to HDFS that allows harlinked files 
 to share data without copying. Currently we will support hardlinking only 
 closed files, but it could be extended to unclosed files as well.
 Among many potential use cases of the feature, the following two are 
 primarily used in facebook:
 1. This provides a lightweight way for applications like hbase to create a 
 snapshot;
 2. This also allows an application like Hive to move a table to a different 
 directory without breaking current running hive queries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

59 matches

Mail list logo