subject:"\[jira\] \[Commented\] \(HDFS\-6382\) HDFS File\/Directory TTL"

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-07-18 Thread Doug Cutting (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14066507#comment-14066507
]

Doug Cutting commented on HDFS-6382:

bq. But the trash emptier runs inside NN as a daemon thread, instead of a
separate daemon process.

The trash emptier was embedded in the NN mostly just to avoid making folks have
to manage another daemon process. However, embedding the emptier has many of
the hazards that Chris and Colin described above for embedding TTL. So, if we
add a separate daemon process for TTL, then we might also have that process
empty the trash and remove the embedded emptier.

HDFS File/Directory TTL
---

Key: HDFS-6382
URL: https://issues.apache.org/jira/browse/HDFS-6382
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu
Assignee: Zesheng Wu
Attachments: HDFS-TTL-Design -2.pdf, HDFS-TTL-Design-3.pdf,
HDFS-TTL-Design.pdf

In production environment, we always have scenario like this, we want to
backup files on hdfs for some time and then hope to delete these files
automatically. For example, we keep only 1 day's logs on local disk due to
limited disk space, but we need to keep about 1 month's logs in order to
debug program bugs, so we keep all the logs on hdfs and delete logs which are
older than 1 month. This is a typical scenario of HDFS TTL. So here we
propose that hdfs can support TTL.
Following are some details of this proposal:
1. HDFS can support TTL on a specified file or directory
2. If a TTL is set on a file, the file will be deleted automatically after
the TTL is expired
3. If a TTL is set on a directory, the child files and directories will be
deleted automatically after the TTL is expired
4. The child file/directory's TTL configuration should override its parent
directory's
5. A global configuration is needed to configure that whether the deleted
files/directories should go to the trash or not
6. A global configuration is needed to configure that whether a directory
with TTL should be deleted when it is emptied by TTL mechanism or not.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-07-18 Thread Zesheng Wu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14067254#comment-14067254
]

Zesheng Wu commented on HDFS-6382:
--

Thanks [~cutting]
I will move the trash emptier into TTL daemon after HDFS-6525 and HDFS-6526 are
resolved.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-23 Thread Zesheng Wu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040683#comment-14040683
]

Zesheng Wu commented on HDFS-6382:
--

Hi guys, I've uploaded an initial implementation on HDFS-6525 and HDFS-6526
separately, hope you can take a look at, any comments will be appreciated.
Thanks in advance.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-16 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032964#comment-14032964
]

Colin Patrick McCabe commented on HDFS-6382:

bq. Plus, in the places that need this the most, one has to deal with getting
what essentially becomes a critical part of uptime getting scheduled, competing
with all of the other things running and, to remind you, to just delete
files. It's sort of ridiculous to require YARN running for what is
fundamentally a file system problem. It simply doesn't work in the real world.

In the examples you give, you're already using YARN for Hive and Pig, so it's
already a critical part of the infrastructure. Anyway, you should be able to
put the cleanup job in a different queue. It's not like YARN is strictly FIFO.

bq. One eventually gets to the point that the auto cleaner job is now running
hourly just so /tmp doesn't overrun the rest of HDFS. Because these run outside
of HDFS, they are slow and tedious and generally fall in the lap of teams that
don't do Java so end up doing all sorts of squirrely things to make these jobs
work. This also sucks.

Well, presumably the implementation in this JIRA won't be done by a team that
doesn't do Java so we should skip that problem, right?

The comments about /tmp are, I think, another example of how this needs to be
highly configurable. Rather than modifying Hive or Pig to set TTLs on things,
we probably want to be able to configure the scanner to look at everything
under /tmp. Perhaps the scanner should attach a TTL to things in /tmp that
don't already have one.

Running this under YARN has an intuitive appeal to the upstream developers,
since YARN is a scheduler. If we write our own scheduler for this inside HDFS,
we're kind of duplicating some of that work, including the monitoring, logging,
etc. features. I think Steve's comments (and a lot of the earlier comments)
reflect that. Of course, to users not already using YARN, a standalone daemon
might seem more appealing.

The proposal to put this in the balancer seems like a reasonable compromise.
We can reuse some of the balancer code, and that way, we're not adding another
daemon to manage. I wonder if we could have YARN run the balancer
periodically? That might be interesting.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-16 Thread Steve Loughran (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033079#comment-14033079
]

Steve Loughran commented on HDFS-6382:
--

{quote}
It's sort of ridiculous to require YARN running for what is fundamentally a
file system problem. It simply doesn't work in the real world.
{quote}

Allen, that's like saying it's ridiculous to require bash scripts to perform
what is fundamentally a unix filesystem problem. One is data, the other is the
mechanism to run code near the data. I don't try and hide any local /tmp
cleanup init.d scripts inside an ext3 plugin, after all.
YARN
# handles security by having you include kerberos tickets in the launch.
# stops you having to choose a specific server to run this thing (hence point
of failure).
# lets you scale up when needed.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-15 Thread Zesheng Wu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032082#comment-14032082
]

Zesheng Wu commented on HDFS-6382:
--

[~ste...@apache.org] Thanks for your feedback.
We have discussed that whether to use a MR job or a standalone daemon, and most
people upstream has come to an agreement that a standalone daemon is reasonable
and acceptable. You can go through the earlier discussion.

[~aw] Thanks for your feedback.
Your suggestion is really valuable and firms our confidence to implement it as
a standalone daemon.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-14 Thread Allen Wittenauer (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031762#comment-14031762
]

Allen Wittenauer commented on HDFS-6382:

Let's take a step back and give a more concrete example: /tmp

/tmp is the plague of Hadoop operations teams everywhere. Between Hive and Pig
leaving bits hanging around because the client can't properly handle signals to
crunch that by default just leaves (seemingly) as much crap laying around that
it possibly can because someone somewhere might want to debug it someday, /tmp
is the cesspool of HDFS. Every competent admin ends up writing some sort of
auto-/tmp-cleaner because of these issues. At scale, /tmp can have hundreds of
TB and millions of objects in it in less than 24 hours. It sucks.

One eventually gets to the point that the auto cleaner job is now running
hourly just so /tmp doesn't overrun the rest of HDFS. Because these run
outside of HDFS, they are slow and tedious and generally fall in the lap of
teams that don't do Java so end up doing all sorts of squirrely things to make
these jobs work. This also sucks.

Now, I can see why using an MR job is appealing (easy!), but it isn't very
effective. For one, we've already been here once and the result was distch.
Hell, there was a big fight just to get distch written and that--years and
years later!--still isn't documented because of how slow it works. Throw in
directories like /tmp that simply have WAY too much churn and one can see that
depending upon MR to work here just isn't viable. Plus, in the places that
need this the most, one has to deal with getting what essentially becomes a
critical part of uptime getting scheduled, competing with all of the other
things running and, to remind you, to just delete files. It's sort of
ridiculous to require YARN running for what is fundamentally a file system
problem. It simply doesn't work in the real world.

While at Hadoop Summit, a bunch of us sat around a table and were talking about
this issue with regards specifically to /tmp. (We didn't know about this JIRA,
BTW.) The solution we came up with was basically a service that would bootstrap
by reading fsimage and then reading the edits stream by sending the audit
information to Kafka. One of the big advantages of this is that we can get
near real-time updates of the parts of the file system we care need to operate
on. Since we only care about a subsection of the file system, the memory
requirements are significantly lower and it might be possible to coalesce
deletes in a smart way to cut back on RPCs. I suspect it wouldn't be hard to
generalize this type of solution to handle multiple user cases. But for me,
this is critical admin functionality that HDFS needs desperately and throwing
the the problem to MR just isn't workable.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-13 Thread Zesheng Wu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030366#comment-14030366
]

Zesheng Wu commented on HDFS-6382:
--

I filed two sub-tasks to track the development of this feature.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-13 Thread Steve Loughran (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14031018#comment-14031018
]

Steve Loughran commented on HDFS-6382:
--

My comments

# this can be done as an MR job.
# If you are worried about excessive load, start exactly one mapper, and
consider throttling requests. As some object stores throttle heavy load
reject on a very high DELETE rate, throttling is going to be needed for
anything that works against them.
# you can then use OOzie as the scheduler.
# MR restart handles failures: you just re-enum the directories and deleted
files don't show up.
# If you really, really can't do it as MR, write it as a one-node YARN app, for
which I'd recommend apache twill as the starting point. In fact, this project
would make for a nice example.

Don't rush to write a new service here for an intermittent job. that just adds
a new cost A service to install and monitor. Especially when you consider
that this new service will need
# a launcher entry point
# tests
# commitment from the HDFS team to maintain it

{quote}
We can implement TTL within a MapReduce job that is similar with DistCp. We
could run this MapReduce job over and over again or nightly or weekly to delete
the expired files and directories.
{quote}

Yes, and schedule with oozie
{quote}
(1) Advantages:
The major advantage of the MapReduce framework is concurrency control, if we
want to run multiple tasks concurrently, choose a MapReduce approach will ease
of concurrency control.
{quote}

There are other advantages
# The MR job will be simple to write and can be submitted remotely.
# it's trivial to test and therefore maintain.
# no need to wait for a new version of Hadoop. You can evolve it locally.
# different users, submitting jobs with different kerberos tickets can work on
their own files securely.
# there's no need to install and maintain a new service.

{quote}
(2) Disadvantages:
For implementing the TTL functionality, one task is enough, multiple tasks will
give too much race and load to the NameNode.
{quote}

# Demonstrate this by writing an MR job and assessing its load when you have a
throttled executor.
{quote}

On another hand, use a MapReduce job will introduce additional dependencies and
have additional overheads.
{quote}

# additional dependencies? In a cluster with MapReduce installed? The only
additional dependency is the JAR with the mapper and the reducer.
# What additional overheads? Are they really any less than running another
service in your cluster, with its own classpath, failure modes, security needs?

My recommendation, before writing a single line of a new service, is to write
it as an MR job. You will find it easy to write and maintain; server load is
handled by making sleep time a configurable parameter.

If you can then actually demonstrate that this is inadequate on a large
cluster, then consider a service. But start with MapReduce first. If you
haven't written an MR job before, don't worry -it doesn't take that long to
learn, and having done it you'll understand your user's workflow better.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-11 Thread Tsz Wo Nicholas Sze (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028682#comment-14028682
]

Tsz Wo Nicholas Sze commented on HDFS-6382:
---

Checked the design doc. It looks good. Some comments:

- Standalone Daemon Approach ... To Implement a completely new standalone
daemon can rarely reuse existing code, will need lots of work to do.
I don't agree. We may refactor Balancer or other tools if necessary.

- Using xattrs for TTL is a good idea. Do we really need ttl in milliseconds?
Do you think that the daemon could guarantee such accuracy? We don't want to
waste namenode memory space to store trailing zeros/digits for each ttl. How
about supporting symbolic ttl notation, e.g. 10h, 5d?

- The name Supervisor sounds too general. How about calling it TtlManager
for the moment? If there are more new features added to the tool, we may
change the name later.

- For setting ttl on a directory foo, write permission permission on the parent
directory of foo is not enough. Namenode also checks rwx for all
subdirectories of foo for recursive delete. BTW, permission could be changed
from time to time. A user may be able to delete a file/dir at the time of
setting TTL but the same user may not have permission to delete the same
file/dir when the ttl expires.
I suggest not to check additional permission requirement on setting ttl but run
as the particular user when deleting the file. Then we need to add username to
the ttl xattr.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-11 Thread Tsz Wo Nicholas Sze (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028729#comment-14028729
]

Tsz Wo Nicholas Sze commented on HDFS-6382:
---

How about supporting symbolic ttl notation, e.g. 10h, 5d?

The actual value in xattr could be encoded as two bytes -- 3 bits for the unit
(year, month, week, day, hour, minute, second, millisecond) and 13 bits for
value. If we store it using milliseconds, it probably needs eight bytes.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-11 Thread Zesheng Wu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14028768#comment-14028768
]

Zesheng Wu commented on HDFS-6382:
--

[~szetszwo], Thanks for your valuable suggestions.
bq. Using xattrs for TTL is a good idea. Do we really need ttl in milliseconds?
Do you think that the daemon could guarantee such accuracy? We don't want to
waste namenode memory space to store trailing zeros/digits for each ttl. How
about supporting symbolic ttl notation, e.g. 10h, 5d?
Yes, I agree with you that the daemon can't guarantee milliseconds accuracy,
and in fact there's no need to guarantee such accuracy. As you suggested, we
can use encoded bytes to save NN's memory.

bq. The name Supervisor sounds too general. How about calling it TtlManager
for the moment? If there are more new features added to the tool, we may change
the name later.
OK, TtlManager is more suitable for the moment.

bq. For setting ttl on a directory foo, write permission permission on the
parent directory of foo is not enough. Namenode also checks rwx for all
subdirectories of foo for recursive delete.
Nice catch, If we want to conform to the delete semantics mentioned by Colin,
we should check the subdirectories recursively.

bq. BTW, permission could be changed from time to time. A user may be able to
delete a file/dir at the time of setting TTL but the same user may not have
permission to delete the same file/dir when the ttl expires.
The deleting work will be done by a super user(which the TtlManager runs as),
seems this is not a problem?

bq. I suggest not to check additional permission requirement on setting ttl but
run as the particular user when deleting the file. Then we need to add username
to the ttl xattr.
Good point, but adding the username to the ttl xattr requires more space of
NN's memory, we should do the trade-off whether it's worth doing.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-10 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026772#comment-14026772
]

Colin Patrick McCabe commented on HDFS-6382:

bq. You mean that we scan the whole namespace at first and then split it into 5
pieces according to hash of the path, why do we just complete the work during
the first scanning process? If I misunderstand your meaning, please point out.

You need to make one RPC for each file or directory you delete. In contrast,
when listing a directory you make only one RPC for every {{dfs.ls.limit}}
elements (by default 1000). So if you have 5 workers all listing all
directories, but only calling delete on some of the files, you still might come
out ahead in terms of number of RPCs, provided you had a high ratio of files to
directories.

There are other ways to partition the namespace which are smarter, but rely on
some knowledge of what is in it, which you'd have to keep track of.

A single node design will work for now, though. Considering that you probably
want rate-limiting anyway.

bq. For the simplicity purpose, in the initial version, we will use logs to
record which file/directory is deleted by TTL, and errors during the deleting
process.

Even if it's not implemented at first, we should think about the configuration
required here. I think we want the ability to email the admins when things go
wrong. Possibly the notifier could be pluggable or have several policies.
There was nothing in the doc about configuration in general, which I think we
need to fix. For example, how is rate limiting configurable? How do we notify
admins that the rate is too slow to finish in the time given?

bq. It doesn't need to be an administrator command, user only can setTtl on
file/directory that they have write permission, and can getTtl on
file/directory that they have read permission.

You can't delete a file in HDFS unless you have write permission on the
containing directory. Whether you have write permission on the file itself is
not relevant. So I would expect the same semantics here (probably enforced by
setfacl itself).

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-10 Thread Zesheng Wu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14027325#comment-14027325
]

Zesheng Wu commented on HDFS-6382:
--

bq. Even if it's not implemented at first, we should think about the
configuration required here. I think we want the ability to email the admins
when things go wrong. Possibly the notifier could be pluggable or have several
policies. There was nothing in the doc about configuration in general, which I
think we need to fix. For example, how is rate limiting configurable? How do we
notify admins that the rate is too slow to finish in the time given?
OK, I will update the document and post a new version soon.

bq. You can't delete a file in HDFS unless you have write permission on the
containing directory. Whether you have write permission on the file itself is
not relevant. So I would expect the same semantics here (probably enforced by
setfacl itself).
That's reasonable, I'll figure it out clearly in the document.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-09 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025585#comment-14025585
]

Colin Patrick McCabe commented on HDFS-6382:

For the MR strategy, it seems like this could be parallelized fairly easily.
For example, if you have 5 MR tasks, you can calculate the hash of each path,
and then task 1 can do all the paths that are 0 mod 5, task 2 can do all the
paths that are 1 mod 5, and so forth. MR also doesn't introduce extra
dependencies since HDFS and MR are packaged together.

I don't understand what you mean by the mapreduce strategy will have
additional overheads. What overheads are you forseeing?

It is true that you need to avoid overloading the NameNode. But this is a
concern with any approach, not just the MR one. It would be good to see a
section on this. I think the simplest way to do it is to rate-limit RPCs to
the NameNode to a configurable rate.

bq. \[for the standalone daemon\] The major advantage of this approach is that
we don’t need any extra work to finish the TTL work, all will be done in the
daemon automatically.

I don't understand what you mean by this. What will be done automatically?

How are you going to implement HA for the standalone daemon? I suppose if all
the state is kept in HDFS, you can simply restart it when it fails. However,
it seems like you need to checkpoint how far along in the FS you are, so that
if you die and later get restarted, you don't have to redo the whole FS scan.
This implies reading directories in alphabetical order, or similar. You also
need to somehow record when the last scan was, perhaps in a file in HDFS.

I don't see a lot of discussion of logging and monitoring in general. How is
the user going to become aware that a file was deleted because of a TTL? Or if
there is an error during the delete, how will the user know? Logging is one
choice here. Creating a file in HDFS is another.

The setTtl command seems reasonable. Does this need to be an administrator
command?

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-09 Thread Zesheng Wu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026047#comment-14026047
]

Zesheng Wu commented on HDFS-6382:
--

Thanks [~cmccabe] for your feedback.
bq. For the MR strategy, it seems like this could be parallelized fairly
easily. For example, if you have 5 MR tasks, you can calculate the hash of each
path, and then task 1 can do all the paths that are 0 mod 5, task 2 can do all
the paths that are 1 mod 5, and so forth. MR also doesn't introduce extra
dependencies since HDFS and MR are packaged together.
You mean that we scan the whole namespace at first and then split it into 5
pieces according to hash of the path, why do we just complete the work during
the first scanning process? If I misunderstand your meaning, please point out.

bq. I don't understand what you mean by the mapreduce strategy will have
additional overheads. What overheads are you foreseeing?
Possible overheads: Starting a mapreduce job needs to split the input, start an
AppMaster, collect result from random machines (Perhaps 'overheads' is not a
proper word here)

bq. I don't understand what you mean by this. What will be done automatically?
Here automatically means we do not have to rely on external tools, the daemon
itself can manage the work well.

bq. How are you going to implement HA for the standalone daemon?
Good point. As you suggested, one approach is save the state in HDFS and simply
restart it when it fails. But managing the state is a complex work, I am
considering how to simplify this. One possible simpler approach is that we can
consider that the daemon is stateless and simply restart it when if fails. We
needn't do checkpoint and just scan from the beginning when it restarts.
Because we can require that the work the daemon does is idempotent, starting
from the beginning will be harmless. Possible drawbacks of the later approach
are that it may waste some time and may delay the work, but they are
acceptable.

bq. I don't see a lot of discussion of logging and monitoring in general. How
is the user going to become aware that a file was deleted because of a TTL? Or
if there is an error during the delete, how will the user know?
For the simplicity purpose, in the initial version, we will use logs to record
which file/directory is deleted by TTL, and errors during the deleting process.

bq. Does this need to be an administrator command?
It doesn't need to be an administrator command, user only can setTtl on
file/directory that they have write permission, and can getTtl on
file/directory that they have read permission.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-05 Thread Hangjun Ye (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018506#comment-14018506
]

Hangjun Ye commented on HDFS-6382:
--

Thanks Colin. We would start to draft a design doc and ask you guys' help to
review.

Yes, the xattrs has saved the big burden for saving the policy, the major
question left is where to run the logic.

Besides these 3 options, another related stuff might be the trash. Currently
trash is implemented as a client-side capability, the trash cleanup logic
(trash emptier) depends on FileSystem to operate namespace and basically is a
client-side function. But the trash emptier runs *inside* NN as a daemon
thread, instead of a separate daemon process. I guess it interacts with NN via
RPC even it runs inside NN.

We could observe some similarities of trash, balancer, and the proposed TTL:
mainly need data from NN; could be implemented as client-side capability (via
RPC); need to be run periodically. So if possible we unify all these stuff in
one framework/daemon? It also echos Haohui's points earlier. And if it's
implemented clearly enough, the user could optionally run it inside NN as a
daemon thread to have less jobs to maintain, as long as the user would like to
take the risk of running additional logic inside NN (w/o changing NN's logic
for this, as it still interacts with NN like a client).

That's just a premature idea, we might still want to have the TTL as a separate
daemon firstly as it's most straight forward. Let's discuss more after we have
the design doc.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-04 Thread Hangjun Ye (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017427#comment-14017427
]

Hangjun Ye commented on HDFS-6382:
--

Thanks Colin! That's exactly what we want. Seems it's on the way to be merged
to branch-2, we will wait for it.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-03 Thread Hangjun Ye (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016335#comment-14016335
]

Hangjun Ye commented on HDFS-6382:
--

Thanks Haohui and Colin.

The balancer or a balancer-like standalone daemon sounds a feasible approach to
us. A special requirement of the TTL cleanup is that we need a persistent
storage to contain all TTL policies set by users, while balancer and DistCp
don't require. It might be nice if the namenode could store such information
then we don't have to find somewhere else.

So just wondering if possible we add an opaque feature in INode to store
arbitrary bytes? NN just stores it, doesn't interpret it. As an analogy, HBase
supports tags to store arbitrary metadata at a cell:
https://issues.apache.org/jira/browse/HBASE-8496

Then we could have external tools/daemon to let end-users set their TTL
policies, and do the cleanup logic. The only change to NN is to add a new
feature and also expose APIs to set/get the feature, complicated and volatile
logic (metadata encoding, interpretation, cleanup) are done outside NN. And the
change might have a much broader usage other than TTL.

Any thoughts?

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-03 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14016805#comment-14016805
]

Colin Patrick McCabe commented on HDFS-6382:

bq. So just wondering if possible we add an opaque feature in INode to store
arbitrary bytes? NN just stores it, doesn't interpret it. As an analogy, HBase
supports tags to store arbitrary metadata at a cell:
https://issues.apache.org/jira/browse/HBASE-8496

It sounds like extended attributes (xattrs) might work here. They were
recently implemented in HDFS-2006 and subtasks. They basically let you
associate some arbitrary key/value pairs with each inode. Check out
https://issues.apache.org/jira/secure/attachment/12644341/HDFS-XAttrs-Design-3.pdf

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-02 Thread Haohui Mai (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015678#comment-14015678
]

Haohui Mai commented on HDFS-6382:
--

I think the comments against implementing it in NN are legit. Popping up one
level, I'm wondering what is the best approach to meet the following
requirements:

# Fine tune the behavior of HDFS, which requires the information from the
internal data structure in HDFS.
# Performing the above task without MapReduce to simplify the operations of the
cluster.

To meet the above requirements, today it looks like to me that there is no way
other than making massive changes in HDFS.

What I'm wondering is that whether it is possible to architect the system to
make things easier. For example, is it possible to generalize the architecture
of the balancer we have today to accomplish these types of tasks? From a very
high level it looks to me that most of the code can sit outside of the NN while
meeting the above requirements. Since this is aiming for advanced usages, there
are more freedoms on the design of the architecture. For instance, the
architecture might choose to expose the details of the implementation and do
not guarantee compatibility (like an Exokernel type of system).

Thoughts?

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-06-02 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015922#comment-14015922
]

Colin Patrick McCabe commented on HDFS-6382:

I'm -1 on the idea of putting this in the NameNode. Let's see if we can work
together to figure out where the best place for it is, though. Can you comment
on why MR is not an option for you? I am concerned that there will be a lot of
wheel reinvention if we don't use MR (authentication, resource management,
scheduling, etc. etc.) Why not do as DistCp does? As Haohui said, another
option is the balancer or a completely standalone daemon.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-30 Thread Hangjun Ye (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013411#comment-14013411
]

Hangjun Ye commented on HDFS-6382:
--

I think we have two discussions here now: a TTL cleanup policy (implemented
inside or outiside NN), and a general mechanism to help implement such a policy
easily inside NN.

I've been convinced that a specific TTL cleanup policy implementation does NOT
sound feasible to fly in core code of NN directly, I'm more interested to
pursuing a mechanism to enable such policy implementation.

Considering HBase having co-processor
(https://blogs.apache.org/hbase/entry/coprocessor_introduction), people could
extend the functionality easily (w/o extending the base classes), such as
counting rows, secondary index. We could argue that most of such usages are NOT
necessarily implemented as server side, but having such a mechanism gives users
an opportunity to choose what is most suitable for their requirements.

If the NN has such an extensible mechanism (as Haohui suggested earlier), we
could implement a TTL cleanup policy in NN in an elegant way (w/o touching the
base classes). And NN has abstracted out the INode.Feature, we could
implement a TTLFeature to hold the meta. The policy implementation doesn't have
to go into community's codebase if it's too specific, we could keep it in our
private branch. But basing on a general mechanism (w/o touching the base
classes) makes it easy to be maintained (considering we would upgrade with new
Hadoop releases regularly).

If you guys think such a general mechanism deserves to be considered, we are
happy to contribute some efforts.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-30 Thread Hangjun Ye (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14013624#comment-14013624
]

Hangjun Ye commented on HDFS-6382:
--

BTW: TTL is one of applications that could benefit from a general mechanism.
Haohui gave several nice use cases that would benefit as well.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-30 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014030#comment-14014030
]

Colin Patrick McCabe commented on HDFS-6382:

Chris, Andrew, and I have brought up a lot of reasons why this probably doesn't
make sense in the NameNode.

Just to summarize:
* security / correctness concerns: it's easy to make a mistake that could bring
down the NameNode or entire FS
* non-generality to systems using s3 or another FS in addition to HDFS
* issues with federation (which NN does the cleanup? How do you decide?)
* complexities surrounding our client-side Trash implementation and our
server-side snapshots
* configuration burden on sysadmins
* inability to change the cleanup code without restarting the NameNode
* HA concerns (need to avoid split-brain or lost updates)
* error handling (where do users find out about errors?)
* semantics: disappearing or time-limited files is an unfamiliar API, not like
the traditional FS APIs we usually implement

Making this pluggable doesn't fix any of those problems, and it adds some more:
* API stability issues (the INode and Feature classes have changed a lot, and
we make no guarantees there)
* CLASSPATH issues (if I want to send an email about a cleanup job with the
FooEmailer library, how do I get that into the NameNode's CLASSPATH?) How do I
avoid jar conflicts?

The only points I've seen raised in favor of doing this in the NameNode are:
* the NameNode already has an authorization system which this could use.
* HBase has coprocessors which also allow loading arbitrary code.

To the first point, there are lots of other ways to deal with authorization,
like by using YARN (which also has authorization), or configuring the cleanup
using files in HDFS.

To the second point, HBase doesn't use coprocessors for cleanup jobs... it uses
them for things like secondary indices, a much better-defined problem. The
functionality you want is not something that should be implemented as a
coprocessor, even if we had those.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-29 Thread Chris Nauroth (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012442#comment-14012442
]

Chris Nauroth commented on HDFS-6382:
-

bq. The implemented mechanism inside the NameNode would (maybe periodically)
execute all policies specified by users, and it would do it as a superuser
safely, as authentication/authorization have been done when user set their
policies to the NameNode.

This logic is subject to time of check/time of use race conditions, possibly
resulting in incorrect deletion of data. For example, imagine the following
sequence:
# A user calls the setttl command on /file1. Authentication is successful, and
the authenticated user is the file owner, so NN decides the user is authorized
to set a TTL.
# An admin changes the owner of /file1 in order to revoke the user's access.
# Now the NN's background expiration thread/job starts running. It finds a TTL
on /file1 and deletes it. Since this is running as the HDFS super-user,
nothing blocks the delete, even though the user who set the TTL really no
longer has permission to delete.

With an external process, authentication and authorization are enforced at the
time of delete for the specific user, so there is no time of check/time of use
race condition, and there is no chance of an incorrect delete.

Running some code as a privileged user might look expedient in some ways, but
it also compromises the file system permissions model somewhat.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-29 Thread Hangjun Ye (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012480#comment-14012480
]

Hangjun Ye commented on HDFS-6382:
--

bq. This logic is subject to time of check/time of use race conditions,
possibly resulting in incorrect deletion of data. For example, imagine the
following sequence: ...

It doesn't sound a race condition to me. We could consider TTL as an
independent attribute of file, just like file owner and replication number.
In the above scenario, it seems to work as expected, since the admin only
changes the owner of /file1 but leaves the TTL attribute as is and so the TTL
should be still effective. If the admin doesn't want it to happen, he/she
should unset the TTL attribute (i.e. set it to infinity) firstly before
he/she changes the owner of /file1, and the new owner of /file1 could set a new
TTL attribute later if needed.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-29 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012551#comment-14012551
]

Colin Patrick McCabe commented on HDFS-6382:

bq. One approach, as you suggested, is we that implement a separate cleanup
platform and users submit their policy to this platform, and we do the real
cleanup action to the HDFS on behalf of users (as a superuser or other powerful
user). But the separate platform has to implement an
authentication/authorization mechanism to make sure the user is who they claim
to be and have the permission (authentication is a must, authorization might be
optional but it'd better have). It's a repeated job as the NameNode has done
with Kerberos/acl If it's implemented inside the NameNode, we could
leverage NameNode's authentication/authorization mechanism.

YARN / MR / etc already have authentication frameworks that you can use. For
example, you can set up a YARN queue with certain permissions so that only
certain users or groups can submit to it.

Another idea is to have an HDFS directory where each group (or user) puts their
files containing the cleanup policies they want, and let HDFS take care of
permissions.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-29 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012591#comment-14012591
]

Andrew Wang commented on HDFS-6382:
---

Even if implementing security was a major hurdle (and I really don't think it
is as hard as you think, considering we have quite a few examples of Hadoop
auth besides the NN), the rest of Chris's points still stand. I also think that
semantically, a TTL is not an expected type of file attribute for those of us
with a Unix background, which leads to TOCTOU issues like Chris also pointed
out even if just because of user expectations.

So, at this point, I think there are strong technical reasons not to implement
this in the NN, and strong reasons to do this type of data lifecycle management
externally.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-28 Thread Hangjun Ye (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010841#comment-14010841
]

Hangjun Ye commented on HDFS-6382:
--

Thanks Haohui for your reply.

Let me confirm I got your point. Your suggestion is that we'd better have a
general mechanism/framework to run a job (maybe periodically) over the
namespace inside the NN, and the TTL policy is just a specific job that might
be implemented by user?
That's an interesting direction, we will think about it.

We are heavy users of Hadoop and also do some in-house improvements per our
business requirement. We definitely want to contribute the improvements back to
community, as long as it's helpful for the community.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-28 Thread Haohui Mai (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010879#comment-14010879
]

Haohui Mai commented on HDFS-6382:
--

bq. Your suggestion is that we'd better have a general mechanism/framework to
run a job (maybe periodically) over the namespace inside the NN, and the TTL
policy is just a specific job that might be implemented by user?

This is correct. There are a couple additional use cases that might be useful
to keep in mind:

# Archiving data. TTL is one of the use case here.
# Backing up or syncing data between clusters. It's nice to back up / to sync
data between clusters for disaster recovery, without running a MR job.
# Balancing data between data nodes.

A mechanism that can support the above use cases can be quite powerful and
improve the state of the art. I'm happy to collaborate if this is the direction
you guys want to pursue.

bq. We are heavy users of Hadoop and also do some in-house improvements per our
business requirement. We definitely want to contribute the improvements back to
community.

This is great to hear. Patches are welcome.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-28 Thread Hangjun Ye (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010891#comment-14010891
]

Hangjun Ye commented on HDFS-6382:
--

Thanks Haohui, that's clear to us now.
That's interesting and we'd like to pursue the more general approach.
We will take time to work out a rough design and ask you guys to review.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-28 Thread Chris Nauroth (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011305#comment-14011305
]

Chris Nauroth commented on HDFS-6382:
-

bq. ...run a job (maybe periodically) over the namespace inside the NN...

Please correct me if I misunderstood, but this sounds like execution of
arbitrary code inside the NN process. If so, this opens the risk of resource
exhaustion at the NN by buggy or malicious code. Even if there is a fork for
process isolation, it's still sharing machine resources with the NN process.
If the code is running as the HDFS super-user, then it has access to sensitive
resources like the fsimage file. If multiple such in-process jobs are
submitted concurrently, then it would cause resource contention with the main
work of the NN. Multiple concurrent jobs also gets into the realm of
scheduling. There are lots of tough problems here that would increase the
complexity of the NN.

Even putting that aside, I see multiple advantages in implementing this
externally instead of embedded inside the NN. Here is a list of several
problems that an embedded design would need to solve, and which I believe are
already easily addressed by an external design. This includes/expands on
issues brought up by others in earlier comments too.

* Trash: The description mentions trash capability as a requirement. Trash
functionality is currently implemented as a client-side capability.
** Embedded: We'd need to reimplement trash inside the NN, or heavily refactor
for code sharing.
** External: The client already has the trash capability, so this problem is
already solved.
* Integration: Many Hadoop deployments use an alternative file system like S3
or Azure storage. In these deployments, there is no NameNode.
** Embedded: The feature is only usable for HDFS-based deployments. Users of
alternative file systems can't use the feature.
** External: The client already has the capability to target any Hadoop file
system implementation, so this problem is already solved.
* HA: In the event of a failover, we must guarantee that the former active NN
does not drive any expiration activity.
** Embedded: Any background thread or in-process jobs running inside the NN
must coordinate shutdown during a failover.
** External: Thanks to our client-side retry policies, an external process
automatically transitions to the new active NN after a failover, and there is
no risk of split-brain scenario, so this problem is already solved.
* Authentication/Authorization: Who exactly is the effective user running the
delete, and how do we manage their login and file permission enforcement?
** Embedded: You mention there is an advantage to running embedded, but I
didn't quite understand. Are you suggesting running the deletes inside a
{{UserGroupInformation#doAs}} for the specific user?
** External: The client already knows how to authenticate RPC, and the NN
already knows how to enforce authorization on files for that authenticated
user, so this problem is already solved.
* Error Handling: How do users find out when the deletes don't work?
** Embedded: There is no mechanism for asynchronous user notification inside
the NN. As others have mentioned, there is a lot of complexity in this area.
If it's email, then you need to solve the problem of reliable email delivery
(i.e. retries if SMTP gateways are down). If it's monitoring/alerting, then
you need to expose new monitoring endpoints to publish sufficient information.
** External: The client's exception messages are sufficient to identify file
paths that failed during synchronous calls, and the NN audit log is another
source of troubleshooting information, so this problem is already solved.
* Federation: With federation, the HDFS namespace is split across multiple
NameNodes.
** Embedded: The design needs to coordinate putting the right expiration work
on the right NN hosting that part of the namespace.
** External: The client has the capability to configure a client-side mount
table that joins together multiple federated namespaces, and {{ViewFileSystem}}
then routes RPC to the correct NN depending on the target file path, so this
problem is already solved.

HDFS File/Directory TTL
---

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-28 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011357#comment-14011357
]

Colin Patrick McCabe commented on HDFS-6382:

I agree with Chris' comments here. There are just so many advantages to
running outside the NameNode, that I think that's the design we should start
with. If we later find something that would work better with NN support, we
can think about it then.

Hangjun Ye wrote:
bq. Another benefit to having it inside NN is we don't have to handle the
authentication/authorization problem in a separate system. For example we have
a shared HDFS cluster for many internal users, we don't want someone to set TTL
policy to other one's files. NN could handle it easily by its own
authentication/authorization mechanism.

The client handles authentication/authorization very well, actually. You can
choose to run your cleanup job as superuser (can do anything) or some other
less powerful user who is limited (safer). But when you run inside the
NameNode, there are no safeguards... everything is effectively superuser. And
you can destroy or corrupt the entire filesystem very easily that way,
especially if your cleanup code is buggy.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-28 Thread Hangjun Ye (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012008#comment-14012008
]

Hangjun Ye commented on HDFS-6382:
--

Thanks Chris and Colin for your valuable comments, I'd like to address your
concern about the security problem.

Firstly our scenario is as following:
We have a Hadoop cluster shared by multiple teams for their storage and
computation requirement and we are the dev/supporting team to ensure the
functionality and availability of the cluster. The cluster is security enabled
to ensure every team could only access the files that they should. So every
team is a common user of the cluster and we own the superuser.

Currently several teams have the requirement to clean up files based on TTL
policy. Obviously they could have cron job to do that by themselves but it
would have many repeated jobs, so we'd better have a mechanism to let them to
specify/implement their policy easily.

One approach, as you suggested, is we that implement a separate cleanup
platform and users submit their policy to this platform, and we do the real
cleanup action to the HDFS on behalf of users (as a superuser or other powerful
user). But the separate platform has to implement an
authentication/authorization mechanism to make sure the user is who they claim
to be and have the permission (authentication is a must, authorization might be
optional but it'd better have). It's a repeated job as the NameNode has done
with Kerberos/acl.

If it's implemented inside the NameNode, we could leverage NameNode's
authentication/authorization mechanism. For example we provide a ./bin/hdfs
dfs -setttl path/file command (just like -setrep). Users could specify their
policy by it and the NameNode should persist it somewhere, maybe as an
attribute of file like replication number. The implemented mechanism inside the
NameNode would (maybe periodically) execute all policies specified by users,
and it would do it as a superuser safely, as authentication/authorization have
been done when user set their policies to the NameNode.

To address several detailed concerns you raised:
* buggy or malicious code: The proposed concept (actually Haohui proposed)
should be pretty similar to HBase's coprocessor
(http://hbase.apache.org/book.html#cp), it's a plug-in or extension of NameNode
and most likely enabled at deployment time. A common user can't submit it, the
cluster owner could do. So the code is not arbitrary and the quality/safety
could be guaranteed.

* Who exactly is the effective user running the delete, and how do we manage
their login and file permission enforcement: the extension is run as
superuser/system, a specific extension implementation could do any permission
enforcement if needed. For the TTL-based cleanup policy executor, no
permission enforcement is needed at this stage as authentication/authorization
have been done when user set policy.

I think the idea proposed by Haohui is to have an extensible mechanism in
NameNode to run jobs which intensively depend on namespace data, and make the
specific job's code as de-coupled from NameNode's core code as possible. For
certain it's not easy, as Chris pointed out several problems, like HA and
concurrency, but it might deserve to be thought about.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-27 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010053#comment-14010053
]

Colin Patrick McCabe commented on HDFS-6382:

bq. But if there's no internal cleanup mechanism of HDFS, all users(across
companies) need to write their own cleanup tools respectively, lots of repeated
work.

Like I said, we should write such a tool and add it to the base Hadoop
distribution. This is similar to what we did with {{DistCp}}. Then users
would not need to write their own versions of this stuff.

It's important to distinguish between creating a tool to handle deleting old
files (which we all agree we should do), and putting this into the NameNode
(which seems questionable).

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-27 Thread Zesheng Wu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010520#comment-14010520
]

Zesheng Wu commented on HDFS-6382:
--

bq. Like I said, we should write such a tool and add it to the base Hadoop
distribution. This is similar to what we did with DistCp. Then users would not
need to write their own versions of this stuff.
Sure, this is another good option.

bq. It's important to distinguish between creating a tool to handle deleting
old files (which we all agree we should do), and putting this into the NameNode
(which seems questionable).
Why do you think that putting the cleanup mechanism into the NameNode seems
questionable, can you point out some details?

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-27 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010632#comment-14010632
]

Colin Patrick McCabe commented on HDFS-6382:

bq. Why do you think that putting the cleanup mechanism into the NameNode seems
questionable, can you point out some details?

Andrew and Chris commented about this earlier. See:
https://issues.apache.org/jira/browse/HDFS-6382?focusedCommentId=13998933page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13998933

I would add to that:
* Every user of this is going to want a slightly different deletion policy.
It's just way too much configuration for the NameNode to reasonably handle.
Much easier to do it in a user process. For example, maybe you want to keep at
least 100 GB of logs, 100 GB of foo data, and 1000 GB of bar data. It's
easy to handle this complexity in a user process, incredibly complex and
frustrating to handle it in the NameNode.
* Your nightly MR job (or whatever) also needs to be able to do things like
email sysadmins when the disks are filling up, which the NameNode can't
reasonably be expected to do.
* I don't see a big advantage to doing this in the NameNode, and I see a lot of
disadvantages (more complexity to maintain, difficult configuration, need to
restart to update config)

Maybe I could be convinced otherwise, but so far the only argument that I've
seen for doing it in the NN is that it would be re-usable. And this could just
as easily apply to an implementation outside the NN. For example, as I pointed
out earlier, DistCp is reusable, without being in the NameNode.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-27 Thread Jian Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010657#comment-14010657
]

Jian Wang commented on HDFS-6382:
-

I think it is better for you to provide a (backup clean up ) platform for
your user ，you can implement a lot of clean up strategy for your users in your
company.
This can reduce a lot of repeated jobs.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-27 Thread Hangjun Ye (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010734#comment-14010734
]

Hangjun Ye commented on HDFS-6382:
--

Implementing it outside NN is definitely another option, and I agree with Colin
that it's not feasible to implement a complex clean up policy (like based on
storage space) inside NN.

TTL is a very simple (but general) policy and we might even consider it as an
attribute of file, like the number of replicas. Seems it wouldn't introduce
much complexity to handle it in the NN.

Another benefit to having it inside NN is we don't have to handle the
authentication/authorization problem in a separate system. For example we have
a shared HDFS cluster for many internal users, we don't want someone to set TTL
policy to other one's files. NN could handle it easily by its own
authentication/authorization mechanism.

So far a TTL-based clean up policy is good enough for our scenario (Zesheng and
I are from the same company and we are supporting our company's internal usage
for Hadoop) and it's would be nice to have a simple and workable solution in
HDFS.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-27 Thread Haohui Mai (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010820#comment-14010820
]

Haohui Mai commented on HDFS-6382:
--

bq. TTL is a very simple (but general) policy and we might even consider it as
an attribute of file, like the number of replicas. Seems it wouldn't introduce
much complexity to handle it in the NN.

bq. Another benefit to having it inside NN is we don't have to handle the
authentication/authorization problem in a separate system. For example we have
a shared HDFS cluster for many internal users, we don't want someone to set TTL
policy to other one's files. NN could handle it easily by its own
authentication/authorization mechanism.

I agree that running jobs of the namespace without MR should be the direction
to go. However, I think the main hold back here is that the design mixes the
mechanism (running jobs of the namespace without MR) and the policy (TTL)
together.

As [~cmccabe] pointed out earlier, every user has his / her own policy.
Provided that HDFS has a wide range of users, this type of design /
implementation is unlikely to fly in the ecosystem.

Currently HDFS does not have the above mechanism, you're more than welcomed to
contribute a patch.

HDFS File/Directory TTL
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-20 Thread Colin Patrick McCabe (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14003783#comment-14003783
]

Colin Patrick McCabe commented on HDFS-6382:

I don't think a nightly (or weekly) cleanup job that lives outside HDFS is that
difficult or complex to write. If it were done as a MapReduce job, it could
easily work on the whole cluster. This is something we could consider putting
upstream.

Another issue to consider here is snapshots. Deleting files is not going to
free space if they exist in a snapshot.

HDFS File/Directory TTL
---

Key: HDFS-6382
URL: https://issues.apache.org/jira/browse/HDFS-6382
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-20 Thread Zesheng Wu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14004242#comment-14004242
]

Zesheng Wu commented on HDFS-6382:
--

Thanks [~cmccabe], it's sure that an outside cleanup tool is not too difficult
or complex to implement, and there are many ways which can satisfy the
requirements. But if there's no internal cleanup mechanism of HDFS, all
users(across companies) need to write their own cleanup tools respectively,
lots of repeated work. Suppose that if HDFS can support an internal cleanup
mechanism, it's sure that this will be more convenient, do you agree this?

About snapshots, I think the behavior of a snapshotted file which is deleted by
the TTL mechanism is just the same as the behavior of a snapshotted file which
is deleted by a user manually.

HDFS File/Directory TTL
---

Key: HDFS-6382
URL: https://issues.apache.org/jira/browse/HDFS-6382
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-16 Thread Chris Nauroth (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998933#comment-13998933
]

Chris Nauroth commented on HDFS-6382:
-

I agree with Andrew's opinion that this is better implemented outside the file
system. An automatic delete based on a TTL introduces a high risk of
concurrency bugs for applications. For example, imagine a MapReduce job gets
submitted, we derive input splits from a file, and then the file expires after
input split calculation but before the map tasks start running and reading the
blocks. Overall, I think it's preferable to put delete into the hands of the
calling application for explicit control.

HDFS File/Directory TTL
---

Key: HDFS-6382
URL: https://issues.apache.org/jira/browse/HDFS-6382
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-16 Thread Zesheng Wu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999746#comment-13999746
]

Zesheng Wu commented on HDFS-6382:
--

Thanks [~cnauroth], I agree with your example MapReduce scenario and the risk,
but this risk can't be avoided even if we use outside tools. For example, we
use a nightly cron job just like Andrew mentioned, imagine a MapReduce job gets
submitted, we derive input splits from a file, and then the file is deleted by
the cron job after input split calculation but before the map tasks start
running and reading the blocks, the risk is the same. What I want to declare is
that the TTL is just a convenient way to finish tasks like I described in the
proposal, the users should learn how to use it and use it correctly, rather
than use a complicated way and there's no obvious advantage.

HDFS File/Directory TTL
---

Key: HDFS-6382
URL: https://issues.apache.org/jira/browse/HDFS-6382
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-15 Thread Andrew Wang (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997794#comment-13997794
]

Andrew Wang commented on HDFS-6382:
---

This is just my opinion, but isn't this something better done in userspace? A
nightly cron job could do this for you, and log files are typically even
already timestamped for easy parsing and removal.

HDFS File/Directory TTL
---

Key: HDFS-6382
URL: https://issues.apache.org/jira/browse/HDFS-6382
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-6382) HDFS File/Directory TTL

2014-05-15 Thread Zesheng Wu (JIRA)

[
https://issues.apache.org/jira/browse/HDFS-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998374#comment-13998374
]

Zesheng Wu commented on HDFS-6382:
--

Thanks [~andrew.wang]. Of course a nightly cron job can do this for us, but
there're various kinds of data backup requirements from our users, log backup
is just one of them, we just want to supply a more convenient way for our users
to satisfy their requirements. Imagine that we have lots of backup
requirements, each have different TTL configuration, one way to achieve this is
each user maintains his own cron job, and the other is the cluster
administrator maintains all the cron jobs for all users, both these two ways
are not very convenient and need lots of manual operation work. If HDFS can
support TTL as I proposed above, these requirements will be satisfied very
easily.

HDFS File/Directory TTL
---

Key: HDFS-6382
URL: https://issues.apache.org/jira/browse/HDFS-6382
Project: Hadoop HDFS
Issue Type: Improvement
Components: hdfs-client, namenode
Affects Versions: 2.4.0
Reporter: Zesheng Wu

--
This message was sent by Atlassian JIRA
(v6.2#6252)

47 matches

Mail list logo