[jira] [Commented] (HDFS-14038) Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate
[ https://issues.apache.org/jira/browse/HDFS-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669087#comment-16669087 ] Marcelo Vanzin commented on HDFS-14038: --- No, the problem here is not the use of reflection. That is needed because Spark still has to build against Hadoop 2, which doesn't have that API. The issue raised in that comment is that the method Spark uses is in a LimitedPrivate / Unstable API. Which means it can break at any time. For example, a better approach would be to have a method in {{FSDataOutputStreamBuilder}}, which is marked as public. In fact, there's already {{replication()}}, to set the replication factor, but it doesn't seem related to the {{replicate()}} method in HdfsDataOutputStreamBuilder. Maybe they should be merged. > Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate > - > > Key: HDFS-14038 > URL: https://issues.apache.org/jira/browse/HDFS-14038 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Xiao Chen >Priority: Major > > In SPARK-25855 / > https://github.com/apache/spark/pull/22881#issuecomment-434359237, Spark > prefer to create Spark event log files with replication (instead of EC). To > do this currently, it has to be done by some casting / reflection, to get a > DistributedFileSystem object (or use the {{HdfsDataOutputStreamBuilder}} > subclass of it). > We should officially expose this for Spark's usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12534) Provide logical BlockLocations for EC files for better split calculation
[ https://issues.apache.org/jira/browse/HDFS-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16177425#comment-16177425 ] Marcelo Vanzin commented on HDFS-12534: --- bq. Are you sure we can split within a single S3 file? Location != split. You can have x splits all with the same location. I'm pretty sure reading from a single s3 file using FileInputFormat generates multiple tasks (one per "split"). You may want to look at how it does that, it might be all client-side based on some client-side configuration. > Provide logical BlockLocations for EC files for better split calculation > > > Key: HDFS-12534 > URL: https://issues.apache.org/jira/browse/HDFS-12534 > Project: Hadoop HDFS > Issue Type: Bug > Components: erasure-coding >Affects Versions: 3.0.0-beta1 >Reporter: Andrew Wang > Labels: hdfs-ec-3.0-must-do > > I talked to [~vanzin] and [~alex.behm] some more about split calculation with > EC. It turns out HDFS-1 was resolved prematurely. Applications depend on > HDFS BlockLocation to understand where the split points are. The current > scheme of returning one BlockLocation per block group loses this information. > We should change this to provide logical blocks. Divide the file length by > the block size and provide suitable BlockLocations to match, with virtual > offsets and lengths too. > I'm not marking this as incompatible, since changing it this way would in > fact make it more compatible from the perspective of applications that are > scheduling against replicated files. Thus, it'd be good for beta1 if > possible, but okay for later too. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985707#comment-13985707 ] Marcelo Vanzin commented on HDFS-6293: -- [~ajisakaa] my modified parser does not have an upper-bound on the memory; because it still needs to load information about all inodes, it's still O(n) for the number of inodes in the image. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker Attachments: Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages
[ https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983288#comment-13983288 ] Marcelo Vanzin commented on HDFS-6293: -- Hi Kihwal, We have developed some code internally that mitigates (but does not eliminate) some of these problems. For an image with 140M entries it would need in the ballpark of 7-8GB of heap space, from my pencil-and-napkin calculations. Also, it does not generate entries in order like LsrPBImage does, and it's tailored for the use case of listing the contents of the file system (so it completely ignores things like snapshots). (The reason it still requires a lot of memory is, as you note, that it needs to load information about all inodes in memory; our code is just a little smarter about what information it loads. I don't think it's possible to make it much better without changing the data in the fsimage itself.) If people are ok with those limitations, we could clean up our code and post it as a patch. Issues with OIV processing PB-based fsimages Key: HDFS-6293 URL: https://issues.apache.org/jira/browse/HDFS-6293 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.4.0 Reporter: Kihwal Lee Priority: Blocker Attachments: Heap Histogram.html There are issues with OIV when processing fsimages in protobuf. Due to the internal layout changes introduced by the protobuf-based fsimage, OIV consumes excessive amount of memory. We have tested with a fsimage with about 140M files/directories. The peak heap usage when processing this image in pre-protobuf (i.e. pre-2.4.0) format was about 350MB. After converting the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of heap (max new size was 1GB). It should be possible to process any image with the default heap size of 1.5GB. Another issue is the complete change of format/content in OIV's XML output. I also noticed that the secret manager section has no tokens while there were unexpired tokens in the original image (pre-2.4.0). I did not check whether they were also missing in the new pb fsimage. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HDFS-4601) DistCp's ThrottledInputStream leaks.
Marcelo Vanzin created HDFS-4601: Summary: DistCp's ThrottledInputStream leaks. Key: HDFS-4601 URL: https://issues.apache.org/jira/browse/HDFS-4601 Project: Hadoop HDFS Issue Type: Bug Components: tools Reporter: Marcelo Vanzin DistCp's ThrottledInputStream.java leaks the wrapped input stream because it does not override InputStream.close(). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HDFS-3680: - Attachment: hdfs-3680-v10.patch rebased + corrected typo. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v10.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, hdfs-3680-v6.patch, hdfs-3680-v7.patch, hdfs-3680-v8.patch, hdfs-3680-v9.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HDFS-3680: - Attachment: hdfs-3680-v9.patch Re-based and re-tested (a.k.a. ping). Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, hdfs-3680-v6.patch, hdfs-3680-v7.patch, hdfs-3680-v8.patch, hdfs-3680-v9.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470508#comment-13470508 ] Marcelo Vanzin commented on HDFS-3680: -- bq. Why is the following code part of this patch? See my previous response: I'm creating a FileStatus object based on an HdfsFileStatus, which is a private audience class and thus cannot be used in the public audience AuditLogger. bq. Given loggers are going to some kind of io, to database, or some server etc. IOException should be expected and seems like a logical thing to throw and not RunTimeException. You're making assumptions about the implementation of the logger. Why would it throw IOException and not SQLException? What if my logger doesn't do any I/O in the thread doing the logging at all? Saying throws IOException would just make implementors wrap whatever is the real exception being thrown in an IOException instead of a RuntimeException, to no benefit I can see. If FSNamesystem should handle errors from custom loggers, it should handle all errors, not just specific ones. bq. I do not think it is outside the scope of this patch. Current logger could fail, on system failures. However here it may fail because poorly written code We can't prevent badly written code from doing bad things (also see previous comments from ATM that this is not an interface that people will be implementing willy nilly - people who'll touch it are expected to know what they are doing). The reason I say it's out of the scope of this patch is because it's a change in the current behavior that's unrelated to whether you have a custom audit logger or not; if audit logs should cause the name node to shut down, that's a change that needs to be made today, right now, independent of this patch going in or not. . hdfs-default.xml document does not cover my previous comment It covers details related to configuration. Details about what the implementation is expected to do should be (and are) documented in the interface itself, which is the interface that the person writing the implementation will be looking at. If you're talking about the what about when things don't work correctly part, I'll wait for closure on the other comments. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, hdfs-3680-v6.patch, hdfs-3680-v7.patch, hdfs-3680-v8.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470589#comment-13470589 ] Marcelo Vanzin commented on HDFS-3680: -- bq. That said if you do not want to throw IOException, which I recommend, is fine. However you need to make sure you throw a checked exception from this interface, to design it correctly, to force the caller to handle the error condition. RTE which typically indicates programming error is not the choice. My main point in resisting declaring any checked exception here is that not doing that makes it clear that loggers are not expected to throw exceptions, and thus should avoid doing so whenever possible. From FSNamesystem's point of view, there's no point in handling checked and unchecked exceptions differently; and audit log failure should be handled the same way regardless of the nature of the failure, unless you want to do crazy things like have a RetriableFailureException to indicate that FSNamesystem can call the method again to see if it works. bq. I disagree. You are allowing audit logger to be pluggable in this patch. What is the impact of making logging pluggable and how namenode must deal with is the relevant questions for this patch. Not sure you looked at this comment from Todd, that I agree with: I'm making it easier, but as I've mentioned before, it's already possible to use custom loggers by doing it at the log system layer (e.g. a custom log4j appender; just use http://wiki.apache.org/logging-log4j/JDBCAppenderConfiguration, for example, and you have many more failure cases than just writing to a local file). That's why I say that if you're worried about that case, that particular change should be made in the current code regardless of this patch. But if you feel strongly about it, I can add a try...catch that just shuts down the namenode. (BTW, unless log4j itself shuts down the process by calling System.exit(), which my experience says is not the case, I don't see where this shutdown on audit log error code is.) bq. Other thing I have not completely thought through but bother me is, when these failures happen, a failure response is sent to the client, though it is successful on Namenode and is recorded on editlog. I would like others opinion on this. That's independent of this patch (ignoring the broken audit logger argument). The problem is that the audit log is written after the operation succeeds; to fix that, the permission check and implementation of each operation should be separated, so that you could write the audit log before executing the operation. bq. The side effects of this configuration change should be documented for an administrator so he understands the impact of this. What side effects are you talking about? The only one would be if the audit logger fails, namenode will shut down, if that behavior is implemented. Otherwise, any side-effect is implementation-specific and it's impossible to say anything other than there might be side effects. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, hdfs-3680-v6.patch, hdfs-3680-v7.patch, hdfs-3680-v8.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470788#comment-13470788 ] Marcelo Vanzin commented on HDFS-3680: -- Just to end the what does log4j do discussion, here's what happens when log4j appenders throw exceptions/errors: 2012-10-05 16:32:05,748 WARN org.apache.hadoop.ipc.Server: IPC Server handler 7 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 172.29.110.222:48751: error: java.lang.RuntimeException: testing exception handling. java.lang.RuntimeException: testing exception handling. at test.AppenderBase.append(AppenderBase.java:72) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:199) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logAuditEvent(FSNamesystem.java:258) 2012-10-05 16:32:08,820 WARN org.apache.hadoop.ipc.Server: IPC Server handler 11 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 172.29.110.222:48754: error: java.lang.Error: testing error handling. java.lang.Error: testing error handling. at test.AppenderBase.append(AppenderBase.java:74) at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251) at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66) at org.apache.log4j.Category.callAppenders(Category.java:206) at org.apache.log4j.Category.forcedLog(Category.java:391) at org.apache.log4j.Category.log(Category.java:856) at org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:199) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logAuditEvent(FSNamesystem.java:258) log4j appenders in general use log4j's ErrorHandler when errors occur, instead of throwing exceptions. The default one (http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/helpers/OnlyOnceErrorHandler.html) just drops errors after printing the first one to stderr. So the NameNode is not shut down; it just keeps running, and audit logs are silently dropped. Which is another reason why I'll maintain that the what to do when logging an audit fails issue is not particular to my patch. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, hdfs-3680-v6.patch, hdfs-3680-v7.patch, hdfs-3680-v8.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469591#comment-13469591 ] Marcelo Vanzin commented on HDFS-3680: -- Hi Suresh, I think I addressed all your concerns (will upload new patch after testing). bq. what is the reason symlink is being done in logAuditEvent? Why is it a part of this jira? What do you mean symlink is being done? I'm not creating any symlinks. I'm creating a FileStatus object based on an HdfsFileStatus, which is a private audience class and thus cannot be used in the public audience AuditLogger. bq. How does one add DefaultAuditLogger with a custom audit loggers? How does isAuditEnabled() method work if you add an ability to setup DefaultAuditLogger? I hope I addressed this in the new documentation. For your second question, you should take a look at the isDefaultAuditLogger in FSNamesystem.java. bq. FSNamesystem#auditLog should be moved to DefaultAuditLogger. Since that field is public and used in other places, I'd rather not touch that. bq. Should AuditLogger#logAuditEvent consider throwing IOException to indicate error? IOException would be one of many exceptions custom audit loggers could throw. So I don't see why it should be special-cased here. My opinion is that audit loggers in general shouldn't throw exceptions; if they really want to, having to throw a RuntimeException indicates that it's not really an expected case. bq. Sorry I have not caught up all the comments - what is the final decision on how to handle logger errors? Currently the client gets an exception when logAuditEvent fails. That does not seem to be correct. I don't think there was an ultimate decision; my current patch follows the things work just like before approach: currently, if the logging system throws some kind of exception, it fails the request, just like the new code does. If that's not desired, then it can be changed, but I think that's outside the scope of this patch. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, hdfs-3680-v6.patch, hdfs-3680-v7.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HDFS-3680: - Attachment: hdfs-3680-v8.patch Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, hdfs-3680-v6.patch, hdfs-3680-v7.patch, hdfs-3680-v8.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums
[ https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450006#comment-13450006 ] Marcelo Vanzin commented on HDFS-3889: -- A couple of comments on this bug, for context: Checksums are checked in two spots in the code: * When deciding whether to perform a copy if the -update option is used * When checking that a copy succeeded (that's the code changed in HDFS-3054). I think error checking should behave differently for each of those. In the first case, we're interested in whether a copy should be made or not; if we fail to read the checksum, I think the best would be to treat that as an indication that the file should be copied to the destination (which is what the code does today). In the second case, if we fail to read the checksum, it means we can't verify that the copy is correct. That case, I think, should result in an exception. As a plus, I think the -skipcrccheck option should not apply to the first case above (deciding whether to update a remote file). CRCs should always be checked in that case; otherwise the equality check is simply based on file size and block size, which I don't think is enough to say the files are the same. distcp overwrites files even when there are missing checksums - Key: HDFS-3889 URL: https://issues.apache.org/jira/browse/HDFS-3889 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.2.0-alpha Reporter: Colin Patrick McCabe Priority: Minor If distcp can't read the checksum files for the source and destination files-- for any reason-- it ignores the checksums and overwrites the destination file. It does produce a log message, but I think the correct behavior would be to throw an error and stop the distcp. If the user really wants to ignore checksums, he or she can use {{-skipcrccheck}} to do so. The relevant code is in DistCpUtils#checksumsAreEquals: {code} try { sourceChecksum = sourceFS.getFileChecksum(source); targetChecksum = targetFS.getFileChecksum(target); } catch (IOException e) { LOG.error(Unable to retrieve checksum for + source + or + target, e); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums
[ https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450031#comment-13450031 ] Marcelo Vanzin commented on HDFS-3889: -- bq. What if the source and destination clusters have different checksum types, or one of the checksums is missing? That means that you can't reasonably detect whether both files are equal, so the code should fall back to the safe path, which is to assume they are not equal and that a copy should be performed. Since manually computing the checksums (by reading both source and destination files) and just copying the file would be about the same performance-wise, it should be fine. -update is an optimization to avoid copying redundant data. Nothing will break if you just overwrite the target data with the source, it will just be slower than if the checksum checks were possible. distcp overwrites files even when there are missing checksums - Key: HDFS-3889 URL: https://issues.apache.org/jira/browse/HDFS-3889 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.2.0-alpha Reporter: Colin Patrick McCabe Priority: Minor If distcp can't read the checksum files for the source and destination files-- for any reason-- it ignores the checksums and overwrites the destination file. It does produce a log message, but I think the correct behavior would be to throw an error and stop the distcp. If the user really wants to ignore checksums, he or she can use {{-skipcrccheck}} to do so. The relevant code is in DistCpUtils#checksumsAreEquals: {code} try { sourceChecksum = sourceFS.getFileChecksum(source); targetChecksum = targetFS.getFileChecksum(target); } catch (IOException e) { LOG.error(Unable to retrieve checksum for + source + or + target, e); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums
[ https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450052#comment-13450052 ] Marcelo Vanzin commented on HDFS-3889: -- bq. In the absence of CRCs, it should also be based on modtime and other file metadata, not just size. If the goal is to just provide the same functionality as rsync, then sure. Although I consider those less reliable (or just as bad) as file size alone. They require the metadata to be kept in sync between source and destination, something that I don't think is very common for mod time or access time, for example. distcp overwrites files even when there are missing checksums - Key: HDFS-3889 URL: https://issues.apache.org/jira/browse/HDFS-3889 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.2.0-alpha Reporter: Colin Patrick McCabe Priority: Minor If distcp can't read the checksum files for the source and destination files-- for any reason-- it ignores the checksums and overwrites the destination file. It does produce a log message, but I think the correct behavior would be to throw an error and stop the distcp. If the user really wants to ignore checksums, he or she can use {{-skipcrccheck}} to do so. The relevant code is in DistCpUtils#checksumsAreEquals: {code} try { sourceChecksum = sourceFS.getFileChecksum(source); targetChecksum = targetFS.getFileChecksum(target); } catch (IOException e) { LOG.error(Unable to retrieve checksum for + source + or + target, e); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums
[ https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450075#comment-13450075 ] Marcelo Vanzin commented on HDFS-3889: -- bq. I believe that the modification time is set based on the NN, not the clients. So nothing needs to be kept in sync. You have two NNs. The metadata on the the target NN needs to be in sync with the source NN for the metadata-based check to do the right thing. In the end, my opinion is just that metadata-based checks are a very poor substitute for checksums, and can much more easily generate false positives (i.e. say that files are equal when they're not). But if it's a feature that people find useful, why not. The false negative case is not such a big problem, since it would just waste bandwidth by forcing the copy. distcp overwrites files even when there are missing checksums - Key: HDFS-3889 URL: https://issues.apache.org/jira/browse/HDFS-3889 Project: Hadoop HDFS Issue Type: Bug Components: tools Affects Versions: 2.2.0-alpha Reporter: Colin Patrick McCabe Priority: Minor If distcp can't read the checksum files for the source and destination files-- for any reason-- it ignores the checksums and overwrites the destination file. It does produce a log message, but I think the correct behavior would be to throw an error and stop the distcp. If the user really wants to ignore checksums, he or she can use {{-skipcrccheck}} to do so. The relevant code is in DistCpUtils#checksumsAreEquals: {code} try { sourceChecksum = sourceFS.getFileChecksum(source); targetChecksum = targetFS.getFileChecksum(target); } catch (IOException e) { LOG.error(Unable to retrieve checksum for + source + or + target, e); } {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3865) TestDistCp is @ignored
[ https://issues.apache.org/jira/browse/HDFS-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1337#comment-1337 ] Marcelo Vanzin commented on HDFS-3865: -- I haven't looked at the code in detail, but is this a case of TestDistCp being replaced by the other tests? e.g. TestDistCp.testUniforSizeDistCp() vs. TestUniformSizeInputFormat. TestDistCp is @ignored -- Key: HDFS-3865 URL: https://issues.apache.org/jira/browse/HDFS-3865 Project: Hadoop HDFS Issue Type: Test Components: tools Affects Versions: 2.2.0-alpha Reporter: Colin Patrick McCabe Priority: Minor We should fix TestDistCp so that it actually runs, rather than being ignored. {code} @ignore public class TestDistCp { private static final Log LOG = LogFactory.getLog(TestDistCp.class); private static ListPath pathList = new ArrayListPath(); ... {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HDFS-3680: - Attachment: hdfs-3680-v7.patch Made AuditLogger interface public, avoiding usage of UserGroupInformation and HdfsFileStatus classes. Only concern with this change is that exposing just the user name misses a lot of information on the principal held by UGI; don't know how interesting that information is for audit logs, though, since the service configuration should provide a lot of the details (like auth type, realm, etc). Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, hdfs-3680-v6.patch, hdfs-3680-v7.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HDFS-3680: - Status: Patch Available (was: Open) Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, hdfs-3680-v6.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HDFS-3680: - Attachment: hdfs-3680-v6.patch Change code to let exceptions from audit loggers propagate to the caller methods. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, hdfs-3680-v6.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427470#comment-13427470 ] Marcelo Vanzin commented on HDFS-3680: -- Hi Daryn, bq. The NN has an option for an external logger daemon So, this implies some sort of IPC, either to another local process or possibly even a remote one. Which raises different questions, some of which have been asked before: * Are audit logs written synchronously? * If yes, is the extra latency for the RPC call acceptable? * If no, how do we reliably know that audit logs have been written? * What's an acceptable timeout for the RPC call? And it doesn't really answer the question of what to do on failure, although the consensus there seems to be to abort. I'm not against this solution per se, I just don't think that, in the end, it's much different than allowing 3rd party code into the NN. I'll refer back to ATM's comment that we expect people who are writing audit loggers to know what they're doing. On a different note, I played with the let exceptions flow approach, and it seems that the NN catches the exception at some point and morphs it into a RemoteException. I actually like that better than shutting down the NN; it fails the particular operation, without shutting down the NN itself, which allows the logger to recover if it can. Of course a really broken logger will just keep throwing exceptions, but then IMO the user is asking for it by installing a broken custom logger. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427550#comment-13427550 ] Marcelo Vanzin commented on HDFS-3680: -- bq. Sure it is, as noted before, it buffers the NN against 3rd party code calling exit, segfaulting in JNI, memory and fd leaks, OOM, etc. If the daemon misbehaves or blows up then it shouldn't compromise the integrity of the namespace. The consensus seems to be that if you can't log an audit event the NN should blow up (either fail the request or just shut itself down entirely). So even though the daemon is in a different address space, in that view it would still affect the functionality of the NN. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HDFS-3680: - Status: Open (was: Patch Available) Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426859#comment-13426859 ] Marcelo Vanzin commented on HDFS-3680: -- Any idea why the Hadoop QA bot hasn't picked up the new patch? Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426944#comment-13426944 ] Marcelo Vanzin commented on HDFS-3680: -- Answers inline. bq. With this approach, should namenode run if for some reason we have removed all the audit loggers from the list? My answer would be no, given the importance of audit log. bq. Does the system need a mechanism to add/remove audit loggers? When a failed logger is fixed, do we need a way to refresh the audit logger so it is picked up by the Namenode again? Being of the opinion that making it a list wasn't really needed to start with (I can't really see a scenario where you'd have more than one custom logger, which is what my original patch did), I don't think the NameNode should be stopped. Removing the audit logger from the list means the audit logger is buggy, which means that we'd be stopping the NameNode because code outside the control of the NameNode did something bad, which goes back to previous discussions about how this can end up being blamed on the HDFS code while it's not HDFS's fault. I could, though, always fall back to having the default logger in the list if it ever becomes empty. That would still not generate any audit logs if the logging system is not configured for it. bq. When you have multiple audit loggers, is there a need to keep them in sync or out of sync audit loggers okay? Not sure what you mean by in sync, but my answer here is the same regardless: there is no need to do anything other than what's already being done. Custom loggers, once called, do what they want with the data, and it's out of the NameNode's control at that point. bq. Alternatively, should we consider a separate daemon that runs off of the audit log written to disk and updates other syncs instead of doing it inline in the namenode code? See my comment on 20/Jul/12 . I think that's overkill and creates more problems than it solves. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427000#comment-13427000 ] Marcelo Vanzin commented on HDFS-3680: -- bq. Given that on the first exception we throw away a logger, a logger could miss whole bunch of audit logs. You're talking about letting audit loggers recover in some way. I don't think you can have it both ways: either misbehaving loggers are removed, or they're always called, and we just eat the exceptions and let them misbehave for as long as they want to, and potentially recover by themselves. Anything more complicated is, well, more complicated, and I don't think it's worth the extra complexity, even if there is something that could possibly be done. This covers the add / remove audit loggers question too. I don't think that's needed. It's a configuration option; this is really not something that should be changing during runtime. bq. So a mechanism to bring the out of sync audit logger in sync will be needed. Again you're assuming that we should try to fix broken audit loggers, something that we can't do. I'm treating broken audit loggers as just that: broken. If they want to not lose audit messages, they should make sure they don't throw exceptions, and handle any unexpected situations internally. bq. I am not sure I understand why it creates more problems. Well, let me rephrase. Talking about a separate daemon to serve audit logs is moot because, well, it wouldn't be HDFS code anymore. Anyone can do that without requiring a single line change in HDFS. My point with submitting this patch is that I think it's worth it to have an interface closer to the core for audit logs. It provides loggers with richer information about the context of the logs, information that may not exist in the text messages written to a log file. It also provides an interface that is more stable than a string in a log file (as in, if you change it, the compiler will yell at you). Just to reiterate my previous point, it is possible today to have a custom audit logger inside the NameNode without my patch; you just have to implement a log appender for the log system in use, and configure the logging system accordingly. With that, you get all the issues with custom loggers (they run synchronously in a critical section of the NameNode, they can bring down the whole process, etc) without any of the benefits of a proper interface (they have to parse a string to get any context data about the log, potentially losing info). Which is what led me to propose a cleaner way of doing it. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427029#comment-13427029 ] Marcelo Vanzin commented on HDFS-3680: -- Just to be clear, I don't have any strong opinion over whether the NN should fail or continue if an access logger fails. I just implemented the stop logging on failure thing because there were comments about third party code bringing the NN down and that being seen as an HDFS bug. The config route is an option, but I'm always loth to add more config options unless strictly necessary. The only thing I'm against is building a complicated system where we try to, in some way, fix the audit logger (by, let's say, re-instantiating it). I think that sort of logic belongs in the audit logger implementation itself. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427045#comment-13427045 ] Marcelo Vanzin commented on HDFS-3680: -- bq. I think you are missing my point. Misbehaving loggers should be removed. But given the importance of audit logging, if the error condition associated with that logger is fixed, we need a mechanism to reinstate that logger back without having to restart the namenode. How can you know that the error condition was fixed, if you have no idea of what it was in the first place? How can you know that a logger that threw a NullPointerException will now stop throwing it? Which is why I took the simpler approach, which is not to make assumptions. The only thing that needs to be settled, in my view, is what to do when the logger throws an exception: ignore (and log it), and hope it will recover by itself in subsequent calls, or log, loudly, that the logger failed, and stop using that logger. I understand the importance of audit logging, but I think that anybody who's implementing an audit logger should understand that too, and make their logger resilient to errors. There's very little the FSNamesystem code can do other than the two options above, from what I can see. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425162#comment-13425162 ] Marcelo Vanzin commented on HDFS-3680: -- Hi Suresh, thanks for the comments. Replies below. bq. Existing LOG could handle different levels, such as trace, debug, info, warn etc. I understand we probably use info level logs now. Should we consider adding such levels to FSAccessLogger? IMO trace, debug et al are related to the logger implementation, not the audit event. The audit event already has information about what it's about; e.g., access allowed / denied, etc. The logger can choose to map that information to something that makes sense in the target; for example, logging denied events with warning level. But such level wouldn't make much sense in a different implementation (say, for example, writing to a database). bq. Why call it FSAccessLogger and not AuditLogger? AuditLogger seems to be a more generic name. Fair enough, will change. bq. You cannot make this InterfaceAudience.Public given HdfsFileStatus and UserGroupInformation are not Public. That poses an issue, though. Would there be resistance to make those two classes public? The problem with them not being public is that it would then require the information to be exposed in some other way: either a new class that just provides the same information (= code duplication, overhead to create the copy), or a string (difficult to parse, overhead to create the string). bq. Do not catch blanket Exception. Instead catch the specific exception you want to handle. Are you OK with catching RuntimeException? I'm really being paranoid here, because I don't want a buggy AccessLogger to suddenly cause the NameNode to go down. Alternatively, the access logger could be disabled when it throws an exception (similar to how HBase disables coprocessors when they throw unexpected exceptions). bq. Please consider adding a separate test instead of adding this to TestFSNamesystem.java Any particular reason why? The test is testing functionality of FSNamesystem (instantiating and using custom AccessLoggers), so it makes sense to me that it's part of FSNamesystem's test. bq. Mock may be better than TestAccessLogger implementation. If you still want to use TestAccessLogger make it private class. I can't mock because FSNamesystem instantiates the access logger using Class.forName(). I also believe I cannot make it private for the same reason: FSNamesystem trying to call the (now private) constructor would cause an IllegalAccessException. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HDFS-3680: - Attachment: hdfs-3680-v5.patch Applying review feedback; I chose to remove misbehaving audit loggers on the first exception, instead of keeping track of how many exceptions or the rate of exceptions being thrown. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HDFS-3680: - Attachment: hdfs-3680-v4.patch Fixed default configuration (and TestAuditLogs in the process). Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HDFS-3680: - Attachment: hdfs-3680-v3.patch Allow multiple loggers to be defined; still logs under the namesystem lock. Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch, hdfs-3680-v3.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419413#comment-13419413 ] Marcelo Vanzin commented on HDFS-3680: -- Thanks for the comments everyone. Good to know FSNamesystem is a singleton, so no need to worry about that issue. As for queuing / blocking, I understand the concerns, but I don't see how they're any different than today. To do something like this today, you'd do one of the following: (i) Process logs post-facto, by tailing the HDFS log file or something along those lines. This would be the completely off process model, not affecting the NN operation. (ii) Use a custom log appender that parses log messages inside the NN. This is almost the same as what my patch does; except it's tied to the log system implementation. Both cases suffer from turning a log message into something expected to be a stable interface; the second approach (which is doable today, just to make that clear) adds on top of that all the concerns you guys listed. Does anyone know how the different log systems behave when using file loggers, which I guess would be the vast majority of cases for this code? Do they do queuing, do they block waiting for the message to be written, what happens when they flush buffers, what if the log file is on NFS, etc? Lots of the concerns raised here are similar to those questions. I agree that implementations of this interface can do all sorts of bad things, but I don't see how that's any worse than today. Unless you guys want to forgo using a log system at all for audit logging, and force writing to files as the only option, having your own custom code to do it and avoid as many of the issues discussed here as possible. The code could definitely force queuing on this code path; since not everybody may need that (the current log approach being the example), I'm wary of turning that into a requirement. So, those out of the way, a few comments about other things: . audit logging under the namesystem lock: that can be hacked around. One ugly way would be to store the audit data in a thread local, and flush it in the unlock() methods. . using the interface for the existing log: that can be easily done; my goal with not changing that part was to not change the existing behavior. I could use the AUDITLOG access logger as the default one, that would be very easy to do. A custom access logger would replace it (or we could make the config option a list, this allowing the use of both again). Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418565#comment-13418565 ] Marcelo Vanzin commented on HDFS-3680: -- Tests are passing locally for me, so I guess they are flaky? Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
Marcelo Vanzin created HDFS-3680: Summary: Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 3.0.0 Reporter: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HDFS-3680: - Status: Patch Available (was: Open) Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 3.0.0 Reporter: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HDFS-3680: - Attachment: accesslogger-v1.patch Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 3.0.0 Reporter: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem
[ https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated HDFS-3680: - Attachment: accesslogger-v2.patch Fix javadoc (copy paste ftl). Allows customized audit logging in HDFS FSNamesystem Key: HDFS-3680 URL: https://issues.apache.org/jira/browse/HDFS-3680 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 2.0.0-alpha Reporter: Marcelo Vanzin Assignee: Marcelo Vanzin Priority: Minor Attachments: accesslogger-v1.patch, accesslogger-v2.patch Currently, FSNamesystem writes audit logs to a logger; that makes it easy to get audit logs in some log file. But it makes it kinda tricky to store audit logs in any other way (let's say a database), because it would require the code to implement a log appender (and thus know what logging system is actually being used underneath the façade), and parse the textual log message generated by FSNamesystem. I'm attaching a patch that introduces a cleaner interface for this use case. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira