[jira] [Commented] (HDFS-14038) Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate

2018-10-30 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669087#comment-16669087
 ] 

Marcelo Vanzin commented on HDFS-14038:
---

No, the problem here is not the use of reflection. That is needed because Spark 
still has to build against Hadoop 2, which doesn't have that API.

The issue raised in that comment is that the method Spark uses is in a 
LimitedPrivate / Unstable API. Which means it can break at any time.

For example, a better approach would be to have a method in 
{{FSDataOutputStreamBuilder}}, which is marked as public. In fact, there's 
already {{replication()}}, to set the replication factor, but it doesn't seem 
related to the {{replicate()}} method in HdfsDataOutputStreamBuilder. Maybe 
they should be merged.

> Expose HdfsDataOutputStreamBuilder to include Spark in LimitedPrivate
> -
>
> Key: HDFS-14038
> URL: https://issues.apache.org/jira/browse/HDFS-14038
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Xiao Chen
>Priority: Major
>
> In SPARK-25855 / 
> https://github.com/apache/spark/pull/22881#issuecomment-434359237, Spark 
> prefer to create Spark event log files with replication (instead of EC). To 
> do this currently, it has to be done by some casting / reflection, to get a 
> DistributedFileSystem object (or use the {{HdfsDataOutputStreamBuilder}} 
> subclass of it).
> We should officially expose this for Spark's usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-12534) Provide logical BlockLocations for EC files for better split calculation

2017-09-22 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-12534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16177425#comment-16177425
 ] 

Marcelo Vanzin commented on HDFS-12534:
---

bq. Are you sure we can split within a single S3 file?

Location != split. You can have x splits all with the same location. I'm pretty 
sure reading from a single s3 file using FileInputFormat generates multiple 
tasks (one per "split"). You may want to look at how it does that, it might be 
all client-side based on some client-side configuration.

> Provide logical BlockLocations for EC files for better split calculation
> 
>
> Key: HDFS-12534
> URL: https://issues.apache.org/jira/browse/HDFS-12534
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: erasure-coding
>Affects Versions: 3.0.0-beta1
>Reporter: Andrew Wang
>  Labels: hdfs-ec-3.0-must-do
>
> I talked to [~vanzin] and [~alex.behm] some more about split calculation with 
> EC. It turns out HDFS-1 was resolved prematurely. Applications depend on 
> HDFS BlockLocation to understand where the split points are. The current 
> scheme of returning one BlockLocation per block group loses this information.
> We should change this to provide logical blocks. Divide the file length by 
> the block size and provide suitable BlockLocations to match, with virtual 
> offsets and lengths too.
> I'm not marking this as incompatible, since changing it this way would in 
> fact make it more compatible from the perspective of applications that are 
> scheduling against replicated files. Thus, it'd be good for beta1 if 
> possible, but okay for later too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-30 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985707#comment-13985707
 ] 

Marcelo Vanzin commented on HDFS-6293:
--

[~ajisakaa] my modified parser does not have an upper-bound on the memory; 
because it still needs to load information about all inodes, it's still O(n) 
for the number of inodes in the image.

 Issues with OIV processing PB-based fsimages
 

 Key: HDFS-6293
 URL: https://issues.apache.org/jira/browse/HDFS-6293
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Blocker
 Attachments: Heap Histogram.html


 There are issues with OIV when processing fsimages in protobuf. 
 Due to the internal layout changes introduced by the protobuf-based fsimage, 
 OIV consumes excessive amount of memory.  We have tested with a fsimage with 
 about 140M files/directories. The peak heap usage when processing this image 
 in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
 the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
 heap (max new size was 1GB).  It should be possible to process any image with 
 the default heap size of 1.5GB.
 Another issue is the complete change of format/content in OIV's XML output.  
 I also noticed that the secret manager section has no tokens while there were 
 unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
 they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6293) Issues with OIV processing PB-based fsimages

2014-04-28 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13983288#comment-13983288
 ] 

Marcelo Vanzin commented on HDFS-6293:
--

Hi Kihwal,

We have developed some code internally that mitigates (but does not eliminate) 
some of these problems. For an image with 140M entries it would need in the 
ballpark of 7-8GB of heap space, from my pencil-and-napkin calculations. Also, 
it does not generate entries in order like LsrPBImage does, and it's tailored 
for the use case of listing the contents of the file system (so it completely 
ignores things like snapshots).

(The reason it still requires a lot of memory is, as you note, that it needs to 
load information about all inodes in memory; our code is just a little smarter 
about what information it loads. I don't think it's possible to make it much 
better without changing the data in the fsimage itself.)

If people are ok with those limitations, we could clean up our code and post it 
as a patch.

 Issues with OIV processing PB-based fsimages
 

 Key: HDFS-6293
 URL: https://issues.apache.org/jira/browse/HDFS-6293
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.4.0
Reporter: Kihwal Lee
Priority: Blocker
 Attachments: Heap Histogram.html


 There are issues with OIV when processing fsimages in protobuf. 
 Due to the internal layout changes introduced by the protobuf-based fsimage, 
 OIV consumes excessive amount of memory.  We have tested with a fsimage with 
 about 140M files/directories. The peak heap usage when processing this image 
 in pre-protobuf (i.e. pre-2.4.0) format was about 350MB.  After converting 
 the image to the protobuf format on 2.4.0, OIV would OOM even with 80GB of 
 heap (max new size was 1GB).  It should be possible to process any image with 
 the default heap size of 1.5GB.
 Another issue is the complete change of format/content in OIV's XML output.  
 I also noticed that the secret manager section has no tokens while there were 
 unexpired tokens in the original image (pre-2.4.0).  I did not check whether 
 they were also missing in the new pb fsimage.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HDFS-4601) DistCp's ThrottledInputStream leaks.

2013-03-13 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created HDFS-4601:


 Summary: DistCp's ThrottledInputStream leaks.
 Key: HDFS-4601
 URL: https://issues.apache.org/jira/browse/HDFS-4601
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Reporter: Marcelo Vanzin


DistCp's ThrottledInputStream.java leaks the wrapped input stream because it 
does not override InputStream.close().

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-11-29 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HDFS-3680:
-

Attachment: hdfs-3680-v10.patch

rebased + corrected typo.

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v10.patch, hdfs-3680-v3.patch, hdfs-3680-v4.patch, 
 hdfs-3680-v5.patch, hdfs-3680-v6.patch, hdfs-3680-v7.patch, 
 hdfs-3680-v8.patch, hdfs-3680-v9.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-11-13 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HDFS-3680:
-

Attachment: hdfs-3680-v9.patch

Re-based and re-tested (a.k.a. ping).

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, 
 hdfs-3680-v6.patch, hdfs-3680-v7.patch, hdfs-3680-v8.patch, hdfs-3680-v9.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-10-05 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470508#comment-13470508
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

bq. Why is the following code part of this patch?

See my previous response:

I'm creating a FileStatus object based on an HdfsFileStatus, which is a 
private audience class and thus cannot be used in the public audience 
AuditLogger.

bq. Given loggers are going to some kind of io, to database, or some server 
etc. IOException should be expected and seems like a logical thing to throw and 
not RunTimeException.

You're making assumptions about the implementation of the logger. Why would it 
throw IOException and not SQLException? What if my logger doesn't do any I/O in 
the thread doing the logging at all? Saying throws IOException would just 
make implementors wrap whatever is the real exception being thrown in an 
IOException instead of a RuntimeException, to no benefit I can see. If 
FSNamesystem should handle errors from custom loggers, it should handle all 
errors, not just specific ones.

bq. I do not think it is outside the scope of this patch. Current logger could 
fail, on system failures. However here it may fail because poorly written code

We can't prevent badly written code from doing bad things (also see previous 
comments from ATM that this is not an interface that people will be 
implementing willy nilly - people who'll touch it are expected to know what 
they are doing). The reason I say it's out of the scope of this patch is 
because it's a change in the current behavior that's unrelated to whether you 
have a custom audit logger or not; if audit logs should cause the name node to 
shut down, that's a change that needs to be made today, right now, independent 
of this patch going in or not.

. hdfs-default.xml document does not cover my previous comment

It covers details related to configuration. Details about what the 
implementation is expected to do should be (and are) documented in the 
interface itself, which is the interface that the person writing the 
implementation will be looking at.

If you're talking about the what about when things don't work correctly part, 
I'll wait for closure on the other comments.

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, 
 hdfs-3680-v6.patch, hdfs-3680-v7.patch, hdfs-3680-v8.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-10-05 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470589#comment-13470589
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

bq. That said if you do not want to throw IOException, which I recommend, is 
fine. However you need to make sure you throw a checked exception from this 
interface, to design it correctly, to force the caller to handle the error 
condition. RTE which typically indicates programming error is not the choice.

My main point in resisting declaring any checked exception here is that not 
doing that makes it clear that loggers are not expected to throw exceptions, 
and thus should avoid doing so whenever possible. From FSNamesystem's point of 
view, there's no point in handling checked and unchecked exceptions 
differently; and audit log failure should be handled the same way regardless of 
the nature of the failure, unless you want to do crazy things like have a 
RetriableFailureException to indicate that FSNamesystem can call the method 
again to see if it works.

bq. I disagree. You are allowing audit logger to be pluggable in this patch. 
What is the impact of making logging pluggable and how namenode must deal with 
is the relevant questions for this patch. Not sure you looked at this comment 
from Todd, that I agree with:

I'm making it easier, but as I've mentioned before, it's already possible to 
use custom loggers by doing it at the log system layer (e.g. a custom log4j 
appender; just use 
http://wiki.apache.org/logging-log4j/JDBCAppenderConfiguration, for example, 
and you have many more failure cases than just writing to a local file). 

That's why I say that if you're worried about that case, that particular change 
should be made in the current code regardless of this patch. But if you feel 
strongly about it, I can add a try...catch that just shuts down the namenode.

(BTW, unless log4j itself shuts down the process by calling System.exit(), 
which my experience says is not the case, I don't see where this shutdown on 
audit log error code is.)

bq. Other thing I have not completely thought through but bother me is, when 
these failures happen, a failure response is sent to the client, though it is 
successful on Namenode and is recorded on editlog. I would like others opinion 
on this.

That's independent of this patch (ignoring the broken audit logger argument). 
The problem is that the audit log is written after the operation succeeds; to 
fix that, the permission check and implementation of each operation should be 
separated, so that you could write the audit log before executing the operation.

bq. The side effects of this configuration change should be documented for an 
administrator so he understands the impact of this.

What side effects are you talking about? The only one would be if the audit 
logger fails, namenode will shut down, if that behavior is implemented. 
Otherwise, any side-effect is implementation-specific and it's impossible to 
say anything other than there might be side effects.


 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, 
 hdfs-3680-v6.patch, hdfs-3680-v7.patch, hdfs-3680-v8.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-10-05 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13470788#comment-13470788
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

Just to end the what does log4j do discussion, here's what happens when log4j 
appenders throw exceptions/errors:

2012-10-05 16:32:05,748 WARN org.apache.hadoop.ipc.Server: IPC Server handler 7 
on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo from 
172.29.110.222:48751: error: java.lang.RuntimeException: testing exception 
handling.
java.lang.RuntimeException: testing exception handling.
at test.AppenderBase.append(AppenderBase.java:72)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at 
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at 
org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:199)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logAuditEvent(FSNamesystem.java:258)

2012-10-05 16:32:08,820 WARN org.apache.hadoop.ipc.Server: IPC Server handler 
11 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getFileInfo 
from 172.29.110.222:48754: error: java.lang.Error: testing error handling.
java.lang.Error: testing error handling.
at test.AppenderBase.append(AppenderBase.java:74)
at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
at 
org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
at org.apache.log4j.Category.callAppenders(Category.java:206)
at org.apache.log4j.Category.forcedLog(Category.java:391)
at org.apache.log4j.Category.log(Category.java:856)
at 
org.apache.commons.logging.impl.Log4JLogger.info(Log4JLogger.java:199)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.logAuditEvent(FSNamesystem.java:258)

log4j appenders in general use log4j's ErrorHandler when errors occur, instead 
of throwing exceptions. The default one 
(http://logging.apache.org/log4j/1.2/apidocs/org/apache/log4j/helpers/OnlyOnceErrorHandler.html)
 just drops errors after printing the first one to stderr. So the NameNode is 
not shut down; it just keeps running, and audit logs are silently dropped. 
Which is another reason why I'll maintain that the what to do when logging an 
audit fails issue is not particular to my patch.

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, 
 hdfs-3680-v6.patch, hdfs-3680-v7.patch, hdfs-3680-v8.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-10-04 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13469591#comment-13469591
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

Hi Suresh, I think I addressed all your concerns (will upload new patch after 
testing).

bq. what is the reason symlink is being done in logAuditEvent? Why is it a part 
of this jira?

What do you mean symlink is being done? I'm not creating any symlinks. I'm 
creating a FileStatus object based on an HdfsFileStatus, which is a private 
audience class and thus cannot be used in the public audience AuditLogger.

bq. How does one add DefaultAuditLogger with a custom audit loggers? How does 
isAuditEnabled() method work if you add an ability to setup DefaultAuditLogger?

I hope I addressed this in the new documentation. For your second question, you 
should take a look at the isDefaultAuditLogger in FSNamesystem.java.

bq. FSNamesystem#auditLog should be moved to DefaultAuditLogger.

Since that field is public and used in other places, I'd rather not touch that.

bq. Should AuditLogger#logAuditEvent consider throwing IOException to indicate 
error?

IOException would be one of many exceptions custom audit loggers could throw. 
So I don't see why it should be special-cased here. My opinion is that audit 
loggers in general shouldn't throw exceptions; if they really want to, having 
to throw a RuntimeException indicates that it's not really an expected case.

bq. Sorry I have not caught up all the comments - what is the final decision on 
how to handle logger errors? Currently the client gets an exception when 
logAuditEvent fails. That does not seem to be correct.

I don't think there was an ultimate decision; my current patch follows the 
things work just like before approach: currently, if the logging system 
throws some kind of exception, it fails the request, just like the new code 
does. If that's not desired, then it can be changed, but I think that's outside 
the scope of this patch.


 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, 
 hdfs-3680-v6.patch, hdfs-3680-v7.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-10-04 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HDFS-3680:
-

Attachment: hdfs-3680-v8.patch

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, 
 hdfs-3680-v6.patch, hdfs-3680-v7.patch, hdfs-3680-v8.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums

2012-09-06 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450006#comment-13450006
 ] 

Marcelo Vanzin commented on HDFS-3889:
--

A couple of comments on this bug, for context:

Checksums are checked in two spots in the code:
* When deciding whether to perform a copy if the -update option is used
* When checking that a copy succeeded (that's the code changed in HDFS-3054).

I think error checking should behave differently for each of those.

In the first case, we're interested in whether a copy should be made or not; if 
we fail to read the checksum, I think the best would be to treat that as an 
indication that the file should be copied to the destination (which is what the 
code does today).

In the second case, if we fail to read the checksum, it means we can't verify 
that the copy is correct. That case, I think, should result in an exception.

As a plus, I think the -skipcrccheck option should not apply to the first 
case above (deciding whether to update a remote file). CRCs should always be 
checked in that case; otherwise the equality check is simply based on file size 
and block size, which I don't think is enough to say the files are the same.

 distcp overwrites files even when there are missing checksums
 -

 Key: HDFS-3889
 URL: https://issues.apache.org/jira/browse/HDFS-3889
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Priority: Minor

 If distcp can't read the checksum files for the source and destination 
 files-- for any reason-- it ignores the checksums and overwrites the 
 destination file.  It does produce a log message, but I think the correct 
 behavior would be to throw an error and stop the distcp.
 If the user really wants to ignore checksums, he or she can use 
 {{-skipcrccheck}} to do so.
 The relevant code is in DistCpUtils#checksumsAreEquals:
 {code}
 try {
   sourceChecksum = sourceFS.getFileChecksum(source);
   targetChecksum = targetFS.getFileChecksum(target);
 } catch (IOException e) {
   LOG.error(Unable to retrieve checksum for  + source +  or  + 
 target, e);
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums

2012-09-06 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450031#comment-13450031
 ] 

Marcelo Vanzin commented on HDFS-3889:
--

bq. What if the source and destination clusters have different checksum types, 
or one of the checksums is missing?

That means that you can't reasonably detect whether both files are equal, so 
the code should fall back to the safe path, which is to assume they are not 
equal and that a copy should be performed. Since manually computing the 
checksums (by reading both source and destination files) and just copying the 
file would be about the same performance-wise, it should be fine.

-update is an optimization to avoid copying redundant data. Nothing will 
break if you just overwrite the target data with the source, it will just be 
slower than if the checksum checks were possible.

 distcp overwrites files even when there are missing checksums
 -

 Key: HDFS-3889
 URL: https://issues.apache.org/jira/browse/HDFS-3889
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Priority: Minor

 If distcp can't read the checksum files for the source and destination 
 files-- for any reason-- it ignores the checksums and overwrites the 
 destination file.  It does produce a log message, but I think the correct 
 behavior would be to throw an error and stop the distcp.
 If the user really wants to ignore checksums, he or she can use 
 {{-skipcrccheck}} to do so.
 The relevant code is in DistCpUtils#checksumsAreEquals:
 {code}
 try {
   sourceChecksum = sourceFS.getFileChecksum(source);
   targetChecksum = targetFS.getFileChecksum(target);
 } catch (IOException e) {
   LOG.error(Unable to retrieve checksum for  + source +  or  + 
 target, e);
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums

2012-09-06 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450052#comment-13450052
 ] 

Marcelo Vanzin commented on HDFS-3889:
--

bq. In the absence of CRCs, it should also be based on modtime and other file 
metadata, not just size.

If the goal is to just provide the same functionality as rsync, then sure. 
Although I consider those less reliable (or just as bad) as file size alone. 
They require the metadata to be kept in sync between source and destination, 
something that I don't think is very common for mod time or access time, for 
example.

 distcp overwrites files even when there are missing checksums
 -

 Key: HDFS-3889
 URL: https://issues.apache.org/jira/browse/HDFS-3889
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Priority: Minor

 If distcp can't read the checksum files for the source and destination 
 files-- for any reason-- it ignores the checksums and overwrites the 
 destination file.  It does produce a log message, but I think the correct 
 behavior would be to throw an error and stop the distcp.
 If the user really wants to ignore checksums, he or she can use 
 {{-skipcrccheck}} to do so.
 The relevant code is in DistCpUtils#checksumsAreEquals:
 {code}
 try {
   sourceChecksum = sourceFS.getFileChecksum(source);
   targetChecksum = targetFS.getFileChecksum(target);
 } catch (IOException e) {
   LOG.error(Unable to retrieve checksum for  + source +  or  + 
 target, e);
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3889) distcp overwrites files even when there are missing checksums

2012-09-06 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13450075#comment-13450075
 ] 

Marcelo Vanzin commented on HDFS-3889:
--

bq. I believe that the modification time is set based on the NN, not the 
clients. So nothing needs to be kept in sync.

You have two NNs. The metadata on the the target NN needs to be in sync with 
the source NN for the metadata-based check to do the right thing.

In the end, my opinion is just that metadata-based checks are a very poor 
substitute for checksums, and can much more easily generate false positives 
(i.e. say that files are equal when they're not). But if it's a feature that 
people find useful, why not. The false negative case is not such a big problem, 
since it would just waste bandwidth by forcing the copy.

 distcp overwrites files even when there are missing checksums
 -

 Key: HDFS-3889
 URL: https://issues.apache.org/jira/browse/HDFS-3889
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: tools
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Priority: Minor

 If distcp can't read the checksum files for the source and destination 
 files-- for any reason-- it ignores the checksums and overwrites the 
 destination file.  It does produce a log message, but I think the correct 
 behavior would be to throw an error and stop the distcp.
 If the user really wants to ignore checksums, he or she can use 
 {{-skipcrccheck}} to do so.
 The relevant code is in DistCpUtils#checksumsAreEquals:
 {code}
 try {
   sourceChecksum = sourceFS.getFileChecksum(source);
   targetChecksum = targetFS.getFileChecksum(target);
 } catch (IOException e) {
   LOG.error(Unable to retrieve checksum for  + source +  or  + 
 target, e);
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HDFS-3865) TestDistCp is @ignored

2012-08-29 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1337#comment-1337
 ] 

Marcelo Vanzin commented on HDFS-3865:
--

I haven't looked at the code in detail, but is this a case of TestDistCp being 
replaced by the other tests?

e.g. TestDistCp.testUniforSizeDistCp() vs. TestUniformSizeInputFormat.

 TestDistCp is @ignored
 --

 Key: HDFS-3865
 URL: https://issues.apache.org/jira/browse/HDFS-3865
 Project: Hadoop HDFS
  Issue Type: Test
  Components: tools
Affects Versions: 2.2.0-alpha
Reporter: Colin Patrick McCabe
Priority: Minor

 We should fix TestDistCp so that it actually runs, rather than being ignored.
 {code}
 @ignore
 public class TestDistCp {
   private static final Log LOG = LogFactory.getLog(TestDistCp.class);
   private static ListPath pathList = new ArrayListPath();
   ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-08-21 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HDFS-3680:
-

Attachment: hdfs-3680-v7.patch

Made AuditLogger interface public, avoiding usage of UserGroupInformation and 
HdfsFileStatus classes.

Only concern with this change is that exposing just the user name misses a lot 
of information on the principal held by UGI; don't know how interesting that 
information is for audit logs, though, since the service configuration should 
provide a lot of the details (like auth type, realm, etc).

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, 
 hdfs-3680-v6.patch, hdfs-3680-v7.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-08-03 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HDFS-3680:
-

Status: Patch Available  (was: Open)

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, hdfs-3680-v6.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-08-03 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HDFS-3680:
-

Attachment: hdfs-3680-v6.patch

Change code to let exceptions from audit loggers propagate to the caller 
methods.

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch, hdfs-3680-v6.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-08-02 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427470#comment-13427470
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

Hi Daryn,

bq. The NN has an option for an external logger daemon

So, this implies some sort of IPC, either to another local process or possibly 
even a remote one. Which raises different questions, some of which have been 
asked before:

* Are audit logs written synchronously?
* If yes, is the extra latency for the RPC call acceptable?
* If no, how do we reliably know that audit logs have been written?
* What's an acceptable timeout for the RPC call?

And it doesn't really answer the question of what to do on failure, although 
the consensus there seems to be to abort. I'm not against this solution per 
se, I just don't think that, in the end, it's much different than allowing 3rd 
party code into the NN. I'll refer back to ATM's comment that we expect people 
who are writing audit loggers to know what they're doing.


On a different note, I played with the let exceptions flow approach, and it 
seems that the NN catches the exception at some point and morphs it into a 
RemoteException. I actually like that better than shutting down the NN; it 
fails the particular operation, without shutting down the NN itself, which 
allows the logger to recover if it can. Of course a really broken logger will 
just keep throwing exceptions, but then IMO the user is asking for it by 
installing a broken custom logger.

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-08-02 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427550#comment-13427550
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

bq. Sure it is, as noted before, it buffers the NN against 3rd party code 
calling exit, segfaulting in JNI, memory and fd leaks, OOM, etc. If the daemon 
misbehaves or blows up then it shouldn't compromise the integrity of the 
namespace.

The consensus seems to be that if you can't log an audit event the NN should 
blow up (either fail the request or just shut itself down entirely). So even 
though the daemon is in a different address space, in that view it would still 
affect the functionality of the NN.

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-08-01 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HDFS-3680:
-

Status: Open  (was: Patch Available)

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-08-01 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426859#comment-13426859
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

Any idea why the Hadoop QA bot hasn't picked up the new patch?

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-08-01 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426944#comment-13426944
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

Answers inline.

bq. With this approach, should namenode run if for some reason we have removed 
all the audit loggers from the list? My answer would be no, given the 
importance of audit log.
bq. Does the system need a mechanism to add/remove audit loggers? When a failed 
logger is fixed, do we need a way to refresh the audit logger so it is picked 
up by the Namenode again?

Being of the opinion that making it a list wasn't really needed to start with 
(I can't really see a scenario where you'd have more than one custom logger, 
which is what my original patch did), I don't think the NameNode should be 
stopped. Removing the audit logger from the list means the audit logger is 
buggy, which means that we'd be stopping the NameNode because code outside the 
control of the NameNode did something bad, which goes back to previous 
discussions about how this can end up being blamed on the HDFS code while it's 
not HDFS's fault.

I could, though, always fall back to having the default logger in the list if 
it ever becomes empty. That would still not generate any audit logs if the 
logging system is not configured for it.

bq. When you have multiple audit loggers, is there a need to keep them in sync 
or out of sync audit loggers okay?

Not sure what you mean by in sync, but my answer here is the same regardless: 
there is no need to do anything other than what's already being done. Custom 
loggers, once called, do what they want with the data, and it's out of the 
NameNode's control at that point.

bq. Alternatively, should we consider a separate daemon that runs off of the 
audit log written to disk and updates other syncs instead of doing it inline in 
the namenode code?

See my comment on 20/Jul/12 . I think that's overkill and creates more problems 
than it solves.


 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-08-01 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427000#comment-13427000
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

bq. Given that on the first exception we throw away a logger, a logger could 
miss whole bunch of audit logs.

You're talking about letting audit loggers recover in some way. I don't think 
you can have it both ways: either misbehaving loggers are removed, or they're 
always called, and we just eat the exceptions and let them misbehave for as 
long as they want to, and potentially recover by themselves. Anything more 
complicated is, well, more complicated, and I don't think it's worth the extra 
complexity, even if there is something that could possibly be done.

This covers the add / remove audit loggers question too. I don't think that's 
needed. It's a configuration option; this is really not something that should 
be changing during runtime.

bq. So a mechanism to bring the out of sync audit logger in sync will be 
needed. 

Again you're assuming that we should try to fix broken audit loggers, something 
that we can't do. I'm treating broken audit loggers as just that: broken. If 
they want to not lose audit messages, they should make sure they don't throw 
exceptions, and handle any unexpected situations internally.

bq. I am not sure I understand why it creates more problems.

Well, let me rephrase. Talking about a separate daemon to serve audit logs is 
moot because, well, it wouldn't be HDFS code anymore. Anyone can do that 
without requiring a single line change in HDFS.

My point with submitting this patch is that I think it's worth it to have an 
interface closer to the core for audit logs. It provides loggers with richer 
information about the context of the logs, information that may not exist in 
the text messages written to a log file. It also provides an interface that is 
more stable than a string in a log file (as in, if you change it, the compiler 
will yell at you).

Just to reiterate my previous point, it is possible today to have a custom 
audit logger inside the NameNode without my patch; you just have to implement a 
log appender for the log system in use, and configure the logging system 
accordingly. With that, you get all the issues with custom loggers (they run 
synchronously in a critical section of the NameNode, they can bring down the 
whole process, etc) without any of the benefits of a proper interface (they 
have to parse a string to get any context data about the log, potentially 
losing info). Which is what led me to propose a cleaner way of doing it.

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-08-01 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427029#comment-13427029
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

Just to be clear, I don't have any strong opinion over whether the NN should 
fail or continue if an access logger fails. I just implemented the stop 
logging on failure thing because there were comments about third party code 
bringing the NN down and that being seen as an HDFS bug. The config route is an 
option, but I'm always loth to add more config options unless strictly 
necessary.

The only thing I'm against is building a complicated system where we try to, in 
some way, fix the audit logger (by, let's say, re-instantiating it). I think 
that sort of logic belongs in the audit logger implementation itself.

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-08-01 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427045#comment-13427045
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

bq. I think you are missing my point. Misbehaving loggers should be removed. 
But given the importance of audit logging, if the error condition associated 
with that logger is fixed, we need a mechanism to reinstate that logger back 
without having to restart the namenode.

How can you know that the error condition was fixed, if you have no idea of 
what it was in the first place? How can you know that a logger that threw a 
NullPointerException will now stop throwing it?

Which is why I took the simpler approach, which is not to make assumptions. The 
only thing that needs to be settled, in my view, is what to do when the logger 
throws an exception: ignore (and log it), and hope it will recover by itself in 
subsequent calls, or log, loudly, that the logger failed, and stop using that 
logger.

I understand the importance of audit logging, but I think that anybody who's 
implementing an audit logger should understand that too, and make their logger 
resilient to errors. There's very little the FSNamesystem code can do other 
than the two options above, from what I can see.

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-07-30 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13425162#comment-13425162
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

Hi Suresh, thanks for the comments. Replies below.

bq. Existing LOG could handle different levels, such as trace, debug, info, 
warn etc. I understand we probably use info level logs now. Should we consider 
adding such levels to FSAccessLogger?

IMO trace, debug et al are related to the logger implementation, not the 
audit event. The audit event already has information about what it's about; 
e.g., access allowed / denied, etc. The logger can choose to map that 
information to something that makes sense in the target; for example, logging 
denied events with warning level. But such level wouldn't make much sense in 
a different implementation (say, for example, writing to a database). 

bq. Why call it FSAccessLogger and not AuditLogger? AuditLogger seems to be a 
more generic name.

Fair enough, will change.

bq. You cannot make this InterfaceAudience.Public given HdfsFileStatus and 
UserGroupInformation are not Public.

That poses an issue, though. Would there be resistance to make those two 
classes public? The problem with them not being public is that it would then 
require the information to be exposed in some other way: either a new class 
that just provides the same information (= code duplication, overhead to create 
the copy), or a string (difficult to parse, overhead to create the string).

bq. Do not catch blanket Exception. Instead catch the specific exception you 
want to handle.

Are you OK with catching RuntimeException? I'm really being paranoid here, 
because I don't want a buggy AccessLogger to suddenly cause the NameNode to go 
down. Alternatively, the access logger could be disabled when it throws an 
exception (similar to how HBase disables coprocessors when they throw 
unexpected exceptions).

bq. Please consider adding a separate test instead of adding this to 
TestFSNamesystem.java

Any particular reason why? The test is testing functionality of FSNamesystem 
(instantiating and using custom AccessLoggers), so it makes sense to me that 
it's part of FSNamesystem's test.

bq. Mock may be better than TestAccessLogger implementation. If you still want 
to use TestAccessLogger make it private class.

I can't mock because FSNamesystem instantiates the access logger using 
Class.forName(). I also believe I cannot make it private for the same reason: 
FSNamesystem trying to call the (now private) constructor would cause an 
IllegalAccessException.


 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-07-30 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HDFS-3680:
-

Attachment: hdfs-3680-v5.patch

Applying review feedback; I chose to remove misbehaving audit loggers on the 
first exception, instead of keeping track of how many exceptions or the rate of 
exceptions being thrown.

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch, hdfs-3680-v5.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-07-26 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HDFS-3680:
-

Attachment: hdfs-3680-v4.patch

Fixed default configuration (and TestAuditLogs in the process).

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch, hdfs-3680-v4.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-07-25 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HDFS-3680:
-

Attachment: hdfs-3680-v3.patch

Allow multiple loggers to be defined; still logs under the namesystem lock.

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch, 
 hdfs-3680-v3.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-07-20 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13419413#comment-13419413
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

Thanks for the comments everyone. Good to know FSNamesystem is a singleton, so 
no need to worry about that issue.

As for queuing / blocking, I understand the concerns, but I don't see how 
they're any different than today. To do something like this today, you'd do one 
of the following:

(i) Process logs post-facto, by tailing the HDFS log file or something along 
those lines.

This would be the completely off process model, not affecting the NN 
operation.

(ii) Use a custom log appender that parses log messages inside the NN.

This is almost the same as what my patch does; except it's tied to the log 
system implementation.

Both cases suffer from turning a log message into something expected to be a 
stable interface; the second approach (which is doable today, just to make 
that clear) adds on top of that all the concerns you guys listed.

Does anyone know how the different log systems behave when using file loggers, 
which I guess would be the vast majority of cases for this code? Do they do 
queuing, do they block waiting for the message to be written, what happens when 
they flush buffers, what if the log file is on NFS, etc? Lots of the concerns 
raised here are similar to those questions.

I agree that implementations of this interface can do all sorts of bad things, 
but I don't see how that's any worse than today. Unless you guys want to forgo 
using a log system at all for audit logging, and force writing to files as the 
only option, having your own custom code to do it and avoid as many of the 
issues discussed here as possible.

The code could definitely force queuing on this code path; since not everybody 
may need that (the current log approach being the example), I'm wary of turning 
that into a requirement.

So, those out of the way, a few comments about other things:
. audit logging under the namesystem lock: that can be hacked around. One ugly 
way would be to store the audit data in a thread local, and flush it in the 
unlock() methods.

. using the interface for the existing log: that can be easily done; my goal 
with not changing that part was to not change the existing behavior. I could 
use the AUDITLOG access logger as the default one, that would be very easy to 
do. A custom access logger would replace it (or we could make the config option 
a list, this allowing the use of both again).

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-07-19 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13418565#comment-13418565
 ] 

Marcelo Vanzin commented on HDFS-3680:
--

Tests are passing locally for me, so I guess they are flaky?

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-07-18 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created HDFS-3680:


 Summary: Allows customized audit logging in HDFS FSNamesystem
 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch

Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
get audit logs in some log file. But it makes it kinda tricky to store audit 
logs in any other way (let's say a database), because it would require the code 
to implement a log appender (and thus know what logging system is actually 
being used underneath the façade), and parse the textual log message generated 
by FSNamesystem.

I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-07-18 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HDFS-3680:
-

Status: Patch Available  (was: Open)

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-07-18 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HDFS-3680:
-

Attachment: accesslogger-v1.patch

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-3680) Allows customized audit logging in HDFS FSNamesystem

2012-07-18 Thread Marcelo Vanzin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcelo Vanzin updated HDFS-3680:
-

Attachment: accesslogger-v2.patch

Fix javadoc (copy  paste ftl).

 Allows customized audit logging in HDFS FSNamesystem
 

 Key: HDFS-3680
 URL: https://issues.apache.org/jira/browse/HDFS-3680
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha
Reporter: Marcelo Vanzin
Assignee: Marcelo Vanzin
Priority: Minor
 Attachments: accesslogger-v1.patch, accesslogger-v2.patch


 Currently, FSNamesystem writes audit logs to a logger; that makes it easy to 
 get audit logs in some log file. But it makes it kinda tricky to store audit 
 logs in any other way (let's say a database), because it would require the 
 code to implement a log appender (and thus know what logging system is 
 actually being used underneath the façade), and parse the textual log message 
 generated by FSNamesystem.
 I'm attaching a patch that introduces a cleaner interface for this use case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira