[jira] [Commented] (HDFS-6377) Unify xattr name and value limits into a single limit

2014-05-16 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995741#comment-13995741
 ] 

Andrew Wang commented on HDFS-6377:
---

Existing filesystems go both ways on this. Splitting makes sense if you want to 
stash the names in the inode and then store the values in blocks, but we don't 
do that.

I'll also add that the name is a String while the value is a byte[]. 
{{String#length}} returns the # of unicode code units, not the # of bytes. It 
might make sense to do some kind of conversion here to get a more accurate 
accounting of space usage.

One other cleanup that'd be good to do is using Hash and EqualsBuilder in XAttr 
to shorten those functions.

> Unify xattr name and value limits into a single limit
> -
>
> Key: HDFS-6377
> URL: https://issues.apache.org/jira/browse/HDFS-6377
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Andrew Wang
>
> Instead of having separate limits and config options for the size of an 
> xattr's name and value, let's use a single limit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6377) Unify xattr name and value limits into a single limit

2014-05-15 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996616#comment-13996616
 ] 

Chris Nauroth commented on HDFS-6377:
-

Thanks for changing the config property names.

bq. Do you think we need to have minimum configuration limit? Lets say user 
configured size as 3, then this is always invalid size as Namespace itself 
occupy this space? [ I am not insisting, just to discuss this point ]

Alternatively, the other fs-limits configs have the semantics that setting them 
to 0 disables enforcement.  I suppose this might be helpful as an escape hatch 
if something causes really unexpectedly long data, but the admin still wants to 
keep the service running.  (Like Uma, I'm just discussing ideas, not insisting.)

{code}
  private void checkXAttrSize(XAttr xAttr) throws UnsupportedEncodingException {
int size = xAttr.getName().getBytes("UTF-8").length;
if (xAttr.getValue() != null) {
  size += xAttr.getValue().length;
}
if (size > nnConf.xattrMaxSize) {
  throw new HadoopIllegalArgumentException(
  "XAttr is too big, maximum size = " + nnConf.xattrMaxSize
  + ", but the size is = " + xAttr.getName().length());
}
  }
{code}

I believe the log message will be incorrect in the presence of multi-byte 
characters.  The limit is enforced on the number of bytes in UTF-8 encoding.  
The log message uses the string length, which can differ.  This could confuse 
users if we reject an xattr and then log a size that appears to be under the 
configured limit.  Here is a quick Scala REPL session demonstrating the problem:

{code}
scala> val s = "single-byte-chars"
val s = "single-byte-chars"
s: java.lang.String = single-byte-chars

scala> s.getBytes("UTF-8").length
s.getBytes("UTF-8").length
res2: Int = 17

scala> s.length
s.length
res3: Int = 17

scala> val s2 = "multi-byte-\u0641-chars"
val s2 = "multi-byte-\u0641-chars"
s2: java.lang.String = multi-byte-?-chars

scala> s2.getBytes("UTF-8").length
s2.getBytes("UTF-8").length
res4: Int = 19

scala> s2.length
s2.length
res5: Int = 18
{code}

Also, here is a minor code cleanup suggestion on the above.  Guava defines a 
constant {{Charsets#UTF_8}}.  We can pass this to {{String#getBytes(Charset)}} 
(not using the overload that takes a {{String}} parameter).  Then, that 
eliminates the need to deal with {{UnsupportedEncodingException}}.  I've always 
found that exception irritating.  Of course we have UTF-8!  :-)


For {{dfs.namenode.fs-limits.max-directory-items}}, we log an error message if 
we encounter an existing inode that violates the limit during startup/applying 
edits.  This can be a helpful message if an admin down-tunes the setting and 
then wants to identify and clean up existing data that's in violation.  Can we 
log a message for the xattr limit violations too?  If it's easier, feel free to 
punt this part to a separate jira.  (I realize you're close to +1 on this patch 
already.)

> Unify xattr name and value limits into a single limit
> -
>
> Key: HDFS-6377
> URL: https://issues.apache.org/jira/browse/HDFS-6377
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-6377-1.patch
>
>
> Instead of having separate limits and config options for the size of an 
> xattr's name and value, let's use a single limit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6377) Unify xattr name and value limits into a single limit

2014-05-14 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996982#comment-13996982
 ] 

Chris Nauroth commented on HDFS-6377:
-

The patch looked good, Andrew.  Thanks!  Maybe a few potential cleanups:

{code}
+import java.io.UnsupportedEncodingException;
{code}

Is the import unused now?

{code}
-  cacheManager.stopMonitorThread();
-  cacheManager.clearDirectiveStats();
+  if (cacheManager != null) {
+cacheManager.stopMonitorThread();
+cacheManager.clearDirectiveStats();
+  }
{code}

Was this an unrelated change?

> Unify xattr name and value limits into a single limit
> -
>
> Key: HDFS-6377
> URL: https://issues.apache.org/jira/browse/HDFS-6377
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Fix For: HDFS XAttrs (HDFS-2006)
>
> Attachments: hdfs-6377-1.patch, hdfs-6377-2.patch
>
>
> Instead of having separate limits and config options for the size of an 
> xattr's name and value, let's use a single limit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6377) Unify xattr name and value limits into a single limit

2014-05-14 Thread Andrew Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996988#comment-13996988
 ] 

Andrew Wang commented on HDFS-6377:
---

Hey Chris, good comments. Let's remove the import in HDFS-6395.

The cacheManager check was because of the new unit test I added for testing the 
Precondition checks. Basically, if the NN fails in its constructor, it calls 
stopActiveServices to clean up. Since FSDirectory can throw a precondition, 
this means we try to clean up while cacheManager is still null. This meant I 
was getting a NullPointerException rather than the IllegalArgumentException I 
wanted to verify, and I couldn't get the LogVerificationAppender to show me the 
stack trace. Thus, the null check :)

> Unify xattr name and value limits into a single limit
> -
>
> Key: HDFS-6377
> URL: https://issues.apache.org/jira/browse/HDFS-6377
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Fix For: HDFS XAttrs (HDFS-2006)
>
> Attachments: hdfs-6377-1.patch, hdfs-6377-2.patch
>
>
> Instead of having separate limits and config options for the size of an 
> xattr's name and value, let's use a single limit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6377) Unify xattr name and value limits into a single limit

2014-05-13 Thread Chris Nauroth (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997065#comment-13997065
 ] 

Chris Nauroth commented on HDFS-6377:
-

Ah, thanks for the explanation of the {{cacheManager}} bit.  :-)

> Unify xattr name and value limits into a single limit
> -
>
> Key: HDFS-6377
> URL: https://issues.apache.org/jira/browse/HDFS-6377
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Fix For: HDFS XAttrs (HDFS-2006)
>
> Attachments: hdfs-6377-1.patch, hdfs-6377-2.patch
>
>
> Instead of having separate limits and config options for the size of an 
> xattr's name and value, let's use a single limit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6377) Unify xattr name and value limits into a single limit

2014-05-13 Thread Uma Maheswara Rao G (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995997#comment-13995997
 ] 

Uma Maheswara Rao G commented on HDFS-6377:
---


Thanks a lot, Andrew for filing this JIRA and for the patch.
Adding to Liu's  and Chris comments, I have following comments:
- . {code}
 "XAttr is too big, maximum size = " + nnConf.xattrMaxSize
+  + ", but the size is = " + xAttr.getName().length());
{code}

Seems like now max size is combination of name and value size together, but 
here you considered on name#length?

- . {code}
 public static void init() throws Exception {
+conf = new Configuration();
{code}
Lets use HdfsCOnfiguration? [ but no harm in using either :-) ]

- . {code}
 fs.setXAttr(path, name1, longValue);
+  fs.setXAttr(path, "user.a", longValue);
{code}
you can use name1 for it?

- 
Do you think we need to have minimum configuration limit?  Lets say user 
configured size as 3, then this is always invalid size as Namespace itself 
occupy this space?  [ I am not insisting, just to discuss this point ]
Atleast we can warn on startup saying, you can not add xattrs with configured 
limit? [ I think 7 is the minimum size for adding xattr with user name space 
where name and value can have one char each at least]

- 
Thanks for the other cleanup.
Yes, I agree with Chris, having fs-limits for consistency make sense to me.

> Unify xattr name and value limits into a single limit
> -
>
> Key: HDFS-6377
> URL: https://issues.apache.org/jira/browse/HDFS-6377
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-6377-1.patch
>
>
> Instead of having separate limits and config options for the size of an 
> xattr's name and value, let's use a single limit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6377) Unify xattr name and value limits into a single limit

2014-05-13 Thread Charles Lamb (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996396#comment-13996396
 ] 

Charles Lamb commented on HDFS-6377:


In general, LGTM. One nit (piling on Uma's comment):

+  "XAttr is too big, maximum size = " + nnConf.xattrMaxSize
+  + ", but the size is = " + xAttr.getName().length());

Could we change the wording to

"The XAttr is too big. The maximum size of the name + value is 
, but the total size is  Unify xattr name and value limits into a single limit
> -
>
> Key: HDFS-6377
> URL: https://issues.apache.org/jira/browse/HDFS-6377
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-6377-1.patch
>
>
> Instead of having separate limits and config options for the size of an 
> xattr's name and value, let's use a single limit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HDFS-6377) Unify xattr name and value limits into a single limit

2014-05-13 Thread Yi Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-6377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995977#comment-13995977
 ] 

Yi Liu commented on HDFS-6377:
--

Thanks Andrew. 
I think unifing xattr name and value limits into a single limit is OK for me.  
I think in existing filesystems it should depend on the implementation, for 
example in XFS they are separate limits: 
http://en.wikipedia.org/wiki/Extended_file_attributes. 

Furthermore, can we modify the configuration name according to [~cnauroth]'s 
comments:
{quote}The defaults chosen in this patch look reasonable to me with 
consideration of the use case driving xattrs. However, I wonder if the property 
name should include "fs-limits" for consistency with the other metadata 
limitation properties: dfs.namenode.fs-limits.max-component-length and 
dfs.namenode.fs-limits.max-directory-items. What do you think?{quote}

+1 for me, and I'd like we modify the configuration name.

> Unify xattr name and value limits into a single limit
> -
>
> Key: HDFS-6377
> URL: https://issues.apache.org/jira/browse/HDFS-6377
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: namenode
>Affects Versions: HDFS XAttrs (HDFS-2006)
>Reporter: Andrew Wang
>Assignee: Andrew Wang
> Attachments: hdfs-6377-1.patch
>
>
> Instead of having separate limits and config options for the size of an 
> xattr's name and value, let's use a single limit.



--
This message was sent by Atlassian JIRA
(v6.2#6252)