Re: [VOTE] Release Apache Hadoop 2.0.6-alpha (RC1)

2013-08-20 Thread Arun C Murthy
+1 (binding)

Verified bits and ran examples on a 10-node cluster. Looks good.

Arun

On Aug 15, 2013, at 10:29 PM, Konstantin Boudnik  wrote:

> All,
> 
> I have created a release candidate (rc1) for hadoop-2.0.6-alpha that I would
> like to release.
> 
> This is a stabilization release that includes fixed for a couple a of issues
> as outlined on the security list.
> 
> The RC is available at: http://people.apache.org/~cos/hadoop-2.0.6-alpha-rc1/
> The RC tag in svn is here: 
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.6-alpha-rc1
> 
> The maven artifacts are available via repository.apache.org.
> 
> The only difference between rc0 and rc1 is ASL added to releasenotes.html and
> updated release dates in CHANGES.txt files.
> 
> Please try the release bits and vote; the vote will run for the usual 7 days.
> 
> Thanks for your voting
>  Cos
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-20 Thread Arun C Murthy
Thanks for the heads up Aaron, I've changed fix-version of HDFS-4763 to 
2.1.1-beta for now.

Committers - please be careful setting fix-versions, this is a good 
anti-pattern to avoid… though, I'm willing to bet a lot of dough that this 
isn't the first Hadoop release with this issue… *smile*

Arun


On Aug 20, 2013, at 6:09 PM, Aaron T. Myers  wrote:

> I was evaluating the release bits when I noticed that the change done in
> HDFS-4763 to add support for starting the HDFS NFSv3 gateway, which is
> marked with a "fix version" of 2.1.0-beta and included in the release notes
> of RC2, is not in fact included in the RC2 release bits. It looks to me
> like the change is included in branch-2.1-beta, but not branch-2.1.0-beta.
> 
> Particularly since the release notes in RC2 are incorrect in claiming that
> this change is in this release, it seems like a pretty serious
> issue. Ordinarily I'd say that this issue should result in a new RC, and I
> would vote -1 on RC2. But, given the previous discussion that folks are
> interested in releasing 2.1.0-beta with several fairly substantial bugs
> that we already know about, I'll withhold my vote. If RC2 ends up getting
> released as-is, we should be sure to change the fix version field on that
> JIRA to be correct.
> 
> --
> Aaron T. Myers
> Software Engineer, Cloudera
> 
> 
> On Thu, Aug 15, 2013 at 2:15 PM, Arun C Murthy  wrote:
> 
>> Folks,
>> 
>> I've created a release candidate (rc2) for hadoop-2.1.0-beta that I would
>> like to get released - this fixes the bugs we saw since the last go-around
>> (rc1).
>> 
>> The RC is available at:
>> http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc2/
>> The RC tag in svn is here:
>> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.0-beta-rc2
>> 
>> The maven artifacts are available via repository.apache.org.
>> 
>> Please try the release and vote; the vote will run for the usual 7 days.
>> 
>> thanks,
>> Arun
>> 
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>> 
>> 
>> --
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to
>> which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: hadoop-2.1.1-beta & hadoop-2.2.0 (GA)

2013-08-20 Thread Arun C Murthy
FYI - here is the end-point I'm using to track blockers on 2.1.1-beta:
http://s.apache.org/hadoop-2.1.1-beta-blockers

Essentially, these are *Blocker* bugs with *Target Version* set to 2.1.1-beta.

thanks,
Arun

On Aug 16, 2013, at 7:39 PM, Arun C Murthy  wrote:

> Gang,
> 
>  I spent time looking through changes slated for hadoop-2.1.1-beta and things 
> look fairly contained (~10 or so changes for each of Common, HDFS, YARN & 
> MapReduce).
> 
>  Can I, henceforth, request committers to exercise large dollops of caution 
> when committing to branch-2.1-beta? This way I hope we can quickly turn 
> around to make a hadoop-2.1.1-beta release in the next couple of weeks. This 
> can be followed by a bit more testing so that we are in a position to release 
> hadoop-2.2.0 (GA/stable). 
> 
>  In other words, I'm hoping hadoop-2.1.1-beta can be, um, the 'golden master' 
> for the final hadoop-2 GA release. I don't mean to jinx it but saying it out 
> loud, but I feel we could look at pushing out hadoop-2 GA by mid-September… 
> there I said it! *smile*
> 
>  Thoughts?
> 
> thanks,
> Arun
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Created] (HDFS-5118) Extend LoadGenerator for testing NameNode retry cache

2013-08-20 Thread Jing Zhao (JIRA)
Jing Zhao created HDFS-5118:
---

 Summary: Extend LoadGenerator for testing NameNode retry cache
 Key: HDFS-5118
 URL: https://issues.apache.org/jira/browse/HDFS-5118
 Project: Hadoop HDFS
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Jing Zhao
Assignee: Jing Zhao


We plan to extend the current LoadGenerator so that the client is able to 
intentionally drop responses of NameNode RPC calls according to settings in 
configuration. In this way we can do better system test for NameNode retry 
cache, especially when NN failover happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-20 Thread Aaron T. Myers
I was evaluating the release bits when I noticed that the change done in
HDFS-4763 to add support for starting the HDFS NFSv3 gateway, which is
marked with a "fix version" of 2.1.0-beta and included in the release notes
of RC2, is not in fact included in the RC2 release bits. It looks to me
like the change is included in branch-2.1-beta, but not branch-2.1.0-beta.

Particularly since the release notes in RC2 are incorrect in claiming that
this change is in this release, it seems like a pretty serious
issue. Ordinarily I'd say that this issue should result in a new RC, and I
would vote -1 on RC2. But, given the previous discussion that folks are
interested in releasing 2.1.0-beta with several fairly substantial bugs
that we already know about, I'll withhold my vote. If RC2 ends up getting
released as-is, we should be sure to change the fix version field on that
JIRA to be correct.

--
Aaron T. Myers
Software Engineer, Cloudera


On Thu, Aug 15, 2013 at 2:15 PM, Arun C Murthy  wrote:

> Folks,
>
> I've created a release candidate (rc2) for hadoop-2.1.0-beta that I would
> like to get released - this fixes the bugs we saw since the last go-around
> (rc1).
>
> The RC is available at:
> http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc2/
> The RC tag in svn is here:
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.0-beta-rc2
>
> The maven artifacts are available via repository.apache.org.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>
> thanks,
> Arun
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-20 Thread Stack
On Thu, Aug 15, 2013 at 2:15 PM, Arun C Murthy  wrote:

> Folks,
>
> I've created a release candidate (rc2) for hadoop-2.1.0-beta that I would
> like to get released - this fixes the bugs we saw since the last go-around
> (rc1).
>
> The RC is available at:
> http://people.apache.org/~acmurthy/hadoop-2.1.0-beta-rc2/
> The RC tag in svn is here:
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.1.0-beta-rc2
>
> The maven artifacts are available via repository.apache.org.
>
> Please try the release and vote; the vote will run for the usual 7 days.
>

It basically works (in insecure mode), +1.

+ Checked signature.
+ Ran on small cluster w/ small load made using mapreduce interfaces.
+ Got the HBase full unit test suite to pass on top of it.

I had the following issues getting it to all work. I don't know if they are
known issues so will just list them here first.

+ I could not find documentation on how to go from tarball to running
cluster (the bundled 'cluster' and 'standalone' doc are not about how to
get this tarball off the ground).
+ I had a bit of a struggle putting this release in place under hbase unit
tests.  The container would just exit w/ 127 errcode.  No logs in expected
place.  Tripped over where minimrcluster was actually writing.  Tried to
corral it so it played nicely w/o our general test setup but found that the
new mini clusters have 'target' hardcoded as output dirs.
+ Once I figured where the logs were, found that JAVA_HOME was not being
exported (don't need this in hadoop-2.0.5 for instance).  Adding an
exported JAVA_HOME to my running shell which don't seem right but it took
care of it (I gave up pretty quick on messing w/
yarn.nodemanager.env-whitelist and yarn.nodemanager.admin-env -- I wasn't
getting anywhere)
+ This did not seem to work for me:
hadoop.security.group.mapping
org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.
 It just did this:

Caused by: java.lang.UnsatisfiedLinkError:
org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative()V
at org.apache.hadoop.security.JniBasedUnixGroupsMapping.anchorNative(Native
Method)
at
org.apache.hadoop.security.JniBasedUnixGroupsMapping.(JniBasedUnixGroupsMapping.java:49)
at
org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback.(JniBasedUnixGroupsMappingWithFallback.java:38)

..so I replaced it
w/ org.apache.hadoop.security.ShellBasedUnixGroupsMapping on the hbase-side
to get my cluster up and running.

+ Untarring the bin dir, it undoes as hadoop-X.Y.Z-beta.  Undoing the src
dir it undoes as hadoop-X.Y.Z-beta-src.  I'd have thought they would undo
into the one directory overlaying each other.

St.Ack


Re: Secure deletion of blocks

2013-08-20 Thread Matt Fellows
Thanks for the heads up, but I think I've managed to implement it crudely by 
overwriting sequentially with 1s, 0s and random bytes and tested it 
successfully on an ext4 partition. 


  


I tested it by dd-ing the entire partition to a file, confirming a 
particular string was not present with strings, uploaded a large file with a 
chosen string repeated in it many times, dd'd the partition to confirm it was 
present, issued a delete, repeated the test and confirmed it had been removed. 


  


I'm sure some journal information may be leaked, but the entire block can't 
be reconstructed from the journal else your disk would be halved in useable 
size right?

—
Sent from Mailbox for iPhone

On Tue, Aug 20, 2013 at 8:43 PM, Colin McCabe 
wrote:

>> If I've got the right idea about this at all?
> From the man page for wipe(1);
> "Journaling filesystems (such as Ext3 or ReiserFS) are now being used by
> default by most Linux distributions. No secure deletion program that does
> filesystem-level calls can sanitize files on such filesystems, because
> sensitive data and metadata can be written to the journal, which cannot be
> readily accessed. Per-file secure deletion is better implemented in the
> operating system."
> You might be able to work around this by turning off the journal on these
> filesystems.  But even then, you've got issues like the drive remapping bad
> sectors (and leaving around the old ones), flash firmware that is unable to
> erase less than an erase block, etc.
> The simplest solution is probably just to use full-disk encryption.  Then
> you don't need any code changes at all.
> Doing something like invoking shred on the block files could improve
> security somewhat, but it's not going to work all the time.
> Colin
> On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
> matt.fell...@bespokesoftware.com> wrote:
>> Hi,
>> I'm looking into writing a patch for HDFS which will provide a new method
>> within HDFS which can securely delete the contents of a block on all the
>> nodes upon which it exists. By securely delete I mean, overwrite with
>> 1's/0's/random data cyclically such that the data could not be recovered
>> forensically.
>>
>> I'm not currently aware of any existing code / methods which provide this,
>> so was going to implement this myself.
>>
>> I figured the DataNode.java was probably the place to start looking into
>> how this could be done, so I've read the source for this, but it's not
>> really enlightened me a massive amount.
>>
>> I'm assuming I need to tell the NameServer that all DataNodes with a
>> particular block id would be required to be deleted, then as each DataNode
>> calls home, the DataNode would be instructed to securely delete the
>> relevant block, and it would oblige.
>>
>> Unfortunately I have no idea where to begin and was looking for some
>> pointers?
>>
>> I guess specifically I'd like to know:
>>
>> 1. Where the hdfs CLI commands are implemented
>> 2. How a DataNode identifies a block / how a NameServer could inform a
>> DataNode to delete a block
>> 3. Where the existing "delete" is implemented so I can make sure my secure
>> delete makes use of it after successfully blanking the block contents
>> 4. If I've got the right idea about this at all?
>>
>> Kind regards,
>> Matt Fellows
>>
>> --
>> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
>>  First Option Software Ltd
>> Signal House
>> Jacklyns Lane
>> Alresford
>> SO24 9JJ
>> Tel: +44 (0)1962 738232
>> Mob: +44 (0)7710 160458
>> Fax: +44 (0)1962 600112
>> Web: www.b 
>> espokesoftware.com
>>
>> __**__
>>
>> This is confidential, non-binding and not company endorsed - see full
>> terms at 
>> www.fosolutions.co.uk/**emailpolicy.html
>>
>> First Option Software Ltd Registered No. 06340261
>> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
>> __**__
>>
>>
-- 


This is confidential, non-binding and not company endorsed - see full terms 
at www.fosolutions.co.uk/emailpolicy.html 
First Option Software Ltd Registered No. 06340261
Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.




Re: [VOTE] Release Apache Hadoop 2.0.6-alpha (RC1)

2013-08-20 Thread Konstantin Shvachko
+1

Did the same as with rc0.
Works for me.

Thanks,
--Konst


On Thu, Aug 15, 2013 at 10:29 PM, Konstantin Boudnik  wrote:

> All,
>
> I have created a release candidate (rc1) for hadoop-2.0.6-alpha that I
> would
> like to release.
>
> This is a stabilization release that includes fixed for a couple a of
> issues
> as outlined on the security list.
>
> The RC is available at:
> http://people.apache.org/~cos/hadoop-2.0.6-alpha-rc1/
> The RC tag in svn is here:
> http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.6-alpha-rc1
>
> The maven artifacts are available via repository.apache.org.
>
> The only difference between rc0 and rc1 is ASL added to releasenotes.html
> and
> updated release dates in CHANGES.txt files.
>
> Please try the release bits and vote; the vote will run for the usual 7
> days.
>
> Thanks for your voting
>   Cos
>
>


[jira] [Created] (HDFS-5117) Allow the owner of an HDFS path to be a group

2013-08-20 Thread Ryan Hennig (JIRA)
Ryan Hennig created HDFS-5117:
-

 Summary: Allow the owner of an HDFS path to be a group
 Key: HDFS-5117
 URL: https://issues.apache.org/jira/browse/HDFS-5117
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: hdfs-client
Reporter: Ryan Hennig


At eBay, we have the need to associate some HDFS paths with a set of users with 
write access, a set of users with read-only access, and neither read or write 
to others.

The current model of POSIX-style permissions is nearly sufficient for this, 
except for the need of multiple writers.

One easy fix would be to allow the owner of a path to be a group, and then 
grant owner permissions to all members of that group.  I have verified that HDP 
1.3 allows you to set the owner of a path to a group without error, but the 
owner permissions of that group are not given to members of the group.

I've created a relatively simple fix for this by modifying the "check" method 
in src/hdfs/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java and 
I am working on related changes to unit tests etc now.

- Ryan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Secure deletion of blocks

2013-08-20 Thread Colin McCabe
Just to clarify, ext4 has the option to turn off journalling.  ext3 does
not.  Not sure about reiser.

Colin


On Tue, Aug 20, 2013 at 12:42 PM, Colin McCabe wrote:

> > If I've got the right idea about this at all?
>
> From the man page for wipe(1);
>
> "Journaling filesystems (such as Ext3 or ReiserFS) are now being used by
> default by most Linux distributions. No secure deletion program that does
> filesystem-level calls can sanitize files on such filesystems, because
> sensitive data and metadata can be written to the journal, which cannot be
> readily accessed. Per-file secure deletion is better implemented in the
> operating system."
>
> You might be able to work around this by turning off the journal on these
> filesystems.  But even then, you've got issues like the drive remapping bad
> sectors (and leaving around the old ones), flash firmware that is unable to
> erase less than an erase block, etc.
>
> The simplest solution is probably just to use full-disk encryption.  Then
> you don't need any code changes at all.
>
> Doing something like invoking shred on the block files could improve
> security somewhat, but it's not going to work all the time.
>
> Colin
>
>
> On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
> matt.fell...@bespokesoftware.com> wrote:
>
>> Hi,
>> I'm looking into writing a patch for HDFS which will provide a new method
>> within HDFS which can securely delete the contents of a block on all the
>> nodes upon which it exists. By securely delete I mean, overwrite with
>> 1's/0's/random data cyclically such that the data could not be recovered
>> forensically.
>>
>> I'm not currently aware of any existing code / methods which provide
>> this, so was going to implement this myself.
>>
>> I figured the DataNode.java was probably the place to start looking into
>> how this could be done, so I've read the source for this, but it's not
>> really enlightened me a massive amount.
>>
>> I'm assuming I need to tell the NameServer that all DataNodes with a
>> particular block id would be required to be deleted, then as each DataNode
>> calls home, the DataNode would be instructed to securely delete the
>> relevant block, and it would oblige.
>>
>> Unfortunately I have no idea where to begin and was looking for some
>> pointers?
>>
>> I guess specifically I'd like to know:
>>
>> 1. Where the hdfs CLI commands are implemented
>> 2. How a DataNode identifies a block / how a NameServer could inform a
>> DataNode to delete a block
>> 3. Where the existing "delete" is implemented so I can make sure my
>> secure delete makes use of it after successfully blanking the block contents
>> 4. If I've got the right idea about this at all?
>>
>> Kind regards,
>> Matt Fellows
>>
>> --
>> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
>>  First Option Software Ltd
>> Signal House
>> Jacklyns Lane
>> Alresford
>> SO24 9JJ
>> Tel: +44 (0)1962 738232
>> Mob: +44 (0)7710 160458
>> Fax: +44 (0)1962 600112
>> Web: www.b 
>> espokesoftware.com
>>
>> __**__
>>
>> This is confidential, non-binding and not company endorsed - see full
>> terms at 
>> www.fosolutions.co.uk/**emailpolicy.html
>>
>> First Option Software Ltd Registered No. 06340261
>> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
>> __**__
>>
>>
>


Re: Secure deletion of blocks

2013-08-20 Thread Colin McCabe
> If I've got the right idea about this at all?

>From the man page for wipe(1);

"Journaling filesystems (such as Ext3 or ReiserFS) are now being used by
default by most Linux distributions. No secure deletion program that does
filesystem-level calls can sanitize files on such filesystems, because
sensitive data and metadata can be written to the journal, which cannot be
readily accessed. Per-file secure deletion is better implemented in the
operating system."

You might be able to work around this by turning off the journal on these
filesystems.  But even then, you've got issues like the drive remapping bad
sectors (and leaving around the old ones), flash firmware that is unable to
erase less than an erase block, etc.

The simplest solution is probably just to use full-disk encryption.  Then
you don't need any code changes at all.

Doing something like invoking shred on the block files could improve
security somewhat, but it's not going to work all the time.

Colin


On Thu, Aug 15, 2013 at 5:31 AM, Matt Fellows <
matt.fell...@bespokesoftware.com> wrote:

> Hi,
> I'm looking into writing a patch for HDFS which will provide a new method
> within HDFS which can securely delete the contents of a block on all the
> nodes upon which it exists. By securely delete I mean, overwrite with
> 1's/0's/random data cyclically such that the data could not be recovered
> forensically.
>
> I'm not currently aware of any existing code / methods which provide this,
> so was going to implement this myself.
>
> I figured the DataNode.java was probably the place to start looking into
> how this could be done, so I've read the source for this, but it's not
> really enlightened me a massive amount.
>
> I'm assuming I need to tell the NameServer that all DataNodes with a
> particular block id would be required to be deleted, then as each DataNode
> calls home, the DataNode would be instructed to securely delete the
> relevant block, and it would oblige.
>
> Unfortunately I have no idea where to begin and was looking for some
> pointers?
>
> I guess specifically I'd like to know:
>
> 1. Where the hdfs CLI commands are implemented
> 2. How a DataNode identifies a block / how a NameServer could inform a
> DataNode to delete a block
> 3. Where the existing "delete" is implemented so I can make sure my secure
> delete makes use of it after successfully blanking the block contents
> 4. If I've got the right idea about this at all?
>
> Kind regards,
> Matt Fellows
>
> --
> [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
>  First Option Software Ltd
> Signal House
> Jacklyns Lane
> Alresford
> SO24 9JJ
> Tel: +44 (0)1962 738232
> Mob: +44 (0)7710 160458
> Fax: +44 (0)1962 600112
> Web: www.b 
> espokesoftware.com
>
> __**__
>
> This is confidential, non-binding and not company endorsed - see full
> terms at 
> www.fosolutions.co.uk/**emailpolicy.html
>
> First Option Software Ltd Registered No. 06340261
> Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
> __**__
>
>


[jira] [Created] (HDFS-5116) TestHDFSCLI fails on OS X

2013-08-20 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-5116:
---

 Summary: TestHDFSCLI fails on OS X
 Key: HDFS-5116
 URL: https://issues.apache.org/jira/browse/HDFS-5116
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: OS X 10.8.4, JDK 1.6.0.51
Reporter: Arpit Agarwal


Exception details

{code}
Running org.apache.hadoop.cli.TestHDFSCLI
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 44.714 sec <<< 
FAILURE!
testAll(org.apache.hadoop.cli.TestHDFSCLI)  Time elapsed: 44334 sec  <<< 
FAILURE!
java.lang.AssertionError: One of the tests failed. See the Detailed results to 
identify the command that failed
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.hadoop.cli.CLITestHelper.displayResults(CLITestHelper.java:264)
at org.apache.hadoop.cli.CLITestHelper.tearDown(CLITestHelper.java:126)
at org.apache.hadoop.cli.TestHDFSCLI.tearDown(TestHDFSCLI.java:85)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.Nati
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5115) Make StorageID a UUID

2013-08-20 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-5115:
---

 Summary: Make StorageID a UUID
 Key: HDFS-5115
 URL: https://issues.apache.org/jira/browse/HDFS-5115
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: datanode
Affects Versions: 3.0.0
Reporter: Arpit Agarwal
Assignee: Arpit Agarwal


The Storage ID is currently generated from the DataNode's IP+Port+Random 
components. This scheme will not work when we have separate Storage IDs for 
each storage directory as there is a possibility of conflicts when an 
unreliable storage is intermittently offline.

Converting it to a UUID makes collisions very unlikely.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5114) getMaxNodesPerRack() in BlockPlacementPolicyDefault does not take decommissioning nodes into account.

2013-08-20 Thread Kihwal Lee (JIRA)
Kihwal Lee created HDFS-5114:


 Summary: getMaxNodesPerRack() in BlockPlacementPolicyDefault does 
not take decommissioning nodes into account.
 Key: HDFS-5114
 URL: https://issues.apache.org/jira/browse/HDFS-5114
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Kihwal Lee


If a large proportion of data nodes are being decommissioned, one or more racks 
may not be writable. However this is not taken into account when the default 
block placement policy module invokes getMaxNodesPerRack(). Some blocks, 
especially the ones with a high replication factor, may not be able to fully 
replicated until those nodes are taken out of dfs.include.  It can actually 
block decommissioning itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HDFS-5113) Implement POST method to upload the checkpoint

2013-08-20 Thread Vinay (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-5113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinay resolved HDFS-5113.
-

Resolution: Duplicate

> Implement POST method to upload the checkpoint
> --
>
> Key: HDFS-5113
> URL: https://issues.apache.org/jira/browse/HDFS-5113
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.0.0, 2.1.0-beta
>Reporter: Vinay
>Assignee: Vinay
>
> We have seen many issues related to Checkpoint upload. 
> Currently upload happens with 2 GET requests.
> One limitation here is, both checkpointer and the target should have the http 
> server running.
> So, to improve further, uploading checkpoint via POST/PUT HTTP request will 
> make the upload clear and one way communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HDFS-5113) Implement POST method to upload the checkpoint

2013-08-20 Thread Vinay (JIRA)
Vinay created HDFS-5113:
---

 Summary: Implement POST method to upload the checkpoint
 Key: HDFS-5113
 URL: https://issues.apache.org/jira/browse/HDFS-5113
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: namenode
Affects Versions: 3.0.0, 2.1.0-beta
Reporter: Vinay
Assignee: Vinay


We have seen many issues related to Checkpoint upload. 
Currently upload happens with 2 GET requests.
One limitation here is, both checkpointer and the target should have the http 
server running.

So, to improve further, uploading checkpoint via POST/PUT HTTP request will 
make the upload clear and one way communication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Which SVN location should i checkout for creating patch to branch-1 (HDFS-2933) ?

2013-08-20 Thread Suresh Srinivas
Vivek,

The current branch where features/bug fixes go to for release 1.x is
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1

1.x releases are created from the above branch. A new branch is created and
the release is made from that branch. For example,  you can see branch
corresponding to release 1.2 at
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2/

BTW you can look at the branches by running the command:
svn ls https://svn.apache.org/repos/asf/hadoop/common/branches

When posting branch-1 patch, sometimes porting from trunk to branch-1 is
straightforward. It just requires changing the path to the files that are
changed in the patch. The path change is due to trunk moving to maven
builds from ant in branch-1. Sometimes the port is not straightforward, due
to code differences between the branches and might require substantial
work. That said, for the issue HDFS-2933, porting the patch to branch-1 may
not be necessary, given it is not a big issue.


Regards,
Suresh


On Tue, Aug 20, 2013 at 7:16 AM, Vivek Ganesan wrote:

> Hi,
>
> I am a new contributor to Hadoop.  In HDFS-2933 <
> https://issues.apache.org/**jira/browse/HDFS-2933?page=**
> com.atlassian.jira.plugin.**system.issuetabpanels:comment-**
> tabpanel&focusedCommentId=**13744679#comment-13744679>
> (/Datanode index page on debug port not useful/), which I am currently
> working on, it has been suggested to attach a patch for 'branch-1'.
>
> /--"Does the same fix work in branch-1? If so you could consider
> attaching a patch for branch-1 although that certainly won't prevent
> getting this in trunk."/
>
> Could anyone tell me which SVN repository location should I checkout in
> order to work on 'branch-1'?  Do the instructions in
> http://wiki.apache.org/hadoop/**HowToContributehold
>  good for that branch too?
>
> Thanks in advance.
>
> Regards,
> Vivek Ganesan
>



-- 
http://hortonworks.com/download/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Created] (HDFS-5112) NetWorkTopology#countNumOfAvailableNodes() is returning wrong value if excluded nodes passed are not part of the cluster tree

2013-08-20 Thread Vinay (JIRA)
Vinay created HDFS-5112:
---

 Summary: NetWorkTopology#countNumOfAvailableNodes() is returning 
wrong value if excluded nodes passed are not part of the cluster tree
 Key: HDFS-5112
 URL: https://issues.apache.org/jira/browse/HDFS-5112
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.0.5-alpha, 3.0.0
Reporter: Vinay
Assignee: Vinay


I got "File /hdfs_COPYING_ could only be replicated to 0 nodes instead of 
minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are 
excluded in this operation." in the following case

1. 2 DNs cluster,
2. One of the datanodes was not responding from last 10 min, but about to 
detect as dead at NN.
3. Tried to write one file, for the block NN allocated both DNs.
4. Client While creating the pipeline took some time to detect one node failure.
5. Before client detects pipeline failure, NN side dead node was removed from 
cluster map.
6. Now, client has abandoned previous block and asked for new block with dead 
node in excluded list and got above exception even though one more node was 
available live.

When I dig this more, found that,
{{NetWorkTopology#countNumOfAvailableNodes()}} is not giving correct count when 
the excludeNodes passed from client are not part of the cluster map.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Which SVN location should i checkout for creating patch to branch-1 (HDFS-2933) ?

2013-08-20 Thread Vivek Ganesan

Hi,

I am a new contributor to Hadoop.  In HDFS-2933 
 
(/Datanode index page on debug port not useful/), which I am currently 
working on, it has been suggested to attach a patch for 'branch-1'.


/--"Does the same fix work in branch-1? If so you could consider 
attaching a patch for branch-1 although that certainly won't prevent 
getting this in trunk."/


Could anyone tell me which SVN repository location should I checkout in 
order to work on 'branch-1'?  Do the instructions in 
http://wiki.apache.org/hadoop/HowToContribute hold good for that branch too?


Thanks in advance.

Regards,
Vivek Ganesan


Jenkins build is back to normal : Hadoop-Hdfs-trunk #1497

2013-08-20 Thread Apache Jenkins Server
See 



Build failed in Jenkins: Hadoop-Hdfs-0.23-Build #705

2013-08-20 Thread Apache Jenkins Server
See 

--
[...truncated 7673 lines...]
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[270,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[281,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[10533,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[10544,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[8357,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[8368,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[12641,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[12652,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[9741,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[9752,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[1781,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[1792,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[5338,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[5349,30]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[6290,37]
 cannot find symbol
[ERROR] symbol  : class Parser
[ERROR] location: package com.google.protobuf
[ERROR] 
:[6301,30]
 cannot find sym

Hadoop-Hdfs-0.23-Build - Build # 705 - Still Failing

2013-08-20 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/705/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 7866 lines...]
[ERROR] location: class com.google.protobuf.InvalidProtocolBufferException
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3313,27]
 cannot find symbol
[ERROR] symbol  : method 
setUnfinishedMessage(org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto)
[ERROR] location: class com.google.protobuf.InvalidProtocolBufferException
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3319,8]
 cannot find symbol
[ERROR] symbol  : method makeExtensionsImmutable()
[ERROR] location: class 
org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpWriteBlockProto
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3330,10]
 cannot find symbol
[ERROR] symbol  : method 
ensureFieldAccessorsInitialized(java.lang.Class,java.lang.Class)
[ERROR] location: class com.google.protobuf.GeneratedMessage.FieldAccessorTable
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3335,31]
 cannot find symbol
[ERROR] symbol  : class AbstractParser
[ERROR] location: package com.google.protobuf
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[3344,4]
 method does not override or implement a method from a supertype
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[4098,12]
 cannot find symbol
[ERROR] symbol  : method 
ensureFieldAccessorsInitialized(java.lang.Class,java.lang.Class)
[ERROR] location: class com.google.protobuf.GeneratedMessage.FieldAccessorTable
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[4371,104]
 cannot find symbol
[ERROR] symbol  : method getUnfinishedMessage()
[ERROR] location: class com.google.protobuf.InvalidProtocolBufferException
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5264,8]
 getUnknownFields() in 
org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpTransferBlockProto 
cannot override getUnknownFields() in com.google.protobuf.GeneratedMessage; 
overridden method is final
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5284,19]
 cannot find symbol
[ERROR] symbol  : method 
parseUnknownField(com.google.protobuf.CodedInputStream,com.google.protobuf.UnknownFieldSet.Builder,com.google.protobuf.ExtensionRegistryLite,int)
[ERROR] location: class 
org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpTransferBlockProto
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5314,15]
 cannot find symbol
[ERROR] symbol  : method 
setUnfinishedMessage(org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpTransferBlockProto)
[ERROR] location: class com.google.protobuf.InvalidProtocolBufferException
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5317,27]
 cannot find symbol
[ERROR] symbol  : method 
setUnfinishedMessage(org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos.OpTransferBlockProto)
[ERROR] location: class com.google.protobuf.InvalidProtocolBufferException
[ERROR] 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/hadoop-hdfs/target/generated-sources/java/org/apache/hadoop/hdfs/protocol/proto/DataTransferProtos.java:[5323,8]
 cannot find symbol
[ERROR] symbol  : method makeExtensionsImmutable()
[ERROR] location: class 
org.apache.hadoop.hdfs

Implement directory/table level access control in HDFS

2013-08-20 Thread Erik fang
Hi folks,


HDFS has a POSIX-like permission model, using R,W,X and owner, group, other
for access control. It is good most of the time, except for:

1. Data need to be shared among users

group can be used for access control, and the users has to be in the same
GROUP as the data. the GROUP here stand for the sharing relationship
between users and data. If many sharing relationships exists, there are
many groups. It is hard to manage.

2. Hive

Hive use a table based access control model, user can have SELECT,  UPDATE,
CREATE, DROP privileges on certain table, which means R/W permission in
HDFS. However, Hive’s table based authorization doesn’t match HDFS’s
POSIX-like model. For hive user accessing HDFS, Group permissions can be
deployed, which introduces many groups, or big groups contains many sharing
relationship.

Inspired by RDBMS’s way of manage data, a  directory level access control
based on authorized user impersonate can be implemented as a extension to
POSIX-like permission model.

it consist of:

1. ACLFileSystem

2. authorization manager: hold access control information and a shared
secret with namenode

3. authenticator(embedded in namenode)

Take hive as a example, owner of the data is user DW. The procedure is:

1. user submit a hive query or a hcatalog job to access DW’s data, we can
get the read table/partition and write table/partition, and the
corresponding hdfs path. Then a RPC call to authorization manager is
invoked, send

{user, tablename, table_path, w/r}

2. authorization manager do a authorization check to find whether it is
allowed. If allowed, reply a encrypted tablepath:

{realuser, encrypted(tablepath+w/r)}

realuser here stand for the owner of the requested data

3. ACLFilesystem extends FileSystem and when a open(path) call is invoked ,
it replace the path to encrypted(tablepath+w/r) and invoke the namenode RPC
call, such as

open(realuser, encrypted(tablepath+w/r), null)

If the user is requesting a partition path, the rpc call can be invoked as

open(realuser, encrypted(tablepath+w/r), path_suffix)

4. Namenode pick up the RPC call, decrypt the encrypted(hdfspath+w/r) with
the shared secret to verify whether it is fake. If it is true, check w/r
operation, join the  tablepath and path_suffix, and invoke the call as
hdfspath owner, for example user DW.


delegation token or something else can be used as the shared secret, and
authorization manager can be integrated into hive metastore.

In general, I propose a HDFS user impersonate mechanism and a authorization
mechanism based on HDFS user impersonation.

If the community is interested, I will file a jira for HDFS user
impersonation and a jira for authorization manager soon.


Thoughts?

Thanks a lot
Erik.fang