[jira] [Resolved] (HADOOP-9987) HDFS Compatible ViewFileSystem

2014-05-29 Thread Gera Shegalov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gera Shegalov resolved HADOOP-9987.
---

Resolution: Duplicate

 HDFS Compatible ViewFileSystem
 --

 Key: HADOOP-9987
 URL: https://issues.apache.org/jira/browse/HADOOP-9987
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Lohit Vijayarenu
 Fix For: 2.0.6-alpha


 There are multiple scripts and projects like pig, hive, elephantbird refer to 
 HDFS URI as hdfs://namenodehostport/ or hdfs:/// . In federated namespace 
 this causes problem because supported scheme for federation is viewfs:// . We 
 will have to force all users to change their scripts/programs to be able to 
 access federated cluster. 
 It would be great if thee was a way to map viewfs scheme to hdfs scheme 
 without exposing it to users. Opening this JIRA to get inputs from people who 
 have thought about this in their clusters.
 In our clusters we ended up created another class 
 HDFSCompatibleViewFileSystem which hijacks both hdfs.fs.impl and 
 viewfs.fs.impl and passes down filesystem calls to ViewFileSystem. Is there 
 any suggested approach other than this?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10637) Add snapshot and several dfsadmin tests into TestCLI

2014-05-29 Thread Dasha Boudnik (JIRA)
Dasha Boudnik created HADOOP-10637:
--

 Summary: Add snapshot and several dfsadmin tests into TestCLI
 Key: HADOOP-10637
 URL: https://issues.apache.org/jira/browse/HADOOP-10637
 Project: Hadoop Common
  Issue Type: Improvement
  Components: test
Reporter: Dasha Boudnik


Add the following commands to TestCLI:
appendToFile
text
rmdir
rmdir with ignore-fail-on-non-empty
df
expunge
getmerge
allowSnapshot
disallowSnapshot
createSnapshot
renameSnapshot
deleteSnapshot
refreshUserToGroupsMappings
refreshSuperUserGroupsConfiguration
setQuota
clrQuota
setSpaceQuota
setBalancerBandwidth
finalizeUpgrade



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Change proposal for FileInputFormat isSplitable

2014-05-29 Thread Steve Loughran
On 28 May 2014 20:50, Niels Basjes ni...@basjes.nl wrote:

 Hi,

 Last week I ran into this problem again
 https://issues.apache.org/jira/browse/MAPREDUCE-2094

 What happens here is that the default implementation of the isSplitable
 method in FileInputFormat is so unsafe that just about everyone who
 implements a new subclass is likely to get this wrong. The effect of
 getting this wrong is that all unit tests succeed and running it against
 'large' input files (64MiB) that are compressed using a non-splittable
 compression (often Gzip) will cause the input to be fed into the mappers
 multiple time (i.e. you get garbage results without ever seeing any
 errors).

 Last few days I was at Berlin buzzwords talking to someone about this bug


that was me, I recall.


 and this resulted in the following proposal which I would like your
 feedback on.

 1) This is a change that will break backwards compatibility (deliberate
 choice).
 2) The FileInputFormat will get 3 methods (the old isSplitable with the
 typo of one 't' in the name will disappear):
 (protected) isSplittableContainer -- true unless compressed with
 non-splittable compression.
 (protected) isSplittableContent -- abstract, MUST be implemented by
 the subclass
 (public)  isSplittable -- isSplittableContainer 
 isSplittableContent

 The idea is that only the isSplittable is used by other classes to know if
 this is a splittable file.
 The effect I hope to get is that a developer writing their own
 fileinputformat (which I alone have done twice so far) is 'forced' and
 'helped' getting this right.


I could see making the attributes more explicit would be good -but stopping
everything that exists from working isn't going to fly.

what about some subclass, AbstractSplittableFileInputFormat that implements
the container properly, requires that content one -and then calculates
IsSplitable() from the results? Existing code: no change, new formats can
descend from this (and built in ones retrofitted).



 The reason for me to propose this as an incompatible change is that this
 way I hope to eradicate some of the existing bugs in custom implementations
 'out there'.

 P.S. If you agree to this change then I'm willing to put my back into it
 and submit a patch.

 --
 Best regards,

 Niels Basjes


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Change proposal for FileInputFormat isSplitable

2014-05-29 Thread Matt Fellows
I could be missing something, but couldn't you just deprecate isSplitable
(spelled incorrectly) and create a new isSplittable as described?


On Thu, May 29, 2014 at 10:34 AM, Steve Loughran ste...@hortonworks.com
wrote:

 On 28 May 2014 20:50, Niels Basjes ni...@basjes.nl wrote:

  Hi,
 
  Last week I ran into this problem again
  https://issues.apache.org/jira/browse/MAPREDUCE-2094
 
  What happens here is that the default implementation of the isSplitable
  method in FileInputFormat is so unsafe that just about everyone who
  implements a new subclass is likely to get this wrong. The effect of
  getting this wrong is that all unit tests succeed and running it against
  'large' input files (64MiB) that are compressed using a non-splittable
  compression (often Gzip) will cause the input to be fed into the mappers
  multiple time (i.e. you get garbage results without ever seeing any
  errors).
 
  Last few days I was at Berlin buzzwords talking to someone about this bug
 

 that was me, I recall.


  and this resulted in the following proposal which I would like your
  feedback on.
 
  1) This is a change that will break backwards compatibility (deliberate
  choice).
  2) The FileInputFormat will get 3 methods (the old isSplitable with the
  typo of one 't' in the name will disappear):
  (protected) isSplittableContainer -- true unless compressed with
  non-splittable compression.
  (protected) isSplittableContent -- abstract, MUST be implemented by
  the subclass
  (public)  isSplittable -- isSplittableContainer 
  isSplittableContent
 
  The idea is that only the isSplittable is used by other classes to know
 if
  this is a splittable file.
  The effect I hope to get is that a developer writing their own
  fileinputformat (which I alone have done twice so far) is 'forced' and
  'helped' getting this right.
 

 I could see making the attributes more explicit would be good -but stopping
 everything that exists from working isn't going to fly.

 what about some subclass, AbstractSplittableFileInputFormat that implements
 the container properly, requires that content one -and then calculates
 IsSplitable() from the results? Existing code: no change, new formats can
 descend from this (and built in ones retrofitted).



  The reason for me to propose this as an incompatible change is that this
  way I hope to eradicate some of the existing bugs in custom
 implementations
  'out there'.
 
  P.S. If you agree to this change then I'm willing to put my back into it
  and submit a patch.
 
  --
  Best regards,
 
  Niels Basjes
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
[image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94]
 First Option Software Ltd
Signal House
Jacklyns Lane
Alresford
SO24 9JJ
Tel: +44 (0)1962 738232
Mob: +44 (0)7710 160458
Fax: +44 (0)1962 600112
Web: www.b http://www.fosolutions.co.uk/espokesoftware.com
http://bespokesoftware.com/

-- 


This is confidential, non-binding and not company endorsed - see full terms 
at www.fosolutions.co.uk/emailpolicy.html 
First Option Software Ltd Registered No. 06340261
Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.




Re: Change proposal for FileInputFormat isSplitable

2014-05-29 Thread Niels Basjes
My original proposal (from about 3 years ago) was to change the isSplitable
method to return a safe default ( you can see that in the patch that is
still attached to that Jira issue).
For arguments I still do not fully understand this was rejected by Todd and
Doug.

So that is why my new proposal is to deprecate (remove!) the old method
with the typo in Hadoop 3.0 and replace it with something correct and less
error prone.
Given the fact that this would happen in a major version jump I thought
that would be the right time to do that.

Niels


On Thu, May 29, 2014 at 11:34 AM, Steve Loughran ste...@hortonworks.comwrote:

 On 28 May 2014 20:50, Niels Basjes ni...@basjes.nl wrote:

  Hi,
 
  Last week I ran into this problem again
  https://issues.apache.org/jira/browse/MAPREDUCE-2094
 
  What happens here is that the default implementation of the isSplitable
  method in FileInputFormat is so unsafe that just about everyone who
  implements a new subclass is likely to get this wrong. The effect of
  getting this wrong is that all unit tests succeed and running it against
  'large' input files (64MiB) that are compressed using a non-splittable
  compression (often Gzip) will cause the input to be fed into the mappers
  multiple time (i.e. you get garbage results without ever seeing any
  errors).
 
  Last few days I was at Berlin buzzwords talking to someone about this bug
 

 that was me, I recall.


  and this resulted in the following proposal which I would like your
  feedback on.
 
  1) This is a change that will break backwards compatibility (deliberate
  choice).
  2) The FileInputFormat will get 3 methods (the old isSplitable with the
  typo of one 't' in the name will disappear):
  (protected) isSplittableContainer -- true unless compressed with
  non-splittable compression.
  (protected) isSplittableContent -- abstract, MUST be implemented by
  the subclass
  (public)  isSplittable -- isSplittableContainer 
  isSplittableContent
 
  The idea is that only the isSplittable is used by other classes to know
 if
  this is a splittable file.
  The effect I hope to get is that a developer writing their own
  fileinputformat (which I alone have done twice so far) is 'forced' and
  'helped' getting this right.
 

 I could see making the attributes more explicit would be good -but stopping
 everything that exists from working isn't going to fly.

 what about some subclass, AbstractSplittableFileInputFormat that implements
 the container properly, requires that content one -and then calculates
 IsSplitable() from the results? Existing code: no change, new formats can
 descend from this (and built in ones retrofitted).



  The reason for me to propose this as an incompatible change is that this
  way I hope to eradicate some of the existing bugs in custom
 implementations
  'out there'.
 
  P.S. If you agree to this change then I'm willing to put my back into it
  and submit a patch.
 
  --
  Best regards,
 
  Niels Basjes
 

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Best regards / Met vriendelijke groeten,

Niels Basjes


Re: Change proposal for FileInputFormat isSplitable

2014-05-29 Thread Jay Vyas
I think breaking backwards compat is sensible since It's easily caught by the 
compiler and  in this case where the alternative is a 
Runtime error that can result in terabytes of mucked up output.

 On May 29, 2014, at 6:11 AM, Matt Fellows matt.fell...@bespokesoftware.com 
 wrote:
 
 As someone who doesn't really contribute, just lurks, I could well be 
 misinformed or under-informed, but I don't see why we can't deprecate a 
 method which could cause dangerous side effects?  
 People can still use the deprecated methods for backwards compatibility, but 
 are discouraged by compiler warnings, and any changes they write to their 
 code can start to use the new functionality?
 
 *Apologies if I'm stepping into a Hadoop holy war here
 
 
 On Thu, May 29, 2014 at 10:47 AM, Niels Basjes ni...@basjes.nl wrote:
 My original proposal (from about 3 years ago) was to change the isSplitable
 method to return a safe default ( you can see that in the patch that is
 still attached to that Jira issue).
 For arguments I still do not fully understand this was rejected by Todd and
 Doug.
 
 So that is why my new proposal is to deprecate (remove!) the old method
 with the typo in Hadoop 3.0 and replace it with something correct and less
 error prone.
 Given the fact that this would happen in a major version jump I thought
 that would be the right time to do that.
 
 Niels
 
 
 On Thu, May 29, 2014 at 11:34 AM, Steve Loughran 
 ste...@hortonworks.comwrote:
 
  On 28 May 2014 20:50, Niels Basjes ni...@basjes.nl wrote:
 
   Hi,
  
   Last week I ran into this problem again
   https://issues.apache.org/jira/browse/MAPREDUCE-2094
  
   What happens here is that the default implementation of the isSplitable
   method in FileInputFormat is so unsafe that just about everyone who
   implements a new subclass is likely to get this wrong. The effect of
   getting this wrong is that all unit tests succeed and running it against
   'large' input files (64MiB) that are compressed using a non-splittable
   compression (often Gzip) will cause the input to be fed into the mappers
   multiple time (i.e. you get garbage results without ever seeing any
   errors).
  
   Last few days I was at Berlin buzzwords talking to someone about this bug
  
 
  that was me, I recall.
 
 
   and this resulted in the following proposal which I would like your
   feedback on.
  
   1) This is a change that will break backwards compatibility (deliberate
   choice).
   2) The FileInputFormat will get 3 methods (the old isSplitable with the
   typo of one 't' in the name will disappear):
   (protected) isSplittableContainer -- true unless compressed with
   non-splittable compression.
   (protected) isSplittableContent -- abstract, MUST be implemented by
   the subclass
   (public)  isSplittable -- isSplittableContainer 
   isSplittableContent
  
   The idea is that only the isSplittable is used by other classes to know
  if
   this is a splittable file.
   The effect I hope to get is that a developer writing their own
   fileinputformat (which I alone have done twice so far) is 'forced' and
   'helped' getting this right.
  
 
  I could see making the attributes more explicit would be good -but stopping
  everything that exists from working isn't going to fly.
 
  what about some subclass, AbstractSplittableFileInputFormat that implements
  the container properly, requires that content one -and then calculates
  IsSplitable() from the results? Existing code: no change, new formats can
  descend from this (and built in ones retrofitted).
 
 
 
   The reason for me to propose this as an incompatible change is that this
   way I hope to eradicate some of the existing bugs in custom
  implementations
   'out there'.
  
   P.S. If you agree to this change then I'm willing to put my back into it
   and submit a patch.
  
   --
   Best regards,
  
   Niels Basjes
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender immediately
  and delete it from your system. Thank You.
 
 
 
 
 --
 Best regards / Met vriendelijke groeten,
 
 Niels Basjes
 
 
 
 -- 
 
 First Option Software Ltd
 Signal House
 Jacklyns Lane
 Alresford
 SO24 9JJ
 Tel: +44 (0)1962 738232
 Mob: +44 (0)7710 160458 
 Fax: +44 (0)1962 600112
 Web: www.bespokesoftware.com
 
 
 
 This is confidential, non-binding and not company endorsed - see full terms 
 at www.fosolutions.co.uk/emailpolicy.html 
 First Option Software Ltd Registered No. 06340261
 Signal 

Re: Change proposal for FileInputFormat isSplitable

2014-05-29 Thread Niels Basjes
This is exactly why I'm proposing a change that will either 'fix silently'
(my original patch from 3 years ago) or 'break loudly' (my current
proposal) old implementations.
I'm convinced that ther are atleast 100 companies world wide that have a
custom implementation with this bug and have no clue they have been basing
descision upon silently corrupted data.


On Thu, May 29, 2014 at 1:21 PM, Jay Vyas jayunit...@gmail.com wrote:

 I think breaking backwards compat is sensible since It's easily caught by
 the compiler and  in this case where the alternative is a
 Runtime error that can result in terabytes of mucked up output.

  On May 29, 2014, at 6:11 AM, Matt Fellows 
 matt.fell...@bespokesoftware.com wrote:
 
  As someone who doesn't really contribute, just lurks, I could well be
 misinformed or under-informed, but I don't see why we can't deprecate a
 method which could cause dangerous side effects?
  People can still use the deprecated methods for backwards compatibility,
 but are discouraged by compiler warnings, and any changes they write to
 their code can start to use the new functionality?
 
  *Apologies if I'm stepping into a Hadoop holy war here
 
 
  On Thu, May 29, 2014 at 10:47 AM, Niels Basjes ni...@basjes.nl wrote:
  My original proposal (from about 3 years ago) was to change the
 isSplitable
  method to return a safe default ( you can see that in the patch that is
  still attached to that Jira issue).
  For arguments I still do not fully understand this was rejected by Todd
 and
  Doug.
 
  So that is why my new proposal is to deprecate (remove!) the old method
  with the typo in Hadoop 3.0 and replace it with something correct and
 less
  error prone.
  Given the fact that this would happen in a major version jump I thought
  that would be the right time to do that.
 
  Niels
 
 
  On Thu, May 29, 2014 at 11:34 AM, Steve Loughran 
 ste...@hortonworks.comwrote:
 
   On 28 May 2014 20:50, Niels Basjes ni...@basjes.nl wrote:
  
Hi,
   
Last week I ran into this problem again
https://issues.apache.org/jira/browse/MAPREDUCE-2094
   
What happens here is that the default implementation of the
 isSplitable
method in FileInputFormat is so unsafe that just about everyone who
implements a new subclass is likely to get this wrong. The effect of
getting this wrong is that all unit tests succeed and running it
 against
'large' input files (64MiB) that are compressed using a
 non-splittable
compression (often Gzip) will cause the input to be fed into the
 mappers
multiple time (i.e. you get garbage results without ever seeing any
errors).
   
Last few days I was at Berlin buzzwords talking to someone about
 this bug
   
  
   that was me, I recall.
  
  
and this resulted in the following proposal which I would like your
feedback on.
   
1) This is a change that will break backwards compatibility
 (deliberate
choice).
2) The FileInputFormat will get 3 methods (the old isSplitable with
 the
typo of one 't' in the name will disappear):
(protected) isSplittableContainer -- true unless compressed
 with
non-splittable compression.
(protected) isSplittableContent -- abstract, MUST be
 implemented by
the subclass
(public)  isSplittable -- isSplittableContainer 
isSplittableContent
   
The idea is that only the isSplittable is used by other classes to
 know
   if
this is a splittable file.
The effect I hope to get is that a developer writing their own
fileinputformat (which I alone have done twice so far) is 'forced'
 and
'helped' getting this right.
   
  
   I could see making the attributes more explicit would be good -but
 stopping
   everything that exists from working isn't going to fly.
  
   what about some subclass, AbstractSplittableFileInputFormat that
 implements
   the container properly, requires that content one -and then calculates
   IsSplitable() from the results? Existing code: no change, new formats
 can
   descend from this (and built in ones retrofitted).
  
  
  
The reason for me to propose this as an incompatible change is that
 this
way I hope to eradicate some of the existing bugs in custom
   implementations
'out there'.
   
P.S. If you agree to this change then I'm willing to put my back
 into it
and submit a patch.
   
--
Best regards,
   
Niels Basjes
   
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If the
 reader
   of this message is not the intended recipient, you are hereby
 notified that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
 immediately
   

Re: Change proposal for FileInputFormat isSplitable

2014-05-29 Thread Niels Basjes
I forgot to ask a relevant question: What made the original proposed
solution incompatible?
To me it still seems to be a clean backward compatible solution that fixes
this issue in a simple way.

Perhaps Todd can explain why?

Niels
On May 29, 2014 2:17 PM, Niels Basjes ni...@basjes.nl wrote:

 This is exactly why I'm proposing a change that will either 'fix silently'
 (my original patch from 3 years ago) or 'break loudly' (my current
 proposal) old implementations.
 I'm convinced that ther are atleast 100 companies world wide that have a
 custom implementation with this bug and have no clue they have been basing
 descision upon silently corrupted data.


 On Thu, May 29, 2014 at 1:21 PM, Jay Vyas jayunit...@gmail.com wrote:

 I think breaking backwards compat is sensible since It's easily caught by
 the compiler and  in this case where the alternative is a
 Runtime error that can result in terabytes of mucked up output.

  On May 29, 2014, at 6:11 AM, Matt Fellows 
 matt.fell...@bespokesoftware.com wrote:
 
  As someone who doesn't really contribute, just lurks, I could well be
 misinformed or under-informed, but I don't see why we can't deprecate a
 method which could cause dangerous side effects?
  People can still use the deprecated methods for backwards
 compatibility, but are discouraged by compiler warnings, and any changes
 they write to their code can start to use the new functionality?
 
  *Apologies if I'm stepping into a Hadoop holy war here
 
 
  On Thu, May 29, 2014 at 10:47 AM, Niels Basjes ni...@basjes.nl
 wrote:
  My original proposal (from about 3 years ago) was to change the
 isSplitable
  method to return a safe default ( you can see that in the patch that is
  still attached to that Jira issue).
  For arguments I still do not fully understand this was rejected by
 Todd and
  Doug.
 
  So that is why my new proposal is to deprecate (remove!) the old method
  with the typo in Hadoop 3.0 and replace it with something correct and
 less
  error prone.
  Given the fact that this would happen in a major version jump I thought
  that would be the right time to do that.
 
  Niels
 
 
  On Thu, May 29, 2014 at 11:34 AM, Steve Loughran 
 ste...@hortonworks.comwrote:
 
   On 28 May 2014 20:50, Niels Basjes ni...@basjes.nl wrote:
  
Hi,
   
Last week I ran into this problem again
https://issues.apache.org/jira/browse/MAPREDUCE-2094
   
What happens here is that the default implementation of the
 isSplitable
method in FileInputFormat is so unsafe that just about everyone who
implements a new subclass is likely to get this wrong. The effect
 of
getting this wrong is that all unit tests succeed and running it
 against
'large' input files (64MiB) that are compressed using a
 non-splittable
compression (often Gzip) will cause the input to be fed into the
 mappers
multiple time (i.e. you get garbage results without ever seeing any
errors).
   
Last few days I was at Berlin buzzwords talking to someone about
 this bug
   
  
   that was me, I recall.
  
  
and this resulted in the following proposal which I would like your
feedback on.
   
1) This is a change that will break backwards compatibility
 (deliberate
choice).
2) The FileInputFormat will get 3 methods (the old isSplitable
 with the
typo of one 't' in the name will disappear):
(protected) isSplittableContainer -- true unless compressed
 with
non-splittable compression.
(protected) isSplittableContent -- abstract, MUST be
 implemented by
the subclass
(public)  isSplittable -- isSplittableContainer 
isSplittableContent
   
The idea is that only the isSplittable is used by other classes to
 know
   if
this is a splittable file.
The effect I hope to get is that a developer writing their own
fileinputformat (which I alone have done twice so far) is 'forced'
 and
'helped' getting this right.
   
  
   I could see making the attributes more explicit would be good -but
 stopping
   everything that exists from working isn't going to fly.
  
   what about some subclass, AbstractSplittableFileInputFormat that
 implements
   the container properly, requires that content one -and then
 calculates
   IsSplitable() from the results? Existing code: no change, new
 formats can
   descend from this (and built in ones retrofitted).
  
  
  
The reason for me to propose this as an incompatible change is
 that this
way I hope to eradicate some of the existing bugs in custom
   implementations
'out there'.
   
P.S. If you agree to this change then I'm willing to put my back
 into it
and submit a patch.
   
--
Best regards,
   
Niels Basjes
   
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
 entity to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If the
 

[jira] [Resolved] (HADOOP-10589) NativeS3FileSystem throw NullPointerException when the file is empty

2014-05-29 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-10589.
-

  Resolution: Duplicate
Release Note: Duplicate of HADOOP-10533, though the stack trace is more up 
to date on this one

 NativeS3FileSystem throw NullPointerException when the file is empty
 

 Key: HADOOP-10589
 URL: https://issues.apache.org/jira/browse/HADOOP-10589
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.2.0
Reporter: shuisheng wei

 An empty file in the s3 path.
 NativeS3FsInputStream dose not check the InputStream .
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 4 forwarded 0 rows
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.GroupByOperator: 3 Close done
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.SelectOperator: 2 Close done
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.FilterOperator: 1 Close done
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.MapOperator: 5 Close done
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper: ExecMapper: processed 0 rows: 
 used memory = 602221488
 2014-05-06 20:29:26,964 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.lang.NullPointerException
   at 
 org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:147)
   at java.io.BufferedInputStream.close(BufferedInputStream.java:472)
   at java.io.FilterInputStream.close(FilterInputStream.java:181)
   at org.apache.hadoop.util.LineReader.close(LineReader.java:150)
   at 
 org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:244)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doClose(CombineHiveRecordReader.java:72)
   at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.close(HiveContextAwareRecordReader.java:96)
   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.close(HadoopShimsSecure.java:248)
   at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:209)
   at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1950)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:445)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 2014-05-06 20:29:26,970 INFO [main] org.apache.hadoop.mapred.Task: Runnning 
 cleanup for the task



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (HADOOP-10589) NativeS3FileSystem throw NullPointerException when the file is empty

2014-05-29 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HADOOP-10589:
-

  Assignee: Steve Loughran

not a duplicate. Same stack trace, but root cause is different

 NativeS3FileSystem throw NullPointerException when the file is empty
 

 Key: HADOOP-10589
 URL: https://issues.apache.org/jira/browse/HADOOP-10589
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.2.0
Reporter: shuisheng wei
Assignee: Steve Loughran

 An empty file in the s3 path.
 NativeS3FsInputStream dose not check the InputStream .
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 4 forwarded 0 rows
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.GroupByOperator: 3 Close done
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.SelectOperator: 2 Close done
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.FilterOperator: 1 Close done
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.MapOperator: 5 Close done
 2014-05-06 20:29:26,961 INFO [main] 
 org.apache.hadoop.hive.ql.exec.mr.ExecMapper: ExecMapper: processed 0 rows: 
 used memory = 602221488
 2014-05-06 20:29:26,964 WARN [main] org.apache.hadoop.mapred.YarnChild: 
 Exception running child : java.lang.NullPointerException
   at 
 org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:147)
   at java.io.BufferedInputStream.close(BufferedInputStream.java:472)
   at java.io.FilterInputStream.close(FilterInputStream.java:181)
   at org.apache.hadoop.util.LineReader.close(LineReader.java:150)
   at 
 org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:244)
   at 
 org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doClose(CombineHiveRecordReader.java:72)
   at 
 org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.close(HiveContextAwareRecordReader.java:96)
   at 
 org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.close(HadoopShimsSecure.java:248)
   at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:209)
   at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1950)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:445)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
 2014-05-06 20:29:26,970 INFO [main] org.apache.hadoop.mapred.Task: Runnning 
 cleanup for the task



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10639) FileBasedKeyStoresFactory initialization is not using default for SSL_REQUIRE_CLIENT_CERT_KEY

2014-05-29 Thread Alejandro Abdelnur (JIRA)
Alejandro Abdelnur created HADOOP-10639:
---

 Summary: FileBasedKeyStoresFactory initialization is not using 
default for SSL_REQUIRE_CLIENT_CERT_KEY
 Key: HADOOP-10639
 URL: https://issues.apache.org/jira/browse/HADOOP-10639
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 2.4.0
Reporter: Alejandro Abdelnur
Assignee: Alejandro Abdelnur


The FileBasedKeyStoresFactory initialization is defaulting 
SSL_REQUIRE_CLIENT_CERT_KEY to true instead of the default 
DEFAULT_SSL_REQUIRE_CLIENT_CERT (false).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10640) Implement Namenode RPCs in HDFS native client

2014-05-29 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HADOOP-10640:
-

 Summary: Implement Namenode RPCs in HDFS native client
 Key: HADOOP-10640
 URL: https://issues.apache.org/jira/browse/HADOOP-10640
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: native
Affects Versions: HADOOP-10388
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe


Implement the parts of libhdfs that just involve making RPCs to the Namenode, 
such as mkdir, rename, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10641) Introduce Coordination Engine

2014-05-29 Thread Konstantin Shvachko (JIRA)
Konstantin Shvachko created HADOOP-10641:


 Summary: Introduce Coordination Engine
 Key: HADOOP-10641
 URL: https://issues.apache.org/jira/browse/HADOOP-10641
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Konstantin Shvachko


Coordination Engine (CE) is a system, which allows to agree on a sequence of 
events in a distributed system. In order to be reliable CE should be 
distributed by itself.
Coordination Engine can be based on different algorithms (paxos, raft, 2PC, 
zab) and have different implementations, depending on use cases, reliability, 
availability, and performance requirements.
CE should have a common API, so that it could serve as a pluggable component in 
different projects. The immediate beneficiaries are HDFS (HDFS-6469) and HBase 
(HBASE-10909).
First implementation is proposed to be based on ZooKeeper.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Introducing ConsensusNode and a Coordination Engine

2014-05-29 Thread Konstantin Shvachko
Hello hadoop developers,

I just opened two jiras proposing to introduce ConsensusNode into HDFS and
a Coordination Engine into Hadoop Common. The latter should benefit HDFS
and  HBase as well as potentially other projects. See HDFS-6469 and
HADOOP-10641 for details.
The effort is based on the system we built at Wandisco with my colleagues,
who are glad to contribute it to Apache, as quite a few people in the
community expressed interest in this ideas and their potential applications.

We should probably keep technical discussions in the jiras. Here on the dev
list I wanted to touch-base on any logistic issues / questions.
- First of all, any ideas and help are very much welcome.
- We would like to set up a meetup to discuss this if people are
interested. Hadoop Summit next week may be a potential time-place to meet.
Not sure in what form. If not, we can organize one in our San Ramon office
later on.
- The effort may take a few months depending on the contributors schedules.
Would it make sense to open a branch for the ConsensusNode work?
- APIs and the implementation of the Coordination Engine should be a fairly
independent, so it may be reasonable to add it directly to Hadoop Common
trunk.

Thanks,
--Konstantin


[jira] [Resolved] (HADOOP-10628) Javadoc and few code style improvement for Crypto input and output streams

2014-05-29 Thread Charles Lamb (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charles Lamb resolved HADOOP-10628.
---

Resolution: Fixed

Thanks Yi, I committed this to fs-encryption.


 Javadoc and few code style improvement for Crypto input and output streams
 --

 Key: HADOOP-10628
 URL: https://issues.apache.org/jira/browse/HADOOP-10628
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: security
Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134)
Reporter: Yi Liu
Assignee: Yi Liu
 Fix For: fs-encryption (HADOOP-10150 and HDFS-6134)

 Attachments: HADOOP-10628.patch


 There are some additional comments from [~clamb] related to javadoc and few 
 code style on HADOOP-10603, let's fix them in this follow-on JIRA.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10642) Provide option to limit heap memory consumed by dynamic metrics2 metrics

2014-05-29 Thread Ted Yu (JIRA)
Ted Yu created HADOOP-10642:
---

 Summary: Provide option to limit heap memory consumed by dynamic 
metrics2 metrics
 Key: HADOOP-10642
 URL: https://issues.apache.org/jira/browse/HADOOP-10642
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Ted Yu


User sunweiei provided the following jmap output in HBase 0.96 deployment:
{code}
 num #instances #bytes  class name
--
   1:  14917882 3396492464  [C
   2:   1996994 2118021808  [B
   3:  43341650 1733666000  java.util.LinkedHashMap$Entry
   4:  14453983 1156550896  [Ljava.util.HashMap$Entry;
   5:  14446577  924580928  
org.apache.hadoop.metrics2.lib.Interns$CacheWith2Keys$2
{code}
Heap consumption by Interns$CacheWith2Keys$2 (and indirectly by [C) could be 
due to calls to Interns.info() in DynamicMetricsRegistry which was cloned off 
metrics2/lib/MetricsRegistry.java.
This scenario would arise when large number of regions are tracked through 
metrics2 dynamically.
Interns class doesn't provide API to remove entries in its internal Map.

One solution is to provide an option that allows skipping calls to 
Interns.info() in metrics2/lib/MetricsRegistry.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10643) Add NativeS3Fs that delgates calls from FileContext apis to native s3 fs implementation

2014-05-29 Thread Sumit Kumar (JIRA)
Sumit Kumar created HADOOP-10643:


 Summary: Add NativeS3Fs that delgates calls from FileContext apis 
to native s3 fs implementation
 Key: HADOOP-10643
 URL: https://issues.apache.org/jira/browse/HADOOP-10643
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs/s3
Affects Versions: 2.4.0
Reporter: Sumit Kumar


The new set of file system related apis (FileContext/AbstractFileSystem) 
already support local filesytem, hdfs, viewfs) however they don't support s3n. 
This patch is to add that support using configurations like

fs.AbstractFileSystem.s3n.impl = org.apache.hadoop.fs.s3native.NativeS3Fs

This patch however doesn't provide a new implementation, instead relies on 
DelegateToFileSystem abstract class to delegate all calls from FileContext apis 
for s3n to the NativeS3FileSystem implementation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)