[jira] [Resolved] (HADOOP-9987) HDFS Compatible ViewFileSystem
[ https://issues.apache.org/jira/browse/HADOOP-9987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gera Shegalov resolved HADOOP-9987. --- Resolution: Duplicate HDFS Compatible ViewFileSystem -- Key: HADOOP-9987 URL: https://issues.apache.org/jira/browse/HADOOP-9987 Project: Hadoop Common Issue Type: Bug Reporter: Lohit Vijayarenu Fix For: 2.0.6-alpha There are multiple scripts and projects like pig, hive, elephantbird refer to HDFS URI as hdfs://namenodehostport/ or hdfs:/// . In federated namespace this causes problem because supported scheme for federation is viewfs:// . We will have to force all users to change their scripts/programs to be able to access federated cluster. It would be great if thee was a way to map viewfs scheme to hdfs scheme without exposing it to users. Opening this JIRA to get inputs from people who have thought about this in their clusters. In our clusters we ended up created another class HDFSCompatibleViewFileSystem which hijacks both hdfs.fs.impl and viewfs.fs.impl and passes down filesystem calls to ViewFileSystem. Is there any suggested approach other than this? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10637) Add snapshot and several dfsadmin tests into TestCLI
Dasha Boudnik created HADOOP-10637: -- Summary: Add snapshot and several dfsadmin tests into TestCLI Key: HADOOP-10637 URL: https://issues.apache.org/jira/browse/HADOOP-10637 Project: Hadoop Common Issue Type: Improvement Components: test Reporter: Dasha Boudnik Add the following commands to TestCLI: appendToFile text rmdir rmdir with ignore-fail-on-non-empty df expunge getmerge allowSnapshot disallowSnapshot createSnapshot renameSnapshot deleteSnapshot refreshUserToGroupsMappings refreshSuperUserGroupsConfiguration setQuota clrQuota setSpaceQuota setBalancerBandwidth finalizeUpgrade -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Change proposal for FileInputFormat isSplitable
On 28 May 2014 20:50, Niels Basjes ni...@basjes.nl wrote: Hi, Last week I ran into this problem again https://issues.apache.org/jira/browse/MAPREDUCE-2094 What happens here is that the default implementation of the isSplitable method in FileInputFormat is so unsafe that just about everyone who implements a new subclass is likely to get this wrong. The effect of getting this wrong is that all unit tests succeed and running it against 'large' input files (64MiB) that are compressed using a non-splittable compression (often Gzip) will cause the input to be fed into the mappers multiple time (i.e. you get garbage results without ever seeing any errors). Last few days I was at Berlin buzzwords talking to someone about this bug that was me, I recall. and this resulted in the following proposal which I would like your feedback on. 1) This is a change that will break backwards compatibility (deliberate choice). 2) The FileInputFormat will get 3 methods (the old isSplitable with the typo of one 't' in the name will disappear): (protected) isSplittableContainer -- true unless compressed with non-splittable compression. (protected) isSplittableContent -- abstract, MUST be implemented by the subclass (public) isSplittable -- isSplittableContainer isSplittableContent The idea is that only the isSplittable is used by other classes to know if this is a splittable file. The effect I hope to get is that a developer writing their own fileinputformat (which I alone have done twice so far) is 'forced' and 'helped' getting this right. I could see making the attributes more explicit would be good -but stopping everything that exists from working isn't going to fly. what about some subclass, AbstractSplittableFileInputFormat that implements the container properly, requires that content one -and then calculates IsSplitable() from the results? Existing code: no change, new formats can descend from this (and built in ones retrofitted). The reason for me to propose this as an incompatible change is that this way I hope to eradicate some of the existing bugs in custom implementations 'out there'. P.S. If you agree to this change then I'm willing to put my back into it and submit a patch. -- Best regards, Niels Basjes -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Change proposal for FileInputFormat isSplitable
I could be missing something, but couldn't you just deprecate isSplitable (spelled incorrectly) and create a new isSplittable as described? On Thu, May 29, 2014 at 10:34 AM, Steve Loughran ste...@hortonworks.com wrote: On 28 May 2014 20:50, Niels Basjes ni...@basjes.nl wrote: Hi, Last week I ran into this problem again https://issues.apache.org/jira/browse/MAPREDUCE-2094 What happens here is that the default implementation of the isSplitable method in FileInputFormat is so unsafe that just about everyone who implements a new subclass is likely to get this wrong. The effect of getting this wrong is that all unit tests succeed and running it against 'large' input files (64MiB) that are compressed using a non-splittable compression (often Gzip) will cause the input to be fed into the mappers multiple time (i.e. you get garbage results without ever seeing any errors). Last few days I was at Berlin buzzwords talking to someone about this bug that was me, I recall. and this resulted in the following proposal which I would like your feedback on. 1) This is a change that will break backwards compatibility (deliberate choice). 2) The FileInputFormat will get 3 methods (the old isSplitable with the typo of one 't' in the name will disappear): (protected) isSplittableContainer -- true unless compressed with non-splittable compression. (protected) isSplittableContent -- abstract, MUST be implemented by the subclass (public) isSplittable -- isSplittableContainer isSplittableContent The idea is that only the isSplittable is used by other classes to know if this is a splittable file. The effect I hope to get is that a developer writing their own fileinputformat (which I alone have done twice so far) is 'forced' and 'helped' getting this right. I could see making the attributes more explicit would be good -but stopping everything that exists from working isn't going to fly. what about some subclass, AbstractSplittableFileInputFormat that implements the container properly, requires that content one -and then calculates IsSplitable() from the results? Existing code: no change, new formats can descend from this (and built in ones retrofitted). The reason for me to propose this as an incompatible change is that this way I hope to eradicate some of the existing bugs in custom implementations 'out there'. P.S. If you agree to this change then I'm willing to put my back into it and submit a patch. -- Best regards, Niels Basjes -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- [image: cid:1CBF4038-3F0F-4FC2-A1FF-6DC81B8B6F94] First Option Software Ltd Signal House Jacklyns Lane Alresford SO24 9JJ Tel: +44 (0)1962 738232 Mob: +44 (0)7710 160458 Fax: +44 (0)1962 600112 Web: www.b http://www.fosolutions.co.uk/espokesoftware.com http://bespokesoftware.com/ -- This is confidential, non-binding and not company endorsed - see full terms at www.fosolutions.co.uk/emailpolicy.html First Option Software Ltd Registered No. 06340261 Signal House, Jacklyns Lane, Alresford, Hampshire, SO24 9JJ, U.K.
Re: Change proposal for FileInputFormat isSplitable
My original proposal (from about 3 years ago) was to change the isSplitable method to return a safe default ( you can see that in the patch that is still attached to that Jira issue). For arguments I still do not fully understand this was rejected by Todd and Doug. So that is why my new proposal is to deprecate (remove!) the old method with the typo in Hadoop 3.0 and replace it with something correct and less error prone. Given the fact that this would happen in a major version jump I thought that would be the right time to do that. Niels On Thu, May 29, 2014 at 11:34 AM, Steve Loughran ste...@hortonworks.comwrote: On 28 May 2014 20:50, Niels Basjes ni...@basjes.nl wrote: Hi, Last week I ran into this problem again https://issues.apache.org/jira/browse/MAPREDUCE-2094 What happens here is that the default implementation of the isSplitable method in FileInputFormat is so unsafe that just about everyone who implements a new subclass is likely to get this wrong. The effect of getting this wrong is that all unit tests succeed and running it against 'large' input files (64MiB) that are compressed using a non-splittable compression (often Gzip) will cause the input to be fed into the mappers multiple time (i.e. you get garbage results without ever seeing any errors). Last few days I was at Berlin buzzwords talking to someone about this bug that was me, I recall. and this resulted in the following proposal which I would like your feedback on. 1) This is a change that will break backwards compatibility (deliberate choice). 2) The FileInputFormat will get 3 methods (the old isSplitable with the typo of one 't' in the name will disappear): (protected) isSplittableContainer -- true unless compressed with non-splittable compression. (protected) isSplittableContent -- abstract, MUST be implemented by the subclass (public) isSplittable -- isSplittableContainer isSplittableContent The idea is that only the isSplittable is used by other classes to know if this is a splittable file. The effect I hope to get is that a developer writing their own fileinputformat (which I alone have done twice so far) is 'forced' and 'helped' getting this right. I could see making the attributes more explicit would be good -but stopping everything that exists from working isn't going to fly. what about some subclass, AbstractSplittableFileInputFormat that implements the container properly, requires that content one -and then calculates IsSplitable() from the results? Existing code: no change, new formats can descend from this (and built in ones retrofitted). The reason for me to propose this as an incompatible change is that this way I hope to eradicate some of the existing bugs in custom implementations 'out there'. P.S. If you agree to this change then I'm willing to put my back into it and submit a patch. -- Best regards, Niels Basjes -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Best regards / Met vriendelijke groeten, Niels Basjes
Re: Change proposal for FileInputFormat isSplitable
I think breaking backwards compat is sensible since It's easily caught by the compiler and in this case where the alternative is a Runtime error that can result in terabytes of mucked up output. On May 29, 2014, at 6:11 AM, Matt Fellows matt.fell...@bespokesoftware.com wrote: As someone who doesn't really contribute, just lurks, I could well be misinformed or under-informed, but I don't see why we can't deprecate a method which could cause dangerous side effects? People can still use the deprecated methods for backwards compatibility, but are discouraged by compiler warnings, and any changes they write to their code can start to use the new functionality? *Apologies if I'm stepping into a Hadoop holy war here On Thu, May 29, 2014 at 10:47 AM, Niels Basjes ni...@basjes.nl wrote: My original proposal (from about 3 years ago) was to change the isSplitable method to return a safe default ( you can see that in the patch that is still attached to that Jira issue). For arguments I still do not fully understand this was rejected by Todd and Doug. So that is why my new proposal is to deprecate (remove!) the old method with the typo in Hadoop 3.0 and replace it with something correct and less error prone. Given the fact that this would happen in a major version jump I thought that would be the right time to do that. Niels On Thu, May 29, 2014 at 11:34 AM, Steve Loughran ste...@hortonworks.comwrote: On 28 May 2014 20:50, Niels Basjes ni...@basjes.nl wrote: Hi, Last week I ran into this problem again https://issues.apache.org/jira/browse/MAPREDUCE-2094 What happens here is that the default implementation of the isSplitable method in FileInputFormat is so unsafe that just about everyone who implements a new subclass is likely to get this wrong. The effect of getting this wrong is that all unit tests succeed and running it against 'large' input files (64MiB) that are compressed using a non-splittable compression (often Gzip) will cause the input to be fed into the mappers multiple time (i.e. you get garbage results without ever seeing any errors). Last few days I was at Berlin buzzwords talking to someone about this bug that was me, I recall. and this resulted in the following proposal which I would like your feedback on. 1) This is a change that will break backwards compatibility (deliberate choice). 2) The FileInputFormat will get 3 methods (the old isSplitable with the typo of one 't' in the name will disappear): (protected) isSplittableContainer -- true unless compressed with non-splittable compression. (protected) isSplittableContent -- abstract, MUST be implemented by the subclass (public) isSplittable -- isSplittableContainer isSplittableContent The idea is that only the isSplittable is used by other classes to know if this is a splittable file. The effect I hope to get is that a developer writing their own fileinputformat (which I alone have done twice so far) is 'forced' and 'helped' getting this right. I could see making the attributes more explicit would be good -but stopping everything that exists from working isn't going to fly. what about some subclass, AbstractSplittableFileInputFormat that implements the container properly, requires that content one -and then calculates IsSplitable() from the results? Existing code: no change, new formats can descend from this (and built in ones retrofitted). The reason for me to propose this as an incompatible change is that this way I hope to eradicate some of the existing bugs in custom implementations 'out there'. P.S. If you agree to this change then I'm willing to put my back into it and submit a patch. -- Best regards, Niels Basjes -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- Best regards / Met vriendelijke groeten, Niels Basjes -- First Option Software Ltd Signal House Jacklyns Lane Alresford SO24 9JJ Tel: +44 (0)1962 738232 Mob: +44 (0)7710 160458 Fax: +44 (0)1962 600112 Web: www.bespokesoftware.com This is confidential, non-binding and not company endorsed - see full terms at www.fosolutions.co.uk/emailpolicy.html First Option Software Ltd Registered No. 06340261 Signal
Re: Change proposal for FileInputFormat isSplitable
This is exactly why I'm proposing a change that will either 'fix silently' (my original patch from 3 years ago) or 'break loudly' (my current proposal) old implementations. I'm convinced that ther are atleast 100 companies world wide that have a custom implementation with this bug and have no clue they have been basing descision upon silently corrupted data. On Thu, May 29, 2014 at 1:21 PM, Jay Vyas jayunit...@gmail.com wrote: I think breaking backwards compat is sensible since It's easily caught by the compiler and in this case where the alternative is a Runtime error that can result in terabytes of mucked up output. On May 29, 2014, at 6:11 AM, Matt Fellows matt.fell...@bespokesoftware.com wrote: As someone who doesn't really contribute, just lurks, I could well be misinformed or under-informed, but I don't see why we can't deprecate a method which could cause dangerous side effects? People can still use the deprecated methods for backwards compatibility, but are discouraged by compiler warnings, and any changes they write to their code can start to use the new functionality? *Apologies if I'm stepping into a Hadoop holy war here On Thu, May 29, 2014 at 10:47 AM, Niels Basjes ni...@basjes.nl wrote: My original proposal (from about 3 years ago) was to change the isSplitable method to return a safe default ( you can see that in the patch that is still attached to that Jira issue). For arguments I still do not fully understand this was rejected by Todd and Doug. So that is why my new proposal is to deprecate (remove!) the old method with the typo in Hadoop 3.0 and replace it with something correct and less error prone. Given the fact that this would happen in a major version jump I thought that would be the right time to do that. Niels On Thu, May 29, 2014 at 11:34 AM, Steve Loughran ste...@hortonworks.comwrote: On 28 May 2014 20:50, Niels Basjes ni...@basjes.nl wrote: Hi, Last week I ran into this problem again https://issues.apache.org/jira/browse/MAPREDUCE-2094 What happens here is that the default implementation of the isSplitable method in FileInputFormat is so unsafe that just about everyone who implements a new subclass is likely to get this wrong. The effect of getting this wrong is that all unit tests succeed and running it against 'large' input files (64MiB) that are compressed using a non-splittable compression (often Gzip) will cause the input to be fed into the mappers multiple time (i.e. you get garbage results without ever seeing any errors). Last few days I was at Berlin buzzwords talking to someone about this bug that was me, I recall. and this resulted in the following proposal which I would like your feedback on. 1) This is a change that will break backwards compatibility (deliberate choice). 2) The FileInputFormat will get 3 methods (the old isSplitable with the typo of one 't' in the name will disappear): (protected) isSplittableContainer -- true unless compressed with non-splittable compression. (protected) isSplittableContent -- abstract, MUST be implemented by the subclass (public) isSplittable -- isSplittableContainer isSplittableContent The idea is that only the isSplittable is used by other classes to know if this is a splittable file. The effect I hope to get is that a developer writing their own fileinputformat (which I alone have done twice so far) is 'forced' and 'helped' getting this right. I could see making the attributes more explicit would be good -but stopping everything that exists from working isn't going to fly. what about some subclass, AbstractSplittableFileInputFormat that implements the container properly, requires that content one -and then calculates IsSplitable() from the results? Existing code: no change, new formats can descend from this (and built in ones retrofitted). The reason for me to propose this as an incompatible change is that this way I hope to eradicate some of the existing bugs in custom implementations 'out there'. P.S. If you agree to this change then I'm willing to put my back into it and submit a patch. -- Best regards, Niels Basjes -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately
Re: Change proposal for FileInputFormat isSplitable
I forgot to ask a relevant question: What made the original proposed solution incompatible? To me it still seems to be a clean backward compatible solution that fixes this issue in a simple way. Perhaps Todd can explain why? Niels On May 29, 2014 2:17 PM, Niels Basjes ni...@basjes.nl wrote: This is exactly why I'm proposing a change that will either 'fix silently' (my original patch from 3 years ago) or 'break loudly' (my current proposal) old implementations. I'm convinced that ther are atleast 100 companies world wide that have a custom implementation with this bug and have no clue they have been basing descision upon silently corrupted data. On Thu, May 29, 2014 at 1:21 PM, Jay Vyas jayunit...@gmail.com wrote: I think breaking backwards compat is sensible since It's easily caught by the compiler and in this case where the alternative is a Runtime error that can result in terabytes of mucked up output. On May 29, 2014, at 6:11 AM, Matt Fellows matt.fell...@bespokesoftware.com wrote: As someone who doesn't really contribute, just lurks, I could well be misinformed or under-informed, but I don't see why we can't deprecate a method which could cause dangerous side effects? People can still use the deprecated methods for backwards compatibility, but are discouraged by compiler warnings, and any changes they write to their code can start to use the new functionality? *Apologies if I'm stepping into a Hadoop holy war here On Thu, May 29, 2014 at 10:47 AM, Niels Basjes ni...@basjes.nl wrote: My original proposal (from about 3 years ago) was to change the isSplitable method to return a safe default ( you can see that in the patch that is still attached to that Jira issue). For arguments I still do not fully understand this was rejected by Todd and Doug. So that is why my new proposal is to deprecate (remove!) the old method with the typo in Hadoop 3.0 and replace it with something correct and less error prone. Given the fact that this would happen in a major version jump I thought that would be the right time to do that. Niels On Thu, May 29, 2014 at 11:34 AM, Steve Loughran ste...@hortonworks.comwrote: On 28 May 2014 20:50, Niels Basjes ni...@basjes.nl wrote: Hi, Last week I ran into this problem again https://issues.apache.org/jira/browse/MAPREDUCE-2094 What happens here is that the default implementation of the isSplitable method in FileInputFormat is so unsafe that just about everyone who implements a new subclass is likely to get this wrong. The effect of getting this wrong is that all unit tests succeed and running it against 'large' input files (64MiB) that are compressed using a non-splittable compression (often Gzip) will cause the input to be fed into the mappers multiple time (i.e. you get garbage results without ever seeing any errors). Last few days I was at Berlin buzzwords talking to someone about this bug that was me, I recall. and this resulted in the following proposal which I would like your feedback on. 1) This is a change that will break backwards compatibility (deliberate choice). 2) The FileInputFormat will get 3 methods (the old isSplitable with the typo of one 't' in the name will disappear): (protected) isSplittableContainer -- true unless compressed with non-splittable compression. (protected) isSplittableContent -- abstract, MUST be implemented by the subclass (public) isSplittable -- isSplittableContainer isSplittableContent The idea is that only the isSplittable is used by other classes to know if this is a splittable file. The effect I hope to get is that a developer writing their own fileinputformat (which I alone have done twice so far) is 'forced' and 'helped' getting this right. I could see making the attributes more explicit would be good -but stopping everything that exists from working isn't going to fly. what about some subclass, AbstractSplittableFileInputFormat that implements the container properly, requires that content one -and then calculates IsSplitable() from the results? Existing code: no change, new formats can descend from this (and built in ones retrofitted). The reason for me to propose this as an incompatible change is that this way I hope to eradicate some of the existing bugs in custom implementations 'out there'. P.S. If you agree to this change then I'm willing to put my back into it and submit a patch. -- Best regards, Niels Basjes -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the
[jira] [Resolved] (HADOOP-10589) NativeS3FileSystem throw NullPointerException when the file is empty
[ https://issues.apache.org/jira/browse/HADOOP-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-10589. - Resolution: Duplicate Release Note: Duplicate of HADOOP-10533, though the stack trace is more up to date on this one NativeS3FileSystem throw NullPointerException when the file is empty Key: HADOOP-10589 URL: https://issues.apache.org/jira/browse/HADOOP-10589 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.2.0 Reporter: shuisheng wei An empty file in the s3 path. NativeS3FsInputStream dose not check the InputStream . 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 4 forwarded 0 rows 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.GroupByOperator: 3 Close done 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 2 Close done 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.FilterOperator: 1 Close done 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: 5 Close done 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: ExecMapper: processed 0 rows: used memory = 602221488 2014-05-06 20:29:26,964 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.NullPointerException at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:147) at java.io.BufferedInputStream.close(BufferedInputStream.java:472) at java.io.FilterInputStream.close(FilterInputStream.java:181) at org.apache.hadoop.util.LineReader.close(LineReader.java:150) at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:244) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doClose(CombineHiveRecordReader.java:72) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.close(HiveContextAwareRecordReader.java:96) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.close(HadoopShimsSecure.java:248) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:209) at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1950) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:445) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 2014-05-06 20:29:26,970 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Reopened] (HADOOP-10589) NativeS3FileSystem throw NullPointerException when the file is empty
[ https://issues.apache.org/jira/browse/HADOOP-10589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran reopened HADOOP-10589: - Assignee: Steve Loughran not a duplicate. Same stack trace, but root cause is different NativeS3FileSystem throw NullPointerException when the file is empty Key: HADOOP-10589 URL: https://issues.apache.org/jira/browse/HADOOP-10589 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Affects Versions: 2.2.0 Reporter: shuisheng wei Assignee: Steve Loughran An empty file in the s3 path. NativeS3FsInputStream dose not check the InputStream . 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 4 forwarded 0 rows 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.GroupByOperator: 3 Close done 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.SelectOperator: 2 Close done 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.FilterOperator: 1 Close done 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.MapOperator: 5 Close done 2014-05-06 20:29:26,961 INFO [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: ExecMapper: processed 0 rows: used memory = 602221488 2014-05-06 20:29:26,964 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.NullPointerException at org.apache.hadoop.fs.s3native.NativeS3FileSystem$NativeS3FsInputStream.close(NativeS3FileSystem.java:147) at java.io.BufferedInputStream.close(BufferedInputStream.java:472) at java.io.FilterInputStream.close(FilterInputStream.java:181) at org.apache.hadoop.util.LineReader.close(LineReader.java:150) at org.apache.hadoop.mapred.LineRecordReader.close(LineRecordReader.java:244) at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doClose(CombineHiveRecordReader.java:72) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.close(HiveContextAwareRecordReader.java:96) at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.close(HadoopShimsSecure.java:248) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.close(MapTask.java:209) at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1950) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:445) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 2014-05-06 20:29:26,970 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10639) FileBasedKeyStoresFactory initialization is not using default for SSL_REQUIRE_CLIENT_CERT_KEY
Alejandro Abdelnur created HADOOP-10639: --- Summary: FileBasedKeyStoresFactory initialization is not using default for SSL_REQUIRE_CLIENT_CERT_KEY Key: HADOOP-10639 URL: https://issues.apache.org/jira/browse/HADOOP-10639 Project: Hadoop Common Issue Type: Bug Components: security Affects Versions: 2.4.0 Reporter: Alejandro Abdelnur Assignee: Alejandro Abdelnur The FileBasedKeyStoresFactory initialization is defaulting SSL_REQUIRE_CLIENT_CERT_KEY to true instead of the default DEFAULT_SSL_REQUIRE_CLIENT_CERT (false). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10640) Implement Namenode RPCs in HDFS native client
Colin Patrick McCabe created HADOOP-10640: - Summary: Implement Namenode RPCs in HDFS native client Key: HADOOP-10640 URL: https://issues.apache.org/jira/browse/HADOOP-10640 Project: Hadoop Common Issue Type: Sub-task Components: native Affects Versions: HADOOP-10388 Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Implement the parts of libhdfs that just involve making RPCs to the Namenode, such as mkdir, rename, etc. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10641) Introduce Coordination Engine
Konstantin Shvachko created HADOOP-10641: Summary: Introduce Coordination Engine Key: HADOOP-10641 URL: https://issues.apache.org/jira/browse/HADOOP-10641 Project: Hadoop Common Issue Type: New Feature Affects Versions: 3.0.0 Reporter: Konstantin Shvachko Coordination Engine (CE) is a system, which allows to agree on a sequence of events in a distributed system. In order to be reliable CE should be distributed by itself. Coordination Engine can be based on different algorithms (paxos, raft, 2PC, zab) and have different implementations, depending on use cases, reliability, availability, and performance requirements. CE should have a common API, so that it could serve as a pluggable component in different projects. The immediate beneficiaries are HDFS (HDFS-6469) and HBase (HBASE-10909). First implementation is proposed to be based on ZooKeeper. -- This message was sent by Atlassian JIRA (v6.2#6252)
Introducing ConsensusNode and a Coordination Engine
Hello hadoop developers, I just opened two jiras proposing to introduce ConsensusNode into HDFS and a Coordination Engine into Hadoop Common. The latter should benefit HDFS and HBase as well as potentially other projects. See HDFS-6469 and HADOOP-10641 for details. The effort is based on the system we built at Wandisco with my colleagues, who are glad to contribute it to Apache, as quite a few people in the community expressed interest in this ideas and their potential applications. We should probably keep technical discussions in the jiras. Here on the dev list I wanted to touch-base on any logistic issues / questions. - First of all, any ideas and help are very much welcome. - We would like to set up a meetup to discuss this if people are interested. Hadoop Summit next week may be a potential time-place to meet. Not sure in what form. If not, we can organize one in our San Ramon office later on. - The effort may take a few months depending on the contributors schedules. Would it make sense to open a branch for the ConsensusNode work? - APIs and the implementation of the Coordination Engine should be a fairly independent, so it may be reasonable to add it directly to Hadoop Common trunk. Thanks, --Konstantin
[jira] [Resolved] (HADOOP-10628) Javadoc and few code style improvement for Crypto input and output streams
[ https://issues.apache.org/jira/browse/HADOOP-10628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Charles Lamb resolved HADOOP-10628. --- Resolution: Fixed Thanks Yi, I committed this to fs-encryption. Javadoc and few code style improvement for Crypto input and output streams -- Key: HADOOP-10628 URL: https://issues.apache.org/jira/browse/HADOOP-10628 Project: Hadoop Common Issue Type: Sub-task Components: security Affects Versions: fs-encryption (HADOOP-10150 and HDFS-6134) Reporter: Yi Liu Assignee: Yi Liu Fix For: fs-encryption (HADOOP-10150 and HDFS-6134) Attachments: HADOOP-10628.patch There are some additional comments from [~clamb] related to javadoc and few code style on HADOOP-10603, let's fix them in this follow-on JIRA. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10642) Provide option to limit heap memory consumed by dynamic metrics2 metrics
Ted Yu created HADOOP-10642: --- Summary: Provide option to limit heap memory consumed by dynamic metrics2 metrics Key: HADOOP-10642 URL: https://issues.apache.org/jira/browse/HADOOP-10642 Project: Hadoop Common Issue Type: Improvement Reporter: Ted Yu User sunweiei provided the following jmap output in HBase 0.96 deployment: {code} num #instances #bytes class name -- 1: 14917882 3396492464 [C 2: 1996994 2118021808 [B 3: 43341650 1733666000 java.util.LinkedHashMap$Entry 4: 14453983 1156550896 [Ljava.util.HashMap$Entry; 5: 14446577 924580928 org.apache.hadoop.metrics2.lib.Interns$CacheWith2Keys$2 {code} Heap consumption by Interns$CacheWith2Keys$2 (and indirectly by [C) could be due to calls to Interns.info() in DynamicMetricsRegistry which was cloned off metrics2/lib/MetricsRegistry.java. This scenario would arise when large number of regions are tracked through metrics2 dynamically. Interns class doesn't provide API to remove entries in its internal Map. One solution is to provide an option that allows skipping calls to Interns.info() in metrics2/lib/MetricsRegistry.java -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10643) Add NativeS3Fs that delgates calls from FileContext apis to native s3 fs implementation
Sumit Kumar created HADOOP-10643: Summary: Add NativeS3Fs that delgates calls from FileContext apis to native s3 fs implementation Key: HADOOP-10643 URL: https://issues.apache.org/jira/browse/HADOOP-10643 Project: Hadoop Common Issue Type: New Feature Components: fs/s3 Affects Versions: 2.4.0 Reporter: Sumit Kumar The new set of file system related apis (FileContext/AbstractFileSystem) already support local filesytem, hdfs, viewfs) however they don't support s3n. This patch is to add that support using configurations like fs.AbstractFileSystem.s3n.impl = org.apache.hadoop.fs.s3native.NativeS3Fs This patch however doesn't provide a new implementation, instead relies on DelegateToFileSystem abstract class to delegate all calls from FileContext apis for s3n to the NativeS3FileSystem implementation. -- This message was sent by Atlassian JIRA (v6.2#6252)