Re: hadoop-2.5 - June end?
+1 to getting 2.5 by end of June. I would love to help with the release in any way. Some of the YARN features like work-preserving RM/ NM restart seem to going well; it would be great to get 2.6 out as soon as they are done, may be end of July. Again, I am willing to volunteer to do the necessary work. If this sounds reasonable, I can go ahead and update the Roadmap to reflect this? On Wed, Jun 11, 2014 at 12:53 PM, Gangumalla, Uma wrote: > Yes. Suresh. > > I have merged HDFS-2006 (Extended Attributes) to branch-2. So, that it > will be included in 2.5 release. > > Regards, > Uma > > -Original Message- > From: Suresh Srinivas [mailto:sur...@hortonworks.com] > Sent: Tuesday, June 10, 2014 10:15 PM > To: mapreduce-...@hadoop.apache.org > Cc: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org; > yarn-...@hadoop.apache.org > Subject: Re: hadoop-2.5 - June end? > > We should also include extended attributes feature for HDFS from HDFS-2006 > for release 2.5. > > > On Mon, Jun 9, 2014 at 9:39 AM, Arun C Murthy wrote: > > > Folks, > > > > As you can see from the Roadmap wiki, it looks like several items are > > still a bit away from being ready. > > > > I think rather than wait for them, it will be useful to create an > > intermediate release (2.5) this month - I think ATS security is pretty > > close, so we can ship that. I'm thinking of creating hadoop-2.5 by end > > of the month, with a branch a couple of weeks prior. > > > > Thoughts? > > > > thanks, > > Arun > > > > > > -- > > CONFIDENTIALITY NOTICE > > NOTICE: This message is intended for the use of the individual or > > entity to which it is addressed and may contain information that is > > confidential, privileged and exempt from disclosure under applicable > > law. If the reader of this message is not the intended recipient, you > > are hereby notified that any printing, copying, dissemination, > > distribution, disclosure or forwarding of this communication is > > strictly prohibited. If you have received this communication in error, > > please contact the sender immediately and delete it from your system. > Thank You. > > > > > > -- > http://hortonworks.com/download/ > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or entity > to which it is addressed and may contain information that is confidential, > privileged and exempt from disclosure under applicable law. If the reader > of this message is not the intended recipient, you are hereby notified that > any printing, copying, dissemination, distribution, disclosure or > forwarding of this communication is strictly prohibited. If you have > received this communication in error, please contact the sender immediately > and delete it from your system. Thank You. >
[jira] [Created] (HADOOP-10685) Move RefreshCallQueue from its own protocol to the GenericRefreshProto
Chris Li created HADOOP-10685: - Summary: Move RefreshCallQueue from its own protocol to the GenericRefreshProto Key: HADOOP-10685 URL: https://issues.apache.org/jira/browse/HADOOP-10685 Project: Hadoop Common Issue Type: Improvement Reporter: Chris Li Assignee: Chris Li Priority: Minor -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10684) Extend HA support for more use cases
Paul Rubio created HADOOP-10684: --- Summary: Extend HA support for more use cases Key: HADOOP-10684 URL: https://issues.apache.org/jira/browse/HADOOP-10684 Project: Hadoop Common Issue Type: Improvement Components: ha Reporter: Paul Rubio Priority: Minor We'd like the current HA framework to be more configurable from a behavior standpoint. In particular: - Add the ability for a HAServiceTarget to survive a configurable number of health check failures (default of 0) before HealthMonitor (HM) reports service not responding or service unhealthy. For instance, predicate the HM on a state machine whose default implementation can be overridden by method or constructor argument. The default would behave the same as today. -- If a target fails a health check but does not exceed the maximum number of consecutive check failures, it’d be desirable if the target and/or controller were alerted. --- i.e. Introduce a SERVICE_DYING state --Additionally, it’d be desirable if a mechanism existed, similar to fencing semantics, for “reviving” a service that transitioned to SERVICE_DYING. --- i.e. attemptRevive(…) - Add the ability to allow a service to completely fail (no failover or failback possible). There are scenarios where allowing a failover or failback could cause more damage. -- E.g. a recovered master with stale data. The master may have been manually recovered (human error). - Add affinity to a particular HAServiceTarget. -- In other words, allow the controller to prefer one target over another when deciding leadership. -- If a higher affinity, but previously unhealthy target, becomes healthy then it should be allowed to become the leader. -- Likewise, if two targets are racing for a ZooKeeper lock, then the controller should "prefer" the higher the affinity target. -- It might make more sense to add a different implementation/subclass of the ZKFailoverController (i.e. ZKAffinityFailoverController) than modify current behavior. Please comment with thoughts/ideas/etc... Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
RE: hadoop-2.5 - June end?
Yes. Suresh. I have merged HDFS-2006 (Extended Attributes) to branch-2. So, that it will be included in 2.5 release. Regards, Uma -Original Message- From: Suresh Srinivas [mailto:sur...@hortonworks.com] Sent: Tuesday, June 10, 2014 10:15 PM To: mapreduce-...@hadoop.apache.org Cc: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org; yarn-...@hadoop.apache.org Subject: Re: hadoop-2.5 - June end? We should also include extended attributes feature for HDFS from HDFS-2006 for release 2.5. On Mon, Jun 9, 2014 at 9:39 AM, Arun C Murthy wrote: > Folks, > > As you can see from the Roadmap wiki, it looks like several items are > still a bit away from being ready. > > I think rather than wait for them, it will be useful to create an > intermediate release (2.5) this month - I think ATS security is pretty > close, so we can ship that. I'm thinking of creating hadoop-2.5 by end > of the month, with a branch a couple of weeks prior. > > Thoughts? > > thanks, > Arun > > > -- > CONFIDENTIALITY NOTICE > NOTICE: This message is intended for the use of the individual or > entity to which it is addressed and may contain information that is > confidential, privileged and exempt from disclosure under applicable > law. If the reader of this message is not the intended recipient, you > are hereby notified that any printing, copying, dissemination, > distribution, disclosure or forwarding of this communication is > strictly prohibited. If you have received this communication in error, > please contact the sender immediately and delete it from your system. Thank > You. > -- http://hortonworks.com/download/ -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Created] (HADOOP-10683) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE
Benoy Antony created HADOOP-10683: - Summary: Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE Key: HADOOP-10683 URL: https://issues.apache.org/jira/browse/HADOOP-10683 Project: Hadoop Common Issue Type: Bug Components: security Reporter: Benoy Antony Assignee: Benoy Antony We have enabled kerberos authentication in our clusters, but we see the following in the log files 2014-06-11 11:07:05,903 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for x...@y.com (*auth:SIMPLE*) 2014-06-11 11:07:05,914 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for x...@y.com (auth:KERBEROS) for protocol=interface This is quite confusing for administrators. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10682) Metrics are not output in trunk
Akira AJISAKA created HADOOP-10682: -- Summary: Metrics are not output in trunk Key: HADOOP-10682 URL: https://issues.apache.org/jira/browse/HADOOP-10682 Project: Hadoop Common Issue Type: Bug Components: metrics Reporter: Akira AJISAKA Metrics are not output in trunk by the following configuration: {code} *sink.file.class=org.apache.Hadoop.metrics2.sink.FileSink *.period=10 namenode.sink.file.filename=namenode-metrics.out {code} The below change worked well. {code} - namenode.sink.file.filename=namenode-metrics.out + NameNode.sink.file.filename=namenode-metrics.out {code} It means that an old configuration doesn't work on trunk. We should fix it or document to use "NameNode". -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Change proposal for FileInputFormat isSplitable
On Wed, Jun 11, 2014 at 1:35 AM, Niels Basjes wrote: > That's not what I meant. What I understood from what was described is that > sometimes people use an existing file extension (like .gz) for a file that > is not a gzipped file. Understood, but this change also applies to other loaded codecs, like .lzo, .bz, etc. Adding a new codec changes the default behavior for all InputFormats that don't override this method. > I consider "silently producing garbage" one of the worst kinds of problem > to tackle. > Because many custom file based input formats have stumbled (getting > "silently produced garbage") over the current isSplitable implementation I > really want to avoid any more of this in the future. > That is why I want to change the implementations in this area of Hadoop in > such a way that this "silently producing garbage" effect is taken out. Adding validity assumptions to a common base class will affect a lot of users, most of whom are not InputFormat authors. > So the question remains: What is the way this should be changed? > I'm willing to build it and submit a patch. Would a logged warning suffice? This would aid debugging without an incompatible change in behavior. It could also be disabled easily. -C >> > The safest way would be either 2 or 4. Solution 3 would effectively be >> the >> > same as the current implementation, yet it would catch the problem >> > situations as long as people stick to normal file name conventions. >> > Solution 3 would also allow removing some code duplication in several >> > subclasses. >> > >> > I would go for solution 3. >> > >> > Niels Basjes >> > > > > -- > Best regards / Met vriendelijke groeten, > > Niels Basjes
[jira] [Created] (HADOOP-10681) Compressor inner methods are all synchronized - within a tight loop
Gopal V created HADOOP-10681: Summary: Compressor inner methods are all synchronized - within a tight loop Key: HADOOP-10681 URL: https://issues.apache.org/jira/browse/HADOOP-10681 Project: Hadoop Common Issue Type: Bug Components: performance Affects Versions: 2.4.0, 2.2.0, 2.5.0 Reporter: Gopal V Attachments: compress-cmpxchg-small.png, perf-top-spill-merge.png The current implementation of SnappyCompressor spends more time within the java loop of copying from the user buffer into the direct buffer allocated to the compressor impl, than the time it takes to compress the buffers. !perf-top-spill-merge.png! The bottleneck was found to be java monitor code inside SnappyCompressor. The methods are neatly inlined by the JIT into the parent caller (BlockCompressorStream::write), which unfortunately does not flatten out the synchronized blocks. !compress-cmpxchg-small.png! The loop does a write of small byte[] buffers (each IFile key+value). I counted approximately 6 monitor enter/exit blocks per k-v pair written. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HADOOP-10680) Document metrics included in MXBean
Akira AJISAKA created HADOOP-10680: -- Summary: Document metrics included in MXBean Key: HADOOP-10680 URL: https://issues.apache.org/jira/browse/HADOOP-10680 Project: Hadoop Common Issue Type: Improvement Components: documentation Reporter: Akira AJISAKA Enhancement of HADOOP-6350. For example, SnapshotInfo metrics in NameNode, DataNodeInfo and FsDatasetState metrics in DataNode are not collected by {{MetricsSystem}} but registered to {{MXBean}} via {{MBeanServer}}, so these metrics can be seen by jmx/jconsole. These metrics should also be documented. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Change proposal for FileInputFormat isSplitable
On Tue, Jun 10, 2014 at 8:10 PM, Chris Douglas wrote: > On Fri, Jun 6, 2014 at 4:03 PM, Niels Basjes wrote: > > and if you then give the file the .gz extension this breaks all common > > sense / conventions about file names. > > That the suffix for all compression codecs in every context- and all > future codecs- should determine whether a file can be split is not an > assumption we can make safely. Again, that's not an assumption that > held when people built their current systems, and they would be justly > annoyed with the project for changing it. That's not what I meant. What I understood from what was described is that sometimes people use an existing file extension (like .gz) for a file that is not a gzipped file. If a file is splittable or not depends greatly on the actual codec implementation that is used to read it. Using the default GzipCodec a .gz file is not splittable, but that can be changed with a different implementation like for example this https://github.com/nielsbasjes/splittablegzip So given a file extension the file 'must' be a file that is the format that is described by the file name extension. The flow is roughly as follows - What is the file extension - Get the codec class registered to that extension - Is this a splittable codec ? (Does this class implement the splittablecodec interface) > I hold "correct data" much higher than performance and scalability; so the > > performance impact is a concern but it is much less important than the > list > > of bugs we are facing right now. > > These are not bugs. NLineInputFormat doesn't support compressed input, > and why would it? -C > I'm not saying it should (in fact, for this one I agree that it shouldn't). The reality is that it accepts the file, decompresses it and then produces output that 'looks good' but really is garbage. I consider "silently producing garbage" one of the worst kinds of problem to tackle. Because many custom file based input formats have stumbled (getting "silently produced garbage") over the current isSplitable implementation I really want to avoid any more of this in the future. That is why I want to change the implementations in this area of Hadoop in such a way that this "silently producing garbage" effect is taken out. So the question remains: What is the way this should be changed? I'm willing to build it and submit a patch. > > The safest way would be either 2 or 4. Solution 3 would effectively be > the > > same as the current implementation, yet it would catch the problem > > situations as long as people stick to normal file name conventions. > > Solution 3 would also allow removing some code duplication in several > > subclasses. > > > > I would go for solution 3. > > > > Niels Basjes > -- Best regards / Met vriendelijke groeten, Niels Basjes