Re: hadoop-2.5 - June end?

2014-06-11 Thread Karthik Kambatla
+1 to getting 2.5 by end of June. I would love to help with the release in
any way.

Some of the YARN features like work-preserving RM/ NM restart seem to going
well; it would be great to get 2.6 out as soon as they are done, may be end
of July. Again, I am willing to volunteer to do the necessary work.

If this sounds reasonable, I can go ahead and update the Roadmap to reflect
this?



On Wed, Jun 11, 2014 at 12:53 PM, Gangumalla, Uma 
wrote:

> Yes. Suresh.
>
> I have merged HDFS-2006 (Extended Attributes) to branch-2. So, that it
> will be included in 2.5 release.
>
> Regards,
> Uma
>
> -Original Message-
> From: Suresh Srinivas [mailto:sur...@hortonworks.com]
> Sent: Tuesday, June 10, 2014 10:15 PM
> To: mapreduce-...@hadoop.apache.org
> Cc: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org;
> yarn-...@hadoop.apache.org
> Subject: Re: hadoop-2.5 - June end?
>
> We should also include extended attributes feature for HDFS from HDFS-2006
> for release 2.5.
>
>
> On Mon, Jun 9, 2014 at 9:39 AM, Arun C Murthy  wrote:
>
> > Folks,
> >
> >  As you can see from the Roadmap wiki, it looks like several items are
> > still a bit away from being ready.
> >
> >  I think rather than wait for them, it will be useful to create an
> > intermediate release (2.5) this month - I think ATS security is pretty
> > close, so we can ship that. I'm thinking of creating hadoop-2.5 by end
> > of the month, with a branch a couple of weeks prior.
> >
> >  Thoughts?
> >
> > thanks,
> > Arun
> >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or
> > entity to which it is addressed and may contain information that is
> > confidential, privileged and exempt from disclosure under applicable
> > law. If the reader of this message is not the intended recipient, you
> > are hereby notified that any printing, copying, dissemination,
> > distribution, disclosure or forwarding of this communication is
> > strictly prohibited. If you have received this communication in error,
> > please contact the sender immediately and delete it from your system.
> Thank You.
> >
>
>
>
> --
> http://hortonworks.com/download/
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


[jira] [Created] (HADOOP-10685) Move RefreshCallQueue from its own protocol to the GenericRefreshProto

2014-06-11 Thread Chris Li (JIRA)
Chris Li created HADOOP-10685:
-

 Summary: Move RefreshCallQueue from its own protocol to the 
GenericRefreshProto
 Key: HADOOP-10685
 URL: https://issues.apache.org/jira/browse/HADOOP-10685
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Chris Li
Assignee: Chris Li
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10684) Extend HA support for more use cases

2014-06-11 Thread Paul Rubio (JIRA)
Paul Rubio created HADOOP-10684:
---

 Summary: Extend HA support for more use cases
 Key: HADOOP-10684
 URL: https://issues.apache.org/jira/browse/HADOOP-10684
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ha
Reporter: Paul Rubio
Priority: Minor


We'd like the current HA framework to be more configurable from a behavior 
standpoint.  In particular:
- Add the ability for a HAServiceTarget to survive a configurable number of 
health check failures (default of 0) before HealthMonitor (HM) reports service 
not responding or service unhealthy.  For instance, predicate the HM on a state 
machine whose default implementation can be overridden by method or constructor 
argument.  The default would behave the same as today.
-- If a target fails a health check but does not exceed the maximum number of 
consecutive check failures, it’d be desirable if the target and/or controller 
were alerted.
--- i.e. Introduce a SERVICE_DYING state
--Additionally, it’d be desirable if a mechanism existed, similar to fencing 
semantics, for “reviving” a service that transitioned to SERVICE_DYING.
--- i.e. attemptRevive(…)
- Add the ability to allow a service to completely fail (no failover or 
failback possible).  There are scenarios where allowing a failover or failback 
could cause more damage.
-- E.g. a recovered master with stale data.  The master may have been manually 
recovered (human error).
- Add affinity to a particular HAServiceTarget.
-- In other words, allow the controller to prefer one target over another when 
deciding leadership.
-- If a higher affinity, but previously unhealthy target, becomes healthy then 
it should be allowed to become the leader.
-- Likewise, if two targets are racing for a ZooKeeper lock, then the 
controller should "prefer" the higher the affinity target.
-- It might make more sense to add a different implementation/subclass of the 
ZKFailoverController (i.e. ZKAffinityFailoverController) than modify current 
behavior.

Please comment with thoughts/ideas/etc...
Thanks.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


RE: hadoop-2.5 - June end?

2014-06-11 Thread Gangumalla, Uma
Yes. Suresh.

I have merged HDFS-2006 (Extended Attributes) to branch-2. So, that it will be 
included in 2.5 release.

Regards,
Uma

-Original Message-
From: Suresh Srinivas [mailto:sur...@hortonworks.com] 
Sent: Tuesday, June 10, 2014 10:15 PM
To: mapreduce-...@hadoop.apache.org
Cc: common-dev@hadoop.apache.org; hdfs-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: hadoop-2.5 - June end?

We should also include extended attributes feature for HDFS from HDFS-2006 for 
release 2.5.


On Mon, Jun 9, 2014 at 9:39 AM, Arun C Murthy  wrote:

> Folks,
>
>  As you can see from the Roadmap wiki, it looks like several items are 
> still a bit away from being ready.
>
>  I think rather than wait for them, it will be useful to create an 
> intermediate release (2.5) this month - I think ATS security is pretty 
> close, so we can ship that. I'm thinking of creating hadoop-2.5 by end 
> of the month, with a branch a couple of weeks prior.
>
>  Thoughts?
>
> thanks,
> Arun
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. Thank 
> You.
>



--
http://hortonworks.com/download/

--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


[jira] [Created] (HADOOP-10683) Users authenticated with KERBEROS are recorded as being authenticated with SIMPLE

2014-06-11 Thread Benoy Antony (JIRA)
Benoy Antony created HADOOP-10683:
-

 Summary: Users authenticated with KERBEROS are recorded as being 
authenticated with SIMPLE
 Key: HADOOP-10683
 URL: https://issues.apache.org/jira/browse/HADOOP-10683
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Benoy Antony
Assignee: Benoy Antony


We have enabled kerberos authentication in our clusters, but we see the 
following in the log files 

2014-06-11 11:07:05,903 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth 
successful for x...@y.com (*auth:SIMPLE*)
2014-06-11 11:07:05,914 INFO 
SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
 Authorization successful for x...@y.com (auth:KERBEROS) for protocol=interface 

This is quite confusing for administrators.




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10682) Metrics are not output in trunk

2014-06-11 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created HADOOP-10682:
--

 Summary: Metrics are not output in trunk
 Key: HADOOP-10682
 URL: https://issues.apache.org/jira/browse/HADOOP-10682
 Project: Hadoop Common
  Issue Type: Bug
  Components: metrics
Reporter: Akira AJISAKA


Metrics are not output in trunk by the following configuration:
{code}
*sink.file.class=org.apache.Hadoop.metrics2.sink.FileSink
*.period=10
namenode.sink.file.filename=namenode-metrics.out
{code}
The below change worked well.
{code}
- namenode.sink.file.filename=namenode-metrics.out
+ NameNode.sink.file.filename=namenode-metrics.out
{code}
It means that an old configuration doesn't work on trunk. We should fix it or 
document to use "NameNode".



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Change proposal for FileInputFormat isSplitable

2014-06-11 Thread Chris Douglas
On Wed, Jun 11, 2014 at 1:35 AM, Niels Basjes  wrote:
> That's not what I meant. What I understood from what was described is that
> sometimes people use an existing file extension (like .gz) for a file that
> is not a gzipped file.

Understood, but this change also applies to other loaded codecs, like
.lzo, .bz, etc. Adding a new codec changes the default behavior for
all InputFormats that don't override this method.

> I consider "silently producing garbage" one of the worst kinds of problem
> to tackle.
> Because many custom file based input formats have stumbled (getting
> "silently produced garbage") over the current isSplitable implementation I
> really want to avoid any more of this in the future.
> That is why I want to change the implementations in this area of Hadoop in
> such a way that this "silently producing garbage" effect is taken out.

Adding validity assumptions to a common base class will affect a lot
of users, most of whom are not InputFormat authors.

> So the question remains: What is the way this should be changed?
> I'm willing to build it and submit a patch.

Would a logged warning suffice? This would aid debugging without an
incompatible change in behavior. It could also be disabled easily. -C

>> > The safest way would be either 2 or 4. Solution 3 would effectively be
>> the
>> > same as the current implementation, yet it would catch the problem
>> > situations as long as people stick to normal file name conventions.
>> > Solution 3 would also allow removing some code duplication in several
>> > subclasses.
>> >
>> > I would go for solution 3.
>> >
>> > Niels Basjes
>>
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes


[jira] [Created] (HADOOP-10681) Compressor inner methods are all synchronized - within a tight loop

2014-06-11 Thread Gopal V (JIRA)
Gopal V created HADOOP-10681:


 Summary: Compressor inner methods are all synchronized - within a 
tight loop
 Key: HADOOP-10681
 URL: https://issues.apache.org/jira/browse/HADOOP-10681
 Project: Hadoop Common
  Issue Type: Bug
  Components: performance
Affects Versions: 2.4.0, 2.2.0, 2.5.0
Reporter: Gopal V
 Attachments: compress-cmpxchg-small.png, perf-top-spill-merge.png

The current implementation of SnappyCompressor spends more time within the java 
loop of copying from the user buffer into the direct buffer allocated to the 
compressor impl, than the time it takes to compress the buffers.

!perf-top-spill-merge.png!

The bottleneck was found to be java monitor code inside SnappyCompressor.

The methods are neatly inlined by the JIT into the parent caller 
(BlockCompressorStream::write), which unfortunately does not flatten out the 
synchronized blocks.

!compress-cmpxchg-small.png!

The loop does a write of small byte[] buffers (each IFile key+value). 

I counted approximately 6 monitor enter/exit blocks per k-v pair written.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10680) Document metrics included in MXBean

2014-06-11 Thread Akira AJISAKA (JIRA)
Akira AJISAKA created HADOOP-10680:
--

 Summary: Document metrics included in MXBean
 Key: HADOOP-10680
 URL: https://issues.apache.org/jira/browse/HADOOP-10680
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Reporter: Akira AJISAKA


Enhancement of HADOOP-6350.
For example, SnapshotInfo metrics in NameNode, DataNodeInfo and FsDatasetState 
metrics in DataNode are not collected by {{MetricsSystem}} but registered to 
{{MXBean}} via {{MBeanServer}}, so these metrics can be seen by jmx/jconsole.
These metrics should also be documented.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Change proposal for FileInputFormat isSplitable

2014-06-11 Thread Niels Basjes
On Tue, Jun 10, 2014 at 8:10 PM, Chris Douglas  wrote:

> On Fri, Jun 6, 2014 at 4:03 PM, Niels Basjes  wrote:
> > and if you then give the file the .gz extension this breaks all common
> > sense / conventions about file names.
>


> That the suffix for all compression codecs in every context- and all
> future codecs- should determine whether a file can be split is not an
> assumption we can make safely. Again, that's not an assumption that
> held when people built their current systems, and they would be justly
> annoyed with the project for changing it.


That's not what I meant. What I understood from what was described is that
sometimes people use an existing file extension (like .gz) for a file that
is not a gzipped file.
If a file is splittable or not depends greatly on the actual codec
implementation that is used to read it. Using the default GzipCodec a .gz
file is not splittable, but that can be changed with a different
implementation like for example this
https://github.com/nielsbasjes/splittablegzip
So given a file extension the file 'must' be a file that is the format that
is described by the file name extension.

The flow is roughly as follows
- What is the file extension
- Get the codec class registered to that extension
- Is this a splittable codec ? (Does this class implement the
splittablecodec interface)

> I hold "correct data" much higher than performance and scalability; so the
> > performance impact is a concern but it is much less important than the
> list
> > of bugs we are facing right now.
>
> These are not bugs. NLineInputFormat doesn't support compressed input,
> and why would it? -C
>

I'm not saying it should (in fact, for this one I agree that it shouldn't).
The reality is that it accepts the file, decompresses it and then produces
output that 'looks good' but really is garbage.

I consider "silently producing garbage" one of the worst kinds of problem
to tackle.
Because many custom file based input formats have stumbled (getting
"silently produced garbage") over the current isSplitable implementation I
really want to avoid any more of this in the future.
That is why I want to change the implementations in this area of Hadoop in
such a way that this "silently producing garbage" effect is taken out.

So the question remains: What is the way this should be changed?
I'm willing to build it and submit a patch.




> > The safest way would be either 2 or 4. Solution 3 would effectively be
> the
> > same as the current implementation, yet it would catch the problem
> > situations as long as people stick to normal file name conventions.
> > Solution 3 would also allow removing some code duplication in several
> > subclasses.
> >
> > I would go for solution 3.
> >
> > Niels Basjes
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes