Re: Bugfix release 2.0.4.1

2013-05-16 Thread Roman Shaposhnik
On Wed, May 15, 2013 at 11:09 PM, Vinod Kumar Vavilapalli
 wrote:
>
> It's a little dirty, but Mark and Jarek (presumably from BigTop) ran a 
> patched version with my change at MAPREDUCE-5240.

I've just updated the patch to apply cleanly on branch-2.0.4 (had to refactor
quite a bit of unit tests). Hopefully Cos can apply it tomorrow and we
can start testing Bigtop.

> Otherwise we may end up creating lots of bug-fix releases.

Agreed. The point of 2.0.4.1 is to unblock Bigtop 0.6.0 release. We're
trying to come with stable base line for Hadoop 2.0.x where
all the downstream components will be running smoothly.

At this point a MAPREDUCE-5240 is the only thing holding us
up, but once we start our own round of testing other issues
could pop up. We shall see how it goes.

Thanks,
Roman.


Re: [VOTE] - Release 2.0.5-beta

2013-05-16 Thread Suresh Srinivas
What I am seeing times and again in these endless discussion threads is
> this:
>
>   a) "downstream or bigtop: we are seeing a bunch of integration issues
> with
> every new feature introduced/something even a commit made"
>   b) "feature developers: no-no, these features are developed for a long
> time,
> tests are ran, no need to be concerned"
>

Can you please let me know the list of issues you had in bigtop due to HDFS
features? I would like to understand, first if there are issues and so much
bigtop
discussion, in the context of HDFS features, is warranted.

Regards,
Suresh


-- 
http://hortonworks.com/download/


Re: [VOTE] - Release 2.0.5-beta

2013-05-16 Thread Robert Evans
-0 (Binding)

I have made my opinion known in the previous thread/vote, but I have spent
enough time discussing this and need to get back to my day job. If the
community is able to get snapshots and everything else in this list merged
and stable without breaking the stack above it in two weeks it will be
wonderful, but I have serious doubts that it is going to actually be
possible.

--Bobby

On 5/15/13 12:57 PM, "Arun C Murthy"  wrote:

>Folks,
>
>A considerable number of people have expressed confusion regarding the
>recent vote on 2.0.5, beta status etc. given lack of specifics, the
>voting itself (validity of the vote itself, whose votes are binding) etc.
>
>IMHO technical arguments (incompatibility b/w 2.0 & 2.1, current
>stability of 3 features under debate etc.) have been lost in the
>discussion in favor of non-technical (almost dramatic) nuances such as
>"seizing the moment". There is now dangerous talk of tolerating
>incompatibility b/w 2.0 and 2.1) - this is a red flag for me;
>particularly when there are just 3 features being debated and active
>committers and contributors are confident of and ready to stand by their
>work. All patches, I believe, are ready to be merged in the the next few
>days per discussions on jira. This will, clearly, not delay the other API
>work which everyone agrees is crucial. As a result, I feel no recourse
>but to restart a new vote - all attempts at calm, reasoned, civil
>discussion based on technical arguments have come to naught - I apologize
>for the thrash caused to everyone's attention.
>
>To get past all of this confusion, I'd like to present an alternate,
>specific proposal for consideration.
>
>I propose we continue the original plan and make a 2.0.5-beta release by
>May end with the following content:
># HDFS-347
># HDFS Snapshots
># Windows support
># Necessary & final API/protocol changes such as:
> * Final YARN API changes: YARN-386
> * MR Binary Compatibility: MAPREDUCE-5108
> * Final RPC cleanup: HADOOP-8990
>
>People working on the above features have all expressed considerable
>comfort with them and are ready to stand-by to help expedite any
>necessary bug-fixes etc. to get to stabilization quickly. I'm confident
>we can get this release out by end of May. This sets stage for a
>hadoop-2.x GA release right after with some more testing - this means I
>think I can quickly turn around and make bug-fix releases as necessary
>right after 2.0.5-beta.
>
>I request that people consider helping out with this plan and sign up to
>help push hadoop-2.x to stability as outlined above. I believe this will
>help achieve our shared goals of quickly stabilizing hadoop-2 and help
>ensure we can support it for forseeable future in a compatible manner for
>the benefit of our users and downstream projects.
>
>Please vote, the vote will run the normal 7 days. Obviously, I'm +1.
>
>thanks,
>Arun
>
>PS: To keep this discussion grounded in technical details I've moved this
>to dev@ (bcc general@).
>



Re: [VOTE] - Release 2.0.5-beta

2013-05-16 Thread Steve Loughran
On 15 May 2013 23:19, Konstantin Boudnik  wrote:

> Guys,
>
> I guess what you're missing is that Bigtop isn't a testing framework for
> Hadoop. It is stack framework that verifies that components are dealing
> with
> each other nicely.



which to me means "Some form of integration test"


> Every single stack is different: Bigtop 0.5.0 differs from
> 0.6.0, and so on. Bigtop - as any other ASF project - has its releases that
> might or might not be aligned with particular version of Hadoop. Hence, an
> ethalon stack needs to be defined first and foremost.
>
> Before we even start talking about running it nightly (another question is
> on
> what hardware, let's not get there for now) let's understand who will can
> help
> with triage'ing test failures? Downstreams, Hadoop or Bigtop?
>



>
> Judging by a number of other emails there's a number of people on this list
> who care plenty about integration issues. Any volunteers to help with
> integration testing in the open?
>
>
As I said at the HUG, I want to get the non-swift-FS specific tests that do
things like run Pig jobs against any FS in, though I also need a home for
some very swift-specific partitioned file tests.

> Is this a previously solved problem?
>
> Yes. The problem is solved by separating actively developed (aka unstable)
> release from more mature and less volatile ones.


not in filesystems. If you look how long it took ext4 to be implemented and
then adopted, you can see that nobody put data they cared about on it until
they were happy that what you put on write() came back on a read() [and
stat() returned the amount of data, [seek(X);read()] returned the byte at
offset X and other little details that those of us writing tests for the
filesystem APIs care about]


Re: [VOTE] - Release 2.0.5-beta

2013-05-16 Thread Arun C Murthy
Cos,

On May 15, 2013, at 11:38 PM, Konstantin Boudnik wrote:

> What I am seeing times and again in these endless discussion threads is this:
> 
>  a) "downstream or bigtop: we are seeing a bunch of integration issues with
>every new feature introduced/something even a commit made"
>  b) "feature developers: no-no, these features are developed for a long time,
>tests are ran, no need to be concerned"

It's unfortunate you are continuing to take digs at people who actually are 
moving the project forward.

The 'cold facts' you describe do not give any credence your conclusions.

Let's review the bugs Bigtop has found over the course of this year, Vinod 
pointed them out:

> I quickly checked other bugs you reported in 2.0.x:
> - MAPREDUCE-5088 was caused by the fix for HADOOP-9299 which was again a long 
> standing issue in 2.0.x
> - MAPREDUCE-3728 is similar
> - MAPREDUCE-5117 is similar
> - MAPREDUCE-4219 was a security related feature request from you.
> - MAPREDUCE-3916 was because of new proxy-server added.

And now, MAPREDUCE-5240 - again, a long standing bug.

Given the above, please help me understand how 'feature developers' are hurting?

I've repeatedly asked you or Roman to run "CI" on branch-2, instead of stepping 
up to help on a concrete proposal you continue to take digs at a number of 
contributors here, hopefully this will stop.

Arun



Re: [VOTE] - Release 2.0.5-beta

2013-05-16 Thread Nathan Roberts
(initially respond on general@, sorry about that. copied here)

+1 (non-binding)

From my perspective:

* The key feature that will drive me to adopt 2.x is Rolling Upgrades
* In order to get to rolling upgrades, we need a compatibility story that
is significantly better than we have today
** We need a comprehensive definition of what compatibility really means
  ** We need better testing in place to verify we're not breaking
compatibility
** We need better definition and testing of what rolling upgrades really
means. Rolling between bug-fix releases ­ Required, Rolling between minor
releases ­ Required, Rolling between major releases ­ Desired.
  ** We need work-preserving restart on the YARN side. Restarting jobs
isn't sufficient.
** ...
* Given that Rolling upgrades aren't there yet, and there is still work to
be done to solidify the compatibility story, I'm ok with the feature
window remaining open until these are in place, especially given the fact
that the proposed features are likely to have non-zero impact on
compatibility/rolling_upgrades.
* I'd certainly like a release with rolling upgrades as soon as possible,
so I feel like the feature window needs to ramp down very quickly.
Something like 2.0.5-beta in May with the current list of proposed
features, then 2.0.6-beta in late summer with full rolling upgrade support
and a solid compatibility story, would seem like a reasonable timeline.
Once we have a beta release with rolling upgrades, I can look at pushing
2.x to some of our larger clusters.

Nathan Roberts
nrobe...@yahoo-inc.com



On 5/15/13 1:06 PM, "Vinod Kumar Vavilapalli" 
wrote:

>
>Seems like you forgot to bcc. Forwarding this to general.
>
>Thanks,
>+Vinod
>On May 15, 2013, at 10:57 AM, Arun C Murthy wrote:
>
>> Folks,
>> 
>> A considerable number of people have expressed confusion regarding the
>>recent vote on 2.0.5, beta status etc. given lack of specifics, the
>>voting itself (validity of the vote itself, whose votes are binding) etc.
>> 
>> IMHO technical arguments (incompatibility b/w 2.0 & 2.1, current
>>stability of 3 features under debate etc.) have been lost in the
>>discussion in favor of non-technical (almost dramatic) nuances such as
>>"seizing the moment". There is now dangerous talk of tolerating
>>incompatibility b/w 2.0 and 2.1) - this is a red flag for me;
>>particularly when there are just 3 features being debated and active
>>committers and contributors are confident of and ready to stand by their
>>work. All patches, I believe, are ready to be merged in the the next few
>>days per discussions on jira. This will, clearly, not delay the other
>>API work which everyone agrees is crucial. As a result, I feel no
>>recourse but to restart a new vote - all attempts at calm, reasoned,
>>civil discussion based on technical arguments have come to naught - I
>>apologize for the thrash caused to everyone's attention.
>> 
>> To get past all of this confusion, I'd like to present an alternate,
>>specific proposal for consideration.
>> 
>> I propose we continue the original plan and make a 2.0.5-beta release
>>by May end with the following content:
>> # HDFS-347
>> # HDFS Snapshots
>> # Windows support
>> # Necessary & final API/protocol changes such as:
>> * Final YARN API changes: YARN-386
>> * MR Binary Compatibility: MAPREDUCE-5108
>> * Final RPC cleanup: HADOOP-8990
>> 
>> People working on the above features have all expressed considerable
>>comfort with them and are ready to stand-by to help expedite any
>>necessary bug-fixes etc. to get to stabilization quickly. I'm confident
>>we can get this release out by end of May. This sets stage for a
>>hadoop-2.x GA release right after with some more testing - this means I
>>think I can quickly turn around and make bug-fix releases as necessary
>>right after 2.0.5-beta.
>> 
>> I request that people consider helping out with this plan and sign up
>>to help push hadoop-2.x to stability as outlined above. I believe this
>>will help achieve our shared goals of quickly stabilizing hadoop-2 and
>>help ensure we can support it for forseeable future in a compatible
>>manner for the benefit of our users and downstream projects.
>> 
>> Please vote, the vote will run the normal 7 days. Obviously, I'm +1.
>> 
>> thanks,
>> Arun
>> 
>> PS: To keep this discussion grounded in technical details I've moved
>>this to dev@ (bcc general@).
>> 
>



[jira] [Created] (HADOOP-9567) Provide auto-renewal for keytab based logins

2013-05-16 Thread Harsh J (JIRA)
Harsh J created HADOOP-9567:
---

 Summary: Provide auto-renewal for keytab based logins
 Key: HADOOP-9567
 URL: https://issues.apache.org/jira/browse/HADOOP-9567
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


We do a renewal for cached tickets (obtained via kinit before using a Hadoop 
application) but we explicitly seem to avoid doing a renewal for keytab based 
logins (done from within the client code) when we could do that as well via a 
similar thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9568) Improve JobClient messaging by adding job name to the output

2013-05-16 Thread Scott Bressler (JIRA)
Scott Bressler created HADOOP-9568:
--

 Summary: Improve JobClient messaging by adding job name to the 
output
 Key: HADOOP-9568
 URL: https://issues.apache.org/jira/browse/HADOOP-9568
 Project: Hadoop Common
  Issue Type: Improvement
  Components: conf
 Environment: Hadoop 0.20.203, Linux RHEL5 64-bit
Reporter: Scott Bressler


Currently, the JobClient outputs to the terminal something like the following 
when starting a job:
13/05/17 01:56:52 INFO mapred.JobClient: Running job: job_201305161755_0020

I would like to change to add the job name, making it something like:
13/05/17 01:56:52 INFO mapred.JobClient: Running job: [JOB-NAME] 
(job_201305161755_0020)

Of course the job ID could be kept as the primary naming scheme if that's 
preferred, but I think showcasing the name is best.

Will add a patch by the weekend if no one else does first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira