[jira] [Created] (HADOOP-10490) TestMapFile and TestBloomMapFile leak file descriptors.

2014-04-10 Thread Chris Nauroth (JIRA)
Chris Nauroth created HADOOP-10490:
--

 Summary: TestMapFile and TestBloomMapFile leak file descriptors.
 Key: HADOOP-10490
 URL: https://issues.apache.org/jira/browse/HADOOP-10490
 Project: Hadoop Common
  Issue Type: Bug
  Components: test
Affects Versions: 2.4.0, 3.0.0
Reporter: Chris Nauroth
Assignee: Chris Nauroth
Priority: Minor


Multiple tests in {{TestMapFile}} and {{TestBloomMapFile}} open files but don't 
close them.  On Windows, the leaked file descriptors cause subsequent tests to 
fail, because file locks are still held while trying to delete the test data 
directory.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10489) UserGroupInformation#getTokens and UserGroupInformation#addToken can lead to ConcurrentModificationException

2014-04-10 Thread Jing Zhao (JIRA)
Jing Zhao created HADOOP-10489:
--

 Summary: UserGroupInformation#getTokens and 
UserGroupInformation#addToken can lead to ConcurrentModificationException
 Key: HADOOP-10489
 URL: https://issues.apache.org/jira/browse/HADOOP-10489
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Jing Zhao


Currently UserGroupInformation#getTokens and UserGroupInformation#addToken uses 
UGI's monitor to protect the iteration and modification of 
Credentials#tokenMap. Per 
[discussion|https://issues.apache.org/jira/browse/HADOOP-10475?focusedCommentId=13965851&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13965851]
 in HADOOP-10475, this can still lead to ConcurrentModificationException.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: [VOTE] Release Apache Hadoop 2.4.0

2014-04-10 Thread Chen He
+1(non-binding)
download source code
compile successfully
run wordcount and loadgen without problem


On Tue, Apr 8, 2014 at 11:11 PM, Tsuyoshi OZAWA wrote:

> Hi Arun,
>
> I apologize for the late response.
> If the problems are recognized correctly, +1 for the release(non-binding).
>
> * Ran examples on pseudo distributed cluster.
> * Ran tests.
> * Built from source.
>
> Let's fix the problems at the target version(2.4.1).
>
> Thanks,
> - Tsuyoshi
>
>
> On Wed, Apr 9, 2014 at 4:45 AM, sanjay Radia 
> wrote:
> >
> >
> > +1 binding
> > Verified binaries, ran from binary on single node cluster. Tested some
> HDFS clis and wordcount.
> >
> > sanjay
> > On Apr 7, 2014, at 9:52 AM, Suresh Srinivas 
> wrote:
> >
> >> +1 (binding)
> >>
> >> Verified the signatures and hashes for both src and binary tars. Built
> from
> >> the source, the binary distribution and the documentation. Started a
> single
> >> node cluster and tested the following:
> >> # Started HDFS cluster, verified the hdfs CLI commands such ls, copying
> >> data back and forth, verified namenode webUI etc.
> >> # Ran some tests such as sleep job, TestDFSIO, NNBench etc.
> >>
> >> I agree with Arun's anaylysis. At this time, the bar for blockers
> should be
> >> quite high. We can do a dot release if people want some more bug fixes.
> >>
> >>
> >> On Mon, Mar 31, 2014 at 2:22 AM, Arun C Murthy 
> wrote:
> >>
> >>> Folks,
> >>>
> >>> I've created a release candidate (rc0) for hadoop-2.4.0 that I would
> like
> >>> to get released.
> >>>
> >>> The RC is available at:
> >>> http://people.apache.org/~acmurthy/hadoop-2.4.0-rc0
> >>> The RC tag in svn is here:
> >>> https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0-rc0
> >>>
> >>> The maven artifacts are available via repository.apache.org.
> >>>
> >>> Please try the release and vote; the vote will run for the usual 7
> days.
> >>>
> >>> thanks,
> >>> Arun
> >>>
> >>> --
> >>> Arun C. Murthy
> >>> Hortonworks Inc.
> >>> http://hortonworks.com/
> >>>
> >>>
> >>>
> >>> --
> >>> CONFIDENTIALITY NOTICE
> >>> NOTICE: This message is intended for the use of the individual or
> entity to
> >>> which it is addressed and may contain information that is confidential,
> >>> privileged and exempt from disclosure under applicable law. If the
> reader
> >>> of this message is not the intended recipient, you are hereby notified
> that
> >>> any printing, copying, dissemination, distribution, disclosure or
> >>> forwarding of this communication is strictly prohibited. If you have
> >>> received this communication in error, please contact the sender
> immediately
> >>> and delete it from your system. Thank You.
> >>>
> >>
> >>
> >>
> >> --
> >> http://hortonworks.com/download/
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or
> entity to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader
> >> of this message is not the intended recipient, you are hereby notified
> that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> immediately
> >> and delete it from your system. Thank You.
> >
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>
>
>
> --
> - Tsuyoshi
>


Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Alejandro Abdelnur
A bit of a different angle.

As the bottom of the stack Hadoop has to be conservative in adopting
things, but it should not preclude consumers of Hadoop (downstream projects
and Hadoop application developers) to have additional requirements such as
a higher JDK API than JDK6.

Hadoop 2.x should stick to using JDK6  API
Hadoop 2.x should be tested with multiple runtimes: JDK6, JDK7 and
eventually JDK8
Downstream projects and Hadoop application developers are free to require
any JDK6+ version for development and runtime.

Hadoop 3.x should allow using JDK7 API, bumping the minimum runtime
requirement to JDK7 and be tested with JDK7 and JDK8 runtimes.

Thanks.



On Thu, Apr 10, 2014 at 10:04 AM, Eli Collins  wrote:

> On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran  >wrote:
>
> > On 9 April 2014 23:52, Eli Collins  wrote:
> >
> > >
> > >
> > > For the sake of this discussion we should separate the runtime from
> > > the programming APIs. Users are already migrating to the java7 runtime
> > > for most of the reasons listed below (support, performance, bugs,
> > > etc), and the various distributions cert their Hadoop 2 based
> > > distributions on java7.  This gives users many of the benefits of
> > > java7, without forcing users off java6. Ie Hadoop does not need to
> > > switch to the java7 programming APIs to make sure everyone has a
> > > supported runtime.
> > >
> > >
> > +1: you can use Java 7 today; I'm not sure how tested Java 8 is
> >
> >
> > > The question here is really about when Hadoop, and the Hadoop
> > > ecosystem (since adjacent projects often end up in the same classpath)
> > > start using the java7 programming APIs and therefore break
> > > compatibility with java6 runtimes. I think our java6 runtime users
> > > would consider dropping support for their java runtime in an update of
> > > a major release to be an incompatible change (the binaries stop
> > > working on their current jvm).
> >
> >
> > do you mean major 2.x -> 3.y or minor 2.x -> 2.(x+1)  here?
> >
>
> I mean 2.x --> 2.(x+1).  Ie I'm running the 2.4 stable and upgrading to
> 2.5.
>
>
> >
> >
> > > That may be worth it if we can
> > > articulate sufficient value to offset the cost (they have to upgrade
> > > their environment, might make rolling upgrades stop working, etc), but
> > > I've not yet heard an argument that articulates the value relative to
> > > the cost.  Eg upgrading to the java7 APIs allows us to pull in
> > > dependencies with new major versions, but only if those dependencies
> > > don't break compatibility (which is likely given that our classpaths
> > > aren't so isolated), and, realistically, only if the entire Hadoop
> > > stack moves to java7 as well
> >
> >
> >
> >
> > > (eg we have to recompile HBase to
> > > generate v1.7 binaries even if they stick on API v1.6). I'm not aware
> > > of a feature, bug etc that really motivates this.
> > >
> > > I don't see that being needed unless we move up to new java7+ only
> > libraries and HBase needs to track this.
> >
> >  The big "recompile to work" issue is google guava, which is troublesome
> > enough I'd be tempted to say "can we drop it entirely"
> >
> >
> >
> > > An alternate approach is to keep the current stable release series
> > > (v2.x) as is, and start using new APIs in trunk (for v3). This will be
> > > a major upgrade for Hadoop and therefore an incompatible change like
> > > this is to be expected (it would be great if this came with additional
> > > changes to better isolate classpaths and dependencies from each
> > > other). It allows us to continue to support multiple types of users
> > > with different branches, vs forcing all users onto a new version. It
> > > of course means that 2.x users will not get the benefits of the new
> > > API, but its unclear what those benefits are given theIy can already
> > > get the benefits of adopting the newer java runtimes today.
> > >
> > >
> > >
> > I'm (personally) +1 to this, I also think we should plan to do the switch
> > some time this year to not only get the benefits, but discover the costs
> >
>
>
> Agree
>
>
>
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>



-- 
Alejandro


Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Eli Collins
On Thu, Apr 10, 2014 at 6:49 AM, Raymie Stata  wrote:

> I think the problem to be solved here is to define a point in time
> when the average Hadoop contributor can start using Java7 dependencies
> in their code.
>
> The "use Java7 dependencies in trunk(/branch3)" plan, by itself, does
> not solve this problem.  The average Hadoop contributor wants to see
> their contributions make it into a stable release in a predictable
> amount of time.  Putting code with a Java7 dependency into trunk means
> the exact opposite: there is no timeline to a stable release.  So most
> contributors will stay away from Java7 dependencies, despite the
> nominal policy that they're allowed in trunk.  (And the few that do
> use Java7 dependencies are people who do not value releasing code into
> stable releases, which arguably could lead to a situation that the
> Java7-dependent code in trunk is, on average, on the buggy side.)
>
> I'm not saying the "branch2-in-the-future" plan is the only way to
> solve the problem of putting Java7 dependencies on a known time-table,
> but at least it solves it.  Is there another solution?
>

All good reasons for why we should start thinking about a plan for v3. The
points above pertain to any features for trunk that break compatibility,
not just ones that use new Java APIs.  We shouldn't permit incompatible
changes to merge to v2 just because we don't yet have a timeline for v3, we
should figure out the latter. Also motivates finishing the work to isolate
dependencies between Hadoop code, other framework code, and user code.

Let's speak less abstractly, are there particular features or new
dependencies that you would like to contribute (or see contributed) that
require using the Java 1.7 APIs?  Breaking compat in v2 or rolling a v3
release are both non-trivial, not something I suspect we'd want to do just
because it would be, for example, nicer to have a newer version of Jetty.

Thanks,
Eli






>
> On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran 
> wrote:
> > On 9 April 2014 23:52, Eli Collins  wrote:
> >
> >>
> >>
> >> For the sake of this discussion we should separate the runtime from
> >> the programming APIs. Users are already migrating to the java7 runtime
> >> for most of the reasons listed below (support, performance, bugs,
> >> etc), and the various distributions cert their Hadoop 2 based
> >> distributions on java7.  This gives users many of the benefits of
> >> java7, without forcing users off java6. Ie Hadoop does not need to
> >> switch to the java7 programming APIs to make sure everyone has a
> >> supported runtime.
> >>
> >>
> > +1: you can use Java 7 today; I'm not sure how tested Java 8 is
> >
> >
> >> The question here is really about when Hadoop, and the Hadoop
> >> ecosystem (since adjacent projects often end up in the same classpath)
> >> start using the java7 programming APIs and therefore break
> >> compatibility with java6 runtimes. I think our java6 runtime users
> >> would consider dropping support for their java runtime in an update of
> >> a major release to be an incompatible change (the binaries stop
> >> working on their current jvm).
> >
> >
> > do you mean major 2.x -> 3.y or minor 2.x -> 2.(x+1)  here?
> >
> >
> >> That may be worth it if we can
> >> articulate sufficient value to offset the cost (they have to upgrade
> >> their environment, might make rolling upgrades stop working, etc), but
> >> I've not yet heard an argument that articulates the value relative to
> >> the cost.  Eg upgrading to the java7 APIs allows us to pull in
> >> dependencies with new major versions, but only if those dependencies
> >> don't break compatibility (which is likely given that our classpaths
> >> aren't so isolated), and, realistically, only if the entire Hadoop
> >> stack moves to java7 as well
> >
> >
> >
> >
> >> (eg we have to recompile HBase to
> >> generate v1.7 binaries even if they stick on API v1.6). I'm not aware
> >> of a feature, bug etc that really motivates this.
> >>
> >> I don't see that being needed unless we move up to new java7+ only
> > libraries and HBase needs to track this.
> >
> >  The big "recompile to work" issue is google guava, which is troublesome
> > enough I'd be tempted to say "can we drop it entirely"
> >
> >
> >
> >> An alternate approach is to keep the current stable release series
> >> (v2.x) as is, and start using new APIs in trunk (for v3). This will be
> >> a major upgrade for Hadoop and therefore an incompatible change like
> >> this is to be expected (it would be great if this came with additional
> >> changes to better isolate classpaths and dependencies from each
> >> other). It allows us to continue to support multiple types of users
> >> with different branches, vs forcing all users onto a new version. It
> >> of course means that 2.x users will not get the benefits of the new
> >> API, but its unclear what those benefits are given theIy can already
> >> get the benefits of adopting the newer java runtimes today.
> >>
> >>
>

Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Eli Collins
On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran wrote:

> On 9 April 2014 23:52, Eli Collins  wrote:
>
> >
> >
> > For the sake of this discussion we should separate the runtime from
> > the programming APIs. Users are already migrating to the java7 runtime
> > for most of the reasons listed below (support, performance, bugs,
> > etc), and the various distributions cert their Hadoop 2 based
> > distributions on java7.  This gives users many of the benefits of
> > java7, without forcing users off java6. Ie Hadoop does not need to
> > switch to the java7 programming APIs to make sure everyone has a
> > supported runtime.
> >
> >
> +1: you can use Java 7 today; I'm not sure how tested Java 8 is
>
>
> > The question here is really about when Hadoop, and the Hadoop
> > ecosystem (since adjacent projects often end up in the same classpath)
> > start using the java7 programming APIs and therefore break
> > compatibility with java6 runtimes. I think our java6 runtime users
> > would consider dropping support for their java runtime in an update of
> > a major release to be an incompatible change (the binaries stop
> > working on their current jvm).
>
>
> do you mean major 2.x -> 3.y or minor 2.x -> 2.(x+1)  here?
>

I mean 2.x --> 2.(x+1).  Ie I'm running the 2.4 stable and upgrading to 2.5.


>
>
> > That may be worth it if we can
> > articulate sufficient value to offset the cost (they have to upgrade
> > their environment, might make rolling upgrades stop working, etc), but
> > I've not yet heard an argument that articulates the value relative to
> > the cost.  Eg upgrading to the java7 APIs allows us to pull in
> > dependencies with new major versions, but only if those dependencies
> > don't break compatibility (which is likely given that our classpaths
> > aren't so isolated), and, realistically, only if the entire Hadoop
> > stack moves to java7 as well
>
>
>
>
> > (eg we have to recompile HBase to
> > generate v1.7 binaries even if they stick on API v1.6). I'm not aware
> > of a feature, bug etc that really motivates this.
> >
> > I don't see that being needed unless we move up to new java7+ only
> libraries and HBase needs to track this.
>
>  The big "recompile to work" issue is google guava, which is troublesome
> enough I'd be tempted to say "can we drop it entirely"
>
>
>
> > An alternate approach is to keep the current stable release series
> > (v2.x) as is, and start using new APIs in trunk (for v3). This will be
> > a major upgrade for Hadoop and therefore an incompatible change like
> > this is to be expected (it would be great if this came with additional
> > changes to better isolate classpaths and dependencies from each
> > other). It allows us to continue to support multiple types of users
> > with different branches, vs forcing all users onto a new version. It
> > of course means that 2.x users will not get the benefits of the new
> > API, but its unclear what those benefits are given theIy can already
> > get the benefits of adopting the newer java runtimes today.
> >
> >
> >
> I'm (personally) +1 to this, I also think we should plan to do the switch
> some time this year to not only get the benefits, but discover the costs
>


Agree



> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>


Re: DISCUSS: use SLF4J APIs in new modules?

2014-04-10 Thread Karthik Kambatla
+1 to use slf4j. I would actually vote for (1) new modules must-use, (2)
new classes in existing modules are strongly recommended to use, (3)
existing classes can switch to. That would take us closer to using slf4j
everywhere faster.


On Thu, Apr 10, 2014 at 8:17 AM, Alejandro Abdelnur wrote:

> +1 pn slf4j.
>
> one thing Jay, the issues with log4j will still be there as log4j will
> still be under the hood.
>
> thx
>
> Alejandro
> (phone typing)
>
> > On Apr 10, 2014, at 7:35, Andrew Wang  wrote:
> >
> > +1 from me, it'd be lovely to get rid of all those isDebugEnabled checks.
> >
> >
> >> On Thu, Apr 10, 2014 at 4:13 AM, Jay Vyas  wrote:
> >>
> >> Slf4j is definetly a great step forward.  Log4j is restrictive for
> complex
> >> and multi tenant apps like hadoop.
> >>
> >> Also the fact that slf4j doesn't use any magic when binding to its log
> >> provider makes it way easier to swap out its implementation then tools
> of
> >> the past.
> >>
>  On Apr 10, 2014, at 2:16 AM, Steve Loughran 
> >>> wrote:
> >>>
> >>> If we're thinking of future progress, here's a little low-level one:
> >> adopt
> >>> SLF4J as the API for logging
> >>>
> >>>
> >>>  1. its the new defacto standard of logging APIs
> >>>  2. its a lot better than commons-logging with on demand Inline string
> >>>  expansion of varags arguments.
> >>>  3. we already ship it, as jetty uses it
> >>>  4. we already depend on it, client-side and server-side in the
> >>>  hadoop-auth package
> >>>  5. it lets people log via logback if they want to. That's client-side,
> >>>  even if the server stays on log4j
> >>>  6. It's way faster than using String.format()
> >>>
> >>>
> >>> The best initial thing about SL4FJ is how it only expands its arguments
> >>> string values if needed
> >>>
> >>> LOG.debug("Initialized, principal [{}] from keytab [{}]",
> principal,
> >>> keytab);
> >>>
> >>> not logging at debug? No need to test first. That alone saves code and
> >>> improves readability.
> >>>
> >>> The slf4 expansion code handles null values as well as calling
> toString()
> >>> on non-null arguments. Oh and it does arrays too.
> >>>
> >>> int array = [1, 2, 3];
> >>> String undef = null;
> >>>
> >>> LOG.info("a = {}, u = {}", array, undef)  -> "a = [1, 2, 3],  u = null"
> >>>
> >>> Switching to SLF4J from commons-logging is as trivial as changing the
> >> type
> >>> of the logger created, but with one logger per class that does get
> >>> expensive in terms of change. Moving to SLF4J across the board would
> be a
> >>> big piece of work -but doable.
> >>>
> >>> Rather than push for a dramatic change why not adopt a policy of
> >> demanding
> >>> it in new maven subprojects? hadoop-auth shows we permit it, so why not
> >> say
> >>> "you MUST"?
> >>>
> >>> Once people have experience in using it, and are happy, then we could
> >> think
> >>> about switching to the new APIs in the core modules. The only
> troublespot
> >>> there is where code calls getLogger() on the commons log to get at the
> >>> log4j appender -there's ~3 places in production code that does this,
> 200+
> >>> in tests -tests that do it to turn back log levels. Those tests can
> stay
> >>> with commons-logging, same for the production uses. Mixing
> >> commons-logging
> >>> and slf4j isn't drastic -they both route to log4j or a.n.other back
> end.
> >>>
> >>> -Stevve
> >>>
> >>> --
> >>> CONFIDENTIALITY NOTICE
> >>> NOTICE: This message is intended for the use of the individual or
> entity
> >> to
> >>> which it is addressed and may contain information that is confidential,
> >>> privileged and exempt from disclosure under applicable law. If the
> reader
> >>> of this message is not the intended recipient, you are hereby notified
> >> that
> >>> any printing, copying, dissemination, distribution, disclosure or
> >>> forwarding of this communication is strictly prohibited. If you have
> >>> received this communication in error, please contact the sender
> >> immediately
> >>> and delete it from your system. Thank You.
> >>
>


Re: DISCUSS: use SLF4J APIs in new modules?

2014-04-10 Thread Alejandro Abdelnur
+1 pn slf4j. 

one thing Jay, the issues with log4j will still be there as log4j will still be 
under the hood. 

thx

Alejandro
(phone typing)

> On Apr 10, 2014, at 7:35, Andrew Wang  wrote:
> 
> +1 from me, it'd be lovely to get rid of all those isDebugEnabled checks.
> 
> 
>> On Thu, Apr 10, 2014 at 4:13 AM, Jay Vyas  wrote:
>> 
>> Slf4j is definetly a great step forward.  Log4j is restrictive for complex
>> and multi tenant apps like hadoop.
>> 
>> Also the fact that slf4j doesn't use any magic when binding to its log
>> provider makes it way easier to swap out its implementation then tools of
>> the past.
>> 
 On Apr 10, 2014, at 2:16 AM, Steve Loughran 
>>> wrote:
>>> 
>>> If we're thinking of future progress, here's a little low-level one:
>> adopt
>>> SLF4J as the API for logging
>>> 
>>> 
>>>  1. its the new defacto standard of logging APIs
>>>  2. its a lot better than commons-logging with on demand Inline string
>>>  expansion of varags arguments.
>>>  3. we already ship it, as jetty uses it
>>>  4. we already depend on it, client-side and server-side in the
>>>  hadoop-auth package
>>>  5. it lets people log via logback if they want to. That's client-side,
>>>  even if the server stays on log4j
>>>  6. It's way faster than using String.format()
>>> 
>>> 
>>> The best initial thing about SL4FJ is how it only expands its arguments
>>> string values if needed
>>> 
>>> LOG.debug("Initialized, principal [{}] from keytab [{}]", principal,
>>> keytab);
>>> 
>>> not logging at debug? No need to test first. That alone saves code and
>>> improves readability.
>>> 
>>> The slf4 expansion code handles null values as well as calling toString()
>>> on non-null arguments. Oh and it does arrays too.
>>> 
>>> int array = [1, 2, 3];
>>> String undef = null;
>>> 
>>> LOG.info("a = {}, u = {}", array, undef)  -> "a = [1, 2, 3],  u = null"
>>> 
>>> Switching to SLF4J from commons-logging is as trivial as changing the
>> type
>>> of the logger created, but with one logger per class that does get
>>> expensive in terms of change. Moving to SLF4J across the board would be a
>>> big piece of work -but doable.
>>> 
>>> Rather than push for a dramatic change why not adopt a policy of
>> demanding
>>> it in new maven subprojects? hadoop-auth shows we permit it, so why not
>> say
>>> "you MUST"?
>>> 
>>> Once people have experience in using it, and are happy, then we could
>> think
>>> about switching to the new APIs in the core modules. The only troublespot
>>> there is where code calls getLogger() on the commons log to get at the
>>> log4j appender -there's ~3 places in production code that does this, 200+
>>> in tests -tests that do it to turn back log levels. Those tests can stay
>>> with commons-logging, same for the production uses. Mixing
>> commons-logging
>>> and slf4j isn't drastic -they both route to log4j or a.n.other back end.
>>> 
>>> -Stevve
>>> 
>>> --
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>> to
>>> which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified
>> that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender
>> immediately
>>> and delete it from your system. Thank You.
>> 


Re: DISCUSS: use SLF4J APIs in new modules?

2014-04-10 Thread Andrew Wang
+1 from me, it'd be lovely to get rid of all those isDebugEnabled checks.


On Thu, Apr 10, 2014 at 4:13 AM, Jay Vyas  wrote:

> Slf4j is definetly a great step forward.  Log4j is restrictive for complex
> and multi tenant apps like hadoop.
>
> Also the fact that slf4j doesn't use any magic when binding to its log
> provider makes it way easier to swap out its implementation then tools of
> the past.
>
> > On Apr 10, 2014, at 2:16 AM, Steve Loughran 
> wrote:
> >
> > If we're thinking of future progress, here's a little low-level one:
> adopt
> > SLF4J as the API for logging
> >
> >
> >   1. its the new defacto standard of logging APIs
> >   2. its a lot better than commons-logging with on demand Inline string
> >   expansion of varags arguments.
> >   3. we already ship it, as jetty uses it
> >   4. we already depend on it, client-side and server-side in the
> >   hadoop-auth package
> >   5. it lets people log via logback if they want to. That's client-side,
> >   even if the server stays on log4j
> >   6. It's way faster than using String.format()
> >
> >
> > The best initial thing about SL4FJ is how it only expands its arguments
> > string values if needed
> >
> >  LOG.debug("Initialized, principal [{}] from keytab [{}]", principal,
> > keytab);
> >
> > not logging at debug? No need to test first. That alone saves code and
> > improves readability.
> >
> > The slf4 expansion code handles null values as well as calling toString()
> > on non-null arguments. Oh and it does arrays too.
> >
> > int array = [1, 2, 3];
> > String undef = null;
> >
> > LOG.info("a = {}, u = {}", array, undef)  -> "a = [1, 2, 3],  u = null"
> >
> > Switching to SLF4J from commons-logging is as trivial as changing the
> type
> > of the logger created, but with one logger per class that does get
> > expensive in terms of change. Moving to SLF4J across the board would be a
> > big piece of work -but doable.
> >
> > Rather than push for a dramatic change why not adopt a policy of
> demanding
> > it in new maven subprojects? hadoop-auth shows we permit it, so why not
> say
> > "you MUST"?
> >
> > Once people have experience in using it, and are happy, then we could
> think
> > about switching to the new APIs in the core modules. The only troublespot
> > there is where code calls getLogger() on the commons log to get at the
> > log4j appender -there's ~3 places in production code that does this, 200+
> > in tests -tests that do it to turn back log levels. Those tests can stay
> > with commons-logging, same for the production uses. Mixing
> commons-logging
> > and slf4j isn't drastic -they both route to log4j or a.n.other back end.
> >
> > -Stevve
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>


[jira] [Resolved] (HADOOP-10382) Add Apache Tez to the Hadoop homepage as a related project

2014-04-10 Thread Arun C Murthy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arun C Murthy resolved HADOOP-10382.


Resolution: Fixed

I just committed this.

> Add Apache Tez to the Hadoop homepage as a related project
> --
>
> Key: HADOOP-10382
> URL: https://issues.apache.org/jira/browse/HADOOP-10382
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Reporter: Arun C Murthy
>Assignee: Arun C Murthy
> Attachments: HADOOP-10382.patch, HADOOP-10382.patch
>
>
> Add Apache Tez to the Hadoop homepage as a related project



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Raymie Stata
I think the problem to be solved here is to define a point in time
when the average Hadoop contributor can start using Java7 dependencies
in their code.

The "use Java7 dependencies in trunk(/branch3)" plan, by itself, does
not solve this problem.  The average Hadoop contributor wants to see
their contributions make it into a stable release in a predictable
amount of time.  Putting code with a Java7 dependency into trunk means
the exact opposite: there is no timeline to a stable release.  So most
contributors will stay away from Java7 dependencies, despite the
nominal policy that they're allowed in trunk.  (And the few that do
use Java7 dependencies are people who do not value releasing code into
stable releases, which arguably could lead to a situation that the
Java7-dependent code in trunk is, on average, on the buggy side.)

I'm not saying the "branch2-in-the-future" plan is the only way to
solve the problem of putting Java7 dependencies on a known time-table,
but at least it solves it.  Is there another solution?

On Thu, Apr 10, 2014 at 1:11 AM, Steve Loughran  wrote:
> On 9 April 2014 23:52, Eli Collins  wrote:
>
>>
>>
>> For the sake of this discussion we should separate the runtime from
>> the programming APIs. Users are already migrating to the java7 runtime
>> for most of the reasons listed below (support, performance, bugs,
>> etc), and the various distributions cert their Hadoop 2 based
>> distributions on java7.  This gives users many of the benefits of
>> java7, without forcing users off java6. Ie Hadoop does not need to
>> switch to the java7 programming APIs to make sure everyone has a
>> supported runtime.
>>
>>
> +1: you can use Java 7 today; I'm not sure how tested Java 8 is
>
>
>> The question here is really about when Hadoop, and the Hadoop
>> ecosystem (since adjacent projects often end up in the same classpath)
>> start using the java7 programming APIs and therefore break
>> compatibility with java6 runtimes. I think our java6 runtime users
>> would consider dropping support for their java runtime in an update of
>> a major release to be an incompatible change (the binaries stop
>> working on their current jvm).
>
>
> do you mean major 2.x -> 3.y or minor 2.x -> 2.(x+1)  here?
>
>
>> That may be worth it if we can
>> articulate sufficient value to offset the cost (they have to upgrade
>> their environment, might make rolling upgrades stop working, etc), but
>> I've not yet heard an argument that articulates the value relative to
>> the cost.  Eg upgrading to the java7 APIs allows us to pull in
>> dependencies with new major versions, but only if those dependencies
>> don't break compatibility (which is likely given that our classpaths
>> aren't so isolated), and, realistically, only if the entire Hadoop
>> stack moves to java7 as well
>
>
>
>
>> (eg we have to recompile HBase to
>> generate v1.7 binaries even if they stick on API v1.6). I'm not aware
>> of a feature, bug etc that really motivates this.
>>
>> I don't see that being needed unless we move up to new java7+ only
> libraries and HBase needs to track this.
>
>  The big "recompile to work" issue is google guava, which is troublesome
> enough I'd be tempted to say "can we drop it entirely"
>
>
>
>> An alternate approach is to keep the current stable release series
>> (v2.x) as is, and start using new APIs in trunk (for v3). This will be
>> a major upgrade for Hadoop and therefore an incompatible change like
>> this is to be expected (it would be great if this came with additional
>> changes to better isolate classpaths and dependencies from each
>> other). It allows us to continue to support multiple types of users
>> with different branches, vs forcing all users onto a new version. It
>> of course means that 2.x users will not get the benefits of the new
>> API, but its unclear what those benefits are given theIy can already
>> get the benefits of adopting the newer java runtimes today.
>>
>>
>>
> I'm (personally) +1 to this, I also think we should plan to do the switch
> some time this year to not only get the benefits, but discover the costs
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.


Re: DISCUSS: use SLF4J APIs in new modules?

2014-04-10 Thread Jay Vyas
Slf4j is definetly a great step forward.  Log4j is restrictive for complex and 
multi tenant apps like hadoop.

Also the fact that slf4j doesn't use any magic when binding to its log provider 
makes it way easier to swap out its implementation then tools of the past.

> On Apr 10, 2014, at 2:16 AM, Steve Loughran  wrote:
> 
> If we're thinking of future progress, here's a little low-level one: adopt
> SLF4J as the API for logging
> 
> 
>   1. its the new defacto standard of logging APIs
>   2. its a lot better than commons-logging with on demand Inline string
>   expansion of varags arguments.
>   3. we already ship it, as jetty uses it
>   4. we already depend on it, client-side and server-side in the
>   hadoop-auth package
>   5. it lets people log via logback if they want to. That's client-side,
>   even if the server stays on log4j
>   6. It's way faster than using String.format()
> 
> 
> The best initial thing about SL4FJ is how it only expands its arguments
> string values if needed
> 
>  LOG.debug("Initialized, principal [{}] from keytab [{}]", principal,
> keytab);
> 
> not logging at debug? No need to test first. That alone saves code and
> improves readability.
> 
> The slf4 expansion code handles null values as well as calling toString()
> on non-null arguments. Oh and it does arrays too.
> 
> int array = [1, 2, 3];
> String undef = null;
> 
> LOG.info("a = {}, u = {}", array, undef)  -> "a = [1, 2, 3],  u = null"
> 
> Switching to SLF4J from commons-logging is as trivial as changing the type
> of the logger created, but with one logger per class that does get
> expensive in terms of change. Moving to SLF4J across the board would be a
> big piece of work -but doable.
> 
> Rather than push for a dramatic change why not adopt a policy of demanding
> it in new maven subprojects? hadoop-auth shows we permit it, so why not say
> "you MUST"?
> 
> Once people have experience in using it, and are happy, then we could think
> about switching to the new APIs in the core modules. The only troublespot
> there is where code calls getLogger() on the commons log to get at the
> log4j appender -there's ~3 places in production code that does this, 200+
> in tests -tests that do it to turn back log levels. Those tests can stay
> with commons-logging, same for the production uses. Mixing commons-logging
> and slf4j isn't drastic -they both route to log4j or a.n.other back end.
> 
> -Stevve
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.


Build failed in Jenkins: Hadoop-Common-trunk #1095

2014-04-10 Thread Apache Jenkins Server
See 

Changes:

[tucu] HADOOP-10428. JavaKeyStoreProvider should accept keystore password via 
configuration falling back to ENV VAR. (tucu)

[vinodkv] YARN-1910. Fixed a race condition in TestAMRMTokens that causes the 
test to fail more often on Windows. Contributed by Xuan Gong.

[wheat9] HDFS-6225. Remove the o.a.h.hdfs.server.common.UpgradeStatusReport. 
Contributed by Haohui Mai.

[cnauroth] HDFS-6208. DataNode caching can leak file descriptors. Contributed 
by Chris Nauroth.

[wheat9] HDFS-6170. Support GETFILESTATUS operation in WebImageViewer. 
Contributed by Akira Ajisaka.

[tucu] HADOOP-10429. KeyStores should have methods to generate the materials 
themselves, KeyShell should use them. (tucu)

[tucu] HADOOP-10427. KeyProvider implementations should be thread safe. (tucu)

[tucu] HADOOP-10432. Refactor SSLFactory to expose static method to determine 
HostnameVerifier. (tucu)

[szetszwo] HDFS-6209. TestValidateConfigurationSettings should use random 
ports.  Contributed by Arpit Agarwal

[wheat9] HADOOP-10485. Remove dead classes in hadoop-streaming. Contributed by 
Haohui Mai.

[szetszwo] HDFS-6204. Fix TestRBWBlockInvalidation: change the last sleep to a 
loop.

[szetszwo] HDFS-6206. Fix NullPointerException in 
DFSUtil.substituteForWildcardAddress.

[szetszwo] HADOOP-10473. TestCallQueueManager should interrupt before counting 
calls.

[jeagles] HDFS-6215. Wrong error message for upgrade. (Kihwal Lee via jeagles)

[arp] HDFS-6160. TestSafeMode occasionally fails. (Contributed by Arpit Agarwal)

[kihwal] YARN-1907. TestRMApplicationHistoryWriter#testRMWritingMassiveHistory 
intermittently fails. Contributed by Mit Desai.

[stevel] HADOOP-10104. Update jackson to 1.9.13 (Akira Ajisaka via stevel)

--
[...truncated 60536 lines...]
Adding reference: maven.local.repository
[DEBUG] Initialize Maven Ant Tasks
parsing buildfile 
jar:file:/home/jenkins/.m2/repository/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar!/org/apache/maven/ant/tasks/antlib.xml
 with URI = 
jar:file:/home/jenkins/.m2/repository/org/apache/maven/plugins/maven-antrun-plugin/1.7/maven-antrun-plugin-1.7.jar!/org/apache/maven/ant/tasks/antlib.xml
 from a zip file
parsing buildfile 
jar:file:/home/jenkins/.m2/repository/org/apache/ant/ant/1.8.2/ant-1.8.2.jar!/org/apache/tools/ant/antlib.xml
 with URI = 
jar:file:/home/jenkins/.m2/repository/org/apache/ant/ant/1.8.2/ant-1.8.2.jar!/org/apache/tools/ant/antlib.xml
 from a zip file
Class org.apache.maven.ant.tasks.AttachArtifactTask loaded from parent loader 
(parentFirst)
 +Datatype attachartifact org.apache.maven.ant.tasks.AttachArtifactTask
Class org.apache.maven.ant.tasks.DependencyFilesetsTask loaded from parent 
loader (parentFirst)
 +Datatype dependencyfilesets org.apache.maven.ant.tasks.DependencyFilesetsTask
Setting project property: test.build.dir -> 

Setting project property: test.exclude.pattern -> _
Setting project property: hadoop.assemblies.version -> 3.0.0-SNAPSHOT
Setting project property: test.exclude -> _
Setting project property: distMgmtSnapshotsId -> apache.snapshots.https
Setting project property: project.build.sourceEncoding -> UTF-8
Setting project property: java.security.egd -> file:///dev/urandom
Setting project property: distMgmtSnapshotsUrl -> 
https://repository.apache.org/content/repositories/snapshots
Setting project property: distMgmtStagingUrl -> 
https://repository.apache.org/service/local/staging/deploy/maven2
Setting project property: avro.version -> 1.7.4
Setting project property: test.build.data -> 

Setting project property: commons-daemon.version -> 1.0.13
Setting project property: hadoop.common.build.dir -> 

Setting project property: testsThreadCount -> 4
Setting project property: maven.test.redirectTestOutputToFile -> true
Setting project property: jdiff.version -> 1.0.9
Setting project property: build.platform -> Linux-i386-32
Setting project property: project.reporting.outputEncoding -> UTF-8
Setting project property: distMgmtStagingName -> Apache Release Distribution 
Repository
Setting project property: protobuf.version -> 2.5.0
Setting project property: failIfNoTests -> false
Setting project property: protoc.path -> ${env.HADOOP_PROTOC_PATH}
Setting project property: jersey.version -> 1.9
Setting project property: distMgmtStagingId -> apache.staging.https
Setting project property: distMgmtSnapshotsName -> Apache Development Snapshot 
Repository
Setting project property: ant.file -> 

[DEBUG] Setting

DISCUSS: use SLF4J APIs in new modules?

2014-04-10 Thread Steve Loughran
If we're thinking of future progress, here's a little low-level one: adopt
SLF4J as the API for logging


   1. its the new defacto standard of logging APIs
   2. its a lot better than commons-logging with on demand Inline string
   expansion of varags arguments.
   3. we already ship it, as jetty uses it
   4. we already depend on it, client-side and server-side in the
   hadoop-auth package
   5. it lets people log via logback if they want to. That's client-side,
   even if the server stays on log4j
   6. It's way faster than using String.format()


The best initial thing about SL4FJ is how it only expands its arguments
string values if needed

  LOG.debug("Initialized, principal [{}] from keytab [{}]", principal,
keytab);

not logging at debug? No need to test first. That alone saves code and
improves readability.

The slf4 expansion code handles null values as well as calling toString()
on non-null arguments. Oh and it does arrays too.

 int array = [1, 2, 3];
 String undef = null;

 LOG.info("a = {}, u = {}", array, undef)  -> "a = [1, 2, 3],  u = null"

Switching to SLF4J from commons-logging is as trivial as changing the type
of the logger created, but with one logger per class that does get
expensive in terms of change. Moving to SLF4J across the board would be a
big piece of work -but doable.

Rather than push for a dramatic change why not adopt a policy of demanding
it in new maven subprojects? hadoop-auth shows we permit it, so why not say
"you MUST"?

Once people have experience in using it, and are happy, then we could think
about switching to the new APIs in the core modules. The only troublespot
there is where code calls getLogger() on the commons log to get at the
log4j appender -there's ~3 places in production code that does this, 200+
in tests -tests that do it to turn back log levels. Those tests can stay
with commons-logging, same for the production uses. Mixing commons-logging
and slf4j isn't drastic -they both route to log4j or a.n.other back end.

-Stevve

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-10 Thread Steve Loughran
On 9 April 2014 23:52, Eli Collins  wrote:

>
>
> For the sake of this discussion we should separate the runtime from
> the programming APIs. Users are already migrating to the java7 runtime
> for most of the reasons listed below (support, performance, bugs,
> etc), and the various distributions cert their Hadoop 2 based
> distributions on java7.  This gives users many of the benefits of
> java7, without forcing users off java6. Ie Hadoop does not need to
> switch to the java7 programming APIs to make sure everyone has a
> supported runtime.
>
>
+1: you can use Java 7 today; I'm not sure how tested Java 8 is


> The question here is really about when Hadoop, and the Hadoop
> ecosystem (since adjacent projects often end up in the same classpath)
> start using the java7 programming APIs and therefore break
> compatibility with java6 runtimes. I think our java6 runtime users
> would consider dropping support for their java runtime in an update of
> a major release to be an incompatible change (the binaries stop
> working on their current jvm).


do you mean major 2.x -> 3.y or minor 2.x -> 2.(x+1)  here?


> That may be worth it if we can
> articulate sufficient value to offset the cost (they have to upgrade
> their environment, might make rolling upgrades stop working, etc), but
> I've not yet heard an argument that articulates the value relative to
> the cost.  Eg upgrading to the java7 APIs allows us to pull in
> dependencies with new major versions, but only if those dependencies
> don't break compatibility (which is likely given that our classpaths
> aren't so isolated), and, realistically, only if the entire Hadoop
> stack moves to java7 as well




> (eg we have to recompile HBase to
> generate v1.7 binaries even if they stick on API v1.6). I'm not aware
> of a feature, bug etc that really motivates this.
>
> I don't see that being needed unless we move up to new java7+ only
libraries and HBase needs to track this.

 The big "recompile to work" issue is google guava, which is troublesome
enough I'd be tempted to say "can we drop it entirely"



> An alternate approach is to keep the current stable release series
> (v2.x) as is, and start using new APIs in trunk (for v3). This will be
> a major upgrade for Hadoop and therefore an incompatible change like
> this is to be expected (it would be great if this came with additional
> changes to better isolate classpaths and dependencies from each
> other). It allows us to continue to support multiple types of users
> with different branches, vs forcing all users onto a new version. It
> of course means that 2.x users will not get the benefits of the new
> API, but its unclear what those benefits are given theIy can already
> get the benefits of adopting the newer java runtimes today.
>
>
>
I'm (personally) +1 to this, I also think we should plan to do the switch
some time this year to not only get the benefits, but discover the costs

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.