2.8 Release activities

2017-07-12 Thread Haohui Mai
Hi,

Just curious -- what is the current status of the 2.8 release? It looks
like the release process for some time.

There are 5 or 6 blocker / critical bugs of the upcoming 2.8 release:

https://issues.apache.org/jira/browse/YARN-6654?jql=project%20in%20(HDFS%2C%20HADOOP%2C%20MAPREDUCE%2C%20YARN)%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%20AND%20priority%20in%20(Blocker%2C%20Critical)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.8.2%2C%202.8.3)

I think we can address them in reasonable amount of effort.

We are interested in putting 2.8.x in production and it would be great to
have a maintenance Apache release for the 2.8 line.

I wonder, are there any concerns of not getting the release out? We might
be able to get some helps internally to fix the issues in the 2.8 lines. I
can also volunteer to be the release manager for 2.8.2 if it requires more
effort to coordinate to push the release out.

Regards,
Haohui


Re: About 2.7.4 Release

2017-04-25 Thread Haohui Mai
It would be great to backport HDFS-9710 to 2.7.4 as this is one of the
critical fixes on scalability. Maybe we should create a jira to track
this?

~Haohui

On Tue, Apr 25, 2017 at 12:06 AM, Akira Ajisaka  wrote:
> Ping
>
> I too can help with the release process.
>
> Now there are 0 blocker and 6 critical issues targeted for 2.7.4.
> https://s.apache.org/HsIu
>
> If there are critical/blocker issues that need to be fixed in branch-2.7,
> please set Target Version/s to 2.7.4. That way the issues can be found by
> the above query.
>
> I'll check if there are conflicts among JIRA, git commit log, and the change
> logs.
>
> Regards,
> Akira
>
>
> On 2017/04/18 15:40, Brahma Reddy Battula wrote:
>>
>> Hi All
>>
>> Any update on 2.7.4 ..?  Gentle Remainder!! Let me know anything I can
>> help on this..
>>
>>
>>
>> Regards
>> Brahma Reddy Battula
>>
>> -Original Message-
>> From: Andrew Wang [mailto:andrew.w...@cloudera.com]
>> Sent: 08 March 2017 04:22
>> To: Sangjin Lee
>> Cc: Marton Elek; Hadoop Common; yarn-...@hadoop.apache.org; Hdfs-dev;
>> mapreduce-...@hadoop.apache.org
>> Subject: Re: About 2.7.4 Release
>>
>> Our release steps are documented on the wiki:
>>
>> 2.6/2.7:
>>
>> https://wiki.apache.org/hadoop/HowToReleasePreDSBCR
>>
>> 2.8+:
>> https://wiki.apache.org/hadoop/HowToRelease
>>
>> I think given the push toward 2.8 and 3.0, there's less interest in
>> streamlining the 2.6 and 2.7 release processes. CHANGES.txt is the biggest
>> pain, and that's fixed in 2.8+.
>>
>> Current pain points for 2.8+ include:
>>
>> # fixing up JIRA versions and the release notes, though I somewhat
>> addressed this with the versions script for 3.x # making and staging an RC
>> and sending the vote email still requires a lot of manual steps # publishing
>> the release is also quite manual
>>
>> I think the RC issues can be attacked with enough scripting. Steve had an
>> ant file that automated a lot of this for slider. I think it'd be nice to
>> have a nightly Jenkins job that builds an RC, since I've spent a day or two
>> for each 3.x alpha fixing build issues.
>>
>> Publishing can be attacked via a mix of scripting and revamping the darned
>> website. Forrest is pretty bad compared to the newer static site generators
>> out there (e.g. need to write XML instead of markdown, it's hard to review a
>> staging site because of all the absolute links, hard to customize, did I
>> mention XML?), and the look and feel of the site is from the 00s. We don't
>> actually have that much site content, so it should be possible to migrate to
>> a new system.
>>
>> On Tue, Mar 7, 2017 at 9:13 AM, Sangjin Lee  wrote:
>>
>>> I don't think there should be any linkage between releasing 2.8.0 and
>>> 2.7.4. If we have a volunteer for releasing 2.7.4, we should go full
>>> speed ahead. We still need a volunteer from a PMC member or a
>>> committer as some tasks may require certain privileges, but I don't
>>> think it precludes working with others to close down the release.
>>>
>>> I for one would like to see more frequent releases, and being able to
>>> automate release steps more would go a long way.
>>>
>>> On Tue, Mar 7, 2017 at 2:16 AM, Marton Elek 
>>> wrote:
>>>
 Is there any reason to wait for 2.8 with 2.7.4?

 Unfortunately the previous  thread about release cadence has been
 ended without final decision. But if I understood well, there was
 more or less
>>>
>>> an

 agreement about that it would be great to achieve more frequent
 releases, if possible (with or without written rules and EOL policy).

 I personally prefer to be more closer to the scheduling part of the
 proposal:

 "A minor release on the latest major line should be every 6 months,
 and a maintenance release on a minor release (as there may be
 concurrently maintained minor releases) every 2 months".

 I don't know what is the hardest part of creating new
 minor/maintenance releases. But if the problems are technical
 (smoketesting, unit tests,
>>>
>>> old

 release script, anything else) I would be happy to do any task for
 new maintenance releases (or more frequent releases).

 Regards,
 Marton


 
 From: Akira Ajisaka 
 Sent: Tuesday, March 07, 2017 7:34 AM
 To: Brahma Reddy Battula; Hadoop Common; yarn-...@hadoop.apache.org;
 Hdfs-dev; mapreduce-...@hadoop.apache.org
 Subject: Re: About 2.7.4 Release

 Probably 2.8.0 will be released soon.
 https://issues.apache.org/jira/browse/HADOOP-13866?
 focusedCommentId=15898379&page=com.atlassian.jira.
 plugin.system.issuetabpanels:comment-tabpanel#comment-15898379

 I'm thinking 2.7.4 release process starts after 2.8.0 release, so
 2.7.4 will be released in April or May. (hopefully)

 Thoughts?

 Regards,
 Akira

 On 2017/03/01 21:01, Brahma Reddy Battula wrote:
>
> Hi All
>

[jira] [Resolved] (HADOOP-9994) Incorrect ANT_HOME references in build.xml in branch-1

2017-04-24 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-9994.

Resolution: Won't Fix

> Incorrect ANT_HOME references in build.xml in branch-1
> --
>
> Key: HADOOP-9994
> URL: https://issues.apache.org/jira/browse/HADOOP-9994
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Haohui Mai
>Assignee: Haohui Mai
>Priority: Minor
> Attachments: HDFS-5261.000.patch
>
>
> In branch-1, {code}ant eclipse{code} reads build.xml and generates eclipse 
> projects.
> However, the reference of ANT_HOME is incorrect therefore ant fails to 
> generate a working eclipse project out-of-the-box.
>  



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13136) shade protobuf in the hadoop-common jar

2016-05-12 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-13136:
---

 Summary: shade protobuf in the hadoop-common jar
 Key: HADOOP-13136
 URL: https://issues.apache.org/jira/browse/HADOOP-13136
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Haohui Mai


While Protobuf has good wire compatibility, its implementation has been changed 
from time to time. It might be a good idea to shade it in for better 
compatibility.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: 2.7.3 release plan

2016-04-03 Thread Haohui Mai
+1 on option 3. I volunteer to help on releasing 2.8.0 as 2.7.3 +
HDFS-8578 and HDFS-8791.

~Haohui

On Fri, Apr 1, 2016 at 2:54 PM, Chris Trezzo  wrote:
> A few thoughts:
>
> 1. To echo Andrew Wang, HDFS-8578 (parallel upgrades) should be a
> prerequisite for HDFS-8791. Without that patch, upgrades can be very slow
> for data nodes depending on your setup.
>
> 2. We have already deployed this patch internally so, with my Twitter hat
> on, I would be perfectly happy as long as it makes it into trunk and 2.8.
> That being said, I would be hesitant to deploy the current 2.7.x or 2.6.x
> releases on a large production cluster that has a diverse set of block ids
> without this patch, especially if your data nodes have a large number of
> disks or you are using federation. To be clear though: this highly depends
> on your setup and at a minimum you should verify that this regression will
> not affect you. The current block-id based layout in 2.6.x and 2.7.2 has a
> performance regression that gets worse over time. When you see it happening
> on a live cluster, it is one of the harder issues to identify a root cause
> and debug. I do understand that this is currently only affecting a smaller
> number of users, but I also think this number has potential to increase as
> time goes on. Maybe we can issue a warning in the release notes for future
> 2.7.x and 2.6.x releases?
>
> 3. One option (this was suggested on HDFS-8791 and I think Sean alluded to
> this proposal on this thread) would be to cut a 2.8 release off of the
> 2.7.3 release with the new layout. What people currently think of as 2.8
> would then become 2.9. This would give customers a stable release that they
> could deploy with the new layout and would not break upgrade and downgrade
> expectations.
>
> On Fri, Apr 1, 2016 at 11:32 AM, Andrew Purtell  wrote:
>
>> As a downstream consumer of Apache Hadoop 2.7.x releases, I expect we would
>> patch the release to revert HDFS-8791 before pushing it out to production.
>> For what it's worth.
>>
>>
>> On Fri, Apr 1, 2016 at 11:23 AM, Andrew Wang 
>> wrote:
>>
>> > One other thing I wanted to bring up regarding HDFS-8791, we haven't
>> > backported the parallel DN upgrade improvement (HDFS-8578) to branch-2.6.
>> > HDFS-8578 is a very important related fix since otherwise upgrade will be
>> > very slow.
>> >
>> > On Thu, Mar 31, 2016 at 10:35 AM, Andrew Wang 
>> > wrote:
>> >
>> > > As I expressed on HDFS-8791, I do not want to include this JIRA in a
>> > > maintenance release. I've only seen it crop up on a handful of our
>> > > customer's clusters, and large users like Twitter and Yahoo that seem
>> to
>> > be
>> > > more affected are also the most able to patch this change in
>> themselves.
>> > >
>> > > Layout upgrades are quite disruptive, and I don't think it's worth
>> > > breaking upgrade and downgrade expectations when it doesn't affect the
>> > (in
>> > > my experience) vast majority of users.
>> > >
>> > > Vinod seemed to have a similar opinion in his comment on HDFS-8791, but
>> > > will let him elaborate.
>> > >
>> > > Best,
>> > > Andrew
>> > >
>> > > On Thu, Mar 31, 2016 at 9:11 AM, Sean Busbey 
>> > wrote:
>> > >
>> > >> As of 2 days ago, there were already 135 jiras associated with 2.7.3,
>> > >> if *any* of them end up introducing a regression the inclusion of
>> > >> HDFS-8791 means that folks will have cluster downtime in order to back
>> > >> things out. If that happens to any substantial number of downstream
>> > >> folks, or any particularly vocal downstream folks, then it is very
>> > >> likely we'll lose the remaining trust of operators for rolling out
>> > >> maintenance releases. That's a pretty steep cost.
>> > >>
>> > >> Please do not include HDFS-8791 in any 2.6.z release. Folks having to
>> > >> be aware that an upgrade from e.g. 2.6.5 to 2.7.2 will fail is an
>> > >> unreasonable burden.
>> > >>
>> > >> I agree that this fix is important, I just think we should either cut
>> > >> a version of 2.8 that includes it or find a way to do it that gives an
>> > >> operational path for rolling downgrade.
>> > >>
>> > >> On Thu, Mar 31, 2016 at 10:10 AM, Junping Du 
>> > wrote:
>> > >> > Thanks for bringing up this topic, Sean.
>> > >> > When I released our latest Hadoop release 2.6.4, the patch of
>> > HDFS-8791
>> > >> haven't been committed in so that's why we didn't discuss this
>> earlier.
>> > >> > I remember in JIRA discussion, we treated this layout change as a
>> > >> Blocker bug that fixing a significant performance regression before
>> but
>> > not
>> > >> a normal performance improvement. And I believe HDFS community already
>> > did
>> > >> their best with careful and patient to deliver the fix and other
>> related
>> > >> patches (like upgrade fix in HDFS-8578). Take an example of HDFS-8578,
>> > you
>> > >> can see 30+ rounds patch review back and forth by senior committers,
>> > not to
>> > >> mention the outstanding performance test data in HDFS-8791.
>> > >> > I woul

[jira] [Created] (HADOOP-12921) hadoop-minicluster should depend on hadoop-auth

2016-03-13 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-12921:
---

 Summary: hadoop-minicluster should depend on hadoop-auth
 Key: HADOOP-12921
 URL: https://issues.apache.org/jira/browse/HADOOP-12921
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Haohui Mai


Downstream projects fail to spawn a {{MiniDFSCluster}} as AuthenticationFilter 
is not in the dependency of the {{hadoop-minicluster}} package:

{noformat}
java.lang.NoSuchMethodError: 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.constructSecretProvider(Ljavax/servlet/ServletContext;Ljava/util/Properties;Z)Lorg/apache/hadoop/security/authentication/util/SignerSecretProvider;
at 
org.apache.hadoop.http.HttpServer2.constructSecretProvider(HttpServer2.java:447)
at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:340)
at org.apache.hadoop.http.HttpServer2.(HttpServer2.java:114)
at org.apache.hadoop.http.HttpServer2$Builder.build(HttpServer2.java:290)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:126)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:752)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:638)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:811)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:795)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1488)
  ...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: node.js and more as dependencies

2016-03-02 Thread Haohui Mai
+1 on adding npm / gulp / mocha to be part of the build dependencies
(not runtime). The main benefit is modularity -- for example, in the
HDFS UI we manually duplicate the navigation bars and footer. We don't
have unit tests for them due to the lack of infrastructure.

In my opinion Introducing npm will effectively bridge the gap.

However, I'm not entirely convinced by the Ember.js argument -- I
understand it might provide better integration with Ambari, but there
are clear trends that the industry is moving to a more reactive UI
design. I think the decision of using ember.js exclusively might worth
revisiting. To me it makes more sense to move both HDFS / YARN towards
React and go from there.

~Haohui

On Mon, Feb 29, 2016 at 5:14 PM, Wangda Tan  wrote:
> Hi Colin,
>
> Thanks for comment, I think your concerns are all valid but also arguable:
>
> First, YARN UI is different from HDFS UI, it is much more complex:
> 1. We have many data models, such as app/container/system/node, etc. UI of
> HDFS is more like a file explorer.
> 2. We plan to add more rich data visualization to YARN UI to make admin can
> easier identify what happened. For example, using sunburst map
>  to render usage breakdown of
> cluster/queue/user/app, etc.
> 3. We need to get data from different sources with different format. For
> example, application's running container information stores at RM and
> finished container information stores at Timeline server. We need to get
> data from both daemon, normalize these data (because REST API is different)
> and aggregate them.
>
> Ember.js could simplify what we should do a lot:
> - It has great data store design so we can easily normalize data model from
> different sources with different formats (adapter)
> - It can easily bind data model with view, so any changes to data store
> (like application status updated) can trigger page re-rendering without any
> additional works.
> - Besides binding data with view, it can also bind data to other computed
> properties. For example, if property of a model relies on another model,
> all properties/models will be updated altogether.
> - Integrates bower/broccoli/watchman to help with package
> management/build/development.
> - For other benefits, please refer to Why Ember?
>  slides.
>
> The plan of nextgen YARN UI is not only inherit and prettify existing YARN
> UI. I hope it can let users can get deep insight of what happens in their
> cluster. As you said, a simple JS framework can also achieve what we wanna
> to do, but using well designed framework can avoid reinvent the wheel.
>
> Regarding to your concerns about JS compilation/compaction, I think it is
> not conflict with open source: In source folder (git repository), all code
> are readable. Compilation/compaction code only exists in released code. I
> agree that we don't need obfuscation at all, but source code compaction
> could increase performance a lot, we could have heavy rendering tasks, such
> as visualization from statuses of 10K+ applications. Just like Java code of
> Hadoop, no user will try to get source code from a running cluster :).
>
> I will make sure integration to Maven is as less as possible, we should
> only need one single sub module, and limit all changes in that module only.
>
> Please let me know if you have any other concerns.
>
> Thanks,
> Wangda
>
> On Mon, Feb 29, 2016 at 8:51 AM, Colin P. McCabe  wrote:
>
>> Hmm.  Devil's advocate here: Do we really need to have a "JS build"?
>>
>> The main use-cases for "JS builds" seem to be if you want to minimize
>> or obfuscate your JS.  Given that this is open source code,
>> obfuscation seems unnecessary.  Given that it's a low-traffic
>> management interface, minimizing the JS seems like a premature
>> optimization.
>>
>> The HDFS user interface is based on dust.js, and it just requires JS
>> files to be copied into the correct location.
>>
>> Perhaps there are advantages to ember.js that I am missing.  But
>> there's also a big advantage to not having to manage a node.js build
>> system separate from Maven and CMake.  What do you think?
>>
>> best,
>> Colin
>>
>> On Thu, Feb 25, 2016 at 11:18 AM, Wangda Tan  wrote:
>> > Hi Allen,
>> >
>> > YARN-3368 is using Ember.JS and Ember.JS depends on npm (Node.JS Package
>> > Manager) to manage packages.
>> >
>> > One thing to clarify is: npm dependency is only required by build stage
>> (JS
>> > build is stitching source files and renaming variables). After JS build
>> > completes, there's no dependency of Node.JS any more. Server such as RM
>> > only needs to run a HTTP server to host JS files, and browser will take
>> > care of page rendering, just like HDFS/Spark/Mesos UI.
>> >
>> > There're a couple of other Apache projects are using Ember.JS, such as
>> > Tez/Ambari. Ember.JS can help front-end developers easier manage models,
>> > pages, events and packages.
>> >
>> > Thanks,
>> > Wa

Re: Introduce Apache Kerby to Hadoop

2016-02-29 Thread Haohui Mai
Handling Kerberos is similar to what we have done for WebHDFS now. Kerby
will be in the picture but things are much simpler.

If protobuf is a concern, why not shading it into hadoop-common? The
generated binaries might not be compatible but the wire format is.


On Mon, Feb 29, 2016 at 1:55 AM Steve Loughran 
wrote:

>
> > On 27 Feb 2016, at 19:02, Haohui Mai  wrote:
> >
> > Have we evaluated GRPC? A robust RPC requires significant effort.
> Migrating
> > to GRPC can save ourselves a lot of headache.
> >
>
> That's the google protobuf 3 based GRPC? More specifically,
> protobufVersion = '3.0.0-beta-2'?
>
> That's successor to the protobuf.jar whose Alejandro-choreographed
> cross-project upgrade caused the "great protobuf upgrade of 2013"? That's
> the protobuf library where some of us have seriously considered forking the
> library so that we could have a version of protobuf which would link across
> java classes generated with older versions?
>
> We have enough problems working with a released version of protobuf
> breaking across minor point releases, whose guava JARs are a recurrent
> source of cross version compatibility pain?
>
>
> I would rather stab myself in the leg with a fork —repeatedly— than adopt
> something based on a beta-release of a google artifact as critical path of
> the Hadoop RPC chain.
>
> While google are pretty obsessive about wire format compatibility across
> languages and versions, we just can't trust google to maintain binary
> compatibility, primarily due to a build process which clean builds
> everything from scratch. They don't have the same problem of trying to
> nudge things up across a loosely coupled set of projects, including those
> who still have requirements of JAR-sharing compatibility with older hadoop
> versions. Indeed, for those projects, being backwards compatible with
> Hadoop 1.x (no protobuf) is easier than working with Hadoop 2.205, purely
> due to to that protobuf difference.
>
>
>  Even when protobuf 3.0 finally ships, we should hold back even adopting
> it for its current role until 3.1 comes out so we can asses google's
> compatibility policy in the 3.x line.
>
>
> > Haohui
> > On Sat, Feb 27, 2016 at 1:35 AM Andrew Purtell  >
> > wrote:
> >
> >> I get a excited thinking about the prospect of better performance with
> >> auth-conf QoP. HBase RPC is an increasingly distant fork but still close
> >> enough to Hadoop in that respect. Our bulk data transfer protocol isn't
> a
> >> separate thing like in HDFS, which avoids a SASL wrapped
> implementation, so
> >> we really suffer when auth-conf is negotiated. You'll see the same
> impact
> >> where there might be a high frequency of NameNode RPC calls or similar
> >> still. Throughput drops 3-4x, or worse.
> >>
> >>> On Feb 22, 2016, at 4:56 PM, Zheng, Kai  wrote:
> >>>
> >>> Thanks for the confirm and further inputs, Steve.
> >>>
> >>>>> the latter would dramatically reduce the cost of wire-encrypting IPC.
> >>> Yes to optimize Hadoop IPC/RPC encryption is another opportunity Kerby
> >> can help with, it's possible because we may hook Chimera or AES-NI thing
> >> into the Kerberos layer by leveraging the Kerberos library. As it may be
> >> noted, HADOOP-12725 is on the going for this aspect. There may be good
> >> result and further update on this recently.
> >>>
> >>>>> For now, I'd like to see basic steps -upgrading minkdc to krypto, see
> >> how it works.
> >>> Yes, starting with this initial steps upgrading MiniKDC to use Kerby is
> >> the right thing we could do. After some interactions with Kerby
> project, we
> >> may have more ideas how to proceed on the followings.
> >>>
> >>>>> Long term, I'd like Hadoop 3 to be Kerby-ized
> >>> This sounds great! With necessary support from the community like
> >> feedback and patch reviewing, we can speed up the related work.
> >>>
> >>> Regards,
> >>> Kai
> >>>
> >>> -Original Message-
> >>> From: Steve Loughran [mailto:ste...@hortonworks.com]
> >>> Sent: Monday, February 22, 2016 6:51 PM
> >>> To: common-dev@hadoop.apache.org
> >>> Subject: Re: Introduce Apache Kerby to Hadoop
> >>>
> >>>
> >>>
> >>> I've discussed this offline with Kai, as part of the "let's fix
> >> kerberos" project. Not only is it a better Kerbero

Re: Introduce Apache Kerby to Hadoop

2016-02-27 Thread Haohui Mai
Have we evaluated GRPC? A robust RPC requires significant effort. Migrating
to GRPC can save ourselves a lot of headache.

Haohui
On Sat, Feb 27, 2016 at 1:35 AM Andrew Purtell 
wrote:

> I get a excited thinking about the prospect of better performance with
> auth-conf QoP. HBase RPC is an increasingly distant fork but still close
> enough to Hadoop in that respect. Our bulk data transfer protocol isn't a
> separate thing like in HDFS, which avoids a SASL wrapped implementation, so
> we really suffer when auth-conf is negotiated. You'll see the same impact
> where there might be a high frequency of NameNode RPC calls or similar
> still. Throughput drops 3-4x, or worse.
>
> > On Feb 22, 2016, at 4:56 PM, Zheng, Kai  wrote:
> >
> > Thanks for the confirm and further inputs, Steve.
> >
> >>> the latter would dramatically reduce the cost of wire-encrypting IPC.
> > Yes to optimize Hadoop IPC/RPC encryption is another opportunity Kerby
> can help with, it's possible because we may hook Chimera or AES-NI thing
> into the Kerberos layer by leveraging the Kerberos library. As it may be
> noted, HADOOP-12725 is on the going for this aspect. There may be good
> result and further update on this recently.
> >
> >>> For now, I'd like to see basic steps -upgrading minkdc to krypto, see
> how it works.
> > Yes, starting with this initial steps upgrading MiniKDC to use Kerby is
> the right thing we could do. After some interactions with Kerby project, we
> may have more ideas how to proceed on the followings.
> >
> >>> Long term, I'd like Hadoop 3 to be Kerby-ized
> > This sounds great! With necessary support from the community like
> feedback and patch reviewing, we can speed up the related work.
> >
> > Regards,
> > Kai
> >
> > -Original Message-
> > From: Steve Loughran [mailto:ste...@hortonworks.com]
> > Sent: Monday, February 22, 2016 6:51 PM
> > To: common-dev@hadoop.apache.org
> > Subject: Re: Introduce Apache Kerby to Hadoop
> >
> >
> >
> > I've discussed this offline with Kai, as part of the "let's fix
> kerberos" project. Not only is it a better Kerberos engine, we can do more
> diagnostics, get better algorithms and ultimately get better APIs for doing
> Kerberos and SASL —the latter would dramatically reduce the cost of
> wire-encrypting IPC.
> >
> > For now, I'd like to see basic steps -upgrading minkdc to krypto, see
> how it works.
> >
> > Long term, I'd like Hadoop 3 to be Kerby-ized
> >
> >
> >> On 22 Feb 2016, at 06:41, Zheng, Kai  wrote:
> >>
> >> Hi folks,
> >>
> >> I'd like to mention Apache Kerby [1] here to the community and propose
> to introduce the project to Hadoop, a sub project of Apache Directory
> project.
> >>
> >> Apache Kerby is a Kerberos centric project and aims to provide a first
> Java Kerberos library that contains both client and server supports. The
> relevant features include:
> >> It supports full Kerberos encryption types aligned with both MIT KDC
> >> and MS AD; Client APIs to allow to login via password, credential
> >> cache, keytab file and etc.; Utilities for generate, operate and
> >> inspect keytab and credential cache files; A simple KDC server that
> >> borrows some ideas from Hadoop-MiniKDC and can be used in tests but
> >> with minimal overhead in external dependencies; A brand new token
> mechanism is provided, can be experimentally used, using it a JWT token can
> be used to exchange a TGT or service ticket; Anonymous PKINIT support, can
> be experientially used, as the first Java library that supports the
> Kerberos major extension.
> >>
> >> The project stands alone and is ensured to only depend on JRE for
> easier usage. It has made the first release (1.0.0-RC1) and 2nd release
> (RC2) is upcoming.
> >>
> >>
> >> As an initial step, this proposal suggests using Apache Kerby to
> upgrade the existing codes related to ApacheDS for the Kerberos support.
> The advantageous:
> >>
> >> 1. The kerby-kerb library is all the need, which is purely in Java,
> >> SLF4J is the only dependency, the whole is rather small;
> >>
> >> 2. There is a SimpleKDC in the library for test usage, which borrowed
> >> the MiniKDC idea and implemented all the support existing in MiniKDC.
> >> We had a POC that rewrote MiniKDC using Kerby SimpleKDC and it works
> >> fine;
> >>
> >> 3. Full Kerberos encryption types (many of them are not available in
> >> JRE but supported by major Kerberos vendors) and more functionalities
> >> like credential cache support;
> >>
> >> 4. Perhaps the most concerned, Hadoop MiniKDC and etc. depend on the
> >> old Kerberos implementation in Directory Server project, but the
> >> implementation is stopped being maintained. Directory project has a
> >> plan to replace the implementation using Kerby. MiniKDC can use Kerby
> >> directly to simplify the deps;
> >>
> >> 5. Extensively tested with all kinds of unit tests, already being used
> >> for some time (like PSU), even in production environment;
> >>
> >> 6. Actively developed, and can be fixed and rel

Re: Google Cloud Storage connector into Hadoop

2015-12-07 Thread Haohui Mai
Hi,

Thanks for reaching out. It would be great to see this in the Hadoop ecosystem.

In Hadoop we have AWS S3 support. IMO they address similar use cases
thus I think that it should be relatively straightforward to adopt the
code.

The only catch in my head right now is to properly isolate dependency.
Not only the code needs to be put into a separate module, but many
Hadoop applications also depend on different versions of Guava. I
think it might be a problem that needs some attentions at the very
beginning.

Please feel free to reach out if you have any other questions.

Regards,
Haohui


On Mon, Dec 7, 2015 at 2:35 PM, James Malone
 wrote:
> Hello,
>
> We're from a team within Google Cloud Platform focused on OSS and data
> technologies, especially Hadoop (and Spark.) Before we cut a JIRA for
> something we’d like to do, we wanted to reach out to this list to ask a two
> quick questions, describe our proposed action, and check for any major
> objections.
>
> Proposed action:
> We have a Hadoop connector[1] (more info[2]) for Google Cloud Storage (GCS)
> which we have been building and maintaining for some time. After we clean
> up our code and tests to conform (to these[3] and other requirements) we
> would like to contribute it to Hadoop. We have many customers using the
> connector in high-throughput production Hadoop clusters; we’d like to make
> it easier and faster to use Hadoop and GCS.
>
> Timeline:
> Presently, we are working on the beta of Google Cloud Dataproc[4] which
> limits our time a bit, so we’re targeting late Q1 2016 for creating a JIRA
> issue and adapting our connector code as needed.
>
> Our (quick) questions:
> * Do we need to take any (non-coding) action for this beyond submitting a
> JIRA when we are ready?
> * Are there any up-front concerns or questions which we can (or will need
> to) address?
>
> Thank you!
>
> James Malone
> On behalf of the Google Big Data OSS Engineering Team
>
> Links:
> [1] - https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs
> [2] - https://cloud.google.com/hadoop/google-cloud-storage-connector
> [3] - https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs
> [4] - https://cloud.google.com/dataproc


Re: Nightly Jenkins job for maintenance releases

2015-12-04 Thread Haohui Mai
Ping...

I really appreciate if I can have the permissions on creating Jenkins
job so that I can tweak things to help the release process.

Thanks,
Haohui

On Mon, Nov 30, 2015 at 1:42 PM, Sangjin Lee  wrote:
> FWIW when I used it during the 2.6.2 release (thanks Andrew for setting
> this up), the builds never finished completely due to memory issues. Some
> tuning and debugging would be needed to make this work reliably.
>
> Sangjin
>
> On Mon, Nov 30, 2015 at 11:18 AM, Andrew Wang 
> wrote:
>
>> I set up a parameterized branch-2 job during the 2.6.2 release process, so
>> you can do on-demand builds there for whatever branch:
>>
>>
>> https://builds.apache.org/view/H-L/view/Hadoop/job/Hadoop-branch2-parameterized/
>>
>> We already have a nightly branch-2 jobs and the set of flaky tests normally
>> isn't that different from branch to branch, so the parameterized job might
>> suffice for release purposes.
>>
>> Best,
>> Andrew
>>
>> On Mon, Nov 30, 2015 at 9:58 AM, Haohui Mai  wrote:
>>
>> > Hi,
>> >
>> > I'm helping Junping to get Hadoop 2.6.3 out of the door. Sanjin found
>> > out that there are some unit tests that are failing.
>> >
>> > I wonder, is it possible to set up some nightly Jenkins jobs for the
>> > maintenance lines so that we have better ideas on the quality of the
>> > releases? We can just turn it on during the release time to minimize
>> > the work on the Jenkins machines.
>> >
>> > Does it sound reasonable? I appreciate if someone can grant me the
>> > permissions on ASF Jenkins so that I can work on this.
>> >
>> > Regards,
>> > Haohui
>> >
>>


Nightly Jenkins job for maintenance releases

2015-11-30 Thread Haohui Mai
Hi,

I'm helping Junping to get Hadoop 2.6.3 out of the door. Sanjin found
out that there are some unit tests that are failing.

I wonder, is it possible to set up some nightly Jenkins jobs for the
maintenance lines so that we have better ideas on the quality of the
releases? We can just turn it on during the release time to minimize
the work on the Jenkins machines.

Does it sound reasonable? I appreciate if someone can grant me the
permissions on ASF Jenkins so that I can work on this.

Regards,
Haohui


[jira] [Resolved] (HADOOP-12601) findbugs highlights problem with FsPermission

2015-11-25 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-12601.
-
Resolution: Duplicate

Fixed in HDFS-9451.

> findbugs highlights problem with FsPermission
> -
>
> Key: HADOOP-12601
> URL: https://issues.apache.org/jira/browse/HADOOP-12601
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, io
>Affects Versions: 3.0.0
> Environment: yetus
>Reporter: Steve Loughran
>
> Findbugs is warning of a problem in {{FsPermission}}
> {code}
> n class org.apache.hadoop.fs.permission.FsPermission
> In method org.apache.hadoop.fs.permission.FsPermission.getUMask(Configuration)
> Local variable named oldUmask
> At FsPermission.java:[line 249]
> {code}
> This may actually be a sign of a bug in the code, but as it's reading a key 
> tagged as deprecated since 2010 and to be culled in 0.23, maybe cutting the 
> line is the strategy. After all, if the code has been broken, and nobody 
> complained, that deprecation worked



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12592) Remove guava usage in the hdfs-client module

2015-11-23 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-12592:
---

 Summary: Remove guava usage in the hdfs-client module
 Key: HADOOP-12592
 URL: https://issues.apache.org/jira/browse/HADOOP-12592
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai


The following classes in hdfs-client use Google's guava library:

{noformat}
./src/main/java/org/apache/hadoop/hdfs/BlockReaderFactory.java
./src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
./src/main/java/org/apache/hadoop/hdfs/ClientContext.java
./src/main/java/org/apache/hadoop/hdfs/DFSClient.java
./src/main/java/org/apache/hadoop/hdfs/DFSClientFaultInjector.java
./src/main/java/org/apache/hadoop/hdfs/DFSInotifyEventInputStream.java
./src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
./src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java
./src/main/java/org/apache/hadoop/hdfs/DFSStripedInputStream.java
./src/main/java/org/apache/hadoop/hdfs/DFSStripedOutputStream.java
./src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java
./src/main/java/org/apache/hadoop/hdfs/DataStreamer.java
./src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
./src/main/java/org/apache/hadoop/hdfs/KeyProviderCache.java
./src/main/java/org/apache/hadoop/hdfs/NameNodeProxiesClient.java
./src/main/java/org/apache/hadoop/hdfs/PeerCache.java
./src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
./src/main/java/org/apache/hadoop/hdfs/StripedDataStreamer.java
./src/main/java/org/apache/hadoop/hdfs/XAttrHelper.java
./src/main/java/org/apache/hadoop/hdfs/client/HdfsDataInputStream.java
./src/main/java/org/apache/hadoop/hdfs/client/HdfsDataOutputStream.java
./src/main/java/org/apache/hadoop/hdfs/client/impl/DfsClientConf.java
./src/main/java/org/apache/hadoop/hdfs/client/impl/LeaseRenewer.java
./src/main/java/org/apache/hadoop/hdfs/protocol/BlockStoragePolicy.java
./src/main/java/org/apache/hadoop/hdfs/protocol/CacheDirectiveInfo.java
./src/main/java/org/apache/hadoop/hdfs/protocol/CacheDirectiveIterator.java
./src/main/java/org/apache/hadoop/hdfs/protocol/DatanodeID.java
./src/main/java/org/apache/hadoop/hdfs/protocol/LocatedBlock.java
./src/main/java/org/apache/hadoop/hdfs/protocol/SnapshotDiffReport.java
./src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PacketHeader.java
./src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PacketReceiver.java
./src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/PipelineAck.java
./src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/DataTransferSaslUtil.java
./src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/sasl/SaslDataTransferClient.java
./src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientDatanodeProtocolTranslatorPB.java
./src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
./src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
./src/main/java/org/apache/hadoop/hdfs/server/datanode/BlockMetadataHeader.java
./src/main/java/org/apache/hadoop/hdfs/shortcircuit/DfsClientShm.java
./src/main/java/org/apache/hadoop/hdfs/shortcircuit/DfsClientShmManager.java
./src/main/java/org/apache/hadoop/hdfs/shortcircuit/DomainSocketFactory.java
./src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitCache.java
./src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitReplica.java
./src/main/java/org/apache/hadoop/hdfs/shortcircuit/ShortCircuitShm.java
./src/main/java/org/apache/hadoop/hdfs/util/ByteArrayManager.java
./src/main/java/org/apache/hadoop/hdfs/util/StripedBlockUtil.java
./src/main/java/org/apache/hadoop/hdfs/web/ByteRangeInputStream.java
./src/main/java/org/apache/hadoop/hdfs/web/JsonUtilClient.java
./src/main/java/org/apache/hadoop/hdfs/web/TokenAspect.java
./src/main/java/org/apache/hadoop/hdfs/web/URLConnectionFactory.java
./src/main/java/org/apache/hadoop/hdfs/web/WebHdfsFileSystem.java
./src/main/java/org/apache/hadoop/hdfs/web/resources/UserParam.java
./src/test/java/org/apache/hadoop/hdfs/TestPeerCache.java
./src/test/java/org/apache/hadoop/hdfs/client/impl/TestLeaseRenewer.java
./src/test/java/org/apache/hadoop/hdfs/web/TestByteRangeInputStream.java
./src/test/java/org/apache/hadoop/hdfs/web/TestURLConnectionFactory.java
{noformat}

Guava has created quite a few dependency headache for downstream, it would be 
nice to not using Guava code in the hdfs-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: continuing releases on Apache Hadoop 2.6.x

2015-11-18 Thread Haohui Mai
Hi,

I can help on releasing 2.6.3.

~Haohui


On Wed, Nov 18, 2015 at 8:20 AM, Chris Trezzo  wrote:
> Thanks Junping for the clarification! It was not my intention to violate
> the rules. I would be happy to work with you and help you manage the
> release in whatever way is most effective.
>
> Chris
>
> On Wednesday, November 18, 2015, Junping Du  wrote:
>
>> Thanks Chris Trezzo for volunteer on helping 2.6.3 release. I think
>> Sangjin was asking for a committer to serve as release manager for 2.6.3
>> according to Apache rules:
>> http://www.apache.org/dev/release-publishing.html.
>> I would like to serve as that role to work closely with you and Sangjin on
>> 2.6.3 release if no objects from others.
>>
>> Thanks,
>>
>> Junping
>> 
>> From: Chris Trezzo >
>> Sent: Wednesday, November 18, 2015 1:13 AM
>> To: yarn-...@hadoop.apache.org 
>> Cc: common-dev@hadoop.apache.org ;
>> hdfs-...@hadoop.apache.org ; mapreduce-...@hadoop.apache.org
>> 
>> Subject: Re: continuing releases on Apache Hadoop 2.6.x
>>
>> Hi Sangjin,
>>
>> I would be happy to volunteer to work with you as a release manager for
>> 2.6.3. Shooting for a time in early December seems reasonable to me. I also
>> agree that if we miss that window, January would be the next best option.
>>
>> Thanks,
>> Chris
>>
>> On Tue, Nov 17, 2015 at 5:10 PM, Sangjin Lee > > wrote:
>>
>> > I'd like to pick up this email discussion again. It is time that we
>> started
>> > thinking about the next release in the 2.6.x line. IMO we want to walk
>> the
>> > balance between maintaining a reasonable release cadence and getting a
>> good
>> > amount of high-quality fixes. The timeframe is a little tricky as the
>> > holidays are approaching. If we have enough fixes accumulated in
>> > branch-2.6, some time early December might be a good target for cutting
>> the
>> > first release candidate. Once we miss that window, I think we are looking
>> > at next January. I'd like to hear your thoughts on this.
>> >
>> > It'd be good if someone can volunteer for the release manager for 2.6.3.
>> > I'd be happy to help out in any way I can. Thanks!
>> >
>> > Regards,
>> > Sangjin
>> >
>> > On Mon, Nov 2, 2015 at 11:45 AM, Vinod Vavilapalli <
>> > vino...@hortonworks.com >
>> > wrote:
>> >
>> > > Just to stress on the following, it is very important that any critical
>> > > bug-fixes that we push into 2.8.0 or even trunk, we should consider
>> them
>> > > for 2.6.3 and 2.7.3 if it makes sense. This is the only way we can
>> avoid
>> > > extremely long release cycles like that of 2.6.1.
>> > >
>> > > Also, to clarify a little, use Target-version if you want a discussion
>> of
>> > > the backport, but if you do end up backporting patches after that, you
>> > > should set the fix-version to be 2.6.1.
>> > >
>> > > Thanks
>> > > +Vinod
>> > >
>> > >
>> > > > On Nov 2, 2015, at 11:29 AM, Sangjin Lee > > wrote:
>> > > >
>> > > > As you may have seen, 2.6.2 is out
>> > > > . I have also
>> retargeted
>> > > all
>> > > > open issues that were targeted for 2.6.2 to 2.6.3.
>> > > >
>> > > > Continuing the discussion in the email thread here
>> > > > , I'd like us to
>> maintain
>> > > the
>> > > > cadence of monthly point releases in the 2.6.x line. It would be
>> great
>> > if
>> > > > we can have 2.6.3 released before the year-end holidays.
>> > > >
>> > > > If you have any bugfixes and improvements that are targeted for 2.7.x
>> > (or
>> > > > 2.8) that you think are applicable to 2.6.x, please *set the target
>> > > version
>> > > > to 2.6.3* and merge them to branch-2.6. Please use your judgment in
>> > terms
>> > > > of the applicability and quality of the changes so that we can ensure
>> > > each
>> > > > point release is consistently better quality than the previous one.
>> > > Thanks
>> > > > everyone!
>> > > >
>> > > > Regards,
>> > > > Sangjin
>> > >
>> > >
>> >
>>


[jira] [Created] (HADOOP-12579) Deprecate and remove WriteableRPCEngine

2015-11-17 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-12579:
---

 Summary: Deprecate and remove WriteableRPCEngine
 Key: HADOOP-12579
 URL: https://issues.apache.org/jira/browse/HADOOP-12579
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai


The {{WriteableRPCEninge}} depends on Java's serialization mechanisms for RPC 
requests. Without proper checks, it has be shown that it can lead to security 
vulnerabilities such as remote code execution (e.g., COLLECTIONS-580, 
HADOOP-12577).

The current implementation has migrated from {{WriteableRPCEngine}} to 
{{ProtobufRPCEngine}} now. This jira proposes to deprecate 
{{WriteableRPCEngine}} in branch-2 and to remove it in trunk.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Haohui Mai
bq. If and only if they take the Hadoop class path at face value.
Many applications don’t because of conflicting dependencies and
instead import specific jars.

We do make the assumptions that applications need to pick up all the
dependency (either automatically or manually). The situation is
similar with adding a new dependency into hdfs in a minor release.

Maven / gradle obviously help, but I'd love to hear more about it how
you get it to work. In trunk hadoop-env.sh adds 118 jars into the
class path. Are you manually importing 118 jars for every single
applications?



On Wed, Nov 11, 2015 at 3:09 PM, Haohui Mai  wrote:
> bq. currently pulling in hadoop-client gives downstream apps
> hadoop-hdfs-client, but not hadoop-hdfs server side, right?
>
> Right now hadoop-client pulls in hadoop-hdfs directly to ensure a
> smooth transition. Maybe we can revisit the decision in the 2.9 / 3.x?
>
> On Wed, Nov 11, 2015 at 3:00 PM, Steve Loughran  
> wrote:
>>
>>> On 11 Nov 2015, at 22:15, Haohui Mai  wrote:
>>>
>>> bq.  it basically makes the assumption that everyone recompiles for
>>> every minor release.
>>>
>>> I don't think that the statement holds. HDFS-6200 keeps classes in the
>>> same package. hdfs-client becomes a transitive dependency of the
>>> original hdfs jar.
>>>
>>> Applications continue to work without recompilation as the classes
>>> will be in the same name and will be available in the classpath. They
>>> have the option of switching to depending only on hdfs-client to
>>> minimize the dependency when they are comfortable.
>>>
>>> I'm not claiming that there are no bugs in HDFS-6200, but just like
>>> other features we discover bugs and fix them continuously.
>>>
>>> ~Haohui
>>>
>>
>> currently pulling in hadoop-client gives downstream apps hadoop-hdfs-client, 
>> but not hadoop-hdfs server side, right?


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Haohui Mai
bq. currently pulling in hadoop-client gives downstream apps
hadoop-hdfs-client, but not hadoop-hdfs server side, right?

Right now hadoop-client pulls in hadoop-hdfs directly to ensure a
smooth transition. Maybe we can revisit the decision in the 2.9 / 3.x?

On Wed, Nov 11, 2015 at 3:00 PM, Steve Loughran  wrote:
>
>> On 11 Nov 2015, at 22:15, Haohui Mai  wrote:
>>
>> bq.  it basically makes the assumption that everyone recompiles for
>> every minor release.
>>
>> I don't think that the statement holds. HDFS-6200 keeps classes in the
>> same package. hdfs-client becomes a transitive dependency of the
>> original hdfs jar.
>>
>> Applications continue to work without recompilation as the classes
>> will be in the same name and will be available in the classpath. They
>> have the option of switching to depending only on hdfs-client to
>> minimize the dependency when they are comfortable.
>>
>> I'm not claiming that there are no bugs in HDFS-6200, but just like
>> other features we discover bugs and fix them continuously.
>>
>> ~Haohui
>>
>
> currently pulling in hadoop-client gives downstream apps hadoop-hdfs-client, 
> but not hadoop-hdfs server side, right?


Re: [DISCUSS] Looking to a 2.8.0 release

2015-11-11 Thread Haohui Mai
bq.  it basically makes the assumption that everyone recompiles for
every minor release.

I don't think that the statement holds. HDFS-6200 keeps classes in the
same package. hdfs-client becomes a transitive dependency of the
original hdfs jar.

Applications continue to work without recompilation as the classes
will be in the same name and will be available in the classpath. They
have the option of switching to depending only on hdfs-client to
minimize the dependency when they are comfortable.

I'm not claiming that there are no bugs in HDFS-6200, but just like
other features we discover bugs and fix them continuously.

~Haohui


On Wed, Nov 11, 2015 at 12:43 PM, Allen Wittenauer  wrote:
>
>> On Nov 11, 2015, at 12:13 PM, Vinod Vavilapalli  
>> wrote:
>>
>>— HDFS-6200 Create a separate jar for hdfs-client: Compatible improvement 
>> - no dimension of alpha/betaness here.
>
> IMO: this feels like a massive break in backwards compatibility. 
> Anyone who is looking for specific methods in specific jars are going to have 
> a bad time. Also, it seems as though every week a new issue crops up that is 
> related to this change.  Is Slider still having problems with it?  The 
> reasoning “well, the pom sets the dependencies so it’s ok” feels like an 
> *extremely weak* reason this wasn’t marked incompatible— it basically makes 
> the assumption that everyone recompiles for every minor release.
>
>>— Compatibility tools to catch backwards, forwards compatibility issues 
>> at patch submission, release times. Some of it is captured at YARN-3292. 
>> This also involves resurrecting jdiff 
>> (HADOOP-11776/YARN-3426/MAPREDUCE-6310) and/or investing in new tools.
>
> There has been talk in the past about adding Java ACC support to 
> Yetus.
>
>> Thoughts?
>
> I’d rather see efforts on 3.x than another disastrous 2.x release.  
> The track record is not good.  At least a new major will signify that danger 
> looms ahead.  We’re already treating 2.x minor releases as effectively major 
> (see the list of incompatible JIRAs) so what different does it make if we do 
> 2.x vs. 3.x anyway?
>
>>
>> Thanks
>> +Vinod
>> PS:As you may have noted above, this time around, I want to do something 
>> that we’ve always wanted to do, but never explicitly did. I’m calling out 
>> readiness of each feature as they stand today so we can inform our users 
>> better of what they can start relying on in production clusters.
>
> … except some of these changes are so deep reaching that even if you 
> don’t use the feature, you’re still impacted by it ...
>
>


Re: Github integration for Hadoop

2015-11-10 Thread Haohui Mai
I have to some scripts that are tailored for the current git workflow,
which is available at

https://github.com/haohui/hdcli

It's relatively straightforward to make it support github.

~Haohui

On Tue, Nov 10, 2015 at 9:31 AM, Karthik Kambatla  wrote:
> Owen: Thanks for putting the documentation together. I think I understand
> the proposal better now.
>
> I agree that reviewing on github is easier than having to download the
> patch, apply locally, and for reviews copy-paste code to the JIRA. This, we
> get from RB or any other review tool as well.
>
> The committer process seems rather involved though. I am afraid the
> complicated process will adversely affect the amount of time people spend
> committing. The review is simpler though. By any chance, are there scripts
> available (from Spark etc.) or can we put some together to download a patch
> based on PRs? If yes, we could commit it as we have been doing so far.
>
> While admitting to my bias towards gerrit, even objectively, I feel using
> gerrit as a contributor/committer is way simpler.
>
>
>
> On Fri, Nov 6, 2015 at 7:21 AM, Owen O'Malley  wrote:
>
>> On Sun, Nov 1, 2015 at 2:07 PM, Chris Douglas  wrote:
>>
>> > Wow, this happened quickly.
>> >
>> > Owen, could you please create a Wiki describing the proposal and
>> > cataloging infra references so others can understand the
>> > implementation in detail? Even after reading this thread, I'm still
>> > confused what changes this proposes and how the integration works. A
>> > document pairing open questions with answers/workarounds would help
>> > this converge.
>> >
>> > Ok, I used Mahout's page as a basis. Take a look:
>>
>> https://wiki.apache.org/hadoop/GithubIntegration
>>


Re: Github integration for Hadoop

2015-10-29 Thread Haohui Mai
+1

On Thu, Oct 29, 2015 at 10:55 AM, Hitesh Shah  wrote:
> +1 on supporting patch contributions through github pull requests.
>
> — Hitesh
>
> On Oct 29, 2015, at 10:47 AM, Owen O'Malley  wrote:
>
>> All,
>>   For code & patch review, many of the newer projects are using the Github
>> pull request integration. You can read about it here:
>>
>> https://blogs.apache.org/infra/entry/improved_integration_between_apache_and
>>
>> It basically lets you:
>> * have mirroring between comments on pull requests and jira
>> * lets you close pull requests
>> * have mirroring between pull request comments and the Apache mail lists
>>
>> Thoughts?
>> .. Owen
>


Re: hadoop-hdfs-client splitoff is going to break code

2015-10-23 Thread Haohui Mai
All tests that need to spin up a MiniDFSCluster will need to stay in
hadoop-hdfs. Other client only tests are being moved to the
hadoop-hdfs-client module, which is tracked in HDFS-9168.

~Haohui

On Fri, Oct 23, 2015 at 2:14 PM, Kihwal Lee
 wrote:
> I am not sure whether it was mentioned by anyone before, butI noticed that 
> client only changes do not trigger running anytest in hdfs-precommit. This is 
> because hadoop-hdfs-client does nothave any test.
> Kihwal
>
>   From: Colin P. McCabe 
>  To: "hdfs-...@hadoop.apache.org" 
> Cc: "common-dev@hadoop.apache.org" 
>  Sent: Monday, October 19, 2015 4:01 PM
>  Subject: Re: hadoop-hdfs-client splitoff is going to break code
>
> Thanks for being proactive here, Steve.  I think this is a good example of
> why this change should have been done in a branch rather than having been
> done directly in trunk.
>
> regards,
> Colin
>
>
>
>
> On Wed, Oct 14, 2015 at 10:36 AM, Steve Loughran 
> wrote:
>
>> just an FYI, the split off of hadoop hdfs into client and server is going
>> to break things.
>>
>> I know that, as my code is broken; DFSConfigKeys off the path,
>> HdfsConfiguration, the class I've been loading to force pickup of
>> hdfs-site.xml -all missing.
>>
>> This is because hadoop-client  POM now depends on hadoop-hdfs-client, not
>> hadoop-hdfs, so the things I'm referencing are gone. I'm particularly sad
>> about DfsConfigKeys, as everybody uses it as the one hard-coded resource of
>> HDFS constants, HDFS-6566 covering the issue of making this public,
>> something that's been sitting around for a year.
>>
>> I'm fixing my build by explicitly adding a hadoop-hdfs dependency.
>>
>> Any application which used stuff which has now been declared server-side
>> isn't going to compile any more, which does appear to break the
>> compatibility guidelines we've adopted, specifically "The hadoop-client
>> artifact (maven groupId:artifactId) stays compatible within a major release"
>>
>>
>> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Build_artifacts
>>
>>
>> We need to do one of
>>
>> 1. agree that this change, is considered acceptable according to policy,
>> and mark it as incompatible in hdfs/CHANGES.TXT
>> 2. Change the POMs to add both hdfs-client and -hdfs server in
>> hadoop-client -with downstream users free to exclude the server code
>>
>> We unintentionally caused similar grief with the move of the s3n clients
>> to hadoop-aws , HADOOP-11074 -something we should have picked up and -1'd.
>> This time we know the problems going to arise, so lets explicitly make a
>> decision this time, and share it with our users.
>>
>> -steve
>>
>
>
>


[jira] [Created] (HADOOP-12507) Move retry policy to hadoop-common-client

2015-10-23 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-12507:
---

 Summary: Move retry policy to hadoop-common-client
 Key: HADOOP-12507
 URL: https://issues.apache.org/jira/browse/HADOOP-12507
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai


The retry policy is used by both HDFS and YARN clients to implement client-side 
HA failover. This jira proposes to move them to the hadoop-common-client module.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Status: hadoop-common precommit

2015-10-22 Thread Haohui Mai
bq. It’s an extremely rare event that this type of re-ordering occurs,
so risk/reward favors risk.

Unfortunately HDFS-9214 requires building hadoop-hdfs-client before
building hadoop-hdfs, which leads to the failure. Same problem for
HADOOP-12500 as it introduces a new module hadoop-common-client.

Can you please post a more detailed pointer on how to fix this?

~Haohui

On Wed, Oct 21, 2015 at 5:48 PM, Allen Wittenauer  wrote:
>
> On Oct 21, 2015, at 5:11 PM, Allen Wittenauer  wrote:
>
>>
>>> https://issues.apache.org/jira/browse/HADOOP-12500?focusedCommentId=14968065&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14968065
>>
>>   This is yetus.
>>
>>>
>>> Looks like a bug?
>
>
> Actually, looking closer at this last one, this is a fun one.  I’m 
> still experimenting, but it looks like a backwards incompatible change that 
> breaks the build dependency ordering.It’s trying to grab a jar from the 
> maven repos that doesn’t exist yet. This *will* succeed if the local cache 
> already has a copy of it. Chances are, if this commit were to go in, it’ll 
> break every build that doesn’t run at root until the master maven repo or the 
> local cache is up-to-date.  It’s a flag day for building trunk.
>
> Unlike the older version, the Yetus’ hadoop module avoids building at 
> root to speed things up  and has a list of modules to build in a particular 
> order and even injecting other compiles that normally wouldn’t happen in 
> order to improve test coverage.  So old test-patch would let this go through 
> unabated, the new one does not.  It’s an extremely rare event that this type 
> of re-ordering occurs, so risk/reward favors risk.
>
> Depending upon your point of view, this is a feature, not a bug.  
> File a bug (and, preferably, a patch) against Yetus to update the hadoop 
> personality to tell yetus that a new order is needed.


Re: Status: hadoop-common precommit

2015-10-21 Thread Haohui Mai
It seems like changes that involve multiple modules are failing consistently.

https://issues.apache.org/jira/browse/HDFS-9241?focusedCommentId=14968008&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14968008
https://issues.apache.org/jira/browse/HADOOP-12500?focusedCommentId=14968065&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14968065

Looks like a bug?

~Haohui

On Tue, Oct 20, 2015 at 2:23 AM, Steve Loughran  wrote:
>
>> On 19 Oct 2015, at 19:09, Andrew Wang  wrote:
>>
>> These tests should still be fixed up to use target/ instead of
>> build/test/data though. Steve, I'm happy to review if you're chasing these
>> down, or we could trade roles.
>
> I'm not going to look at it today, I've pushed up the branch to github if you 
> want to start with it
>
> https://github.com/steveloughran/hadoop-trunk/tree/stevel/HDFS-9263_build_test_data
>
> Looking at the test runs (I didn't run them locally as I now appear to run 
> out of file handles), It's clear that some of the tests expect restarted mini 
> dfs clusters to always use the same path; adding a random subdir there is 
> breaking things.
>
>
>
> 2015-10-20 03:54:11,141 [main] WARN  namenode.FSNamesystem 
> (FSNamesystem.java:loadFromDisk(682)) - Encountered exception loading fsimage
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
> /testptch/hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/3/TzpG5hegiz/name-0-1
>  is in an inconsistent state: storage directory does not exist or is not 
> accessible.
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:323)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:211)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:973)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:680)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:571)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:628)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:833)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:812)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1505)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNode(MiniDFSCluster.java:1247)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.configureNameService(MiniDFSCluster.java:1016)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.createNameNodesAndSetConf(MiniDFSCluster.java:888)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.initMiniDFSCluster(MiniDFSCluster.java:820)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster.(MiniDFSCluster.java:479)
> at 
> org.apache.hadoop.hdfs.MiniDFSCluster$Builder.build(MiniDFSCluster.java:438)
> at 
> org.apache.hadoop.hdfs.TestSetTimes.testTimes(TestSetTimes.java:197)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
> at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
> at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
> at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
> at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
> at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
> at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
> at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
> at 
> org.apa

[jira] [Created] (HADOOP-12500) Change pom.xml to create the hadoop-common-client project

2015-10-21 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-12500:
---

 Summary: Change pom.xml to create the hadoop-common-client project
 Key: HADOOP-12500
 URL: https://issues.apache.org/jira/browse/HADOOP-12500
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


This jira tracks the changes of changing the pom.xml to create the 
hadoop-common-client project.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12499) Create a client jar for hadoop-common

2015-10-21 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-12499:
---

 Summary: Create a client jar for hadoop-common
 Key: HADOOP-12499
 URL: https://issues.apache.org/jira/browse/HADOOP-12499
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


HDFS-6200 has refactor the HDFS client implementations into the 
{{hadoop-hdfs-client}} module. However, the client module still depends on 
{{hadoop-common}} which contains both the implementation of hadoop server and 
clients.

This jira proposes to separate client-side implementation of {{hadoop-common}} 
to a new module {{hadoop-common-client}} so that both the yarn and the hdfs 
client no longer need to transitively bring in dependency in the server side.

Per feedbacks from [~steve_l] and [~cmccabe], the development will happen in a 
separate branch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: hadoop-hdfs-client splitoff is going to break code

2015-10-14 Thread Haohui Mai
Option 2 sounds good to me. It might make sense to make hadoop-client
directly depends on Hadoop-hdfs?


Haohui
On Wed, Oct 14, 2015 at 10:56 AM larry mccay  wrote:

> Interesting...
>
> As long as #2 provides full backward compatibility and the ability to
> explicitly exclude the server dependencies that seems the best way to go.
> That would get my non-binding +1.
> :)
>
> Perhaps we could add another artifact called hadoop-thin-client that would
> not be backward compatible at some point?
>
> On Wed, Oct 14, 2015 at 1:36 PM, Steve Loughran 
> wrote:
>
> > just an FYI, the split off of hadoop hdfs into client and server is going
> > to break things.
> >
> > I know that, as my code is broken; DFSConfigKeys off the path,
> > HdfsConfiguration, the class I've been loading to force pickup of
> > hdfs-site.xml -all missing.
> >
> > This is because hadoop-client  POM now depends on hadoop-hdfs-client, not
> > hadoop-hdfs, so the things I'm referencing are gone. I'm particularly sad
> > about DfsConfigKeys, as everybody uses it as the one hard-coded resource
> of
> > HDFS constants, HDFS-6566 covering the issue of making this public,
> > something that's been sitting around for a year.
> >
> > I'm fixing my build by explicitly adding a hadoop-hdfs dependency.
> >
> > Any application which used stuff which has now been declared server-side
> > isn't going to compile any more, which does appear to break the
> > compatibility guidelines we've adopted, specifically "The hadoop-client
> > artifact (maven groupId:artifactId) stays compatible within a major
> release"
> >
> >
> >
> http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Compatibility.html#Build_artifacts
> >
> >
> > We need to do one of
> >
> > 1. agree that this change, is considered acceptable according to policy,
> > and mark it as incompatible in hdfs/CHANGES.TXT
> > 2. Change the POMs to add both hdfs-client and -hdfs server in
> > hadoop-client -with downstream users free to exclude the server code
> >
> > We unintentionally caused similar grief with the move of the s3n clients
> > to hadoop-aws , HADOOP-11074 -something we should have picked up and
> -1'd.
> > This time we know the problems going to arise, so lets explicitly make a
> > decision this time, and share it with our users.
> >
> > -steve
> >
>


Re: [DISCUSS] About the details of JDK-8 support

2015-10-13 Thread Haohui Mai
Just to echo Steve's idea -- if we're seriously considering supporting
JDK 8, maybe the first thing to do is to set up the jenkins to run
with JDK 8? I'm happy to help. Does anyone know who I can talk to if I
need to play around with all the Jenkins knob?

~Haohui

On Tue, Oct 13, 2015 at 8:24 AM, Vinod Vavilapalli
 wrote:
> If you see the community discussion thread on 2.8, my proposal was to support 
> *both* JDK 7 and JDK 8 first. The last time we had discussion about dropping 
> JDKs it wasn’t fun, so let’s not go there for now.
>
> In terms of runtime support for JDK 8, yes, there is vast evidence that 
> things already work as they are in Hadoop and all the way up to dependent 
> projects. This was already the case as early as Hadoop 2.6/2.7 if I remember 
> correctly, so nothing new here.
>
> We should target source-level support of JDK 8 too, around which you outlined 
> a bunch of issues around dependencies. I also found a bunch of issues around 
> generating documentation, site etc. I propose that we track them under the 
> umbrella JIRA and make progress there first.
>
> Thanks
> +Vinod
>
>
>> On Oct 13, 2015, at 1:17 AM, Tsuyoshi Ozawa  wrote:
>>
>> Thank you for sharing lots information and joining discussion.
>>
>> About the runtime support of JDK-8,  let's describe it on wiki. It
>> would be great for users to clarify which JDK version they should use.
>> It's also useful to describe to use "1.8.0_40 or above" explicitly.
>> Andrew, Elliott, could you update wiki or can I update wiki to add the
>> description?
>>
>> https://wiki.apache.org/hadoop/HadoopJavaVersions
>>
>> About the source code level, I summarize opinions by users and
>> developers as follows:
>>
>> 1. Moving trunk to the java-8 compatible dependencies.
>> 2. Creating branch-2-java-8 after 1.
>> 3. Dropping Java 7 support (maybe when we support JDK 9?) after 2.
>> Currently, we don't have any consensus.
>>
>>> I'd  be +1 for moving trunk to the java-8 compatible dependencies, and to 
>>> have the jenkins nighly builds only before java 8; we'd still have the 
>>> patch and branch-2 runs on Java 7. That way, people will only have one 
>>> nightly mail from Jenkins saying the build is broken, and maybe pay more 
>>> attention to the fact.
>>
>> This way looks good to me.
>> Steve, do you know how to do this? In fact, I don't know so much about
>> how to change and update nightly builds. Should we contact to Yetus
>> community or Apache Infra team?
>>
>> Best,
>> - Tsuyoshi
>>
>>
>>
>> On Sat, Oct 10, 2015 at 7:17 AM, Sangjin Lee  wrote:
>>> Yes, at least for us, dropping the java 7 support (e.g. moving to java 8
>>> source-wise) **at this point** would be an issue. I concur with the
>>> sentiment that we should preserve the java 7 support on branch-2 (not not
>>> move to java 8 source level) but can consider it for trunk. My 2 cents.
>>>
>>> Thanks,
>>> Sangjin
>>>
>>> On Fri, Oct 9, 2015 at 10:43 AM, Steve Loughran 
>>> wrote:
>>>

> On 7 Oct 2015, at 22:39, Elliott Clark  wrote:
>
> On Mon, Oct 5, 2015 at 5:35 PM, Tsuyoshi Ozawa  wrote:
>
>> Do you have any concern about this? I’ve not
>> tested with HBase yet.
>>
>
> We've been running JDK 8u60 in production with Hadoop 2.6.X and HBase for
> quite a while. Everything has been very stable for us. We're running and
> compiling with jdk8.
>
> We had to turn off yarn.nodemanager.vmem-check-enabled. Otherwise mr jobs
> didn't do too well.
>

 maybe related to the initial memory requirements being higher?

 otherwise: did you file a JIRA?


> I'd be +1 on dropping jdk7 support. However as downstream developer it
> would be very weird for that to happen on anything but a major release.


 Past discussion (including a big contrib from twitter) was that breaking
 Java 7 support breaks all client apps too, which is not something people
 were ready for.

 I'd  be +1 for moving trunk to the java-8 compatible dependencies, and to
 have the jenkins nighly builds only before java 8; we'd still have the
 patch and branch-2 runs on Java 7. That way, people will only have one
 nightly mail from Jenkins saying the build is broken, and maybe pay more
 attention to the fact.
>>
>


[jira] [Reopened] (HADOOP-12469) distcp shout not ignore the ignoreFailures option

2015-10-10 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-12469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai reopened HADOOP-12469:
-

Reopening the jira.

Sorry for the noise -- I realized that I haven't posted my +1 on the patch 
before committing. Taking a closer look the patch, it does not seem to really 
address the problem. I'm going to revert it now.

> distcp shout not ignore the ignoreFailures option
> -
>
> Key: HADOOP-12469
> URL: https://issues.apache.org/jira/browse/HADOOP-12469
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: tools/distcp
>Affects Versions: 2.7.1
>Reporter: Gera Shegalov
>Assignee: Mingliang Liu
>Priority: Critical
> Fix For: 2.8.0
>
> Attachments: HADOOP-12469.000.patch
>
>
> {{RetriableFileCopyCommand.CopyReadException}} is double-wrapped via
> # via {{RetriableCommand::execute}}
> # via {{CopyMapper#copyFileWithRetry}}
> before {{CopyMapper::handleFailure}} tests 
> {code}
> if (ignoreFailures && exception.getCause() instanceof
> RetriableFileCopyCommand.CopyReadException
> {code}
> which is always false.
> Orthogonally, ignoring failures should be mutually exclusive with the atomic 
> option otherwise an incomplete dir is eligible for commit defeating the 
> purpose.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: CHANGES.TXT

2015-09-12 Thread Haohui Mai
CHANGES.txt is always a pain. *sigh*

It seems that relying on human efforts to maintain the CHANGES.txt is
error-prone and not sustainable. It is always a pain to fix them.

I think aw has some scripts for option 2.

I would like to propose option 3 which might be more robust: (1) do a
git log on the branch to figure out the jiras that are committed to
the branch. and (2) generate CHANGES.txt by going through these jiras.
That might eliminate the fix-version issue.

I can volunteer some effort to help on this.

~Haohui


On Sat, Sep 12, 2015 at 11:03 AM, Steve Loughran  wrote:
>
> I've just been trying to get CHANGES.TXT between branch-2 and trunk more in 
> sync, so that cherry picking patches from branch-2 up to trunk is more 
> reliable.
>
> Once you look closely , you see it is a mess, specifically:
>
> trunk/CHANGES.TXT declares things as in trunk only yet which are in branch-2 
> and/or actual releases
>
>
> What to do?
>
> 1. audit trunk/CHANGES.TXT against branch-2/CHANGES.TXT; anything in 
> branch-2's (i.e. to come in 2.8) is placed into trunk at that location; the 
> "new in trunk" runk's version removed
>
> 2. go to JIRA-generated change logs. Though for that to be reliable, those 
> fix-version fields have to be 100% accurate too.
>
>


[jira] [Created] (HADOOP-12405) Expose NN RPC via HTTP / HTTPS

2015-09-11 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-12405:
---

 Summary: Expose NN RPC via HTTP / HTTPS
 Key: HADOOP-12405
 URL: https://issues.apache.org/jira/browse/HADOOP-12405
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Haohui Mai


WebHDFS needs to expose NN RPC calls to allow users to access HDFS via HTTP / 
HTTPS.

The current approach is to add REST APIs into WebHDFS one by one manually. It 
requires significant efforts from a maintainability point of view. we found 
that WebHDFS is consistently lagging behind. It's also hard to maintain the 
REST RPC stubs.

There are a lot of values to expose the NN RPC in a HTTP / HTTPS friendly way 
automatically.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Location of the protobuf library on Jenkins machines

2015-09-04 Thread Haohui Mai
Hi,

I wonder where the protobuf library is installed in the Jenkins
machine? I'm aware of the Jenkins machines has protoc installed but I
don't know where the headers and the libraries are installed.

I'm looking into setting up Jenkins build for the HDFS-8707 branch
which requires the header and the library files from protobuf.

Your help is appreciated.

Thanks,
Haohui


[jira] [Created] (HADOOP-11765) Signal congestion on the DataNode

2015-03-27 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11765:
---

 Summary: Signal congestion on the DataNode
 Key: HADOOP-11765
 URL: https://issues.apache.org/jira/browse/HADOOP-11765
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Haohui Mai
Assignee: Haohui Mai


The DataNode should signal congestion (i.e. "I'm too busy") in the PipelineAck 
using the mechanism introduced in HDFS-7270.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11752) Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.4.0:protoc (compile-protoc) on project hadoop-common: org.apache.maven.plugin.MojoExecutionException: p

2015-03-25 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-11752.
-
  Resolution: Invalid
Target Version/s: 2.6.0, 2.4.0  (was: 2.4.0, 2.6.0)

> Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.4.0:protoc 
> (compile-protoc) on project hadoop-common: 
> org.apache.maven.plugin.MojoExecutionException: protoc failure -> [Help 1]
> 
>
> Key: HADOOP-11752
> URL: https://issues.apache.org/jira/browse/HADOOP-11752
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.4.0, 2.6.0
> Environment: Operating System: Windows 8.1 64Bit
> Cygwin 64Bit
> protobuf-2.5.0
> protoc 2.5.0
> hadoop-2.4.0-src
> apache-maven-3.3.1
>Reporter: Venkata Sravan Kumar Talasila
>  Labels: build, maven
>
> while build of Hadoop, I am facing the below error
> Failed to execute goal org.apache.hadoop:hadoop-maven-plugins:2.4.0:protoc 
> (compile-protoc) on project hadoop-common: 
> org.apache.maven.plugin.MojoExecutionException: protoc failure -> [Help 1]
> [INFO] 
> 
> [INFO] Building Apache Hadoop Common 2.4.0
> [INFO] 
> 
> [INFO]
> [INFO] --- maven-antrun-plugin:1.7:run (create-testdirs) @ hadoop-common ---
> [INFO] Executing tasks
> main:
> [INFO] Executed tasks
> [INFO]
> [INFO] --- maven-enforcer-plugin:1.3.1:enforce (enforce-os) @ hadoop-common 
> ---
> [INFO]
> [INFO] --- hadoop-maven-plugins:2.4.0:protoc (compile-protoc) @ hadoop-common 
> ---
> [WARNING] [C:\cygwin64\usr\local\bin\protoc.exe, 
> --java_out=C:\cygwin64\usr\local\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\target\generated-sources\java,
> -IC:\cygwin64\usr\local\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\src\main\proto,
>  
> C:\cygwin64\usr\local\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\src\main\proto\GetUserMappingsProtocol.proto,
>  
> C:\cygwin64\usr\local\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\src\main\proto\HAServiceProtocol.proto,
>  
> C:\cygwin64\usr\local\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\src\main\proto\IpcConnectionContext.proto,
>  
> C:\cygwin64\usr\local\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\src\main\proto\ProtobufRpcEngine.proto,
>  
> C:\cygwin64\usr\local\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\src\main\proto\ProtocolInfo.proto,
>  
> C:\cygwin64\usr\local\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\src\main\proto\RefreshAuthorizationPolicyProtocol.proto,
>  
> C:\cygwin64\usr\local\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\src\main\proto\RefreshCallQueueProtocol.proto,
>  
> C:\cygwin64\usr\local\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\src\main\proto\RefreshUserMappingsProtocol.proto,
>  
> C:\cygwin64\usr\local\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\src\main\proto\RpcHeader.proto,
>  
> C:\cygwin64\usr\local\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\src\main\proto\Security.proto,
>  
> C:\cygwin64\usr\local\hadoop-2.4.0-src\hadoop-common-project\hadoop-common\src\main\proto\ZKFCProtocol.proto]
>  failed with error code 1
> [ERROR] protoc compiler error
> [INFO] 
> 
> [INFO] Reactor Summary:
> [INFO]
> [INFO] Apache Hadoop Main . SUCCESS [  6.080 
> s]
> [INFO] Apache Hadoop Project POM .. SUCCESS [  2.140 
> s]
> [INFO] Apache Hadoop Annotations .. SUCCESS [  2.691 
> s]
> [INFO] Apache Hadoop Project Dist POM . SUCCESS [  1.250 
> s]
> [INFO] Apache Hadoop Assemblies ... SUCCESS [  0.453 
> s]
> [INFO] Apache Hadoop Maven Plugins  SUCCESS [  6.932 
> s]
> [INFO] Apache Hadoop MiniKDC .. SUCCESS [01:59 
> min]
> [INFO] Apache Hadoop Auth . SUCCESS [11:02 
> min]
> [INFO] Apache Hadoop Auth Examples  SUCCESS [  3.697 
> s]
> [INFO] Apache Hadoop Common ... FAILURE [  4.067 
> s]
> [INFO] Apache Hadoop NFS .. SKIPPED
> [INFO] Apache Hadoop Com

[jira] [Created] (HADOOP-11748) Secrets for auth cookies can be specified in clear text

2015-03-25 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11748:
---

 Summary: Secrets for auth cookies can be specified in clear text
 Key: HADOOP-11748
 URL: https://issues.apache.org/jira/browse/HADOOP-11748
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Priority: Critical


Based on the discussion on HADOOP-10670, this jira proposes to remove 
{{StringSecretProvider}} as it opens up possibilities for misconfiguration and 
security vulnerabilities.

{quote}

My understanding is that the use case of inlining the secret is never 
supported. The property is used to pass the secret internally. The way it works 
before HADOOP-10868 is the following:

* Users specify the initializer of the authentication filter in the 
configuration.
* AuthenticationFilterInitializer reads the secret file. The server will not 
start if the secret file does not exists. The initializer will set the property 
if it read the file correctly.
*There is no way to specify the secret in the configuration out-of-the-box – 
the secret is always overwritten by AuthenticationFilterInitializer.

{quote}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11744) Support OAuth2 in Hadoop

2015-03-24 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11744:
---

 Summary: Support OAuth2 in Hadoop
 Key: HADOOP-11744
 URL: https://issues.apache.org/jira/browse/HADOOP-11744
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Haohui Mai


OAuth2 is a standardize mechanism for authentication and authorization. A 
notable use case of OAuth2 is SSO -- it would be nice to integrate OAuth2 with 
Hadoop.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11726) Allow applications to access both secure and insecure clusters at the same time

2015-03-18 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11726:
---

 Summary: Allow applications to access both secure and insecure 
clusters at the same time
 Key: HADOOP-11726
 URL: https://issues.apache.org/jira/browse/HADOOP-11726
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai


Today there are multiple integration issues when an application (particularly, 
distcp) access both secure and insecure clusters (e.g., HDFS-6776 / HDFS-7036)

There are four use cases in this scenario:

* Secure <-> Secure. Works.
* Insecure <-> Insecure. Works.
* Accessing secure clusters from insecure client. Will not work as expected. 
The insecure client won't be able to be authenticated with the secure client, 
otherwise it is a security vulnerability.
* Accessing insecure clusters from secure client. Currently it will not work as 
the secure client won't be able to obtain a delegation token from the insecure 
cluster.

This jira proposes to fix the last use case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-10367) Hadoop 2.2 Building error

2015-03-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-10367.
-
Resolution: Invalid

> Hadoop 2.2 Building error
> -
>
> Key: HADOOP-10367
> URL: https://issues.apache.org/jira/browse/HADOOP-10367
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 2.2.0
> Environment: Linux mars 3.2.0-4-amd64 #1 SMP Debian 3.2.41-2 x86_64 
> GNU/Linux
> java version "1.6.0_38"
> Java(TM) SE Runtime Environment (build 1.6.0_38-b05)
> Java HotSpot(TM) 64-Bit Server VM (build 20.13-b02, mixed mode)
> Apache Maven 3.0.5 (r01de14724cdef164cd33c7c8c2fe155faf9602da; 2013-02-19 
> 21:51:28+0800)
> Maven home: /usr/java/apache-maven-3.0.5
> Java version: 1.6.0_38, vendor: Sun Microsystems Inc.
> Java home: /usr/java/jdk1.6.0_38/jre
> Default locale: en_HK, platform encoding: UTF-8
> OS name: "linux", version: "3.2.0-4-amd64", arch: "amd64", family: "unix"
> hadoop-2.2.0.src
>Reporter: Shining
>
> mvn package -X -Pdist,native,docs,src -DskipTests -Dtar
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on project 
> hadoop-hdfs: An Ant BuildException has occured: exec returned: 1 -> [Help 1]
> org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (make) on project 
> hadoop-hdfs: An Ant BuildException has occured: exec returned: 1
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:217)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
>   at 
> org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)
>   at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
>   at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
>   at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
>   at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
>   at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
>   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
>   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
>   at 
> org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
> Caused by: org.apache.maven.plugin.MojoExecutionException: An Ant 
> BuildException has occured: exec returned: 1
>   at 
> org.apache.maven.plugin.antrun.AntRunMojo.execute(AntRunMojo.java:283)
>   at 
> org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
>   at 
> org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
>   ... 19 more
> Caused by: 
> /home/nshi/workspace/hadoop-2.2.0-src/hadoop-hdfs-project/hadoop-hdfs/target/antrun/build-main.xml:5:
>  exec returned: 1
>   at org.apache.tools.ant.taskdefs.ExecTask.runExecute(ExecTask.java:650)
>   at org.apache.tools.ant.taskdefs.ExecTask.runExec(ExecTask.java:676)
>   at org.apache.tools.ant.taskdefs.ExecTask.execute(ExecTask.java:502)
>   at org.apache.tools.ant.UnknownElement.execute(UnknownElement.java:291)
>   at sun.reflect.GeneratedMethodAccessor20.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.tools.ant.dispatch.DispatchUtils.execute(DispatchUtils.java:106)
>   at org.apache.tools.ant.Task.perfo

[jira] [Resolved] (HADOOP-10795) unale to build hadoop 2.4.1(redhat5.8 x64)

2015-03-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-10795.
-
Resolution: Invalid

> unale to build hadoop 2.4.1(redhat5.8 x64)
> --
>
> Key: HADOOP-10795
> URL: https://issues.apache.org/jira/browse/HADOOP-10795
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
> Environment: OS version rehat 5.8 x64
> maven version 3.3.1
> java version jdk 1.7_15 for x64
>Reporter: moses.wang
>
> unale to build hadoop 2.4.1(redhat5.8 x64)
> WARNING] Some problems were encountered while building the effective model 
> for org.apache.hadoop:hadoop-project:pom:2.4.1
> [WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but 
> found duplicate declaration of plugin 
> org.apache.maven.plugins:maven-enforcer-plugin @ line 1015, column 15
> [WARNING] Some problems were encountered while building the effective model 
> for org.apache.hadoop:hadoop-common:jar:2.4.1
> [WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but 
> found duplicate declaration of plugin 
> org.apache.maven.plugins:maven-surefire-plugin @ line 479, column 15
> [WARNING] 'build.plugins.plugin.(groupId:artifactId)' must be unique but 
> found duplicate declaration of plugin 
> org.apache.maven.plugins:maven-enforcer-plugin @ 
> org.apache.hadoop:hadoop-project:2.4.1, 
> /home/software/Server/hadoop-2.4.1-src/hadoop-project/pom.xml, line 1015, 
> column 15
> WARNING] 
> /home/software/Server/hadoop-2.4.1-src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/FastByteComparisons.java:[25,15]
>  Unsafe is internal proprietary API and may be removed in a future release
> [WARNING] 
> /home/software/Server/hadoop-2.4.1-src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/nativeio/NativeIO.java:[42,15]
>  Unsafe is internal proprietary API and may be removed in a future release
> [WARNING] 
> /home/software/Server/hadoop-2.4.1-src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/SignalLogger.java:[21,15]
>  Signal is internal proprietary API and may be removed in a future release
> [WARNING] 
> /home/software/Server/hadoop-2.4.1-src/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/SignalLogger.java:[22,15]
>  SignalHandler is internal proprietary API and may be removed in a future 
> release
> [WARNING] 
> /home/software/Server/hadoop-2.4.1-src/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/ssl/KeyStoreTestUtil.java:[22,24]
>  AlgorithmId is internal proprietary API and may be removed in a future 
> release
> [WARNING] 
> /home/software/Server/hadoop-2.4.1-src/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/security/ssl/KeyStoreTestUtil.java:[23,24]
>  CertificateAlgorithmId is internal proprietary API and may be removed in a 
> future release
> testROBufferDirAndRWBufferDir[1](org.apache.hadoop.fs.TestLocalDirAllocator)  
> Time elapsed: 0.014 sec  <<< FAILURE!
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-surefire-plugin:2.16:test (default-test) on 
> project hadoop-common: There are test failures.
> [ERROR] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11192) Change old subversion links to git

2015-03-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-11192.
-
Resolution: Invalid

site.xml no longer contains the svn information.

> Change old subversion links to git
> --
>
> Key: HADOOP-11192
> URL: https://issues.apache.org/jira/browse/HADOOP-11192
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Ravi Prakash
>
> e.g. hadoop-project/src/site/site.xml still references SVN. 
> We should probably check our wiki's and other documentation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11366) Fix findbug warnings after move to Java 7

2015-03-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-11366.
-
Resolution: Duplicate

It has been fixed in HADOOP-10477.

> Fix findbug warnings after move to Java 7
> -
>
> Key: HADOOP-11366
> URL: https://issues.apache.org/jira/browse/HADOOP-11366
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Li Lu
>Assignee: Li Lu
>
> After move to Java 7, there are 65 findbugs warnings for Hadoop common 
> codebase. We may want to fix this. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11633) Convert remaining branch-2 .apt.vm files to markdown

2015-03-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-11633.
-
Resolution: Fixed

> Convert remaining branch-2 .apt.vm files to markdown
> 
>
> Key: HADOOP-11633
> URL: https://issues.apache.org/jira/browse/HADOOP-11633
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 2.7.0
>Reporter: Colin Patrick McCabe
>Assignee: Masatake Iwasaki
> Attachments: HADOOP-11633.001.patch, HADOOP-11633.002.patch, 
> HADOOP-11633.003.patch
>
>
> We should convert the remaining branch-2 .apt.vm files to markdown.
> Excluding the yarn files, which are covered by YARN-3168, we have remaining:
> {code}
> cmccabe@keter:~/hadoop> find -name '*.apt.vm'
> ./hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/index.apt.vm
> ./hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/UsingHttpTools.apt.vm
> ./hadoop-hdfs-project/hadoop-hdfs-httpfs/src/site/apt/ServerSetup.apt.vm
> ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/EncryptedShuffle.apt.vm
> ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/HadoopStreaming.apt.vm
> ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapredAppMasterRest.apt.vm
> ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduceTutorial.apt.vm
> ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapredCommands.apt.vm
> ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/MapReduce_Compatibility_Hadoop1_Hadoop2.apt.vm
> ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/DistributedCacheDeploy.apt.vm
> ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/site/apt/PluggableShuffleAndPluggableSort.apt.vm
> ./hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/site/apt/HistoryServerRest.apt.vm
> ./hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
> ./hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm
> ./hadoop-common-project/hadoop-auth/src/site/apt/Configuration.apt.vm
> ./hadoop-common-project/hadoop-auth/src/site/apt/index.apt.vm
> ./hadoop-common-project/hadoop-auth/src/site/apt/BuildingIt.apt.vm
> ./hadoop-common-project/hadoop-auth/src/site/apt/Examples.apt.vm
> ./hadoop-common-project/hadoop-kms/src/site/apt/index.apt.vm
> ./hadoop-tools/hadoop-openstack/src/site/apt/index.apt.vm
> ./hadoop-tools/hadoop-sls/src/site/apt/SchedulerLoadSimulator.apt.vm
> ./hadoop-project/src/site/apt/index.apt.vm
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11716) Bump netty version to 4.1

2015-03-13 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11716:
---

 Summary: Bump netty version to 4.1
 Key: HADOOP-11716
 URL: https://issues.apache.org/jira/browse/HADOOP-11716
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai


This jira proposes to bump the netty version from 4.0 to 4.1 so that it is 
possible to leverage the HTTP/2 support from netty.

Note that this is a compatible change: the dependency of netty 4.0 is 
introduced during the 2.7 timeframe and no release has been made during the 
time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Request wiki access

2015-03-13 Thread Haohui Mai
Hi,

I would like to create a wiki page
https://wiki.apache.org/hadoop/GSoC2015 to record the ideas for GSoC
2015.

Your help on granting me the permissions to edit the wiki is appreciated.

Thanks,
Haohui


[jira] [Resolved] (HADOOP-11712) Error : ExecuteStatement finished with operation state: CLOSED_STATE” :

2015-03-13 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-11712.
-
Resolution: Not a Problem

The jira system only logs bugs in the core hadoop. Please redirect the 
questions to the vendor or to the hive user mailing list.

> Error : ExecuteStatement finished with operation state: CLOSED_STATE” :
> ---
>
> Key: HADOOP-11712
> URL: https://issues.apache.org/jira/browse/HADOOP-11712
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, scripts, tools, tools/distcp
> Environment: Cloudera ODBC Driver v. 2.5.13 (32 bit) used by 
> Microstrategy application to connect to HiveServer2 
>Reporter: ankush
>Assignee: ankush
>  Labels: build, features, hadoop
>
> when running a query to retrieve row data , receiving the other error 
> “ExecuteStatement finished with operation state: CLOSED_STATE” 
> Error type: Odbc error. Odbc operation attempted: SQLExecDirect. [S1000:35: 
> on HSTMT] [Cloudera][HiveODBC] (35) Error from Hive: error code: '0' error 
> message: 'ExecuteStatement finished with operation state: CLOSED_STATE'. 
> Connection String: DSN=MSTR_HIVE;UID=srv-hdp-mstry-d;. SQL Statement: select 
> a11.region_number  region_number,



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11682) Move the native code for libhadoop into a dedicated directory

2015-03-05 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11682:
---

 Summary: Move the native code for libhadoop into a dedicated 
directory
 Key: HADOOP-11682
 URL: https://issues.apache.org/jira/browse/HADOOP-11682
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai


Current the code of {{libhadoop.so}} lies in 
{{hadoop-common-project/hadoop-common/src/main/native}}. This jira proposes to 
move it into {{hadoop-common-project/hadoop-common/src/main/native/libhadoop}} 
so that it makes it easier to add other native code projects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11679) Create build infrastructure for the native RPC client

2015-03-05 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11679:
---

 Summary: Create build infrastructure for the native RPC client
 Key: HADOOP-11679
 URL: https://issues.apache.org/jira/browse/HADOOP-11679
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-10477) Clean up findbug warnings found by findbugs 3.0.0

2015-02-23 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-10477.
-
Resolution: Fixed

We fixed all new warnings for findbugs 3.0.0. Thanks everybody for the hard 
work.

> Clean up findbug warnings found by findbugs 3.0.0
> -
>
> Key: HADOOP-10477
> URL: https://issues.apache.org/jira/browse/HADOOP-10477
> Project: Hadoop Common
>  Issue Type: Improvement
>    Reporter: Haohui Mai
>        Assignee: Haohui Mai
>
> This is an umbrella jira to clean up the new findbug warnings found by 
> findbugs 3.0.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11443) hadoop.auth cookie has invalid Expires if used with non-US default Locale

2014-12-22 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-11443.
-
Resolution: Duplicate

Since we have moved to Java 7, we can use the {{Cookie}} class form Java 7 
instead of hacking it here and there.

> hadoop.auth cookie has invalid Expires if used with non-US default Locale
> -
>
> Key: HADOOP-11443
> URL: https://issues.apache.org/jira/browse/HADOOP-11443
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: security
>Affects Versions: 2.5.0
>Reporter: Gregory Chanan
>Assignee: Gregory Chanan
> Attachments: HADOOP-11443.patch
>
>
> The netscape cookie spec (http://curl.haxx.se/rfc/cookie_spec.html) does not 
> specify the language of the EXPIRES attribute:
> {code}
>  The date string is formatted as:
> Wdy, DD-Mon- HH:MM:SS GMT
> This is based on RFC 822, RFC 850, RFC 1036, and RFC 1123, with the 
> variations that the only legal time zone is GMT and the separators between 
> the elements of the date must be dashes. 
> {code}
> But RFC 822, lists the months as:
> {code}
>  month   =  "Jan"  /  "Feb" /  "Mar"  /  "Apr"
>  /  "May"  /  "Jun" /  "Jul"  /  "Aug"
>  /  "Sep"  /  "Oct" /  "Nov"  /  "Dec"
> {code}
> and some clients (i.e. httpclient) do not recognize Expires in other 
> languages, so it's best to just use US English (which is the only Locale 
> guaranteed to be supported by the jvm, see 
> http://www.oracle.com/technetwork/articles/javase/locale-140624.html).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Upgrading findbugs

2014-12-18 Thread Haohui Mai
So far we made great progress on fixing findbugs warnings. We're free of
findbugs warnings in hdfs, nfs, and a couple other sub projects. There are
two findbugs warnings left in hadoop-common. I saw there are some
progresses on the YARN side as well.

Thanks very much for the contributors (Brandon Li, Li Lu, and many others)
that have worked on this. We have a cleaner code base now, and the newer
findbugs can help us to catch more issues during pre-commits. :-)

I plan to finish the remaining work on in 2.7 timeframe. Thanks again for
contribution.

Thanks,
Haohui


On Tue, Dec 9, 2014 at 2:16 AM, Steve Loughran 
wrote:
>
> +1 to upgrade.
>
> regarding the newly surfacing issues, I'd recommend we look at them and see
> which are critical problems and fix them.
>
> One of the conclusions I got from the building is that there are a lot of
> javac and javadoc warnings  that everyone ignores. Sitting down to fix them
> is time consuming and doesn't directly fix anything or add new features
> —but it keeps the code cleaner, which is something we want to encourage
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Resolved] (HADOOP-11389) Clean up byte to string encoding issues in hadoop-common

2014-12-11 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-11389.
-
Resolution: Fixed

I've committed the patch to trunk and branch-2.

> Clean up byte to string encoding issues in hadoop-common
> 
>
> Key: HADOOP-11389
> URL: https://issues.apache.org/jira/browse/HADOOP-11389
> Project: Hadoop Common
>  Issue Type: Sub-task
>    Reporter: Haohui Mai
>Assignee: Haohui Mai
> Attachments: HADOOP-11389.000.patch, HADOOP-11389.001.patch
>
>
> Much code in hadoop-common convert bytes to string using default charsets. 
> The behavior of conversion depends on the platform settings of encoding, 
> which is flagged by newer versions of findbugs. This jira proposes to fix the 
> findbugs warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (HADOOP-11389) Clean up byte to string encoding issues in hadoop-common

2014-12-10 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai reopened HADOOP-11389:
-

> Clean up byte to string encoding issues in hadoop-common
> 
>
> Key: HADOOP-11389
> URL: https://issues.apache.org/jira/browse/HADOOP-11389
> Project: Hadoop Common
>  Issue Type: Bug
>    Reporter: Haohui Mai
>        Assignee: Haohui Mai
> Attachments: HADOOP-11389.000.patch
>
>
> Much code in hadoop-common convert bytes to string using default charsets. 
> The behavior of conversion depends on the platform settings of encoding, 
> which is flagged by newer versions of findbugs. This jira proposes to fix the 
> findbugs warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-11389) Clean up byte to string encoding issues in hadoop-common

2014-12-10 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-11389.
-
Resolution: Fixed

> Clean up byte to string encoding issues in hadoop-common
> 
>
> Key: HADOOP-11389
> URL: https://issues.apache.org/jira/browse/HADOOP-11389
> Project: Hadoop Common
>  Issue Type: Bug
>    Reporter: Haohui Mai
>        Assignee: Haohui Mai
> Attachments: HADOOP-11389.000.patch
>
>
> Much code in hadoop-common convert bytes to string using default charsets. 
> The behavior of conversion depends on the platform settings of encoding, 
> which is flagged by newer versions of findbugs. This jira proposes to fix the 
> findbugs warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11389) Clean up byte to string encoding issues in hadoop-common

2014-12-10 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11389:
---

 Summary: Clean up byte to string encoding issues in hadoop-common
 Key: HADOOP-11389
 URL: https://issues.apache.org/jira/browse/HADOOP-11389
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai


Much code in hadoop-common convert bytes to string using default charsets. The 
behavior of conversion depends on the platform settings of encoding, which is 
flagged by newer versions of findbugs. This jira proposes to fix the findbugs 
warnings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11388) Remove deprecated o.a.h.metrics.file.FileContext

2014-12-10 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11388:
---

 Summary: Remove deprecated o.a.h.metrics.file.FileContext
 Key: HADOOP-11388
 URL: https://issues.apache.org/jira/browse/HADOOP-11388
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Li Lu
Priority: Minor


The {{o.a.h.metrics.file.FileContext}} has been deprecated. This jira proposes 
to remove it from the repository.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11387) Simplify NetUtils#canonicalizeHost()

2014-12-09 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11387:
---

 Summary: Simplify NetUtils#canonicalizeHost()
 Key: HADOOP-11387
 URL: https://issues.apache.org/jira/browse/HADOOP-11387
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai


Currently {{NetUtils#canonicalizeHost}} uses a {{ConcurrentHashMap}} to cache 
the canonicalized hostname.

{code}
  private static String canonicalizeHost(String host) {
// check if the host has already been canonicalized
String fqHost = canonicalizedHostCache.get(host);
if (fqHost == null) {
  try {
fqHost = SecurityUtil.getByName(host).getHostName();
// slight race condition, but won't hurt
canonicalizedHostCache.put(host, fqHost);
  } catch (UnknownHostException e) {
fqHost = host;
  }
}
return fqHost;
  }
{code}

The code can be simplified using {{CacheMap}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11385) Cross site scripting attack on JMXJSONServlet

2014-12-09 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11385:
---

 Summary: Cross site scripting attack on JMXJSONServlet
 Key: HADOOP-11385
 URL: https://issues.apache.org/jira/browse/HADOOP-11385
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai
Priority: Critical


JMXJSONServlet allows passing a callback parameter in the JMX response, which 
is introduced in HADOOP-8922:

{code}
// "callback" parameter implies JSONP outpout
jsonpcb = request.getParameter(CALLBACK_PARAM);
if (jsonpcb != null) {
  response.setContentType("application/javascript; charset=utf8");
  writer.write(jsonpcb + "(");
} else {
  response.setContentType("application/json; charset=utf8");
}
{code}

The code writes the callback parameter directly to the output, allowing 
cross-site scripting attack. This vulnerability allows the attacker easily 
stealing the credential of the user on the browser.

The original use case can be supported using Cross-origin resource sharing 
(CORS), which is used by the current NN web UI.

This jira proposes to move JMXJSONServlet to CORS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Upgrading findbugs

2014-12-08 Thread Haohui Mai
Hi,

The recent changes on moving to Java 7 triggers a bug in findbug (
http://sourceforge.net/p/findbugs/bugs/918), which causes all pre-commit
runs (e.g., HADOOP-11287) to fail.

The current version of findbugs (1.3.9) used by Hadoop is released in 2009.
Given that:

(1) The current bug that we hit are fixed by a later version of findbug.
(2) A newer findbug (3.0.0) is required to analyze Hadoop that is compiled
against Java 8.
(3) Newer findbugs are capable of catching more bugs. :-)

Is it a good time to consider upgrading findbugs, which gives us better
tools on ensuring the quality of the code case?

I ran findbugs 3.0.0 against trunk today. It reported 111 warnings for
hadoop-common, 44 for HDFS and 40+ for YARN. Many of them are possible
NPEs, resource leaks, and ignored exception which are indeed bugs and are
worthwhile to address.

However, one issue that needs to be considered is that how to deal with the
additional warnings reported by the newer findbugs without breaking the
Jenkins pre-commit runs.

Personally I can see three possible routes if we decide to upgrade findbugs:

(1) Fix all warnings before upgrading to newer findbugs.
(2) Add all new warnings to the exclude list and fix them slowly.
(3) Update test-patch.sh to make sure that new code won't introduce any new
findbugs warnings.

I proposed upgrading to findbugs 2.0.2 and fixing new warnings in
HADOOP-10476, which could be dated backed to April, 2014. I volunteer to
accelerate the effort if it is required.

Thoughts?

Regards,
Haohui

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Created] (HADOOP-11365) Use Java 7's HttpCookie class to handle Secure and HttpOnly flag

2014-12-08 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11365:
---

 Summary: Use Java 7's HttpCookie class to handle Secure and 
HttpOnly flag
 Key: HADOOP-11365
 URL: https://issues.apache.org/jira/browse/HADOOP-11365
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Li Lu


HADOOP-10379 and HADOOP-10710 introduced support for the Secure and HttpOnly 
flags for hadoop auth cookie. The current implementation includes custom codes 
so that it can be compatible with Java 6. Since Hadoop has moved to Java 7 
these code can be replaced by the Java's HttpCookie class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11357) Print information of the build enviornment in test-patch.sh

2014-12-05 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11357:
---

 Summary: Print information of the build enviornment in 
test-patch.sh
 Key: HADOOP-11357
 URL: https://issues.apache.org/jira/browse/HADOOP-11357
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Priority: Minor


Currently test-patch.sh lacks of information such as java version during the 
build, thus debugging problem like HADOOP-10530 becomes difficult.

This jira proposes to print out more information in test-patch.sh to simplify 
debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Switching to Java 7

2014-12-01 Thread Haohui Mai
Hi Steve,

I think the pre-commit Jenkins are running Java 6, they need to be switched to 
Java 7 as well.

Haohui

> On Dec 1, 2014, at 5:41 AM, Steve Loughran  wrote:
> 
> I'm planning to flip the Javac language & JVM settings to java 7 this week
> 
> https://issues.apache.org/jira/browse/HADOOP-10530
> 
> the latest patch also has a profile that sets the language to java8, for
> the curious; one bit of code will need patching to compile there.
> 
> The plan for the change ASF-side is:
> 
> 1 -switch jenkins patch/regular commits to java7
> 2 -apply the HADOOP-10530 patch
> 
> locally, anyone who runs Jenkins with Java 6 will have to upgrade/switch
> JVM after (2), and anyone with JAVA_HOME set to a jdk 6 JDK is going to
> have to edit their environment variables.
> 
> Is there anything else we need to do before the big Java 7 switch?
> 
> -Steve
> 
> -- 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to 
> which it is addressed and may contain information that is confidential, 
> privileged and exempt from disclosure under applicable law. If the reader 
> of this message is not the intended recipient, you are hereby notified that 
> any printing, copying, dissemination, distribution, disclosure or 
> forwarding of this communication is strictly prohibited. If you have 
> received this communication in error, please contact the sender immediately 
> and delete it from your system. Thank You.

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Resolved] (HADOOP-11297) how to get all the details of live node and dead node using hadoop api in java

2014-11-11 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-11297.
-
Resolution: Invalid

> how to get all the details of live node and dead node using hadoop api in java
> --
>
> Key: HADOOP-11297
> URL: https://issues.apache.org/jira/browse/HADOOP-11297
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: sanjay gupta
>
> Hi all,
> I am trying to do my cluster discovery its just my learinig phase.
> But i want to get same details as we get using http port like 50070 etc.
> so i am trying to use FSNamesystem class as shown below :
> FSNamesystem f = FSNamesystem.getFSNamesystem();
> with this f obj i want call respective methods to get my cluster details buts 
> this object f is returing null
> can any one help in this ??
> Also i  want to use JSP Helper class in namenode pkg that also showing null 
> pointer exception i have tried as shown below :
> JspHelper j= new JspHelper();   
> but nothing is working i m stuck to get details abt live node and dead node 
> running cluster of hadoop 1.2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11287) Simplify UGI#reloginFromKeytab for Java 7+

2014-11-09 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11287:
---

 Summary: Simplify UGI#reloginFromKeytab for Java 7+
 Key: HADOOP-11287
 URL: https://issues.apache.org/jira/browse/HADOOP-11287
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Li Lu


HADOOP-10786 uses reflection to make {{UGI#reloginFromKeytab}} work with Java 
6/7/8. In 2.7 Java 6 will no longer be supported, thus the code can be 
simplified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Guava

2014-11-09 Thread Haohui Mai
Guava did make the lives of Hadoop development easier in many cases -- What
I've been consistently hearing is that the version of Guava used is Hadoop
is so old that it starts to hurt the application developers.

I appreciate the value of Guava -- things like CacheMap are fairly
difficult to implement efficiently and correctly.

I think that creating separate client libraries for Hadoop can largely
alleviate the problem -- obviously these libraries cannot use Guava, but it
allows us to use Guava's help on the server side. For example, HDFS-6200 is
one of the initiatives.

Just my two cents.

Regards,
Haohui

On Sun, Nov 9, 2014 at 4:42 PM, Arun C Murthy  wrote:

> … has been a constant pain w.r.t compatibility etc.
>
> Should we consider adopting a policy to not use guava in Common/HDFS/YARN?
>
> MR doesn't matter too much since it's application-side issue, it does hurt
> end-users though since they still might want a newer guava-version, but at
> least they can modify MR.
>
> Thoughts?
>
> thanks,
> Arun
>
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Created] (HADOOP-11269) Add java 8 profile for hadoop-annotations

2014-11-04 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11269:
---

 Summary: Add java 8 profile for hadoop-annotations
 Key: HADOOP-11269
 URL: https://issues.apache.org/jira/browse/HADOOP-11269
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Li Lu


hadoop-annotations fails to build out-of-the-box under Java 8 because it lacks 
the profile to add {{tools.jar}} into the classpath of {{javac}}. This jira 
proposes to add a new build profile for Java 8 in hadoop-annotations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11268) Update BUILDING.txt to remove the workaround for tools.jar

2014-11-04 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11268:
---

 Summary: Update BUILDING.txt to remove the workaround for tools.jar
 Key: HADOOP-11268
 URL: https://issues.apache.org/jira/browse/HADOOP-11268
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Li Lu
Priority: Minor


After HADOOP-10563 lands in branch-2. The workaround for tools.jar documented 
in BUILDING.txt is no longer required.

We should update the document to reflect this change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11246) Move jenkins to Java 7

2014-10-29 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11246:
---

 Summary: Move jenkins to Java 7
 Key: HADOOP-11246
 URL: https://issues.apache.org/jira/browse/HADOOP-11246
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai


As hadoop 2.7 will drop the support of Java 6, the jenkins slaves should be 
compiling code using Java 7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11219) Upgrade to netty 4

2014-10-22 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11219:
---

 Summary: Upgrade to netty 4
 Key: HADOOP-11219
 URL: https://issues.apache.org/jira/browse/HADOOP-11219
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


This is an umbrella jira to track the effort of upgrading to Netty 4.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11028) Use Java 7 HttpCookie to implement hadoop.auth cookie

2014-08-29 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-11028:
---

 Summary: Use Java 7 HttpCookie to implement hadoop.auth cookie
 Key: HADOOP-11028
 URL: https://issues.apache.org/jira/browse/HADOOP-11028
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai


There are various workarounds for Java 6 (e.g., HADOOP-10991) in the code to 
implement write the correct HttpCookie. These workarounds should be removed 
once Hadoop has moved to Java 7.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (HADOOP-8719) Workaround for kerberos-related log errors upon running any hadoop command on OSX

2014-07-10 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai reopened HADOOP-8719:



Reopen this issue since it breaks secure set ups running under Mac OS X. Secure 
cluster won't be able to start with null realm.

I propose to revert this patch.

> Workaround for kerberos-related log errors upon running any hadoop command on 
> OSX
> -
>
> Key: HADOOP-8719
> URL: https://issues.apache.org/jira/browse/HADOOP-8719
> Project: Hadoop Common
>  Issue Type: Improvement
>Affects Versions: 2.0.0-alpha
> Environment: Mac OS X 10.7, Java 1.6.0_26
>Reporter: Jianbin Wei
>Priority: Trivial
> Fix For: 3.0.0
>
> Attachments: HADOOP-8719.patch, HADOOP-8719.patch, HADOOP-8719.patch, 
> HADOOP-8719.patch
>
>
> When starting Hadoop on OS X 10.7 ("Lion") using start-all.sh, Hadoop logs 
> the following errors:
> 2011-07-28 11:45:31.469 java[77427:1a03] Unable to load realm info from 
> SCDynamicStore
> Hadoop does seem to function properly despite this.
> The workaround takes only 10 minutes.
> There are numerous discussions about this:
> google "Unable to load realm mapping info from SCDynamicStore" returns 1770 
> hits.  Each one has many discussions.  
> Assume each discussion take only 5 minute, a 10-minute fix can save ~150 
> hours.  This does not count much search of this issue and its 
> solution/workaround, which can easily hit (wasted) thousands of hours!!!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10748) HttpServer2 should not load JspServlet

2014-06-24 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10748:
---

 Summary: HttpServer2 should not load JspServlet
 Key: HADOOP-10748
 URL: https://issues.apache.org/jira/browse/HADOOP-10748
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 3.0.0
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HADOOP-10748.000.patch

Currently HttpServer2 loads the JspServlet by default. It should be removed as 
JSP support is no longer required.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Reopened] (HADOOP-10717) Missing JSP support in Jetty, 'NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet' when user want to start namenode.

2014-06-18 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai reopened HADOOP-10717:
-


> Missing JSP support in Jetty, 'NO JSP Support for /, did not find 
> org.apache.jasper.servlet.JspServlet' when user want to start namenode.
> -
>
> Key: HADOOP-10717
> URL: https://issues.apache.org/jira/browse/HADOOP-10717
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HADOOP-10717.patch
>
>
> When user want to start NameNode, user would got the following exception, it 
> is caused by missing org.mortbay.jetty:jsp-2.1-jetty:jar:6.1.26 in the pom.xml
> 14/06/18 14:55:30 INFO http.HttpServer2: Added global filter 'safety' 
> (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
> 14/06/18 14:55:30 INFO http.HttpServer2: Added filter static_user_filter 
> (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
> context hdfs
> 14/06/18 14:55:30 INFO http.HttpServer2: Added filter static_user_filter 
> (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
> context static
> 14/06/18 14:55:30 INFO http.HttpServer2: Added filter static_user_filter 
> (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
> context logs
> 14/06/18 14:55:30 INFO http.HttpServer2: Added filter 
> 'org.apache.hadoop.hdfs.web.AuthFilter' 
> (class=org.apache.hadoop.hdfs.web.AuthFilter)
> 14/06/18 14:55:30 INFO http.HttpServer2: addJerseyResourcePackage: 
> packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources,
>  pathSpec=/webhdfs/v1/*
> 14/06/18 14:55:30 INFO http.HttpServer2: Jetty bound to port 50070
> 14/06/18 14:55:30 INFO mortbay.log: jetty-6.1.26
> 14/06/18 14:55:30 INFO mortbay.log: NO JSP Support for /, did not find 
> org.apache.jasper.servlet.JspServlet
> 14/06/18 14:57:38 WARN mortbay.log: EXCEPTION
> java.net.ConnectException: Connection timed out
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
> at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> at java.net.Socket.connect(Socket.java:529)
> at java.net.Socket.connect(Socket.java:478)
> at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:395)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
> at sun.net.www.http.HttpClient.(HttpClient.java:234)
> at sun.net.www.http.HttpClient.New(HttpClient.java:307)
> at sun.net.www.http.HttpClient.New(HttpClient.java:324)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
> at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:677)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1315)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1282)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:283)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1194)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1090)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1003)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentS

[jira] [Resolved] (HADOOP-10717) Missing JSP support in Jetty, 'NO JSP Support for /, did not find org.apache.jasper.servlet.JspServlet' when user want to start namenode.

2014-06-18 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-10717.
-

Resolution: Invalid

> Missing JSP support in Jetty, 'NO JSP Support for /, did not find 
> org.apache.jasper.servlet.JspServlet' when user want to start namenode.
> -
>
> Key: HADOOP-10717
> URL: https://issues.apache.org/jira/browse/HADOOP-10717
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Dapeng Sun
>Assignee: Dapeng Sun
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: HADOOP-10717.patch
>
>
> When user want to start NameNode, user would got the following exception, it 
> is caused by missing org.mortbay.jetty:jsp-2.1-jetty:jar:6.1.26 in the pom.xml
> 14/06/18 14:55:30 INFO http.HttpServer2: Added global filter 'safety' 
> (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
> 14/06/18 14:55:30 INFO http.HttpServer2: Added filter static_user_filter 
> (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
> context hdfs
> 14/06/18 14:55:30 INFO http.HttpServer2: Added filter static_user_filter 
> (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
> context static
> 14/06/18 14:55:30 INFO http.HttpServer2: Added filter static_user_filter 
> (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to 
> context logs
> 14/06/18 14:55:30 INFO http.HttpServer2: Added filter 
> 'org.apache.hadoop.hdfs.web.AuthFilter' 
> (class=org.apache.hadoop.hdfs.web.AuthFilter)
> 14/06/18 14:55:30 INFO http.HttpServer2: addJerseyResourcePackage: 
> packageName=org.apache.hadoop.hdfs.server.namenode.web.resources;org.apache.hadoop.hdfs.web.resources,
>  pathSpec=/webhdfs/v1/*
> 14/06/18 14:55:30 INFO http.HttpServer2: Jetty bound to port 50070
> 14/06/18 14:55:30 INFO mortbay.log: jetty-6.1.26
> 14/06/18 14:55:30 INFO mortbay.log: NO JSP Support for /, did not find 
> org.apache.jasper.servlet.JspServlet
> 14/06/18 14:57:38 WARN mortbay.log: EXCEPTION
> java.net.ConnectException: Connection timed out
> at java.net.PlainSocketImpl.socketConnect(Native Method)
> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
> at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
> at java.net.Socket.connect(Socket.java:529)
> at java.net.Socket.connect(Socket.java:478)
> at sun.net.NetworkClient.doConnect(NetworkClient.java:163)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:395)
> at sun.net.www.http.HttpClient.openServer(HttpClient.java:530)
> at sun.net.www.http.HttpClient.(HttpClient.java:234)
> at sun.net.www.http.HttpClient.New(HttpClient.java:307)
> at sun.net.www.http.HttpClient.New(HttpClient.java:324)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
> at 
> sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
> at 
> sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
> at 
> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1172)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(XMLEntityManager.java:677)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(XMLEntityManager.java:1315)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(XMLEntityManager.java:1282)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(XMLDTDScannerImpl.java:283)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(XMLDocumentScannerImpl.java:1194)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(XMLDocumentScannerImpl.java:1090)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:1003)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648)
> at 
> com.sun.org.apache.xerces.internal.impl.XMLNSDocumentSca

[jira] [Created] (HADOOP-10563) Remove the dependency of jsp in trunk

2014-05-01 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10563:
---

 Summary: Remove the dependency of jsp in trunk
 Key: HADOOP-10563
 URL: https://issues.apache.org/jira/browse/HADOOP-10563
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HADOOP-10563.000.patch

After HDFS-6252 neither hdfs nor yarn uses jsp, thus the dependency of the jsp 
can be removed from the pom.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10551) Consolidate the logic of path resolution in FSDirectory

2014-04-29 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10551:
---

 Summary: Consolidate the logic of path resolution in FSDirectory
 Key: HADOOP-10551
 URL: https://issues.apache.org/jira/browse/HADOOP-10551
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


Currently both FSDirectory and INodeDirectory provide helpers to resolve paths 
to inodes. This jira proposes to move all these helpers into FSDirectory to 
simplify the code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Text cmd in webhdfs

2014-04-25 Thread Haohui Mai
I suggest to build a separate tool streaming from a fs instead of baking it
into webhdfs. One goal in webhdfs is to minimize the dependency of both the
webhdfs server and the client so that other projects easily adopt it.

If hadoop fs is too slow, it might be a good idea to build a new tool that
reads from stdio, and parses the text format. For example, you can use it
like

curl http://foo/webhdfs/v1?op=OPEN|parse-text

~Haohui

On Fri, Apr 25, 2014 at 3:16 AM, Nikita Makeev  wrote:

> Hi.
>
> Sure I will, it will just take some time, as current patch is against 0.20
> and 2.0.0 and I'm going to add some more unit tests and make sure the patch
> conforms to requirements.
> Thanks.
>
> Nikita
>
>
> On Fri, Apr 25, 2014 at 12:17 PM, Akira AJISAKA
> wrote:
>
> > # Added hdfs-dev@
> >
> > Hi Nikita,
> >
> > I'm personally very interested in the functionality!
> > Please create a issue on ASF JIRA and attach your patch.
> >
> > Here is the documentation of creating a patch and attaching it.
> > http://wiki.apache.org/hadoop/HowToContribute
> >
> > Thanks,
> > Akira
> >
> >
> > (2014/04/25 2:09), Nikita Makeev wrote:
> >
> >> In my company we're using webhdfs a lot. One of usage pattern is
> exporting
> >> files from hdfs to external systems which was done before webhdfs by
> >> calling 'hadoop fs -text' to local filesystem and then transferring to
> >> final destination. We're using 'text' because of storing data on hdfs in
> >> sequence files and our external systems can't read them. We find webhdfs
> >> very handy, particularly because of very little overhead for starting up
> >> compared to 'hadoop fs', so we made a patch to make datanode capable of
> >> doing 'text' via webhdfs. Though I think our usage pattern is hardly
> >> common, I still asking the community, is there any interest in such
> >> functionality.
> >>
> >> SY,
> >>
> >>Nikita Makeev
> >>
> >>
> >
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Created] (HADOOP-10487) Racy code in UserGroupInformation#ensureInitialized()

2014-04-09 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10487:
---

 Summary: Racy code in UserGroupInformation#ensureInitialized()
 Key: HADOOP-10487
 URL: https://issues.apache.org/jira/browse/HADOOP-10487
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Assignee: Haohui Mai


UserGroupInformation#ensureInitialized() uses the double-check-locking pattern 
to reduce the synchronization cost:

{code}
  private static void ensureInitialized() {
if (conf == null) {
  synchronized(UserGroupInformation.class) {
if (conf == null) { // someone might have beat us
  initialize(new Configuration(), false);
}
  }
}
  }
{code}

As [~tlipcon] pointed out in the original jira (HADOOP-9748). This pattern is 
incorrect. Please see more details in 
http://en.wikipedia.org/wiki/Double-checked_locking and 
http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html

This jira proposes to use the static class holder pattern to do it correctly.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10486) Remove typedbytes support from hadoop-streaming

2014-04-08 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10486:
---

 Summary: Remove typedbytes support from hadoop-streaming
 Key: HADOOP-10486
 URL: https://issues.apache.org/jira/browse/HADOOP-10486
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


The typed record support in hadoop-streaming is based upon the deprecated 
records package. Neither of them are actively maintained. This jira proposes to 
remove them.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10485) Remove dead classes in hadoop-streaming

2014-04-08 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10485:
---

 Summary: Remove dead classes in hadoop-streaming
 Key: HADOOP-10485
 URL: https://issues.apache.org/jira/browse/HADOOP-10485
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai
 Attachments: HADOOP-10485.000.patch

Hadoop-streaming no longer requires many classes in o.a.h.record. This jira 
removes the dead code.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10484) Remove o.a.h.conf.Reconfig*

2014-04-08 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10484:
---

 Summary: Remove o.a.h.conf.Reconfig*
 Key: HADOOP-10484
 URL: https://issues.apache.org/jira/browse/HADOOP-10484
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


A search reveals that these classes are not used by hadoop and any downstream 
projects after 0.20. The have not been maintained since 2011.

This jira proposes to remove them from hadoop-common.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10482) Fix new findbugs warnings in hadoop-common

2014-04-08 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10482:
---

 Summary: Fix new findbugs warnings in hadoop-common
 Key: HADOOP-10482
 URL: https://issues.apache.org/jira/browse/HADOOP-10482
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Haohui Mai


The following findbugs warnings need to be fixed:

{noformat}
[INFO] --- findbugs-maven-plugin:2.5.3:check (default-cli) @ hadoop-common ---
[INFO] BugInstance size is 97
[INFO] Error size is 0
[INFO] Total bugs: 97
[INFO] Found reliance on default encoding in 
org.apache.hadoop.conf.Configuration.getConfResourceAsReader(String): new 
java.io.InputStreamReader(InputStream) ["org.apache.hadoop.conf.Configuration"] 
At Configuration.java:[lines 169-2642]
[INFO] Null passed for nonnull parameter of set(String, String) in 
org.apache.hadoop.conf.Configuration.setPattern(String, Pattern) 
["org.apache.hadoop.conf.Configuration"] At Configuration.java:[lines 169-2642]
[INFO] Format string should use %n rather than \n in 
org.apache.hadoop.conf.ReconfigurationServlet.printHeader(PrintWriter, String) 
["org.apache.hadoop.conf.ReconfigurationServlet"] At 
ReconfigurationServlet.java:[lines 44-234]
[INFO] Format string should use %n rather than \n in 
org.apache.hadoop.conf.ReconfigurationServlet.printHeader(PrintWriter, String) 
["org.apache.hadoop.conf.ReconfigurationServlet"] At 
ReconfigurationServlet.java:[lines 44-234]
[INFO] Found reliance on default encoding in new 
org.apache.hadoop.crypto.key.KeyProvider$Metadata(byte[]): new 
java.io.InputStreamReader(InputStream) 
["org.apache.hadoop.crypto.key.KeyProvider$Metadata"] At 
KeyProvider.java:[lines 110-204]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.crypto.key.KeyProvider$Metadata.serialize(): new 
java.io.OutputStreamWriter(OutputStream) 
["org.apache.hadoop.crypto.key.KeyProvider$Metadata"] At 
KeyProvider.java:[lines 110-204]
[INFO] Redundant nullcheck of clazz, which is known to be non-null in 
org.apache.hadoop.fs.FileSystem.createFileSystem(URI, Configuration) 
["org.apache.hadoop.fs.FileSystem"] At FileSystem.java:[lines 89-3017]
[INFO] Unread public/protected field: 
org.apache.hadoop.fs.HarFileSystem$Store.endHash 
["org.apache.hadoop.fs.HarFileSystem$Store"] At HarFileSystem.java:[lines 
492-500]
[INFO] Unread public/protected field: 
org.apache.hadoop.fs.HarFileSystem$Store.startHash 
["org.apache.hadoop.fs.HarFileSystem$Store"] At HarFileSystem.java:[lines 
492-500]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.fs.HardLink.createHardLink(File, File): new 
java.io.InputStreamReader(InputStream) ["org.apache.hadoop.fs.HardLink"] At 
HardLink.java:[lines 51-546]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.fs.HardLink.createHardLinkMult(File, String[], File, int): 
new java.io.InputStreamReader(InputStream) ["org.apache.hadoop.fs.HardLink"] At 
HardLink.java:[lines 51-546]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.fs.HardLink.getLinkCount(File): new 
java.io.InputStreamReader(InputStream) ["org.apache.hadoop.fs.HardLink"] At 
HardLink.java:[lines 51-546]
[INFO] Bad attempt to compute absolute value of signed random integer in 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(String,
 long, Configuration, boolean) 
["org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext"] At 
LocalDirAllocator.java:[lines 247-549]
[INFO] Null passed for nonnull parameter of 
org.apache.hadoop.conf.Configuration.set(String, String) in 
org.apache.hadoop.fs.ftp.FTPFileSystem.initialize(URI, Configuration) 
["org.apache.hadoop.fs.ftp.FTPFileSystem"] At FTPFileSystem.java:[lines 51-593]
[INFO] Redundant nullcheck of dirEntries, which is known to be non-null in 
org.apache.hadoop.fs.ftp.FTPFileSystem.delete(FTPClient, Path, boolean) 
["org.apache.hadoop.fs.ftp.FTPFileSystem"] At FTPFileSystem.java:[lines 51-593]
[INFO] Redundant nullcheck of 
org.apache.hadoop.fs.ftp.FTPFileSystem.getFileStatus(FTPClient, Path), which is 
known to be non-null in 
org.apache.hadoop.fs.ftp.FTPFileSystem.exists(FTPClient, Path) 
["org.apache.hadoop.fs.ftp.FTPFileSystem"] At FTPFileSystem.java:[lines 51-593]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.fs.shell.Display$AvroFileInputStream.read(): 
String.getBytes() ["org.apache.hadoop.fs.shell.Display$AvroFileInputStream"] At 
Display.java:[lines 259-309]
[INFO] Format string should use %n rather than \n in 
org.apache.hadoop.fs.shell.Display$Checksum.processPath(PathData) 
["org.apache.hadoop.fs.shell.Display$Checksum"] At Display.java:[lines 169-196]
[INFO] Format string should use %n rather than \n in 
org.apache.hadoop.fs.shell.Display$Checksum.processPath(PathData) 
["

[jira] [Created] (HADOOP-10481) Fix new findbugs warnings in hadoop-auth

2014-04-08 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10481:
---

 Summary: Fix new findbugs warnings in hadoop-auth
 Key: HADOOP-10481
 URL: https://issues.apache.org/jira/browse/HADOOP-10481
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Haohui Mai


The following findbugs warnings need to be fixed:

{noformat}
[INFO] --- findbugs-maven-plugin:2.5.3:check (default-cli) @ hadoop-auth ---
[INFO] BugInstance size is 2
[INFO] Error size is 0
[INFO] Total bugs: 2
[INFO] Found reliance on default encoding in 
org.apache.hadoop.security.authentication.server.AuthenticationFilter.init(FilterConfig):
 String.getBytes() 
["org.apache.hadoop.security.authentication.server.AuthenticationFilter"] At 
AuthenticationFilter.java:[lines 76-455]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.security.authentication.util.Signer.computeSignature(String): 
String.getBytes() ["org.apache.hadoop.security.authentication.util.Signer"] At 
Signer.java:[lines 34-96]
{noformat}




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10480) Fix new findbugs warnings in hadoop-hdfs

2014-04-08 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10480:
---

 Summary: Fix new findbugs warnings in hadoop-hdfs
 Key: HADOOP-10480
 URL: https://issues.apache.org/jira/browse/HADOOP-10480
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Haohui Mai


The following findbugs warnings need to be fixed:

{noformat}
[INFO] --- findbugs-maven-plugin:2.5.3:check (default-cli) @ hadoop-hdfs ---
[INFO] BugInstance size is 14
[INFO] Error size is 0
[INFO] Total bugs: 14
[INFO] Redundant nullcheck of curPeer, which is known to be non-null in 
org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp() 
["org.apache.hadoop.hdfs.BlockReaderFactory"] At BlockReaderFactory.java:[lines 
68-808]
[INFO] Increment of volatile field 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.restartingNodeIndex in 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery()
 ["org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer"] At 
DFSOutputStream.java:[lines 308-1492]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(DataOutputStream,
 DataInputStream, DataOutputStream, String, DataTransferThrottler, 
DatanodeInfo[]): new java.io.FileWriter(File) 
["org.apache.hadoop.hdfs.server.datanode.BlockReceiver"] At 
BlockReceiver.java:[lines 66-905]
[INFO] b must be nonnull but is marked as nullable 
["org.apache.hadoop.hdfs.server.datanode.DatanodeJspHelper$2"] At 
DatanodeJspHelper.java:[lines 546-549]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.addToReplicasMap(ReplicaMap,
 File, boolean): new java.util.Scanner(File) 
["org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice"] At 
BlockPoolSlice.java:[lines 58-427]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.loadDfsUsed():
 new java.util.Scanner(File) 
["org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice"] At 
BlockPoolSlice.java:[lines 58-427]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.saveDfsUsed():
 new java.io.FileWriter(File) 
["org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice"] At 
BlockPoolSlice.java:[lines 58-427]
[INFO] Redundant nullcheck of f, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(String,
 Block[]) 
["org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl"] At 
FsDatasetImpl.java:[lines 60-1910]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.hdfs.server.namenode.FSImageUtil.(): String.getBytes() 
["org.apache.hadoop.hdfs.server.namenode.FSImageUtil"] At 
FSImageUtil.java:[lines 34-89]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(String, 
byte[], boolean): new String(byte[]) 
["org.apache.hadoop.hdfs.server.namenode.FSNamesystem"] At 
FSNamesystem.java:[lines 301-7701]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.hdfs.server.namenode.INode.dumpTreeRecursively(PrintStream): 
new java.io.PrintWriter(OutputStream, boolean) 
["org.apache.hadoop.hdfs.server.namenode.INode"] At INode.java:[lines 51-744]
[INFO] Redundant nullcheck of fos, which is known to be non-null in 
org.apache.hadoop.hdfs.server.namenode.NamenodeFsck.copyBlocksToLostFound(String,
 HdfsFileStatus, LocatedBlocks) 
["org.apache.hadoop.hdfs.server.namenode.NamenodeFsck"] At 
NamenodeFsck.java:[lines 94-710]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(String[]):
 new java.io.PrintWriter(File) 
["org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB"] At 
OfflineImageViewerPB.java:[lines 45-181]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB.run(String[]):
 new java.io.PrintWriter(OutputStream) 
["org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB"] At 
OfflineImageViewerPB.java:[lines 45-181]
{noformat}




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10479) Fix new findbugs warnings in hadoop-minikdc

2014-04-08 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10479:
---

 Summary: Fix new findbugs warnings in hadoop-minikdc
 Key: HADOOP-10479
 URL: https://issues.apache.org/jira/browse/HADOOP-10479
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Haohui Mai


The following findbugs warnings need to be fixed:

{noformat}
[INFO] --- findbugs-maven-plugin:2.5.3:check (default-cli) @ hadoop-minikdc ---
[INFO] BugInstance size is 2
[INFO] Error size is 0
[INFO] Total bugs: 2
[INFO] Found reliance on default encoding in 
org.apache.hadoop.minikdc.MiniKdc.initKDCServer(): new 
java.io.InputStreamReader(InputStream) ["org.apache.hadoop.minikdc.MiniKdc"] At 
MiniKdc.java:[lines 112-557]
[INFO] Found reliance on default encoding in 
org.apache.hadoop.minikdc.MiniKdc.main(String[]): new java.io.FileReader(File) 
["org.apache.hadoop.minikdc.MiniKdc"] At MiniKdc.java:[lines 112-557]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10478) Fix new findbugs warnings in hadoop-maven-plugins

2014-04-08 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10478:
---

 Summary: Fix new findbugs warnings in hadoop-maven-plugins
 Key: HADOOP-10478
 URL: https://issues.apache.org/jira/browse/HADOOP-10478
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Haohui Mai


The following findbug warning needs to be fixed:

{noformat}
[INFO] --- findbugs-maven-plugin:2.5.3:check (default-cli) @ 
hadoop-maven-plugins ---
[INFO] BugInstance size is 1
[INFO] Error size is 0
[INFO] Total bugs: 1
[INFO] Found reliance on default encoding in new 
org.apache.hadoop.maven.plugin.util.Exec$OutputBufferThread(InputStream): new 
java.io.InputStreamReader(InputStream) 
["org.apache.hadoop.maven.plugin.util.Exec$OutputBufferThread"] At 
Exec.java:[lines 89-114]
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10477) Clean up findbug warnings found by findbugs 2.0.2

2014-04-08 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10477:
---

 Summary: Clean up findbug warnings found by findbugs 2.0.2
 Key: HADOOP-10477
 URL: https://issues.apache.org/jira/browse/HADOOP-10477
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


This is an umbrella jira to clean up the new findbug warnings found by findbugs 
2.0.2.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10476) Bumping the findbugs version to 2.5.3

2014-04-08 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10476:
---

 Summary: Bumping the findbugs version to 2.5.3
 Key: HADOOP-10476
 URL: https://issues.apache.org/jira/browse/HADOOP-10476
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


The findbug version used by hadoop is pretty old (1.3.9). The old version of 
Findbugs itself have some bugs (like 
http://sourceforge.net/p/findbugs/bugs/918/, hit by HADOOP-10474). Futhermore, 
newer version is able to catch more bugs.

It's a good time to bump the findbugs version to the latest stable version, 
2.5.3.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10474) Move o.a.h.record to hadoop-streaming

2014-04-08 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10474:
---

 Summary: Move o.a.h.record to hadoop-streaming
 Key: HADOOP-10474
 URL: https://issues.apache.org/jira/browse/HADOOP-10474
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


The classes in o.a.h.record have been deprecated for more than a year and a 
half. They should be removed. As the first step, the jira moves all these 
classes into the hadoop-streaming project, which is the only user of these 
classes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10468) TestMetricsSystemImpl.testMultiThreadedPublish fails intermediately

2014-04-07 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10468:
---

 Summary: TestMetricsSystemImpl.testMultiThreadedPublish fails 
intermediately
 Key: HADOOP-10468
 URL: https://issues.apache.org/jira/browse/HADOOP-10468
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


{{TestMetricsSystemImpl.testMultiThreadedPublish}} can fail intermediately due 
to the insufficient size of the sink queue:

{code}
2014-04-06 21:34:55,269 WARN  impl.MetricsSinkAdapter 
(MetricsSinkAdapter.java:putMetricsImmediate(107)) - Collector has a full queue 
and can't consume the given metrics.
2014-04-06 21:34:55,270 WARN  impl.MetricsSinkAdapter 
(MetricsSinkAdapter.java:putMetricsImmediate(107)) - Collector has a full queue 
and can't consume the given metrics.
2014-04-06 21:34:55,271 WARN  impl.MetricsSinkAdapter 
(MetricsSinkAdapter.java:putMetricsImmediate(107)) - Collector has a full queue 
and can't consume the given metrics.
{code}

The unit test should increase the default queue size to avoid intermediate 
failure.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Plans of moving towards JDK7 in trunk

2014-04-07 Thread Haohui Mai
It looks to me that the majority of this thread welcomes JDK7. Just to
reiterate, there are two separate questions here:

1. When should hadoop-trunk can be only built on top of JDK7?
2. When should hadoop-branch-2 can be only built on top of JDK7?

The answers of the above questions directly imply when and how hadoop can
break the compatibility for JDK6 runtime.

It looks that there are quite a bit of compatibility concerns of question
(2). Should we focus on question (1) and come up with a plan? Personally
I'd love to see (1) to happen as soon as possible.

~Haohui

On Sun, Apr 6, 2014 at 11:37 AM, Steve Loughran wrote:

> On 5 April 2014 20:54, Raymie Stata  wrote:
>
> > To summarize the thread so far:
> >
> > a) Java7 is already a supported compile- and runtime environment for
> > Hadoop branch2 and trunk
> > b) Java6 must remain a supported compile- and runtime environment for
> > Hadoop branch2
> > c) (b) implies that branch2 must stick to Java6 APIs
> >
> > I wonder if point (b) should be revised.  We could immediately
> > deprecate Java6 as a runtime (and thus compile-time) environment for
> > Hadoop.  We could end support for in some published time frame
> > (perhaps 3Q2014).  That is, we'd say that all future 2.x release past
> > some date would not be guaranteed to run on Java6.  This would set us
> > up for using Java7 APIs into branch2.
> >
>
> I'll let others deal with that question.
>
>
> >
> > An alternative might be to keep branch2 on Java6 APIs forever, and to
> > start using Java7 APIs in trunk relatively soon.  The concern here
> > would be that trunk isn't getting the kind of production torture
> > testing that branch2 is subjected to, and won't be for a while.  If
> > trunk and branch2 diverge too much too quickly, trunk could become a
> > nest of bugs, endangering the timeline and quality of Hadoop 3.  This
> > would argue for keeping trunk and branch2 in closer sync (maybe until
> > a branch3 is created and starts getting used by bleeding-edge users).
> > However, as just suggested, keeping them in closer sync need _not_
> > imply that Java7 features be avoided indefinitely: again, with
> > sufficient warning, Java6 support could be sunset within branch2.
> >
>
> One thing we could do is have a policy towards new features where there's
> consensus that they won't go into branch-2, especially things in their own
> JARs.
>
> Here we could consider a policy of build set up to be Java 7 only, with
> Java7 APIs.
>
> That would be justified if there was some special java 7 feature -such as
> its infiniband support. Add a feature like that in its own module (under
> hadoop-tools, presumably), and use Java7 and Java 7 libraries. If someone
> really did want to use the feature in hadoop-2, they'd be able to, in a
> java7+ only backport.
>
>
> >
> > On a related note, Steve points out that we need to start thinking
> > about Java8.  YES!!  Lambdas are a Really Big Deal!  If we sunset
> > Java6 in a few quarters, maybe we can add Java8 compile and runtime
> > (but not API) support about the same time.  This does NOT imply
> > bringing Java8 APIs into branch2: Even if we do allow Java7 APIs into
> > branch2 in the future, I doubt that bringing Java8 APIs into it will
> > ever make sense.  However, if Java8 is a supported runtime environment
> > for Hadoop 2, that sets us up for using Java8 APIs for the eventual
> > branch3 sometime in 2015.
> >
> >
> Testing Hadoop on Java 8 would let the rest of the stack move forward.
>
> In the meantime, can I point out that both Scala-on-Java7 and
> Groovy-on-Java 7 offer closures quite nicely, with performance by way of
> INVOKEDYNAMIC opcodes.
>
> -steve
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Plans of moving towards JDK7 in trunk

2014-04-04 Thread Haohui Mai
bq. It might not be as clear cut...

Totally agree. I think the key is that we can do the work in an incremental
way. We can only introduce JDK7 dependency on the server side. In order to
do this we need to separate the client-side code to separate jars. I've
already proposed to create a hdfs-client jar in the hdfs-dev mailing list.

bq.  I would have thought it could be easily achieved by marking certain
project poms with source/target 1.6 in their maven compiler plugin
configuration while upgrading the default setting to 1.7. Do you anticipate
more issues?

Correct me if I'm wrong, but I think that's enough. The work should be
minimal.

~Haohui

On Fri, Apr 4, 2014 at 3:43 PM, Sangjin Lee  wrote:

> Please don't forget the mac os build on JDK 7. :)
>
>
> On Fri, Apr 4, 2014 at 3:15 PM, Haohui Mai  wrote:
>
> > I'm referring to the later case. Indeed migrating JDK7 for branch-2 is
> more
> > difficult.
> >
> > I think one reasonable approach is to put the hdfs / yarn clients into
> > separate jars. The client-side jars can only use JDK6 APIs, so that
> > downstream projects running on top of JDK6 continue to work.
> >
>
> It might not be as clear cut. For clients to run clean on JDK 6, not only
> the client projects/artifacts but also any of their dependencies must be
> free of JDK 7 code. And this obviously includes things like hadoop-common
> (or any downstream dependencies for that matter).
>
>
> >
> > The HDFS/YARN/MR servers need to be run on top of JDK7, and we're free to
> > use JDK7 APIs inside them. Given the fact that there're way more code in
> > the server-side compared to the client-side, having the ability to use
> JDK7
> > in the server-side only might still be a win.
> >
> > The downside I can think of is that it might complicate the effort of
> > publishing maven jars, but this should be an one-time issue.
> >
>
> Could you elaborate on why it would complicate maven jar publication?
> Perhaps I'm over-simplifying things, but I would have thought it could be
> easily achieved by marking certain project poms with source/target 1.6 in
> their maven compiler plugin configuration while upgrading the default
> setting to 1.7. Do you anticipate more issues?
>
>
> >
> > ~Haohui
> >
> >
> > On Fri, Apr 4, 2014 at 2:37 PM, Alejandro Abdelnur  > >wrote:
> >
> > > Haohui,
> > >
> > > Is the idea to compile/test with JDK7 and recommend it for runtime and
> > stop
> > > there? Or to start using JDK7 API stuff as well? If the later is the
> > case,
> > > then backporting stuff to branch-2 may break and patches may have to be
> > > refactored for JDK6. Given that branch-2 got GA status not so long
> ago, I
> > > assume it will be active for a while.
> > >
> > > What are your thoughts on this regard?
> > >
> > > Thanks
> > >
> > >
> > > On Fri, Apr 4, 2014 at 2:29 PM, Haohui Mai 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > There have been multiple discussions on deprecating supports of JDK6
> > and
> > > > moving towards JDK7. It looks to me that the consensus is that now
> > hadoop
> > > > is ready to drop the support of JDK6 and to move towards JDK7. Based
> on
> > > the
> > > > consensus, I wonder whether it is a good time to start the migration.
> > > >
> > > > Here are my understandings of the current status:
> > > >
> > > > 1. There is no more public updates of JDK6 since Feb 2013. Users no
> > > longer
> > > > get fixes of security vulnerabilities through official public
> updates.
> > > > 2. Hadoop core is stuck with out-of-date dependency unless moving
> > towards
> > > > JDK7. (see
> > > > http://hadoop.6.n7.nabble.com/very-old-dependencies-td71486.html)
> > > > The implementation can also benefit from it thanks to the new
> > > > functionalities in JDK7.
> > > > 3. The code is ready for JDK7. Cloudera and Hortonworks have
> successful
> > > > stories of supporting Hadoop on JDK7.
> > > >
> > > >
> > > > It seems that the real work of moving to JDK7 is minimal. We only
> need
> > to
> > > > (1) make sure the jenkins are running on top of JDK7, and (2) to
> update
> > > the
> > > > minimum required Java version from 6 to 7. Therefore I propose that
> > let's
> > > > move towards JDK7 in trunk in the short term.
> > > >

Re: Plans of moving towards JDK7 in trunk

2014-04-04 Thread Haohui Mai
I'm referring to the later case. Indeed migrating JDK7 for branch-2 is more
difficult.

I think one reasonable approach is to put the hdfs / yarn clients into
separate jars. The client-side jars can only use JDK6 APIs, so that
downstream projects running on top of JDK6 continue to work.

The HDFS/YARN/MR servers need to be run on top of JDK7, and we're free to
use JDK7 APIs inside them. Given the fact that there're way more code in
the server-side compared to the client-side, having the ability to use JDK7
in the server-side only might still be a win.

The downside I can think of is that it might complicate the effort of
publishing maven jars, but this should be an one-time issue.

~Haohui


On Fri, Apr 4, 2014 at 2:37 PM, Alejandro Abdelnur wrote:

> Haohui,
>
> Is the idea to compile/test with JDK7 and recommend it for runtime and stop
> there? Or to start using JDK7 API stuff as well? If the later is the case,
> then backporting stuff to branch-2 may break and patches may have to be
> refactored for JDK6. Given that branch-2 got GA status not so long ago, I
> assume it will be active for a while.
>
> What are your thoughts on this regard?
>
> Thanks
>
>
> On Fri, Apr 4, 2014 at 2:29 PM, Haohui Mai  wrote:
>
> > Hi,
> >
> > There have been multiple discussions on deprecating supports of JDK6 and
> > moving towards JDK7. It looks to me that the consensus is that now hadoop
> > is ready to drop the support of JDK6 and to move towards JDK7. Based on
> the
> > consensus, I wonder whether it is a good time to start the migration.
> >
> > Here are my understandings of the current status:
> >
> > 1. There is no more public updates of JDK6 since Feb 2013. Users no
> longer
> > get fixes of security vulnerabilities through official public updates.
> > 2. Hadoop core is stuck with out-of-date dependency unless moving towards
> > JDK7. (see
> > http://hadoop.6.n7.nabble.com/very-old-dependencies-td71486.html)
> > The implementation can also benefit from it thanks to the new
> > functionalities in JDK7.
> > 3. The code is ready for JDK7. Cloudera and Hortonworks have successful
> > stories of supporting Hadoop on JDK7.
> >
> >
> > It seems that the real work of moving to JDK7 is minimal. We only need to
> > (1) make sure the jenkins are running on top of JDK7, and (2) to update
> the
> > minimum required Java version from 6 to 7. Therefore I propose that let's
> > move towards JDK7 in trunk in the short term.
> >
> > Your feedbacks are appreciated.
> >
> > Regards,
> > Haohui
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
> >
>
>
>
> --
> Alejandro
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Plans of moving towards JDK7 in trunk

2014-04-04 Thread Haohui Mai
Hi,

There have been multiple discussions on deprecating supports of JDK6 and
moving towards JDK7. It looks to me that the consensus is that now hadoop
is ready to drop the support of JDK6 and to move towards JDK7. Based on the
consensus, I wonder whether it is a good time to start the migration.

Here are my understandings of the current status:

1. There is no more public updates of JDK6 since Feb 2013. Users no longer
get fixes of security vulnerabilities through official public updates.
2. Hadoop core is stuck with out-of-date dependency unless moving towards
JDK7. (see http://hadoop.6.n7.nabble.com/very-old-dependencies-td71486.html)
The implementation can also benefit from it thanks to the new
functionalities in JDK7.
3. The code is ready for JDK7. Cloudera and Hortonworks have successful
stories of supporting Hadoop on JDK7.


It seems that the real work of moving to JDK7 is minimal. We only need to
(1) make sure the jenkins are running on top of JDK7, and (2) to update the
minimum required Java version from 6 to 7. Therefore I propose that let's
move towards JDK7 in trunk in the short term.

Your feedbacks are appreciated.

Regards,
Haohui

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: [VOTE] Release Apache Hadoop 2.4.0

2014-04-03 Thread Haohui Mai
HDFS-6180 seems to be a blocker of 2.4. I'll post a patch later today.

~Haohui


On Thu, Apr 3, 2014 at 3:47 PM, Azuryy  wrote:

> Did you tested RM failover on Hive? There is bug.
>
>
> Sent from my iPhone5s
>
> > On 2014年4月4日, at 2:12, Xuan Gong  wrote:
> >
> > +1 non-binding
> >
> > Built from source code, tested on a single node cluster. Successfully
> ran a
> > few MR sample jobs.
> > Tested RM failover while job is running.
> >
> > Thanks
> >
> > Xuan Gong
> >
> >
> >> On Wed, Apr 2, 2014 at 10:21 PM, Zhijie Shen 
> wrote:
> >>
> >> +1 non-binding
> >>
> >> I built from source code, and setup a single node non-secure cluster
> with
> >> almost the default configurations and ran a number of MR example jobs
> and
> >> distributedshell jobs. I verified the generic and the per-framework (DS
> job
> >> only) historic information has been persisted, and such information is
> >> accessible via webUI, RESTful APIs and CLI.
> >>
> >> Thanks,
> >> Zhijie
> >>
> >>
> >>> On Wed, Apr 2, 2014 at 1:26 PM, Jian He  wrote:
> >>>
> >>> +1 non-binding
> >>>
> >>> Built from source code, tested on a single node cluster. Successfully
> >> ran a
> >>> few MR sample jobs.
> >>> Tested RM restart while job is running.
> >>>
> >>> Thanks,
> >>> Jian
> >>>
> >>> On Tue, Apr 1, 2014 at 5:42 PM, Travis Thompson <
> tthomp...@linkedin.com
>  wrote:
> >>>
>  +1 non-binding
> 
>  Built from git. Started with 120 node 2.3.0 cluster with security and
>  non HA, ran upgrade (non rolling) to 2.4.0.  Confirmed fsimage is OK
> >> and
>  HDFS successfully upgraded.  Also successfully ran some pig jobs and
>  mapreduce examples.  Haven't found any issues yet but will continue
>  testing.  Did not test Timeline Server since I'm using security.
> 
>  Thanks,
>  Travis
> 
> > On 03/31/2014 02:24 AM, Arun C Murthy wrote:
> > Folks,
> >
> > I've created a release candidate (rc0) for hadoop-2.4.0 that I would
>  like to get released.
> >
> > The RC is available at:
>  http://people.apache.org/~acmurthy/hadoop-2.4.0-rc0
> > The RC tag in svn is here:
>  https://svn.apache.org/repos/asf/hadoop/common/tags/release-2.4.0-rc0
> >
> > The maven artifacts are available via repository.apache.org.
> >
> > Please try the release and vote; the vote will run for the usual 7
> >>> days.
> >
> > thanks,
> > Arun
> >
> > --
> > Arun C. Murthy
> > Hortonworks Inc.
> > http://hortonworks.com/
> >>>
> >>> --
> >>> CONFIDENTIALITY NOTICE
> >>> NOTICE: This message is intended for the use of the individual or
> entity
> >> to
> >>> which it is addressed and may contain information that is confidential,
> >>> privileged and exempt from disclosure under applicable law. If the
> reader
> >>> of this message is not the intended recipient, you are hereby notified
> >> that
> >>> any printing, copying, dissemination, distribution, disclosure or
> >>> forwarding of this communication is strictly prohibited. If you have
> >>> received this communication in error, please contact the sender
> >> immediately
> >>> and delete it from your system. Thank You.
> >>
> >>
> >>
> >> --
> >> Zhijie Shen
> >> Hortonworks Inc.
> >> http://hortonworks.com/
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or
> entity to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> reader
> >> of this message is not the intended recipient, you are hereby notified
> that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> immediately
> >> and delete it from your system. Thank You.
> >
> > --
> > CONFIDENTIALITY NOTICE
> > NOTICE: This message is intended for the use of the individual or entity
> to
> > which it is addressed and may contain information that is confidential,
> > privileged and exempt from disclosure under applicable law. If the reader
> > of this message is not the intended recipient, you are hereby notified
> that
> > any printing, copying, dissemination, distribution, disclosure or
> > forwarding of this communication is strictly prohibited. If you have
> > received this communication in error, please contact the sender
> immediately
> > and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If

[jira] [Created] (HADOOP-10453) Do not use AuthenticatedURL in hadoop core

2014-03-31 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10453:
---

 Summary: Do not use AuthenticatedURL in hadoop core
 Key: HADOOP-10453
 URL: https://issues.apache.org/jira/browse/HADOOP-10453
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Haohui Mai
Priority: Blocker


As [~daryn] has suggested in HDFS-4564:

{quote}
AuthenticatedURL is not used because it is buggy in part to causing replay 
attacks, double attempts to kerberos authenticate with the fallback 
authenticator if the TGT is expired, incorrectly uses the fallback 
authenticator (required by oozie servers) to add the username parameter which 
webhdfs has already included in the uri.

AuthenticatedURL's attempt to do SPNEGO auth is a no-op because the JDK 
transparently does SPNEGO when the user's Subject (UGI) contains kerberos 
principals. Since AuthenticatedURL is now not used, webhdfs has to check the 
TGT itself for token operations.
Bottom line is AuthenticatedURL is unnecessary and introduces nothing but 
problems for webhdfs. It's only useful for oozie's anon/non-anon support.
{quote}

However, several functionalities that relies on SPNEGO in secure mode suffer 
from the same problem. For example, NNs / JNs create HTTP connections to 
exchange fsimage and edit logs. Currently all of them are through 
{{AuthenticatedURL}}. This needs to be fixed to avoid security vulnerabilities.

This jira purposes to remove {{AuthenticatedURL}} from hadoop core and to move 
it to oozie.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: very old dependencies

2014-03-28 Thread Haohui Mai
Given the fact that Java 6 is EOL for a while, I wonder, when is a good
time to drop the support of java 6 in trunk?

New functionalities in JDK7 such as file system watcher can simplify the
implementation of the name node and the data node in HDFS.


On Fri, Mar 28, 2014 at 10:30 AM, Steve Loughran wrote:

> I don't disagree about version age -I've just been downgrading a project to
> use some of the versions. The issue with YARN apps is that you get these on
> your classpath.
>
>
>1. There's a JIRA: https://issues.apache.org/jira/browse/HADOOP-9991
>2. Nobody is working full time on these, I sporadically apply the
>patches -I get to feel the pain downstream
>
> commons lang is at 2.6 now, which should keep you happy.
>
> jetty? Now that the shuffle is on netty we could think about this and
> jersey
>
> Guava is trouble. Leave it: new code doesn't work. Remove it and old code
> stops working.
> https://issues.apache.org/jira/browse/HADOOP-10101
>
> I think for branch-3 we should go ahead an apply the patch. For branch 2,
> it's too destructuve.
>
> At some point we also have to commit being java7 + only -this would
> actually help us move up to some of the later dependencies. That's clearly
> not a branch-2
>
>
>
>
> On 28 March 2014 14:59, Sangjin Lee  wrote:
>
> > Hi folks,
> >
> > Even as 2.3 was released, several dependencies of hadoop are quite dated.
> > And more and more of them are causing friction for hadoop-related
> libraries
> > and hadoop users in general, as these dependency versions are often
> > incompatibly different than the versions most people use these days. So
> > this is becoming a very real problem, if not one already.
> >
> > Some key ones include
> > - guava: 11.0.2 (current 16.0.1)
> > - jetty: 6.1.26 (current 9.1.3)
> > - commons-lang: 2.6 (current 3.3.1)
> >
> > In particular, guava is already causing a lot of issues as many
> developers
> > have moved onto a newer version and are using APIs that do not exist in
> > 11.0.2, and get NoSuchMethodErrors and such.
> >
> > Also, for jetty, 6.1.26 has been EOFed for some time now.
> >
> > Do we have a plan to review some of these dependencies and upgrade them
> to
> > reasonably up-to-date versions? I remember seeing some JIRAs on specific
> > dependencies, but has a review been done to determine a good set of
> > versions to upgrade to?
> >
> > It would be great if we could upgrade some of the more common ones at
> least
> > to modern versions.
> >
> > Thanks,
> > Sangjin
> >
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


[jira] [Resolved] (HADOOP-10439) Fix compilation error in branch-2 after HADOOP-10426

2014-03-26 Thread Haohui Mai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haohui Mai resolved HADOOP-10439.
-

   Resolution: Fixed
Fix Version/s: 2.5.0

I've committed it to branch-2. Thanks [~szetszwo] for the review. 

> Fix compilation error in branch-2 after HADOOP-10426
> 
>
> Key: HADOOP-10439
> URL: https://issues.apache.org/jira/browse/HADOOP-10439
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: 2.5.0
>Reporter: Haohui Mai
>Assignee: Haohui Mai
> Fix For: 2.5.0
>
> Attachments: HADOOP-10439.000.patch
>
>
> HADOOP-10426 removes the import of {{java.io.File}} in branch-2, which causes 
> compilation error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10439) Fix compilation error in branch-2 after HADOOP-10426

2014-03-26 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10439:
---

 Summary: Fix compilation error in branch-2 after HADOOP-10426
 Key: HADOOP-10439
 URL: https://issues.apache.org/jira/browse/HADOOP-10439
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Haohui Mai
Assignee: Haohui Mai


HADOOP-10426 removes the import of {{java.io.File}} in branch-2, which causes 
compilation error.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HADOOP-10379) Protect authentication cookies with the HttpOnly and Secure flags

2014-03-03 Thread Haohui Mai (JIRA)
Haohui Mai created HADOOP-10379:
---

 Summary: Protect authentication cookies with the HttpOnly and 
Secure flags
 Key: HADOOP-10379
 URL: https://issues.apache.org/jira/browse/HADOOP-10379
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Haohui Mai
Assignee: Haohui Mai


Browser vendors have adopted proposals to enhance the security of HTTP cookies. 
For example, the server can mark a cookie as {{Secure}} so that it will not be 
transfer via plain-text HTTP protocol, and the server can mark a cookie as 
{{HttpOnly}} to prohibit the JavaScript to access that cookie.

This jira proposes to adopt these flags in Hadoop to protect the HTTP cookie 
used for authentication purposes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >