Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Rohith Sharma K S
On 29 August 2017 at 06:24, Andrew Wang  wrote:

> So far I've seen no -1's to the branching proposal, so I plan to execute
> this tomorrow unless there's further feedback.
>
For on going branch merge threads i.e TSv2, voting will be closing
tomorrow. Does it end up in merging into trunk(3.1.0-SNAPSHOT) and
branch-3.0(3.0.0-beta1-SNAPSHOT) ? If so, would you be able to wait for
couple of more days before creating branch-3.0 so that TSv2 branch merge
would be done directly to trunk?



>
> Regarding the above discussion, I think Jason and I have essentially the
> same opinion.
>
> I hope that keeping trunk a release branch means a higher bar for merges
> and code review in general. In the past, I've seen some patches committed
> to trunk-only as a way of passing responsibility to a future user or
> reviewer. That doesn't help anyone; patches should be committed with the
> intent of running them in production.
>
> I'd also like to repeat the above thanks to the many, many contributors
> who've helped with release improvements. Allen's work on create-release and
> automated changes and release notes were essential, as was Xiao's work on
> LICENSE and NOTICE files. I'm also looking forward to Marton's site
> improvements, which addresses one of the remaining sore spots in the
> release process.
>
> Things have gotten smoother with each alpha we've done over the last year,
> and it's a testament to everyone's work that we have a good probability of
> shipping beta and GA later this year.
>
> Cheers,
> Andrew
>
>


Re: [DISCUSS] Merge HDFS-10467 to (Router-based federation) trunk

2017-08-28 Thread Iñigo Goiri
Brahma, thank you for the comments.
i) I can send a patch with the diff between branches.
ii) Working with Giovanni for the review.
iii) We had some numbers in our cluster.
iv) We could have a Router just for giving a view of all the namespaces
without giving RPC accesses. Another case might be only allowing WebHDFS
and not RPC. We could consolidate nevertheless.
I will open a JIRA to extend the documentation with the configuration keys.
v) I'm open to do more tests. I think the guys from LinkedIn wanted to test
some more frameworks in their dev setup. In addition, before merging, I'd
run the version in trunk for a few days.
v) Good catches, I'll open JIRAs for those.

On Mon, Aug 28, 2017 at 6:12 AM, Brahma Reddy Battula <
brahmareddy.batt...@huawei.com> wrote:

> Nice Feature, Great work Guys. Looking forward getting in this, as already
> YARN federation is in.
>
> At first glance I have few questions
>
> i) Could have a consolidated patch for better review..?
>
> ii) Hoping  "Federation Metrics" and "Federation UI" will be included.
>
> iii) do we've RPC benchmarks ?
>
> iv) As of now "dfs.federation.router.rpc.enable"  and
> "dfs.federation.router.store.enable" made "true", does we need to keep
> this configs..? since without this router might not be useful..?
>
> iv) bq. The rest of the options are documented in [hdfs-default.xml]
>  I feel, better to document  all the configurations. I see, there are so
> many, how about document in tabular format..?
>
> v) Downstream projects (Spark,HBASE,HIVE..) integration testing..? looks
> you mentioned, is that enough..?
>
> v) mvn install (and package) is failing with following error
>
> [INFO]   Adding ignore: *
> [WARNING] Rule 1: org.apache.maven.plugins.enforcer.BanDuplicateClasses
> failed with message:
> Duplicate classes found:
>
>   Found in:
> org.apache.hadoop:hadoop-client-minicluster:jar:3.0.0-
> beta1-SNAPSHOT:compile
> org.apache.hadoop:hadoop-client-runtime:jar:3.0.0-
> beta1-SNAPSHOT:compile
>   Duplicate classes:
> org/apache/hadoop/shaded/org/apache/curator/framework/api/
> DeleteBuilder.class
> org/apache/hadoop/shaded/org/apache/curator/framework/
> CuratorFramework.class
>
>
> I added "hadoop-client-minicluster" to ignore list to get success
>
> hadoop\hadoop-client-modules\hadoop-client-integration-tests\pom.xml
>
>   
> 
>   org.apache.hadoop
>   hadoop-annotations
>   
> *
>   
> 
> 
>   org.apache.hadoop
>   hadoop-client-minicluster
>   
> *
>   
> 
>
>
> Please correct me If I am wrong.
>
>
> --Brahma Reddy Battula
>
> -Original Message-
> From: Chris Douglas [mailto:cdoug...@apache.org]
> Sent: 25 August 2017 06:37
> To: Andrew Wang
> Cc: Iñigo Goiri; hdfs-dev@hadoop.apache.org; su...@apache.org
> Subject: Re: [DISCUSS] Merge HDFS-10467 to (Router-based federation) trunk
>
> On Thu, Aug 24, 2017 at 2:25 PM, Andrew Wang 
> wrote:
> > Do you mind holding this until 3.1? Same reasoning as for the other
> > branch merge proposals, we're simply too late in the 3.0.0 release cycle.
>
> That wouldn't be too dire.
>
> That said, this has the same design and impact as YARN federation.
> Specifically, it sits almost entirely outside core HDFS, so it will not
> affect clusters running without R-BF.
>
> Merging would allow the two router implementations to converge on a common
> backend, which has started with HADOOP-14741 [1]. If the HDFS side only
> exists in 3.1, then that work would complicate maintenance of YARN in
> 3.0.x, which may require bug fixes as it stabilizes.
>
> Merging lowers costs for maintenance with a nominal risk to stability.
> The feature is well tested, deployed, and actively developed. The
> modifications to core HDFS [2] (~23k) are trivial.
>
> So I'd still advocate for this particular merge on those merits. -C
>
> [1] https://issues.apache.org/jira/browse/HADOOP-14741
> [2] git diff --diff-filter=M $(git merge-base apache/HDFS-10467
> apache/trunk)..apache/HDFS-10467
>
> > On Thu, Aug 24, 2017 at 1:39 PM, Chris Douglas 
> wrote:
> >>
> >> I'd definitely support merging this to trunk. The implementation is
> >> almost entirely outside of HDFS and, as Inigo detailed, has been
> >> tested at scale. The branch is in a functional state with
> >> documentation and tests. -C
> >>
> >> On Mon, Aug 21, 2017 at 6:11 PM, Iñigo Goiri  wrote:
> >> > Hi all,
> >> >
> >> >
> >> >
> >> > We would like to open a discussion on merging the Router-based
> >> > Federation feature to trunk.
> >> >
> >> > Last week, there was a thread about which branches would go into
> >> > 3.0 and given that YARN federation is going, this might be a good
> >> > time 

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Andrew Wang
So far I've seen no -1's to the branching proposal, so I plan to execute
this tomorrow unless there's further feedback.

Regarding the above discussion, I think Jason and I have essentially the
same opinion.

I hope that keeping trunk a release branch means a higher bar for merges
and code review in general. In the past, I've seen some patches committed
to trunk-only as a way of passing responsibility to a future user or
reviewer. That doesn't help anyone; patches should be committed with the
intent of running them in production.

I'd also like to repeat the above thanks to the many, many contributors
who've helped with release improvements. Allen's work on create-release and
automated changes and release notes were essential, as was Xiao's work on
LICENSE and NOTICE files. I'm also looking forward to Marton's site
improvements, which addresses one of the remaining sore spots in the
release process.

Things have gotten smoother with each alpha we've done over the last year,
and it's a testament to everyone's work that we have a good probability of
shipping beta and GA later this year.

Cheers,
Andrew

On Mon, Aug 28, 2017 at 3:48 PM, Colin McCabe  wrote:

> On Mon, Aug 28, 2017, at 14:22, Allen Wittenauer wrote:
> >
> > > On Aug 28, 2017, at 12:41 PM, Jason Lowe  wrote:
> > >
> > > I think this gets back to the "if it's worth committing" part.
> >
> >   This brings us back to my original question:
> >
> >   "Doesn't this place an undue burden on the contributor with the
> first incompatible patch to prove worthiness?  What happens if it is
> decided that it's not good enough?"
>
> I feel like this line of argument is flawed by definition.  "What
> happens if the patch isn't worth breaking compatibility over"?  Then we
> shouldn't break compatibility over it.  We all know that most
> compatibility breaks are avoidable with enough effort.  And it's an
> effort we should make, for the good of our users.
>
> Most useful features can be implemented without compatibility breaks.
> And for the few that truly can't, the community should surely agree that
> it's worth breaking compatibility before we do it.  If it's a really
> cool feature, that approval will surely not be hard to get (I'm tempted
> to quote your earlier email about how much we love features...)
>
> >
> >   The answer, if I understand your position, is then at least a
> maybe leaning towards yes: a patch that prior to this branching policy
> change that  would have gone in without any notice now has a higher burden
> (i.e., major feature) to prove worthiness ... and in the process eliminates
> a whole class of contributors and empowers others. Thus my concern ...
> >
> > > As you mentioned, people are already breaking compatibility left and
> right as it is, which is why I wondered if it was really any better in
> practice.  Personally I'd rather find out about a major breakage sooner
> than later, since if trunk remains an active area of development at all
> times it's more likely the community will sit up and take notice when
> something crazy goes in.  In the past, trunk was not really an actively
> deployed area for over 5 years, and all sorts of stuff went in without
> people really being aware of it.
> >
> >   Given the general acknowledgement that the compatibility
> guidelines are mostly useless in reality, maybe the answer is really that
> we're doing releases all wrong.  Would it necessarily be a bad thing if we
> moved to a model where incompatible changes gradually released instead of
> one big one every seven?
>
> I haven't seen anyone "acknowledge that... compatibility guidelines are
> mostly useless"... even you.  Reading your posts from the past, I don't
> get that impression.  On the contrary, you are often upset about
> compatibility breakages.
>
> What would be positive about allowing compatibility breaks in minor
> releases?  Can you give a specific example of what would be improved?
>
> >
> >   Yes, I lived through the "walking on glass" days at Yahoo! and
> realize what I'm saying.  But I also think the rate of incompatible changes
> has slowed tremendously.  Entire groups of APIs aren't getting tossed out
> every week anymore.
> >
> > > It sounds like we agree on that part but disagree on the specifics of
> how to help trunk remain active.
> >
> >   Yup, and there is nothing wrong with that. ;)
> >
> > >  Given that historically trunk has languished for years I was hoping
> this proposal would help reduce the likelihood of it happening again.  If
> we eventually decide that cutting branch-3 now makes more sense then I'll
> do what I can to make that work well, but it would be good to see concrete
> proposals on how to avoid the problems we had with it over the last 6 years.
> >
> >
> >   Yup, agree. But proposals rarely seem to get much actual traction.
> (It's kind of fun reading the Hadoop bylaws and compatibility guidelines
> and old [VOTE] threads to realize 

[jira] [Created] (HDFS-12369) Edit log corruption due to hard lease recovery of not-closed file

2017-08-28 Thread Xiao Chen (JIRA)
Xiao Chen created HDFS-12369:


 Summary: Edit log corruption due to hard lease recovery of 
not-closed file
 Key: HDFS-12369
 URL: https://issues.apache.org/jira/browse/HDFS-12369
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Reporter: Xiao Chen
Assignee: Xiao Chen


HDFS-6257 and HDFS-7707 worked hard to prevent corruption from combinations of 
client operations.

Recently, we have observed NN not able to start with the following exception:
{noformat}
2017-08-17 14:32:18,418 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: 
Failed to start namenode.
java.io.FileNotFoundException: File does not exist: 
/home/Events/CancellationSurvey_MySQL/2015/12/31/.part-0.9nlJ3M
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:66)
at 
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:429)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:897)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:750)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:318)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1125)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:789)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
{noformat}

Quoting a nicely analysed edits:
{quote}
In the edits logged about 1 hour later, we see this failing OP_CLOSE. The 
sequence in the edits shows the file going through:

  OPEN
  ADD_BLOCK
  CLOSE
  ADD_BLOCK # perhaps this was an append
  DELETE
  (about 1 hour later) CLOSE

It is interesting that there was no CLOSE logged before the delete.
{quote}

Grepping that file name, it turns out the close was triggered by lease reaching 
hard limit.
{noformat}
2017-08-16 15:05:45,927 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
  Recovering [Lease.  Holder: DFSClient_NONMAPREDUCE_-1997177597_28, pending 
creates: 75], 
  src=/home/Events/CancellationSurvey_MySQL/2015/12/31/.part-0.9nlJ3M
2017-08-16 15:05:45,927 WARN org.apache.hadoop.hdfs.StateChange: BLOCK* 
  internalReleaseLease: All existing blocks are COMPLETE, lease removed, file 
  /home/Events/CancellationSurvey_MySQL/2015/12/31/.part-0.9nlJ3M closed.
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Colin McCabe
On Mon, Aug 28, 2017, at 14:22, Allen Wittenauer wrote:
> 
> > On Aug 28, 2017, at 12:41 PM, Jason Lowe  wrote:
> > 
> > I think this gets back to the "if it's worth committing" part.
> 
>   This brings us back to my original question:
> 
>   "Doesn't this place an undue burden on the contributor with the first 
> incompatible patch to prove worthiness?  What happens if it is decided that 
> it's not good enough?"

I feel like this line of argument is flawed by definition.  "What
happens if the patch isn't worth breaking compatibility over"?  Then we
shouldn't break compatibility over it.  We all know that most
compatibility breaks are avoidable with enough effort.  And it's an
effort we should make, for the good of our users.

Most useful features can be implemented without compatibility breaks. 
And for the few that truly can't, the community should surely agree that
it's worth breaking compatibility before we do it.  If it's a really
cool feature, that approval will surely not be hard to get (I'm tempted
to quote your earlier email about how much we love features...)

> 
>   The answer, if I understand your position, is then at least a maybe 
> leaning towards yes: a patch that prior to this branching policy change that  
> would have gone in without any notice now has a higher burden (i.e., major 
> feature) to prove worthiness ... and in the process eliminates a whole class 
> of contributors and empowers others. Thus my concern ...
> 
> > As you mentioned, people are already breaking compatibility left and right 
> > as it is, which is why I wondered if it was really any better in practice.  
> > Personally I'd rather find out about a major breakage sooner than later, 
> > since if trunk remains an active area of development at all times it's more 
> > likely the community will sit up and take notice when something crazy goes 
> > in.  In the past, trunk was not really an actively deployed area for over 5 
> > years, and all sorts of stuff went in without people really being aware of 
> > it.
> 
>   Given the general acknowledgement that the compatibility guidelines are 
> mostly useless in reality, maybe the answer is really that we're doing 
> releases all wrong.  Would it necessarily be a bad thing if we moved to a 
> model where incompatible changes gradually released instead of one big one 
> every seven?

I haven't seen anyone "acknowledge that... compatibility guidelines are
mostly useless"... even you.  Reading your posts from the past, I don't
get that impression.  On the contrary, you are often upset about
compatibility breakages.

What would be positive about allowing compatibility breaks in minor
releases?  Can you give a specific example of what would be improved?

> 
>   Yes, I lived through the "walking on glass" days at Yahoo! and realize 
> what I'm saying.  But I also think the rate of incompatible changes has 
> slowed tremendously.  Entire groups of APIs aren't getting tossed out every 
> week anymore.
> 
> > It sounds like we agree on that part but disagree on the specifics of how 
> > to help trunk remain active.
> 
>   Yup, and there is nothing wrong with that. ;)
> 
> >  Given that historically trunk has languished for years I was hoping this 
> > proposal would help reduce the likelihood of it happening again.  If we 
> > eventually decide that cutting branch-3 now makes more sense then I'll do 
> > what I can to make that work well, but it would be good to see concrete 
> > proposals on how to avoid the problems we had with it over the last 6 years.
> 
> 
>   Yup, agree. But proposals rarely seem to get much actual traction. 
> (It's kind of fun reading the Hadoop bylaws and compatibility guidelines and 
> old [VOTE] threads to realize how much stuff doesn't actually happen despite 
> everyone generally agree that abc is a good idea.)  To circle back a bit, I 
> do also agree that automation has a role to play
> 
>Before anyone can accuse or imply me of being a hypocrite (and I'm 
> sure someone eventually will privately if not publicly), I'm sure some folks 
> don't realize I've been working on this set of problems from a different 
> angle for the past few years.
> 
>   There are a handful of people that know I was going to attempt to do a 
> 3.x release a few years ago. [Andrew basically beat me to it. :) ] But I ran 
> into the release process.  What a mess.  Way too much manual work, lots of 
> undocumented bits, violation of ASF rules(!) , etc, etc.  We've all heard the 
> complaints.
> 
>   My hypothesis:  if the release process itself is easier, then getting a 
> release based on trunk is easier too. The more we automate, the more 
> non-vendors ("non traditional release managers"?) will be willing to roll 
> releases.  The more people that feel comfortable rolling a release, the more 
> likelihood releases will happen.  The more likelihood of releases happening, 
> the greater chance trunk had of 

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Allen Wittenauer

> On Aug 28, 2017, at 12:41 PM, Jason Lowe  wrote:
> 
> I think this gets back to the "if it's worth committing" part.

This brings us back to my original question:

"Doesn't this place an undue burden on the contributor with the first 
incompatible patch to prove worthiness?  What happens if it is decided that 
it's not good enough?"

The answer, if I understand your position, is then at least a maybe 
leaning towards yes: a patch that prior to this branching policy change that  
would have gone in without any notice now has a higher burden (i.e., major 
feature) to prove worthiness ... and in the process eliminates a whole class of 
contributors and empowers others. Thus my concern ...

> As you mentioned, people are already breaking compatibility left and right as 
> it is, which is why I wondered if it was really any better in practice.  
> Personally I'd rather find out about a major breakage sooner than later, 
> since if trunk remains an active area of development at all times it's more 
> likely the community will sit up and take notice when something crazy goes 
> in.  In the past, trunk was not really an actively deployed area for over 5 
> years, and all sorts of stuff went in without people really being aware of it.

Given the general acknowledgement that the compatibility guidelines are 
mostly useless in reality, maybe the answer is really that we're doing releases 
all wrong.  Would it necessarily be a bad thing if we moved to a model where 
incompatible changes gradually released instead of one big one every seven?

Yes, I lived through the "walking on glass" days at Yahoo! and realize 
what I'm saying.  But I also think the rate of incompatible changes has slowed 
tremendously.  Entire groups of APIs aren't getting tossed out every week 
anymore.

> It sounds like we agree on that part but disagree on the specifics of how to 
> help trunk remain active.

Yup, and there is nothing wrong with that. ;)

>  Given that historically trunk has languished for years I was hoping this 
> proposal would help reduce the likelihood of it happening again.  If we 
> eventually decide that cutting branch-3 now makes more sense then I'll do 
> what I can to make that work well, but it would be good to see concrete 
> proposals on how to avoid the problems we had with it over the last 6 years.


Yup, agree. But proposals rarely seem to get much actual traction. 
(It's kind of fun reading the Hadoop bylaws and compatibility guidelines and 
old [VOTE] threads to realize how much stuff doesn't actually happen despite 
everyone generally agree that abc is a good idea.)  To circle back a bit, I do 
also agree that automation has a role to play

 Before anyone can accuse or imply me of being a hypocrite (and I'm 
sure someone eventually will privately if not publicly), I'm sure some folks 
don't realize I've been working on this set of problems from a different angle 
for the past few years.

There are a handful of people that know I was going to attempt to do a 
3.x release a few years ago. [Andrew basically beat me to it. :) ] But I ran 
into the release process.  What a mess.  Way too much manual work, lots of 
undocumented bits, violation of ASF rules(!) , etc, etc.  We've all heard the 
complaints.

My hypothesis:  if the release process itself is easier, then getting a 
release based on trunk is easier too. The more we automate, the more 
non-vendors ("non traditional release managers"?) will be willing to roll 
releases.  The more people that feel comfortable rolling a release, the more 
likelihood releases will happen.  The more likelihood of releases happening, 
the greater chance trunk had of getting out the door.

That turned into years worth of fixing and automating lots of stuff 
that was continual complained about but never fixed:  release notes, 
changes.txt, chunks of the build process, chunks of the release tar ball 
process, fixing consistency, etc.  Some of that became a part of Yetus, some of 
it didn't.  Some of that work leaked into branch-2 at some point. Many probably 
don't know why this stuff was happening.  Then there were the people that 
claimed I was "wasting my time" and that I should be focusing on "more 
important" things.  (Press release features, I'm assuming.)

So, yes, I'd like to see proposals, but I'd also like to challenge the 
community at large to spend more time on these build processes.  There's a 
tremendous amount of cruft and our usage of maven is still nearly primordial in 
implementation. (Shout out to Marton Elek who has some great although ambitious 
ideas.)  

Also kudos to Andrew for putting create-release and a lot of my other 
changes through their paces in the early days.  When he publicly stepped up to 
do the release, I don't know if he realized what he was walking into... 

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Vinod Kumar Vavilapalli
+1 to Andrew’s proposal for 3.x releases.

We had fairly elaborate threads on this branching & compatibility topic before. 
One of them’s here: [1]

+1 to what Jason said.
 (a) Incompatible changes are not to be treated lightly.  We need to stop 
breaking stuff and ‘just dump it on trunk'.
 (b) Major versions are expensive. We should hesitate before asking our users 
to move from 2.0 to 3.0 or 3.0 to 4.0 (with incompatible changes) *without* any 
other major value proposition.

Some of the incompatible changes can clear wait while others cannot and so may 
mandate a major release. What are the some of the common types of incompatible 
changes?
 - Renaming APIs, removing deprecated APIs, renaming configuration properties, 
changing the default value of a configuration, changing shell output / logging 
etc:
— Today, we do this on trunk even though the actual effort involved is very 
minimal compared to the overhead it forces in maintaining incompatible trunk.
 - Dependency library updates - updating guava, protobuf etc in Hadoop breaks 
upstreaming applications. I am assuming Classpath Isolation [2] is still a 
blocker for 3.0 GA.
 - JDK upgrades: We tried two different ways with JDK 7 and JDK 8, we need a 
formal policy on this.

If we can managing the above common breaking changes, we can cause less pain to 
our end users.

Here’s what we can do for 3.x / 4.x specifically.
 - Stay on trunk based 3.x releases
 - Avoid all incompatible changes as much as possible
 - If we run into a bunch of minor incompatible changes that have be done, we 
either (a) make the incompatible behavior optional or (b) just park them say 
with an parked-incompatible-change label if making it optional is not possible
 - We create a 4.0 only when (a) we hit the first major incompatible change 
because a major next-step for Hadoop needs it (for e.g. Erasure Coding), and/or 
(b) the number of parked incompatible changes passes a certain threshold. 
Unlike Jason, I don’t see the threshold to be 1 for cases that don’t fit (1).

References
 [1] Looking to a Hadoop 3 release: 
http://markmail.org/thread/2daldggjaeewdmdf#query:+page:1+mid:m6x73t6srlchywsn+state:results
 

 [2] Classpath isolation for downstream client: 
https://issues.apache.org/jira/browse/HADOOP-11656 


Thanks
+Vinod

> On Aug 25, 2017, at 1:23 PM, Jason Lowe  wrote:
> 
> Allen Wittenauer wrote:
> 
> 
>> Doesn't this place an undue burden on the contributor with the first
>> incompatible patch to prove worthiness?  What happens if it is decided that
>> it's not good enough?
> 
> 
> It is a burden for that first, "this can't go anywhere else but 4.x"
> change, but arguably that should not be a change done lightly anyway.  (Or
> any other backwards-incompatible change for that matter.)  If it's worth
> committing then I think it's perfectly reasonable to send out the dev
> announce that there's reason for trunk to diverge from 3.x, cut branch-3,
> and move on.  This is no different than Andrew's recent announcement that
> there's now a need for separating trunk and the 3.0 line based on what's
> about to go in.
> 
> I do not think it makes sense to pay for the maintenance overhead of two
> nearly-identical lines with no backwards-incompatible changes between them
> until we have the need.  Otherwise if past trunk behavior is any
> indication, it ends up mostly enabling people to commit to just trunk,
> forgetting that the thing they are committing is perfectly valid for
> branch-3.  If we can agree that trunk and branch-3 should be equivalent
> until an incompatible change goes into trunk, why pay for the commit
> overhead and potential for accidentally missed commits until it is really
> necessary?
> 
> How many will it take before the dam will break?  Or is there a timeline
>> going to be given before trunk gets set to 4.x?
> 
> 
> I think the threshold count for the dam should be 1.  As soon as we have a
> JIRA that needs to be committed to move the project forward and we cannot
> ship it in a 3.x release then we create branch-3 and move trunk to 4.x.
> As for a timeline going to 4.x, again I don't see it so much as a "baking
> period" as a "when we need it" criteria.  If we need it in a week then we
> should cut it in a week.  Or a year then a year.  It all depends upon when
> that 4.x-only change is ready to go in.
> 
> Given the number of committers that openly ignore discussions like this,
>> who is going to verify that incompatible changes don't get in?
>> 
> 
> The same entities who are verifying other bugs don't get in, i.e.: the
> committers and the Hadoop QA bot running the tests.  Yes, I know that means
> it's inevitable that compatibility breakages will happen, and we can and
> should improve the automation around compatibility testing when possible.
> But I don't think there's a magic 

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Jason Lowe
Allen Wittenauer wrote:


> > On Aug 25, 2017, at 1:23 PM, Jason Lowe  wrote:
> >
> > Allen Wittenauer wrote:
> >
> > > Doesn't this place an undue burden on the contributor with the first
> incompatible patch to prove worthiness?  What happens if it is decided that
> it's not good enough?
> >
> > It is a burden for that first, "this can't go anywhere else but 4.x"
> change, but arguably that should not be a change done lightly anyway.  (Or
> any other backwards-incompatible change for that matter.)  If it's worth
> committing then I think it's perfectly reasonable to send out the dev
> announce that there's reason for trunk to diverge from 3.x, cut branch-3,
> and move on.  This is no different than Andrew's recent announcement that
> there's now a need for separating trunk and the 3.0 line based on what's
> about to go in.
>
> So, by this definition as soon as a patch comes in to remove
> deprecated bits there will be no issue with a branch-3 getting created,
> correct?
>

I think this gets back to the "if it's worth committing" part.  I feel the
community should collectively decide when it's worth taking the hit to
maintain the separate code line.  IMHO removing deprecated bits alone is
not reason enough to diverge the code base and the additional maintenance
that comes along with the extra code line.  A new feature is traditionally
the reason to diverge because that's something users would actually care
enough about to take the compatibility hit when moving to the version that
has it.  That also helps drive a timely release of the new code line
because users want the feature that went into it.


> >  Otherwise if past trunk behavior is any indication, it ends up mostly
> enabling people to commit to just trunk, forgetting that the thing they are
> committing is perfectly valid for branch-3.
>
> I'm not sure there was any "forgetting" involved.  We likely
> wouldn't be talking about 3.x at all if it wasn't for the code diverging
> enough.
>

I don't think it was the myriad of small patches that went only into trunk
over the last 6 years that drove this.  Instead I think it was simply that
an "important enough" feature went in, like erasure coding, that gathered
momentum behind this release.  Trunk sat ignored for basically 5+ years,
and plenty of patches went into just trunk that should have gone into at
least branch-2 as well.  I don't think we as a community did the
contributors any favors by putting their changes into a code line that
didn't see a release for a very long time.  Yes 3.x could have released
sooner to help solve that issue, but given the complete lack of excitement
around 3.x until just recently is there any reason this won't happen again
with 4.x?  Seems to me 4.x will need to have something "interesting enough"
to drive people to release it relative to 3.x, which to me indicates we
shouldn't commit things only to there until we have an interest to do so.

> > Given the number of committers that openly ignore discussions like
> this, who is going to verify that incompatible changes don't get in?
> >
> > The same entities who are verifying other bugs don't get in, i.e.: the
> committers and the Hadoop QA bot running the tests.
> >  Yes, I know that means it's inevitable that compatibility breakages
> will happen, and we can and should improve the automation around
> compatibility testing when possible.
>
> The automation only goes so far.  At least while investigating
> Yetus bugs, I've seen more than enough blatant and purposeful ignored
> errors and warnings that I'm not convinced it will be effective. ("That
> javadoc compile failure didn't come from my patch!"  Um, yes, yes it did.)
> PR for features has greatly trumped code correctness for a few years now.
>

I totally agree here.  We can and should do better about this outside of
automation.  I brought up automation since I see it as a useful part of the
total solution along with better developer education, oversight, etc.  I'm
thinking specifically about tools that can report on public API signature
changes, but that's just one aspect of compatibility.  Semantic behavior is
not something a static analysis tool can automatically detect, and the only
way to automate some of that is something like end-to-end compatibility
testing.  Bigtop may cover some of this with testing of older versions of
downstream projects like HBase, Hive, Oozie, etc., and we could setup some
tests that standup two different Hadoop clusters and run tests that verify
interop between them.  But the tests will never be exhaustive and we will
still need educated committers and oversight to fill in the gaps.

>  But I don't think there's a magic bullet for preventing all
> compatibility bugs from being introduced, just like there isn't one for
> preventing general bugs.  Does having a trunk branch separate but
> essentially similar to branch-3 make this any better?
>
> Yes: it's been the process for over a decade 

[jira] [Created] (HDFS-12368) [branch-2] Enable DFSNetworkTopology as default

2017-08-28 Thread Chen Liang (JIRA)
Chen Liang created HDFS-12368:
-

 Summary: [branch-2] Enable DFSNetworkTopology as default
 Key: HDFS-12368
 URL: https://issues.apache.org/jira/browse/HDFS-12368
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Chen Liang
Assignee: Chen Liang


This JIRA is to backport HDFS-11998 to branch-2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Allen Wittenauer

> On Aug 25, 2017, at 1:23 PM, Jason Lowe  wrote:
> 
> Allen Wittenauer wrote:
>  
> > Doesn't this place an undue burden on the contributor with the first 
> > incompatible patch to prove worthiness?  What happens if it is decided that 
> > it's not good enough?
> 
> It is a burden for that first, "this can't go anywhere else but 4.x" change, 
> but arguably that should not be a change done lightly anyway.  (Or any other 
> backwards-incompatible change for that matter.)  If it's worth committing 
> then I think it's perfectly reasonable to send out the dev announce that 
> there's reason for trunk to diverge from 3.x, cut branch-3, and move on.  
> This is no different than Andrew's recent announcement that there's now a 
> need for separating trunk and the 3.0 line based on what's about to go in.

So, by this definition as soon as a patch comes in to remove deprecated 
bits there will be no issue with a branch-3 getting created, correct?

>  Otherwise if past trunk behavior is any indication, it ends up mostly 
> enabling people to commit to just trunk, forgetting that the thing they are 
> committing is perfectly valid for branch-3. 

I'm not sure there was any "forgetting" involved.  We likely wouldn't 
be talking about 3.x at all if it wasn't for the code diverging enough.

> > Given the number of committers that openly ignore discussions like this, 
> > who is going to verify that incompatible changes don't get in?
>  
> The same entities who are verifying other bugs don't get in, i.e.: the 
> committers and the Hadoop QA bot running the tests.
>  Yes, I know that means it's inevitable that compatibility breakages will 
> happen, and we can and should improve the automation around compatibility 
> testing when possible.

The automation only goes so far.  At least while investigating Yetus 
bugs, I've seen more than enough blatant and purposeful ignored errors and 
warnings that I'm not convinced it will be effective. ("That javadoc compile 
failure didn't come from my patch!"  Um, yes, yes it did.) PR for features has 
greatly trumped code correctness for a few years now.

In any case, specifically thinking of the folks that commit maybe one 
or two patches a year.  They generally don't pay attention to *any* of this 
stuff and it doesn't seem like many people are actually paying attention to 
what gets committed until it breaks their universe.

>  But I don't think there's a magic bullet for preventing all compatibility 
> bugs from being introduced, just like there isn't one for preventing general 
> bugs.  Does having a trunk branch separate but essentially similar to 
> branch-3 make this any better?

Yes: it's been the process for over a decade now.  Unless there is some 
outreach done, it is almost a guarantee that someone will commit something to 
trunk they shouldn't because they simply won't know (or care?) the process has 
changed.  

> > Longer term:  what is the PMC doing to make sure we start doing major 
> > releases in a timely fashion again?  In other words, is this really an 
> > issue if we shoot for another major in (throws dart) 2 years?
> 
> If we're trying to do semantic versioning

FWIW: Hadoop has *never* done semantic versioning. A large percentage 
of our minors should really have been majors. 

> then we shouldn't have a regular cadence for major releases unless we have a 
> regular cadence of changes that break compatibility.  

But given that we don't follow semantic versioning

> I'd hope that's not something we would strive towards.  I do agree that we 
> should try to be better about shipping releases, major or minor, in a more 
> timely manner, but I don't agree that we should cut 4.0 simply based on a 
> duration since the last major release.

... the only thing we're really left with is (technically) time, either 
in the form of a volunteer saying "hey, I've got time to cut a release" or "my 
employer has a corporate goal based upon a feature in this release".   I would 
*love* for the PMC to define a policy or guidelines that says the community 
should strive for a major after x  incompatible changes, a minor after y 
changes, a micro after z fixes.  Even if it doesn't have any teeth, it would at 
least give people hope that their contributions won't be lost in the dustbin of 
history and may actually push others to work on getting a release out.  (Hadoop 
has people made committers based upon features that have never gotten into a 
stable release.  Needless to say, most of those people no longer contribute 
actively if at all.)

No one really has any idea of when releases happen, we have situations 
like we see with fsck:  a completely untenable amount of options for things 
that shouldn't even be options.  It's incredibly user unfriendly and a great 
example of why Hadoop comes off as hostile to its own users.  But because no 
one really knows when the next incompat 

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-08-28 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/

[Aug 27, 2017 10:19:55 PM] (liuml07) MAPREDUCE-6945. TestMapFileOutputFormat 
missing @after annotation.




-1 overall


The following subsystems voted -1:
findbugs unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager
 
   Hard coded reference to an absolute pathname in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(ContainerRuntimeContext)
 At DockerLinuxContainerRuntime.java:absolute pathname in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.linux.runtime.DockerLinuxContainerRuntime.launchContainer(ContainerRuntimeContext)
 At DockerLinuxContainerRuntime.java:[line 490] 

Failed junit tests :

   hadoop.fs.sftp.TestSFTPFileSystem 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure130 
   hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy 
   hadoop.hdfs.TestLeaseRecoveryStriped 
   hadoop.hdfs.TestClientProtocolForPipelineRecovery 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure100 
   hadoop.hdfs.server.namenode.ha.TestPipelinesFailover 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure190 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure160 
   hadoop.hdfs.TestFileCreationDelete 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailureWithRandomECPolicy 
   
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation 
   
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestIncreaseAllocationExpirer
 
   
hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerResizing 
   hadoop.yarn.server.TestDiskFailures 
   hadoop.yarn.client.api.impl.TestAMRMProxy 
   hadoop.yarn.client.api.impl.TestDistributedScheduling 
   hadoop.yarn.sls.appmaster.TestAMSimulator 
   hadoop.yarn.sls.nodemanager.TestNMSimulator 
   hadoop.yarn.sls.TestReservationSystemInvariants 
   hadoop.yarn.sls.TestSLSRunner 

Timed out junit tests :

   org.apache.hadoop.hdfs.TestWriteReadStripedFile 
   org.apache.hadoop.hdfs.TestReadStripedFileWithDecoding 
   
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA 
   
org.apache.hadoop.yarn.server.resourcemanager.TestKillApplicationWithRMHA 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/diff-compile-javac-root.txt
  [292K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/whitespace-eol.txt
  [11M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/whitespace-tabs.txt
  [1.2M]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/diff-javadoc-javadoc-root.txt
  [1.9M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [148K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [1.3M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
  [64K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/506/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
  [24K]
   

RE: [DISCUSS] Merge HDFS-10467 to (Router-based federation) trunk

2017-08-28 Thread Brahma Reddy Battula
Nice Feature, Great work Guys. Looking forward getting in this, as already YARN 
federation is in.

At first glance I have few questions

i) Could have a consolidated patch for better review..?

ii) Hoping  "Federation Metrics" and "Federation UI" will be included.

iii) do we've RPC benchmarks ?

iv) As of now "dfs.federation.router.rpc.enable"  and 
"dfs.federation.router.store.enable" made "true", does we need to keep this 
configs..? since without this router might not be useful..?

iv) bq. The rest of the options are documented in [hdfs-default.xml]
 I feel, better to document  all the configurations. I see, there are so many, 
how about document in tabular format..? 

v) Downstream projects (Spark,HBASE,HIVE..) integration testing..? looks you 
mentioned, is that enough..?

v) mvn install (and package) is failing with following error

[INFO]   Adding ignore: *
[WARNING] Rule 1: org.apache.maven.plugins.enforcer.BanDuplicateClasses failed 
with message:
Duplicate classes found:

  Found in:
org.apache.hadoop:hadoop-client-minicluster:jar:3.0.0-beta1-SNAPSHOT:compile
org.apache.hadoop:hadoop-client-runtime:jar:3.0.0-beta1-SNAPSHOT:compile
  Duplicate classes:

org/apache/hadoop/shaded/org/apache/curator/framework/api/DeleteBuilder.class
org/apache/hadoop/shaded/org/apache/curator/framework/CuratorFramework.class


I added "hadoop-client-minicluster" to ignore list to get success

hadoop\hadoop-client-modules\hadoop-client-integration-tests\pom.xml

  

  org.apache.hadoop
  hadoop-annotations
  
*
  


  org.apache.hadoop
  hadoop-client-minicluster
  
*
  



Please correct me If I am wrong.


--Brahma Reddy Battula

-Original Message-
From: Chris Douglas [mailto:cdoug...@apache.org] 
Sent: 25 August 2017 06:37
To: Andrew Wang
Cc: Iñigo Goiri; hdfs-dev@hadoop.apache.org; su...@apache.org
Subject: Re: [DISCUSS] Merge HDFS-10467 to (Router-based federation) trunk

On Thu, Aug 24, 2017 at 2:25 PM, Andrew Wang  wrote:
> Do you mind holding this until 3.1? Same reasoning as for the other 
> branch merge proposals, we're simply too late in the 3.0.0 release cycle.

That wouldn't be too dire.

That said, this has the same design and impact as YARN federation.
Specifically, it sits almost entirely outside core HDFS, so it will not affect 
clusters running without R-BF.

Merging would allow the two router implementations to converge on a common 
backend, which has started with HADOOP-14741 [1]. If the HDFS side only exists 
in 3.1, then that work would complicate maintenance of YARN in 3.0.x, which may 
require bug fixes as it stabilizes.

Merging lowers costs for maintenance with a nominal risk to stability.
The feature is well tested, deployed, and actively developed. The modifications 
to core HDFS [2] (~23k) are trivial.

So I'd still advocate for this particular merge on those merits. -C

[1] https://issues.apache.org/jira/browse/HADOOP-14741
[2] git diff --diff-filter=M $(git merge-base apache/HDFS-10467
apache/trunk)..apache/HDFS-10467

> On Thu, Aug 24, 2017 at 1:39 PM, Chris Douglas  wrote:
>>
>> I'd definitely support merging this to trunk. The implementation is 
>> almost entirely outside of HDFS and, as Inigo detailed, has been 
>> tested at scale. The branch is in a functional state with 
>> documentation and tests. -C
>>
>> On Mon, Aug 21, 2017 at 6:11 PM, Iñigo Goiri  wrote:
>> > Hi all,
>> >
>> >
>> >
>> > We would like to open a discussion on merging the Router-based 
>> > Federation feature to trunk.
>> >
>> > Last week, there was a thread about which branches would go into 
>> > 3.0 and given that YARN federation is going, this might be a good 
>> > time for this to be merged too.
>> >
>> >
>> > We have been running "Router-based federation" in production for a year.
>> >
>> > Meanwhile, we have been releasing it in a feature branch 
>> > (HDFS-10467
>> > [1])
>> > for a while.
>> >
>> > We are reasonably confident that the state of the branch is about 
>> > to meet the criteria to be merged onto trunk.
>> >
>> >
>> > *Feature*:
>> >
>> > This feature aggregates multiple namespaces into a single one 
>> > transparently to the user.
>> >
>> > It has a similar architecture to YARN federation (YARN-2915).
>> >
>> > It consists on Routers that handle requests from the clients and 
>> > forwards them to the right subcluster and exposes the same API as 
>> > the Namenode.
>> >
>> > Currently we use a mount table (similar to ViewFs) but can be 
>> > replaced by other approaches.
>> >
>> > The Routers share their state in a State Store.
>> >
>> >
>> >
>> > The main advantage is that clients interact with the 

Re: [VOTE] Merge feature branch YARN-5355 (Timeline Service v2) to trunk

2017-08-28 Thread Rohith Sharma K S
+1 (binding)

Thank you very much for the great team work!

Built from source and deployed in secured cluster. The below are the test
result.

Deployment :
Standard hadoop security deployment authentication and authorization as
well.
Branch-2 Hadoop and Hbase security cluster.
Branch-3 Hadoop security cluster. HBase client is pointing to Branch-2
hbase cluster.
All security configurations are set in-place.
Each service is running with its own user. Say, HDFS is running with
hdfs, YARN user is running with yarn, Hbase is running with hbase
Smoke test user : test-user

Test Cases :

Authentication :
Verify for all daemons start up successful : OK
Run a MR job using test-user : OK
Verify for REST API’s with in the scope of application : OK
Verify for REST API’s newly added I.e outside scope of application : OK.
RM Restart/ NM restart / RM_work-preserving restart has executed and
verified for data : OK. (Entity validation is done, but not entity data
validation!
Token redistribution to AM, NM is verified.

Authorization :
 1 . Basic whitelisting of users to read has been validated. Works as
expected!

Disabling TSv2 configuration is also being tested.


Thanks & Regards
Rohith Sharma K S

On 22 August 2017 at 12:02, Vrushali Channapattan 
wrote:

> Hi folks,
>
> Per earlier discussion [1], I'd like to start a formal vote to merge
> feature branch YARN-5355 [2] (Timeline Service v.2) to trunk. The vote will
> run for 7 days, and will end August 29 11:00 PM PDT.
>
> We have previously completed one merge onto trunk [3] and Timeline Service
> v2 has been part of Hadoop release 3.0.0-alpha1.
>
> Since then, we have been working on extending the capabilities of Timeline
> Service v2 in a feature branch [2] for a while, and we are reasonably
> confident that the state of the feature meets the criteria to be merged
> onto trunk and we'd love folks to get their hands on it in a test capacity
> and provide valuable feedback so that we can make it production-ready.
>
> In a nutshell, Timeline Service v.2 delivers significant scalability and
> usability improvements based on a new architecture. What we would like to
> merge to trunk is termed "alpha 2" (milestone 2). The feature has a
> complete end-to-end read/write flow with security and read level
> authorization via whitelists. You should be able to start setting it up and
> testing it.
>
> At a high level, the following are the key features that have been
> implemented since alpha1:
> - Security via Kerberos Authentication and delegation tokens
> - Read side simple authorization via whitelist
> - Client configurable entity sort ordering
> - Richer REST APIs for apps, app attempts, containers, fetching metrics by
> timerange, pagination, sub-app entities
> - Support for storing sub-application entities (entities that exist outside
> the scope of an application)
> - Configurable TTLs (time-to-live) for tables, configurable table prefixes,
> configurable hbase cluster
> - Flow level aggregations done as dynamic (table level) coprocessors
> - Uses latest stable HBase release 1.2.6
>
> There are a total of 82 subtasks that were completed as part of this
> effort.
>
> We paid close attention to ensure that once disabled Timeline Service v.2
> does not impact existing functionality when disabled (by default).
>
> Special thanks to a team of folks who worked hard and contributed towards
> this effort with patches, reviews and guidance: Rohith Sharma K S, Varun
> Saxena, Haibo Chen, Sangjin Lee, Li Lu, Vinod Kumar Vavilapalli, Joep
> Rottinghuis, Jason Lowe, Jian He, Robert Kanter, Micheal Stack.
>
> Regards,
> Vrushali
>
> [1] http://www.mail-archive.com/yarn-dev@hadoop.apache.org/msg27383.html
> [2] https://issues.apache.org/jira/browse/YARN-5355
> [3] https://issues.apache.org/jira/browse/YARN-2928
> [4] https://github.com/apache/hadoop/commits/YARN-5355
>


[jira] [Created] (HDFS-12367) Ozone: Too many open files error while running corona

2017-08-28 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12367:
--

 Summary: Ozone: Too many open files error while running corona
 Key: HDFS-12367
 URL: https://issues.apache.org/jira/browse/HDFS-12367
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone, tools
Reporter: Weiwei Yang


Too many open files error keeps happening to me while using corona, I have 
simply setup a single node cluster and run corona to generate 1000 keys, but I 
keep getting following error

{noformat}
./bin/hdfs corona -numOfThreads 1 -numOfVolumes 1 -numOfBuckets 1 -numOfKeys 
1000
17/08/28 00:47:42 WARN util.NativeCodeLoader: Unable to load native-hadoop 
library for your platform... using builtin-java classes where applicable
17/08/28 00:47:42 INFO tools.Corona: Number of Threads: 1
17/08/28 00:47:42 INFO tools.Corona: Mode: offline
17/08/28 00:47:42 INFO tools.Corona: Number of Volumes: 1.
17/08/28 00:47:42 INFO tools.Corona: Number of Buckets per Volume: 1.
17/08/28 00:47:42 INFO tools.Corona: Number of Keys per Bucket: 1000.
17/08/28 00:47:42 INFO rpc.OzoneRpcClient: Creating Volume: vol-0-05000, with 
wwei as owner and quota set to 1152921504606846976 bytes.
17/08/28 00:47:42 INFO tools.Corona: Starting progress bar Thread.
...
ERROR tools.Corona: Exception while adding key: key-251-19293 in bucket: 
bucket-0-34960 of volume: vol-0-05000.
java.io.IOException: Exception getting XceiverClient.
at 
org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:156)
at 
org.apache.hadoop.scm.XceiverClientManager.acquireClient(XceiverClientManager.java:122)
at 
org.apache.hadoop.ozone.client.io.ChunkGroupOutputStream.getFromKsmKeyInfo(ChunkGroupOutputStream.java:289)
at 
org.apache.hadoop.ozone.client.rpc.OzoneRpcClient.createKey(OzoneRpcClient.java:487)
at 
org.apache.hadoop.ozone.tools.Corona$OfflineProcessor.run(Corona.java:352)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.common.util.concurrent.UncheckedExecutionException: 
java.lang.IllegalStateException: failed to create a child event loop
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2234)
at com.google.common.cache.LocalCache.get(LocalCache.java:3965)
at 
com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4764)
at 
org.apache.hadoop.scm.XceiverClientManager.getClient(XceiverClientManager.java:144)
... 9 more
Caused by: java.lang.IllegalStateException: failed to create a child event loop
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:68)
at 
io.netty.channel.MultithreadEventLoopGroup.(MultithreadEventLoopGroup.java:49)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:61)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:52)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:44)
at 
io.netty.channel.nio.NioEventLoopGroup.(NioEventLoopGroup.java:36)
at org.apache.hadoop.scm.XceiverClient.connect(XceiverClient.java:76)
at 
org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:151)
at 
org.apache.hadoop.scm.XceiverClientManager$2.call(XceiverClientManager.java:145)
at 
com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4767)
at 
com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3568)
at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2350)
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2313)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2228)
... 12 more
Caused by: io.netty.channel.ChannelException: failed to open a new selector
at io.netty.channel.nio.NioEventLoop.openSelector(NioEventLoop.java:128)
at io.netty.channel.nio.NioEventLoop.(NioEventLoop.java:120)
at 
io.netty.channel.nio.NioEventLoopGroup.newChild(NioEventLoopGroup.java:87)
at 
io.netty.util.concurrent.MultithreadEventExecutorGroup.(MultithreadEventExecutorGroup.java:64)
... 25 more
Caused by: java.io.IOException: Too many open files
at sun.nio.ch.EPollArrayWrapper.epollCreate(Native Method)
at sun.nio.ch.EPollArrayWrapper.(EPollArrayWrapper.java:130)
at sun.nio.ch.EPollSelectorImpl.(EPollSelectorImpl.java:69)
at 
sun.nio.ch.EPollSelectorProvider.openSelector(EPollSelectorProvider.java:36)

[jira] [Created] (HDFS-12366) Ozone: Refactor KSM metadata class names to avoid confusion

2017-08-28 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12366:
--

 Summary: Ozone: Refactor KSM metadata class names to avoid 
confusion
 Key: HDFS-12366
 URL: https://issues.apache.org/jira/browse/HDFS-12366
 Project: Hadoop HDFS
  Issue Type: Sub-task
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang
Priority: Trivial


Propose to rename 2 classes in package {{org.apache.hadoop.ozone.ksm}}

* MetadataManager -> KsmMetadataManager
* MetadataManagerImpl -> KsmMetadataManagerImpl

this is to avoid confusions with ozone metadata store classes, such as 
{{MetadataKeyFilters}}, {{MetadataStore}} and {{MetadataStoreBuilder}} etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12365) Ozone: ListVolume displays incorrect createdOn time when the volume was created by OzoneRpcClient

2017-08-28 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12365:
--

 Summary: Ozone: ListVolume displays incorrect createdOn time when 
the volume was created by OzoneRpcClient
 Key: HDFS-12365
 URL: https://issues.apache.org/jira/browse/HDFS-12365
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Weiwei Yang
Assignee: Weiwei Yang


Reproduce steps

1. Create a key in ozone with corona (this delegates the call to 
OzoneRpcClient), e.g

{code}
[wwei@ozone1 hadoop-3.0.0-beta1-SNAPSHOT]$ ./bin/hdfs corona -numOfThreads 1 
-numOfVolumes 1 -numOfBuckets 1 -numOfKeys 1
{code}

2. Run listVolume

{code}
[wwei@ozone1 hadoop-3.0.0-beta1-SNAPSHOT]$ ./bin/hdfs oz -listVolume 
http://localhost:9864 -user wwei
{
  "owner" : {
"name" : "wwei"
  },
  "quota" : {
"unit" : "TB",
"size" : 1048576
  },
  "volumeName" : "vol-0-31437",
  "createdOn" : "Thu, 01 Jan 1970 00:00:00 GMT",
  "createdBy" : null
}
{
  "owner" : {
"name" : "wwei"
  },
  "quota" : {
"unit" : "TB",
"size" : 1048576
  },
  "volumeName" : "vol-0-38900",
  "createdOn" : "Thu, 01 Jan 1970 00:00:00 GMT",
  "createdBy" : null
}
{code}

Note, the time displayed in {{createdOn}} are both incorrect {{Thu, 01 Jan 1970 
00:00:00 GMT}}.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12364) Compile Error:TestClientProtocolForPipelineRecovery#testUpdatePipeLineAfterDNReg

2017-08-28 Thread Jiandan Yang (JIRA)
Jiandan Yang  created HDFS-12364:


 Summary: Compile 
Error:TestClientProtocolForPipelineRecovery#testUpdatePipeLineAfterDNReg
 Key: HDFS-12364
 URL: https://issues.apache.org/jira/browse/HDFS-12364
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.8.2
Reporter: Jiandan Yang 
Assignee: Jiandan Yang 


error line :dn1.setHeartbeatsDisabledForTests(true) 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org